galaaz 0.4.0 → 0.4.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (53) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +325 -32
  3. data/Rakefile +14 -0
  4. data/bin/galaaz +0 -3
  5. data/bin/gknit +28 -0
  6. data/bin/gstudio +6 -0
  7. data/bin/gstudio.rb +6 -0
  8. data/bin/ogk~ +4 -0
  9. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +335 -0
  10. data/blogs/galaaz_ggplot/galaaz_ggplot.html +460 -0
  11. data/blogs/galaaz_ggplot/galaaz_ggplot.md +327 -0
  12. data/blogs/galaaz_ggplot/midwest.Rmd +39 -0
  13. data/blogs/galaaz_ggplot/midwest.html +188 -0
  14. data/blogs/galaaz_ggplot/midwest.png +0 -0
  15. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  16. data/examples/50Plots_MasterList/Images/midwest-scatterplot.PNG +0 -0
  17. data/examples/50Plots_MasterList/ScatterPlot.rb +159 -0
  18. data/examples/R/calc.R +21 -0
  19. data/examples/R/java_interop.R +29 -0
  20. data/examples/{baseball.csv → misc/baseball.csv} +0 -0
  21. data/examples/{ggplot.rb → misc/ggplot.rb} +0 -0
  22. data/examples/misc/moneyball.rb +33 -0
  23. data/examples/{baseball.rb → misc/moneyball.rb~} +0 -0
  24. data/examples/misc/subsetting.rb +374 -0
  25. data/examples/{subsetting.rb → misc/subsetting.rb~} +0 -0
  26. data/lib/{expression.rb → R/expression.rb} +0 -0
  27. data/lib/{r.rb → R/r.rb} +1 -0
  28. data/lib/R/r.rb~ +121 -0
  29. data/lib/{r_methods.rb → R/r_methods.rb} +0 -0
  30. data/lib/{rbinary_operators.rb → R/rbinary_operators.rb} +0 -0
  31. data/lib/{rclosure.rb → R/rclosure.rb} +0 -0
  32. data/lib/{rdata_frame.rb → R/rdata_frame.rb} +0 -0
  33. data/lib/{renvironment.rb → R/renvironment.rb} +0 -0
  34. data/lib/{rexpression.rb → R/rexpression.rb} +0 -0
  35. data/lib/{rindexed_object.rb → R/rindexed_object.rb} +0 -0
  36. data/lib/{rlanguage.rb → R/rlanguage.rb} +0 -0
  37. data/lib/{rlist.rb → R/rlist.rb} +0 -0
  38. data/lib/{rmatrix.rb → R/rmatrix.rb} +0 -0
  39. data/lib/{rmd_indexed_object.rb → R/rmd_indexed_object.rb} +0 -0
  40. data/lib/{robject.rb → R/robject.rb} +0 -0
  41. data/lib/{rpkg.rb → R/rpkg.rb} +0 -0
  42. data/lib/{rsupport.rb → R/rsupport.rb} +0 -0
  43. data/lib/{rsupport_scope.rb → R/rsupport_scope.rb} +0 -0
  44. data/lib/{rsymbol.rb → R/rsymbol.rb} +0 -0
  45. data/lib/{ruby_callback.rb → R/ruby_callback.rb} +0 -0
  46. data/lib/{ruby_extensions.rb → R/ruby_extensions.rb} +0 -0
  47. data/lib/{runary_operators.rb → R/runary_operators.rb} +0 -0
  48. data/lib/{rvector.rb → R/rvector.rb} +0 -0
  49. data/lib/galaaz.rb +2 -1
  50. data/lib/util/exec_ruby.rb +44 -0
  51. data/specs/tmp.rb +167 -1
  52. data/version.rb +1 -1
  53. metadata +63 -28
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 472e7863d96eb66bf0aa0e78194c021c5a128049
4
- data.tar.gz: b5742fe889c830c83f1c53bd091ffa43d3ec44fb
3
+ metadata.gz: 64681d29dd707019a0461ac473194ade6accfb56
4
+ data.tar.gz: 34f0d93c32e4c3f2deb6a15ff8d02ba684a28396
5
5
  SHA512:
6
- metadata.gz: 0f39228250f568f09d322dc1b70647982cdfa785e895695e30dab20ba595f5a5fe42ec5f56076c3deddbde8a7ff3172a7addc35c913315bb3f2c2d151d998107
7
- data.tar.gz: 8e407929466a0d8152c0eb8c7ea5b28bea0ed2818c3abcc732659845448c01898dd6af7cf0332233bcc5ce29d9718285a1a747c2f1fc72ea08ad401aa712d1ad
6
+ metadata.gz: 1651b248d716539fcb2892b0398de6190db001fc02656cca187422b9f9bf10b6ac93486033058874cd87e910f71abe9350816563a4e762d0ee4f5a8887eccc68
7
+ data.tar.gz: d0599bd0afb5602fd3be61b97ed1a46a0b72c87179a1a8a9b081ce135bc60896387e9ad9287bfbf133d9a14a6e4fdb3478aaa2f583c94ee4f478882cfea29404
data/README.md CHANGED
@@ -1,32 +1,325 @@
1
- # Running Ruby and R - The Polyglot Environment
2
-
3
- TruffleRuby (the Polyglot implementation of Ruby) can access, through the Polyglot interface, any other
4
- language available in the environment. For instance, in the code bellow, TruffleRuby makes a call to
5
- JavaScript:
6
-
7
- require 'json'
8
-
9
- obj = {time: Time.now,
10
- msg: 'Hello World',
11
- payload: (1..10).to_a }
12
-
13
- encoded = JSON.dump(obj)
14
-
15
- js_obj = Polyglot.eval('js', 'JSON.parse').call(encoded)
16
- puts js_obj[:time]
17
- puts js_obj[:msg]
18
- puts js_obj[:payload].join(' ')
19
-
20
- Calling R is similar to the above. For example, in R, method 'c' concatenates its arguments making a vector:
21
-
22
- vec = Polyglot.eval('R', 'c').call(1, 2, 3)
23
- puts vec[0]
24
- p vec
25
- > 1
26
- > #<Truffle::Interop::Foreign@5d03df76>
27
-
28
- As can be seen, vec is a vector with the first element (indexed at 0 - Ruby indexing) is 1.
29
- Inspecting vec, show that it is a Truffle::Interop object. Although it is possible to work with
30
- Interop objects in a program, doing so is hard and error prone. Bellow, we show how integration of
31
- Ruby and R can greatly simplify the development of Polyglot application.
32
-
1
+ Introduction
2
+ ============
3
+
4
+ Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large community, a very large set of libraries and great for web development. However, it lacks libraries for data science, statistics, scientific plotting and machine learning. On the other hand, R is considered one of the most powerful languages for solving all of the above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy, Panda, SciPy, SciKit-Learn and a couple more.
5
+
6
+ With Galaaz we do not intend to re-implement any of the scientific libraries in R, we allow for very tight coupling between the two languages to the point that the Ruby developer does not need to know that there is an R engine running. For this to happen we use new technologies provided by Oracle, with GraalVM, TruffleRuby and FastR:
7
+
8
+ GraalVM is a universal virtual machine for running applications written in JavaScript,
9
+ Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin, and LLVM-based languages
10
+ such as C and C++.
11
+
12
+ GraalVM removes the isolation between programming languages and enables interoperability in a
13
+ shared runtime. It can run either standalone or in the context of OpenJDK, Node.js,
14
+ Oracle Database, or MySQL.
15
+
16
+ GraalVM allows you to write polyglot applications with a seamless way to pass values from one
17
+ language to another. With GraalVM there is no copying or marshalling necessary as it is with
18
+ other polyglot systems. This lets you achieve high performance when language boundaries are
19
+ crossed. Most of the time there is no additional cost for crossing a language boundary at all.
20
+
21
+ Often developers have to make uncomfortable compromises that require them to rewrite
22
+ their software in other languages. For example:
23
+
24
+ * “That library is not available in my language. I need to rewrite it.”
25
+ * “That language would be the perfect fit for my problem, but we cannot run it in our environment.”
26
+ * “That problem is already solved in my language, but the language is too slow.”
27
+
28
+ With GraalVM we aim to allow developers to freely choose the right language for the task at
29
+ hand without making compromises.
30
+
31
+ What does Galaaz mean
32
+ ---------------------
33
+
34
+ Galaaz is the Portuguese name for "Galahad". From Wikipedia:
35
+
36
+ Sir Galahad (/ˈɡæləhæd/; sometimes referred to as Galeas /ɡəˈliːəs/ or Galath /ˈɡæləθ/),
37
+ in Arthurian legend, is a knight of King Arthur's Round Table and one of the three
38
+ achievers of the Holy Grail. He is the illegitimate son of Sir Lancelot and Elaine of
39
+ Corbenic, and is renowned for his gallantry and purity as the most perfect of all knights.
40
+ Emerging quite late in the medieval Arthurian tradition, Sir Galahad first appears in the
41
+ Lancelot–Grail cycle, and his story is taken up in later works such as the Post-Vulgate
42
+ Cycle and Sir Thomas Malory's Le Morte d'Arthur. His name should not be mistaken with
43
+ Galehaut, a different knight from Arthurian legend.
44
+
45
+ Galaaz Demo
46
+ ===========
47
+
48
+ Prerequisites
49
+ -------------
50
+
51
+ - GraalVM (&gt;= rc7)
52
+ - TruffleRuby
53
+ - FastR
54
+
55
+ The following R packages will be automatically installed when necessary, but could be installed prior to the demo if desired:
56
+
57
+ - ggplot2
58
+ - gridExtra
59
+
60
+ Installation of R packages requires a development environment. In Linux, the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.
61
+
62
+ In order to run the 'specs' the following Ruby package is necessary:
63
+
64
+ - gem install rspec
65
+
66
+ Preparation
67
+ -----------
68
+
69
+ - gem install galaaz
70
+
71
+ Running the demo
72
+ ----------------
73
+
74
+ The ggplot for this demos was extracted from: <http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html>.
75
+
76
+ On the console do
77
+
78
+ > galaaz master_list:scatter_plot
79
+
80
+ Running other demos
81
+ -------------------
82
+
83
+ Doing on the console
84
+
85
+ > galaaz -T
86
+
87
+ will show a list with all available demos. To run any of the demos in the list, substitute the call to 'rake' to 'galaaz'. For instance, one of the examples in the list is 'rake sthda:bar'. In order to run this example just do 'galaaz sthda:bar'. Doing 'galaaz sthda:all' will run all demos in the sthda cathegory. Some of the examples require 'rspec' do be available. To install 'rspec' just do 'gem install rspec'.
88
+
89
+ The demo code
90
+ =============
91
+
92
+ The following is the Ruby code and plot for the above example. There is a small difference between the code in the example and the code bellow. If the example is ran, the plot will appear on the screen, bellow, we generate an 'svg' image and then include it in this document. In order to generate and image, the R.svg device is used. To generate the plot on the screen, use the R.awt device, as commented on the code.
93
+
94
+ ``` truby
95
+ require 'galaaz'
96
+ require 'ggplot'
97
+
98
+ # load package and data
99
+ R.options(scipen: 999) # turn-off scientific notation like 1e+48
100
+ R.theme_set(R.theme_bw) # pre-set the bw theme.
101
+
102
+ midwest = ~:midwest
103
+ # midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source
104
+
105
+ # R.awt # run the awt device if the plot should show on the screen
106
+ R.svg # run the svg device if an image should be generated
107
+
108
+ # Scatterplot
109
+ gg = midwest.ggplot(E.aes(x: :area, y: :poptotal)) +
110
+ R.geom_point(E.aes(col: :state, size: :popdensity)) +
111
+ R.geom_smooth(method: "loess", se: false) +
112
+ R.xlim(R.c(0, 0.1)) +
113
+ R.ylim(R.c(0, 500000)) +
114
+ R.labs(subtitle: "Area Vs Population",
115
+ y: "Population",
116
+ x: "Area",
117
+ title: "Scatterplot",
118
+ caption: "Source: midwest")
119
+
120
+ R.png('midwest.png') # this line is not necessary with the awt device
121
+ puts gg
122
+
123
+ R.dev__off # R.dev__off turns off the device. If using awt, the plot
124
+ # window will be closed
125
+ ```
126
+
127
+ <img src="https://user-images.githubusercontent.com/3999729/46742999-87bc2480-cc7e-11e8-9f16-31c3437e4a58.PNG" alt="Midwest Plot" style="width:70.0%" />
128
+
129
+ In R, the code to generate this plot is the following
130
+
131
+ ``` r
132
+ # install.packages("ggplot2")
133
+ # load package and data
134
+ options(scipen=999) # turn-off scientific notation like 1e+48
135
+ library(ggplot2)
136
+ theme_set(theme_bw()) # pre-set the bw theme.
137
+ data("midwest", package = "ggplot2")
138
+ # midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source
139
+
140
+ # Scatterplot
141
+ gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
142
+ geom_point(aes(col=state, size=popdensity)) +
143
+ geom_smooth(method="loess", se=F) +
144
+ xlim(c(0, 0.1)) +
145
+ ylim(c(0, 500000)) +
146
+ labs(subtitle="Area Vs Population",
147
+ y="Population",
148
+ x="Area",
149
+ title="Scatterplot",
150
+ caption = "Source: midwest")
151
+
152
+ plot(gg)
153
+ ```
154
+
155
+ Note that both codes are very similar. The Ruby code requires the use of "R." before calling any functions, for instance R function 'geom\_point' becomes 'R.geom\_point' in Ruby. R named parameters such as (col = state, size = popdensity), become in Ruby (col: :state, size: :popdensity).
156
+
157
+ One last point that needs to be observed is the call to the 'aes' function. In Ruby instead of doing 'R.aes', we use 'E.aes'. The explanation of why E.aes is needed is an advanced topic in R and depends on what is know as Non-standard Evaluation (NSE) in R. In short, function 'aes' is lazily evaluated in R, i.e., in R when calling geom\_point(aes(col=state, size=popdensity)), function geom\_point receives as argument something similar to a string containing 'aes(col=state, size=popdensity)', and the aes function will be evaluated inside the geom\_point function. In Ruby, there is no Lazy evaluation and doing R.aes would try to evaluate aes immediately. In order to delay the evaluation of function aes we need to use E.aes. The interested reader on NSE in R is directed to <http://adv-r.had.co.nz/Computing-on-the-language.html>.
158
+
159
+ An extension to the example
160
+ ===========================
161
+
162
+ If both codes are so similar, then why would one use Ruby instead of R and what good is galaaz after all?
163
+
164
+ Ruby is a modern OO language with numerous very useful constructs such as classes, modules, blocks, procs, etc. The example above focus on the coupling of both languages, and does not show the use of other Ruby constructs. In the following example, we will show a more complex example using other Ruby constructs. This is certaily not a very well written and robust Ruby code, but it give the idea of how Ruby and R are stronglyg coupled.
165
+
166
+ Let's imagine that we work in a corporation that has its plot themes. So, it has defined a 'CorpTheme' module. Plots in this corporation should not have grids, numbers in labels should not use scientific notation and the prefered color is blue.
167
+
168
+ ``` truby
169
+ # corp_theme.rb
170
+ # defines the corporate theme for all plots
171
+
172
+ module CorpTheme
173
+
174
+ #--------------------------------------------------------------------------------------
175
+ # Defines the plot theme (visualization). In this theme we remove major and minor
176
+ # grids, borders and background. We also turn-off scientific notation.
177
+ #--------------------------------------------------------------------------------------
178
+
179
+ def self.global_theme
180
+
181
+ R.options(scipen: 999) # turn-off scientific notation like 1e+48
182
+
183
+ # remove major grids
184
+ global_theme = R.theme(panel__grid__major: E.element_blank())
185
+ # remove minor grids
186
+ global_theme = global_theme + R.theme(panel__grid__minor: E.element_blank)
187
+ # remove border
188
+ global_theme = global_theme + R.theme(panel__border: E.element_blank)
189
+ # remove background
190
+ global_theme = global_theme + R.theme(panel__background: E.element_blank)
191
+ # Change axis font
192
+ global_theme = global_theme +
193
+ R.theme(axis__text: E.element_text(size: 8, color: "#000080"))
194
+ # change color of axis titles
195
+ global_theme = global_theme +
196
+ R.theme(axis__title: E.element_text(
197
+ color: "#000080",
198
+ face: "bold",
199
+ size: 8,
200
+ hjust: 1))
201
+ end
202
+
203
+ end
204
+ ```
205
+
206
+ We now define a ScatterPlot class:
207
+
208
+ ``` truby
209
+ # ScatterPlot.rb
210
+ # creates a scatter plot and allow some configuration
211
+
212
+ class ScatterPlot
213
+
214
+ attr_accessor :title
215
+ attr_accessor :subtitle
216
+ attr_accessor :caption
217
+ attr_accessor :x_label
218
+ attr_accessor :y_label
219
+
220
+ #--------------------------------------------------------------------------------------
221
+ # Initialize the plot with the data and the x and y variables
222
+ #--------------------------------------------------------------------------------------
223
+
224
+ def initialize(data, x:, y:)
225
+ @data = data
226
+ @x = x
227
+ @y = y
228
+ end
229
+
230
+ #--------------------------------------------------------------------------------------
231
+ # Define groupings by color and size
232
+ #--------------------------------------------------------------------------------------
233
+
234
+ def group_by(color:, size:)
235
+ @color_by = color
236
+ @size_by = size
237
+ end
238
+
239
+ #--------------------------------------------------------------------------------------
240
+ # Add a smoothing line, and if confidence is true the add a confidence interval, if
241
+ # false does not add the confidence interval
242
+ #--------------------------------------------------------------------------------------
243
+
244
+ def add_smoothing_line(method:, confidence: true)
245
+ @method = method
246
+ @confidence = confidence
247
+ end
248
+
249
+ #--------------------------------------------------------------------------------------
250
+ # Creates the graph title, properly formated for this theme
251
+ # @param title [String] The title to add to the graph
252
+ # @return textGrob that can be included in a graph
253
+ #--------------------------------------------------------------------------------------
254
+
255
+ def graph_params(title: "", subtitle: "", caption: "", x_label: "", y_label: "")
256
+ R.labs(
257
+ title: title,
258
+ subtitle: subtitle,
259
+ caption: caption,
260
+ y_label: y_label,
261
+ x_label: x_label,
262
+ )
263
+ end
264
+
265
+ #--------------------------------------------------------------------------------------
266
+ # Prepare the plot's points
267
+ #--------------------------------------------------------------------------------------
268
+
269
+ def points
270
+ params = {}
271
+ params[:col] = @color_by if @color_by
272
+ params[:size] = @size_by if @size_by
273
+ R.geom_point(E.aes(params))
274
+ end
275
+
276
+ #--------------------------------------------------------------------------------------
277
+ # Plots the scatterplot
278
+ #--------------------------------------------------------------------------------------
279
+
280
+ def plot(device = 'awt')
281
+ device == 'awt' ? R.awt : R.svg
282
+
283
+ gg = @data.ggplot(E.aes(x: @x, y: @y)) +
284
+ points +
285
+ R.geom_smooth(method: @method, se: @confidence) +
286
+ R.xlim(R.c(0, 0.1)) +
287
+ R.ylim(R.c(0, 500000)) +
288
+ graph_params(title: @title,
289
+ subtitle: @subtitle,
290
+ y_label: @y_label,
291
+ x_label: @x_label,
292
+ caption: @caption) +
293
+ CorpTheme.global_theme
294
+
295
+ R.png('scatter_plot.png') if !(device == 'awt')
296
+ puts gg
297
+ R.dev__off
298
+
299
+ end
300
+
301
+ end
302
+ ```
303
+
304
+ And this is the final code for making the scatter plot with the midwest data
305
+
306
+ ``` truby
307
+ require 'galaaz'
308
+ require 'ggplot'
309
+
310
+ sp = ScatterPlot.new(~:midwest, x: :area, y: :poptotal)
311
+ sp.title = "Midwest Dataset - Scatterplot"
312
+ sp.subtitle = "Area Vs Population"
313
+ sp.caption = "Source: midwest"
314
+ sp.x_label = "Area"
315
+ sp.y_label = "Population"
316
+ sp.group_by(color: :state, size: :popdensity) # try sp.group_by(color: :state)
317
+ # available methods: "lm", "glm", "loess", "gam"
318
+ sp.add_smoothing_line(method: "glm")
319
+ sp.plot('svg')
320
+
321
+ # require input from the user so that the script does not end removing the plot from
322
+ # the screen
323
+ ```
324
+
325
+ ![Midwest Plot with 'glm' function and modified theme](https://user-images.githubusercontent.com/3999729/47120345-a903ae80-d244-11e8-9be3-a0db13cf51ab.PNG)
data/Rakefile CHANGED
@@ -78,6 +78,7 @@ geoms = FileList['examples/sthda_ggplot/**/*.rb']
78
78
  specs = FileList['specs/**/*.rb']
79
79
  master_list = FileList['examples/50Plots_MasterList/**/*.rb']
80
80
  islr = FileList['examples/islr/**/*.rb']
81
+ misc = FileList['examples/misc/**/*.rb']
81
82
 
82
83
  #===========================================================================================
83
84
  # Creates tasks for all specs.
@@ -131,6 +132,19 @@ islr.each do |f|
131
132
  Desc
132
133
  end
133
134
 
135
+ #===========================================================================================
136
+ # Creates tasks for misc examples
137
+ # Running 'rake misc:all' will run all specs
138
+ #===========================================================================================
139
+
140
+ misc.each do |f|
141
+ task_name = File.basename(f, ".rb")
142
+ dir_name = File.dirname(f)
143
+ MakeTask.new("misc", dir_name, task_name, false, <<-Desc)
144
+ Executes misc #{task_name}
145
+ Desc
146
+ end
147
+
134
148
  task :default => "sthda:all"
135
149
 
136
150
 
data/bin/galaaz CHANGED
@@ -1,8 +1,5 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- # Dir.chdir(File.expand_path('..', Dir.pwd))
4
3
  Dir.chdir(File.dirname(File.expand_path('..', __FILE__)))
5
4
 
6
- puts Dir.pwd
7
-
8
5
  system ("rake #{ARGV[0]}")
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ dir = Dir.pwd
4
+ puts "Knitting #{dir}/#{ARGV[0]}"
5
+
6
+ req_gal = "require '../lib/galaaz'"
7
+
8
+ code = <<EOF
9
+ library('knitr');
10
+ library('rmarkdown');
11
+
12
+ # define the galaaz engine for processing Ruby chunks in
13
+ # rmarkdown
14
+ eng_galaaz = function(options) {
15
+ block_code = paste(options$code, collapse = "\\n");
16
+ code = paste0("GalaazUtil.exec_ruby(", shQuote(block_code), ")");
17
+ out = eval.polyglot("ruby", code);
18
+ engine_output(options, block_code, out)
19
+ }
20
+
21
+ knit_engines$set(galaaz = eng_galaaz)
22
+
23
+ eval.polyglot("ruby", "require %q(galaaz)")
24
+
25
+ rmarkdown::render("#{dir}/#{ARGV[0]}")
26
+ EOF
27
+
28
+ exec "Rscript --jvm --polyglot -e '#{code}'"
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ Dir.chdir(File.dirname(File.expand_path('.', __FILE__)))
4
+ puts "Loading Pry and Galaaz... Please wait!"
5
+
6
+ exec "ruby --polyglot --jvm -Xsingle_threaded -S pry -r ./gstudio.rb"
@@ -0,0 +1,6 @@
1
+ require 'galaaz'
2
+ require 'ggplot'
3
+
4
+ Pry.config.editor = ENV['VISUAL']
5
+
6
+ Pry.config.prompt_name = 'galaaz'