galaaz 0.4.8 → 0.4.9

Sign up to get free protection for your applications and to get access to all the features.
Binary file
File without changes
@@ -0,0 +1,27 @@
1
+ @book{Wilkinson:grammar_of_graphics,
2
+ author = {Wilkinson, Leland},
3
+ title = {The Grammar of Graphics (Statistics and Computing)},
4
+ year = {2005},
5
+ isbn = {0387245448},
6
+ publisher = {Springer-Verlag},
7
+ address = {Berlin, Heidelberg},
8
+ }
9
+
10
+ @article{Knuth:literate_programming,
11
+ author = {Knuth, Donald E.},
12
+ title = {Literate Programming},
13
+ journal = {Comput. J.},
14
+ issue_date = {May 1984},
15
+ volume = {27},
16
+ number = {2},
17
+ month = may,
18
+ year = {1984},
19
+ issn = {0010-4620},
20
+ pages = {97--111},
21
+ numpages = {15},
22
+ url = {http://dx.doi.org/10.1093/comjnl/27.2.97},
23
+ doi = {10.1093/comjnl/27.2.97},
24
+ acmid = {479},
25
+ publisher = {Oxford University Press},
26
+ address = {Oxford, UK},
27
+ }
@@ -693,6 +693,93 @@ puts @flights_sm.head.as__data__frame
693
693
  puts @flights_sm.head.as__data__frame
694
694
  ```
695
695
 
696
+ ## Summarising data
697
+
698
+ Function 'summarise' calculates summaries for the data frame. When no 'group_by' is used
699
+ a single value is obtained from the data frame:
700
+
701
+ ```{ruby summarise}
702
+ puts @flights.summarise(delay: E.mean(:dep_delay, na__rm: true)).as__data__frame
703
+ ```
704
+
705
+ When a data frame is groupe with 'group_by' summaries apply to the given group:
706
+
707
+ ```{ruby summarise_group_by}
708
+ by_day = @flights.group_by(:year, :month, :day)
709
+ puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame
710
+ ```
711
+
712
+ Next we put many operations together by pipping them one after the other:
713
+
714
+ ```{ruby pipping}
715
+ delays = @flights.
716
+ group_by(:dest).
717
+ summarise(
718
+ count: E.n,
719
+ dist: :distance.mean(na__rm: true),
720
+ delay: :arr_delay.mean(na__rm: true)).
721
+ filter(:count > 20, :dest != "NHL")
722
+
723
+ puts delays.as__data__frame
724
+ ```
725
+
726
+ # Using Data Table
727
+
728
+ ```{ruby fread}
729
+ R.library('data.table')
730
+ R.install_and_loads('curl')
731
+
732
+ input = "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
733
+ @flights = R.fread(input)
734
+ puts @flights
735
+ puts @flights.dim
736
+ ```
737
+
738
+ ```{ruby data_table}
739
+
740
+ data_table = R.data__table(
741
+ ID: R.c("b","b","b","a","a","c"),
742
+ a: (1..6),
743
+ b: (7..12),
744
+ c: (13..18)
745
+ )
746
+
747
+ puts data_table
748
+ puts data_table.ID
749
+ ```
750
+
751
+ ```{ruby subset_i}
752
+ # subset rows in i
753
+ ans = @flights[(:origin.eq "JFK") & (:month.eq 6)]
754
+ puts ans.head
755
+
756
+ # Get the first two rows from flights.
757
+
758
+ ans = @flights[(1..2)]
759
+ puts ans
760
+
761
+ # Sort flights first by column origin in ascending order, and then by dest in descending order:
762
+
763
+ # ans = @flights[E.order(:origin, -(:dest))]
764
+ # puts ans.head
765
+
766
+ ```
767
+
768
+ ```{ruby select_j}
769
+ # Select column(s) in j
770
+ # select arr_delay column, but return it as a vector.
771
+
772
+ ans = @flights[:all, :arr_delay]
773
+ puts ans.head
774
+
775
+ # Select arr_delay column, but return as a data.table instead.
776
+
777
+ ans = @flights[:all, :arr_delay.list]
778
+ puts ans.head
779
+
780
+ ans = @flights[:all, E.list(:arr_delay, :dep_delay)]
781
+ ```
782
+
696
783
  # Graphics in Galaaz
697
784
 
698
785
  Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
@@ -731,8 +818,33 @@ the data frame with the necessary data:
731
818
  puts @mtcars
732
819
  ```
733
820
  Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
734
- 'R.awt' to create a plotting device, since gKnit does take care of it:
735
-
821
+ 'R.awt' to create a plotting device, since gKnit does take care of it. Galaaz
822
+ provides integration with ggplot. The interested reader should check online for more
823
+ information on ggplot, since it is outside the scope of this manual describing
824
+ how ggplot works. We give here but a brief description on how this plot is generated.
825
+
826
+ ggplot implements the 'grammar of graphics'. In this approach, plots are build by
827
+ adding layers to the plot. On the first layer we describe what we want on the 'x'
828
+ and 'y' axis of the plot. In this case, we have 'car_name' on the 'x' axis and
829
+ 'mpg\_z' on the 'y' axis. Then the type of graph is specified by adding
830
+ 'geom\_bar' (for a bar graph). We specify that our bars should be filled using
831
+ 'mpg\_type', which is either 'above' or 'bellow' giving then two colours for
832
+ filling. On the next layer we specify the labels for the graph, then we add the
833
+ title and subtitle. Finally, in a bar chart usually bars go on the vertical direction,
834
+ but in this graph we want the bars to be horizontally layed so we add 'coord\_flip'.
835
+
836
+ ```{ruby diverging_bar, fig.width = 9.1, fig.height = 6.5}
837
+ require 'ggplot'
838
+
839
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
840
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
841
+ R.scale_fill_manual(name: 'Mileage',
842
+ labels: R.c('Above Average', 'Below Average'),
843
+ values: R.c('above': '#00ba38', 'below': '#f8766d')) +
844
+ R.labs(subtitle: "Normalised mileage from 'mtcars'",
845
+ title: "Diverging Bars") +
846
+ R.coord_flip
847
+ ```
736
848
 
737
849
 
738
850
  [TO BE CONTINUED...]
@@ -552,22 +552,22 @@ double
552
552
  ## (eval):1:in `exec_ruby'
553
553
  ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `instance_eval'
554
554
  ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `exec_ruby'
555
- ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:657:in `block in initialize'
555
+ ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
556
556
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
557
557
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
558
558
  ## (eval):3:in `function(...) {\n rb_method(...)'
559
559
  ## unknown.r:1:in `in_dir'
560
560
  ## unknown.r:1:in `block_exec:BLOCK0'
561
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
562
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
563
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
564
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
561
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
562
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
563
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
564
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
565
565
  ## unknown.r:1:in `withCallingHandlers'
566
566
  ## unknown.r:1:in `process_file'
567
567
  ## unknown.r:1:in `<no source>:BLOCK1'
568
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
568
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
569
569
  ## unknown.r:1:in `<no source>:BLOCK1'
570
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
570
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
571
571
  ## <REPL>:5:in `<repl wrapper>'
572
572
  ## &lt;REPL&gt;:1</code></pre>
573
573
  <p>Here is a vector with logical values</p>
@@ -2385,6 +2385,237 @@ puts @flights_sm.head.as__data__frame</code></pre>
2385
2385
  ## 5 2013 1 1 -6 -25 762 116 19 394.1379
2386
2386
  ## 6 2013 1 1 -4 12 719 150 -16 287.6000</code></pre>
2387
2387
  </div>
2388
+ <div id="summarising-data" class="section level2">
2389
+ <h2>Summarising data</h2>
2390
+ <p>Function ‘summarise’ calculates summaries for the data frame. When no ‘group_by’ is used a single value is obtained from the data frame:</p>
2391
+ <pre class="ruby"><code>puts @flights.summarise(delay: E.mean(:dep_delay, na__rm: true)).as__data__frame</code></pre>
2392
+ <pre><code>## delay
2393
+ ## 1 12.63907</code></pre>
2394
+ <p>When a data frame is groupe with ‘group_by’ summaries apply to the given group:</p>
2395
+ <pre class="ruby"><code>by_day = @flights.group_by(:year, :month, :day)
2396
+ puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame</code></pre>
2397
+ <pre><code>## year month day delay
2398
+ ## 1 2013 1 1 11.548926
2399
+ ## 2 2013 1 2 13.858824
2400
+ ## 3 2013 1 3 10.987832
2401
+ ## 4 2013 1 4 8.951595
2402
+ ## 5 2013 1 5 5.732218
2403
+ ## 6 2013 1 6 7.148014</code></pre>
2404
+ <p>Next we put many operations together by pipping them one after the other:</p>
2405
+ <pre class="ruby"><code>delays = @flights.
2406
+ group_by(:dest).
2407
+ summarise(
2408
+ count: E.n,
2409
+ dist: :distance.mean(na__rm: true),
2410
+ delay: :arr_delay.mean(na__rm: true)).
2411
+ filter(:count &gt; 20, :dest != &quot;NHL&quot;)
2412
+
2413
+ puts delays.as__data__frame</code></pre>
2414
+ <pre><code>## dest count dist delay
2415
+ ## 1 ABQ 254 1826.00000 4.38188976
2416
+ ## 2 ACK 265 199.00000 4.85227273
2417
+ ## 3 ALB 439 143.00000 14.39712919
2418
+ ## 4 ATL 17215 757.10822 11.30011285
2419
+ ## 5 AUS 2439 1514.25297 6.01990875
2420
+ ## 6 AVL 275 583.58182 8.00383142
2421
+ ## 7 BDL 443 116.00000 7.04854369
2422
+ ## 8 BGR 375 378.00000 8.02793296
2423
+ ## 9 BHM 297 865.99663 16.87732342
2424
+ ## 10 BNA 6333 758.21348 11.81245891
2425
+ ## 11 BOS 15508 190.63696 2.91439222
2426
+ ## 12 BQN 896 1578.98326 8.24549550
2427
+ ## 13 BTV 2589 265.09154 8.95099602
2428
+ ## 14 BUF 4681 296.80837 8.94595186
2429
+ ## 15 BUR 371 2465.00000 8.17567568
2430
+ ## 16 BWI 1781 179.41830 10.72673385
2431
+ ## 17 BZN 36 1882.00000 7.60000000
2432
+ ## 18 CAE 116 603.55172 41.76415094
2433
+ ## 19 CAK 864 397.00000 19.69833729
2434
+ ## 20 CHO 52 305.00000 9.50000000
2435
+ ## 21 CHS 2884 632.91678 10.59296847
2436
+ ## 22 CLE 4573 414.17428 9.18161129
2437
+ ## 23 CLT 14064 538.02730 7.36031885
2438
+ ## 24 CMH 3524 476.55505 10.60132291
2439
+ ## 25 CRW 138 444.00000 14.67164179
2440
+ ## 26 CVG 3941 575.15986 15.36456376
2441
+ ## 27 DAY 1525 537.10230 12.68048606
2442
+ ## 28 DCA 9705 211.00618 9.06695204
2443
+ ## 29 DEN 7266 1614.67836 8.60650021
2444
+ ## 30 DFW 8738 1383.04303 0.32212685
2445
+ ## 31 DSM 569 1020.88752 19.00573614
2446
+ ## 32 DTW 9384 498.12852 5.42996346
2447
+ ## 33 EGE 213 1735.70892 6.30434783
2448
+ ## 34 FLL 12055 1070.06877 8.08212154
2449
+ ## 35 GRR 765 605.78170 18.18956044
2450
+ ## 36 GSO 1606 449.84184 14.11260054
2451
+ ## 37 GSP 849 595.95995 15.93544304
2452
+ ## 38 HNL 707 4972.67468 -1.36519258
2453
+ ## 39 HOU 2115 1420.15508 7.17618819
2454
+ ## 40 IAD 5700 224.84684 13.86420212
2455
+ ## 41 IAH 7198 1407.20672 4.24079040
2456
+ ## 42 ILM 110 500.00000 4.63551402
2457
+ ## 43 IND 2077 652.26288 9.94043412
2458
+ ## 44 JAC 25 1875.60000 28.09523810
2459
+ ## 45 JAX 2720 824.67610 11.84483416
2460
+ ## 46 LAS 5997 2240.96148 0.25772849
2461
+ ## 47 LAX 16174 2468.62236 0.54711094
2462
+ ## 48 LGB 668 2465.00000 -0.06202723
2463
+ ## 49 MCI 2008 1097.69522 14.51405836
2464
+ ## 50 MCO 14082 943.11057 5.45464309
2465
+ ## 51 MDW 4113 718.04595 12.36422360
2466
+ ## 52 MEM 1789 954.20123 10.64531435
2467
+ ## 53 MHT 1009 207.02973 14.78755365
2468
+ ## 54 MIA 11728 1091.55244 0.29905978
2469
+ ## 55 MKE 2802 733.38151 14.16722038
2470
+ ## 56 MSN 572 803.95455 20.19604317
2471
+ ## 57 MSP 7185 1017.40167 7.27016886
2472
+ ## 58 MSY 3799 1177.70571 6.49017497
2473
+ ## 59 MVY 221 173.00000 -0.28571429
2474
+ ## 60 MYR 59 550.66102 4.60344828
2475
+ ## 61 OAK 312 2576.00000 3.07766990
2476
+ ## 62 OKC 346 1325.00000 30.61904762
2477
+ ## 63 OMA 849 1135.56655 14.69889841
2478
+ ## 64 ORD 17283 729.00081 5.87661475
2479
+ ## 65 ORF 1536 288.52344 10.94909344
2480
+ ## 66 PBI 6554 1028.83811 8.56297210
2481
+ ## 67 PDX 1354 2445.56573 5.14157973
2482
+ ## 68 PHL 1632 94.32353 10.12719014
2483
+ ## 69 PHX 4656 2141.30326 2.09704733
2484
+ ## 70 PIT 2875 334.06122 7.68099053
2485
+ ## 71 PSE 365 1617.00000 7.87150838
2486
+ ## 72 PVD 376 160.00000 16.23463687
2487
+ ## 73 PWM 2352 276.12840 11.66040210
2488
+ ## 74 RDU 8163 426.75769 10.05238095
2489
+ ## 75 RIC 2454 281.40465 20.11125320
2490
+ ## 76 ROC 2416 259.25083 11.56064461
2491
+ ## 77 RSW 3537 1072.85327 3.23814963
2492
+ ## 78 SAN 2737 2437.29923 3.13916574
2493
+ ## 79 SAT 686 1578.34111 6.94537178
2494
+ ## 80 SAV 804 709.18408 15.12950601
2495
+ ## 81 SDF 1157 645.98358 12.66938406
2496
+ ## 82 SEA 3923 2412.66531 -1.09909910
2497
+ ## 83 SFO 13331 2577.92356 2.67289152
2498
+ ## 84 SJC 329 2569.00000 3.44817073
2499
+ ## 85 SJU 5819 1599.83365 2.52052659
2500
+ ## 86 SLC 2467 1986.98662 0.17625459
2501
+ ## 87 SMF 284 2521.00000 12.10992908
2502
+ ## 88 SNA 825 2434.00000 -7.86822660
2503
+ ## 89 SRQ 1211 1044.65153 3.08243131
2504
+ ## 90 STL 4339 878.72321 11.07846451
2505
+ ## 91 STT 522 1626.98276 -3.83590734
2506
+ ## 92 SYR 1761 205.92164 8.90392501
2507
+ ## 93 TPA 7466 1003.93557 7.40852503
2508
+ ## 94 TUL 315 1215.00000 33.65986395
2509
+ ## 95 TVC 101 652.38614 12.96842105
2510
+ ## 96 TYS 631 638.80983 24.06920415
2511
+ ## 97 XNA 1036 1142.50579 7.46572581</code></pre>
2512
+ </div>
2513
+ </div>
2514
+ <div id="using-data-table" class="section level1">
2515
+ <h1>Using Data Table</h1>
2516
+ <pre class="ruby"><code>R.library('data.table')
2517
+ R.install_and_loads('curl')
2518
+
2519
+ input = &quot;https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv&quot;
2520
+ @flights = R.fread(input)
2521
+ puts @flights
2522
+ puts @flights.dim</code></pre>
2523
+ <pre><code>## year month day dep_delay arr_delay carrier origin dest air_time
2524
+ ## 1: 2014 1 1 14 13 AA JFK LAX 359
2525
+ ## 2: 2014 1 1 -3 13 AA JFK LAX 363
2526
+ ## 3: 2014 1 1 2 9 AA JFK LAX 351
2527
+ ## 4: 2014 1 1 -8 -26 AA LGA PBI 157
2528
+ ## 5: 2014 1 1 2 1 AA JFK LAX 350
2529
+ ## ---
2530
+ ## 253312: 2014 10 31 1 -30 UA LGA IAH 201
2531
+ ## 253313: 2014 10 31 -5 -14 UA EWR IAH 189
2532
+ ## 253314: 2014 10 31 -8 16 MQ LGA RDU 83
2533
+ ## 253315: 2014 10 31 -4 15 MQ LGA DTW 75
2534
+ ## 253316: 2014 10 31 -5 1 MQ LGA SDF 110
2535
+ ## distance hour
2536
+ ## 1: 2475 9
2537
+ ## 2: 2475 11
2538
+ ## 3: 2475 19
2539
+ ## 4: 1035 7
2540
+ ## 5: 2475 13
2541
+ ## ---
2542
+ ## 253312: 1416 14
2543
+ ## 253313: 1400 8
2544
+ ## 253314: 431 11
2545
+ ## 253315: 502 11
2546
+ ## 253316: 659 8
2547
+ ## [1] 253316 11</code></pre>
2548
+ <pre class="ruby"><code>
2549
+ data_table = R.data__table(
2550
+ ID: R.c(&quot;b&quot;,&quot;b&quot;,&quot;b&quot;,&quot;a&quot;,&quot;a&quot;,&quot;c&quot;),
2551
+ a: (1..6),
2552
+ b: (7..12),
2553
+ c: (13..18)
2554
+ )
2555
+
2556
+ puts data_table
2557
+ puts data_table.ID</code></pre>
2558
+ <pre><code>## ID a b c
2559
+ ## 1: b 1 7 13
2560
+ ## 2: b 2 8 14
2561
+ ## 3: b 3 9 15
2562
+ ## 4: a 4 10 16
2563
+ ## 5: a 5 11 17
2564
+ ## 6: c 6 12 18
2565
+ ## [1] &quot;b&quot; &quot;b&quot; &quot;b&quot; &quot;a&quot; &quot;a&quot; &quot;c&quot;</code></pre>
2566
+ <pre class="ruby"><code># subset rows in i
2567
+ ans = @flights[(:origin.eq &quot;JFK&quot;) &amp; (:month.eq 6)]
2568
+ puts ans.head
2569
+
2570
+ # Get the first two rows from flights.
2571
+
2572
+ ans = @flights[(1..2)]
2573
+ puts ans
2574
+
2575
+ # Sort flights first by column origin in ascending order, and then by dest in descending order:
2576
+
2577
+ # ans = @flights[E.order(:origin, -(:dest))]
2578
+ # puts ans.head</code></pre>
2579
+ <pre><code>## year month day dep_delay arr_delay carrier origin dest air_time
2580
+ ## 1: 2014 6 1 -9 -5 AA JFK LAX 324
2581
+ ## 2: 2014 6 1 -10 -13 AA JFK LAX 329
2582
+ ## 3: 2014 6 1 18 -1 AA JFK LAX 326
2583
+ ## 4: 2014 6 1 -6 -16 AA JFK LAX 320
2584
+ ## 5: 2014 6 1 -4 -45 AA JFK LAX 326
2585
+ ## 6: 2014 6 1 -6 -23 AA JFK LAX 329
2586
+ ## distance hour
2587
+ ## 1: 2475 8
2588
+ ## 2: 2475 12
2589
+ ## 3: 2475 7
2590
+ ## 4: 2475 10
2591
+ ## 5: 2475 18
2592
+ ## 6: 2475 14
2593
+ ## year month day dep_delay arr_delay carrier origin dest air_time
2594
+ ## 1: 2014 1 1 14 13 AA JFK LAX 359
2595
+ ## 2: 2014 1 1 -3 13 AA JFK LAX 363
2596
+ ## distance hour
2597
+ ## 1: 2475 9
2598
+ ## 2: 2475 11</code></pre>
2599
+ <pre class="ruby"><code># Select column(s) in j
2600
+ # select arr_delay column, but return it as a vector.
2601
+
2602
+ ans = @flights[:all, :arr_delay]
2603
+ puts ans.head
2604
+
2605
+ # Select arr_delay column, but return as a data.table instead.
2606
+
2607
+ ans = @flights[:all, :arr_delay.list]
2608
+ puts ans.head
2609
+
2610
+ ans = @flights[:all, E.list(:arr_delay, :dep_delay)]</code></pre>
2611
+ <pre><code>## [1] 13 13 9 -26 1 0
2612
+ ## arr_delay
2613
+ ## 1: 13
2614
+ ## 2: 13
2615
+ ## 3: 9
2616
+ ## 4: -26
2617
+ ## 5: 1
2618
+ ## 6: 0</code></pre>
2388
2619
  </div>
2389
2620
  <div id="graphics-in-galaaz" class="section level1">
2390
2621
  <h1>Graphics in Galaaz</h1>
@@ -2482,7 +2713,19 @@ puts @mtcars</code></pre>
2482
2713
  ## Lotus Europa Lotus Europa 1.71 above
2483
2714
  ## Fiat 128 Fiat 128 2.04 above
2484
2715
  ## Toyota Corolla Toyota Corolla 2.29 above</code></pre>
2485
- <p>Now, lets plot the diverging bar plot. When using gKnit, there is no need to call ‘R.awt’ to create a plotting device, since gKnit does take care of it:</p>
2716
+ <p>Now, lets plot the diverging bar plot. When using gKnit, there is no need to call ‘R.awt’ to create a plotting device, since gKnit does take care of it. Galaaz provides integration with ggplot. The interested reader should check online for more information on ggplot, since it is outside the scope of this manual describing how ggplot works. We give here but a brief description on how this plot is generated.</p>
2717
+ <p>ggplot implements the ‘grammar of graphics’. In this approach, plots are build by adding layers to the plot. On the first layer we describe what we want on the ‘x’ and ‘y’ axis of the plot. In this case, we have ‘car_name’ on the ‘x’ axis and ‘mpg_z’ on the ‘y’ axis. Then the type of graph is specified by adding ‘geom_bar’ (for a bar graph). We specify that our bars should be filled using ‘mpg_type’, which is either ‘above’ or ‘bellow’ giving then two colours for filling. On the next layer we specify the labels for the graph, then we add the title and subtitle. Finally, in a bar chart usually bars go on the vertical direction, but in this graph we want the bars to be horizontally layed so we add ‘coord_flip’.</p>
2718
+ <pre class="ruby"><code>require 'ggplot'
2719
+
2720
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
2721
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
2722
+ R.scale_fill_manual(name: 'Mileage',
2723
+ labels: R.c('Above Average', 'Below Average'),
2724
+ values: R.c('above': '#00ba38', 'below': '#f8766d')) +
2725
+ R.labs(subtitle: &quot;Normalised mileage from 'mtcars'&quot;,
2726
+ title: &quot;Diverging Bars&quot;) +
2727
+ R.coord_flip</code></pre>
2728
+ <p><img src="" /><!-- --></p>
2486
2729
  <p>[TO BE CONTINUED…]</p>
2487
2730
  </div>
2488
2731
  <div id="contributing" class="section level1">
@@ -221,22 +221,22 @@ vec = R.c(1, hello, 5)
221
221
  ## (eval):1:in `exec_ruby'
222
222
  ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `instance_eval'
223
223
  ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `exec_ruby'
224
- ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:657:in `block in initialize'
224
+ ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
225
225
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
226
226
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
227
227
  ## (eval):3:in `function(...) {\n rb_method(...)'
228
228
  ## unknown.r:1:in `in_dir'
229
229
  ## unknown.r:1:in `block_exec:BLOCK0'
230
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
231
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
232
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
233
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
230
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
231
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
232
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
233
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
234
234
  ## unknown.r:1:in `withCallingHandlers'
235
235
  ## unknown.r:1:in `process_file'
236
236
  ## unknown.r:1:in `<no source>:BLOCK1'
237
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
237
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
238
238
  ## unknown.r:1:in `<no source>:BLOCK1'
239
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
239
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc16/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
240
240
  ## <REPL>:5:in `<repl wrapper>'
241
241
  ## <REPL>:1
242
242
  ```
@@ -1764,6 +1764,288 @@ puts @flights_sm.head.as__data__frame
1764
1764
  ## 6 2013 1 1 -4 12 719 150 -16 287.6000
1765
1765
  ```
1766
1766
 
1767
+ ## Summarising data
1768
+
1769
+ Function 'summarise' calculates summaries for the data frame. When no 'group_by' is used
1770
+ a single value is obtained from the data frame:
1771
+
1772
+
1773
+ ```ruby
1774
+ puts @flights.summarise(delay: E.mean(:dep_delay, na__rm: true)).as__data__frame
1775
+ ```
1776
+
1777
+ ```
1778
+ ## delay
1779
+ ## 1 12.63907
1780
+ ```
1781
+
1782
+ When a data frame is groupe with 'group_by' summaries apply to the given group:
1783
+
1784
+
1785
+ ```ruby
1786
+ by_day = @flights.group_by(:year, :month, :day)
1787
+ puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame
1788
+ ```
1789
+
1790
+ ```
1791
+ ## year month day delay
1792
+ ## 1 2013 1 1 11.548926
1793
+ ## 2 2013 1 2 13.858824
1794
+ ## 3 2013 1 3 10.987832
1795
+ ## 4 2013 1 4 8.951595
1796
+ ## 5 2013 1 5 5.732218
1797
+ ## 6 2013 1 6 7.148014
1798
+ ```
1799
+
1800
+ Next we put many operations together by pipping them one after the other:
1801
+
1802
+
1803
+ ```ruby
1804
+ delays = @flights.
1805
+ group_by(:dest).
1806
+ summarise(
1807
+ count: E.n,
1808
+ dist: :distance.mean(na__rm: true),
1809
+ delay: :arr_delay.mean(na__rm: true)).
1810
+ filter(:count > 20, :dest != "NHL")
1811
+
1812
+ puts delays.as__data__frame
1813
+ ```
1814
+
1815
+ ```
1816
+ ## dest count dist delay
1817
+ ## 1 ABQ 254 1826.00000 4.38188976
1818
+ ## 2 ACK 265 199.00000 4.85227273
1819
+ ## 3 ALB 439 143.00000 14.39712919
1820
+ ## 4 ATL 17215 757.10822 11.30011285
1821
+ ## 5 AUS 2439 1514.25297 6.01990875
1822
+ ## 6 AVL 275 583.58182 8.00383142
1823
+ ## 7 BDL 443 116.00000 7.04854369
1824
+ ## 8 BGR 375 378.00000 8.02793296
1825
+ ## 9 BHM 297 865.99663 16.87732342
1826
+ ## 10 BNA 6333 758.21348 11.81245891
1827
+ ## 11 BOS 15508 190.63696 2.91439222
1828
+ ## 12 BQN 896 1578.98326 8.24549550
1829
+ ## 13 BTV 2589 265.09154 8.95099602
1830
+ ## 14 BUF 4681 296.80837 8.94595186
1831
+ ## 15 BUR 371 2465.00000 8.17567568
1832
+ ## 16 BWI 1781 179.41830 10.72673385
1833
+ ## 17 BZN 36 1882.00000 7.60000000
1834
+ ## 18 CAE 116 603.55172 41.76415094
1835
+ ## 19 CAK 864 397.00000 19.69833729
1836
+ ## 20 CHO 52 305.00000 9.50000000
1837
+ ## 21 CHS 2884 632.91678 10.59296847
1838
+ ## 22 CLE 4573 414.17428 9.18161129
1839
+ ## 23 CLT 14064 538.02730 7.36031885
1840
+ ## 24 CMH 3524 476.55505 10.60132291
1841
+ ## 25 CRW 138 444.00000 14.67164179
1842
+ ## 26 CVG 3941 575.15986 15.36456376
1843
+ ## 27 DAY 1525 537.10230 12.68048606
1844
+ ## 28 DCA 9705 211.00618 9.06695204
1845
+ ## 29 DEN 7266 1614.67836 8.60650021
1846
+ ## 30 DFW 8738 1383.04303 0.32212685
1847
+ ## 31 DSM 569 1020.88752 19.00573614
1848
+ ## 32 DTW 9384 498.12852 5.42996346
1849
+ ## 33 EGE 213 1735.70892 6.30434783
1850
+ ## 34 FLL 12055 1070.06877 8.08212154
1851
+ ## 35 GRR 765 605.78170 18.18956044
1852
+ ## 36 GSO 1606 449.84184 14.11260054
1853
+ ## 37 GSP 849 595.95995 15.93544304
1854
+ ## 38 HNL 707 4972.67468 -1.36519258
1855
+ ## 39 HOU 2115 1420.15508 7.17618819
1856
+ ## 40 IAD 5700 224.84684 13.86420212
1857
+ ## 41 IAH 7198 1407.20672 4.24079040
1858
+ ## 42 ILM 110 500.00000 4.63551402
1859
+ ## 43 IND 2077 652.26288 9.94043412
1860
+ ## 44 JAC 25 1875.60000 28.09523810
1861
+ ## 45 JAX 2720 824.67610 11.84483416
1862
+ ## 46 LAS 5997 2240.96148 0.25772849
1863
+ ## 47 LAX 16174 2468.62236 0.54711094
1864
+ ## 48 LGB 668 2465.00000 -0.06202723
1865
+ ## 49 MCI 2008 1097.69522 14.51405836
1866
+ ## 50 MCO 14082 943.11057 5.45464309
1867
+ ## 51 MDW 4113 718.04595 12.36422360
1868
+ ## 52 MEM 1789 954.20123 10.64531435
1869
+ ## 53 MHT 1009 207.02973 14.78755365
1870
+ ## 54 MIA 11728 1091.55244 0.29905978
1871
+ ## 55 MKE 2802 733.38151 14.16722038
1872
+ ## 56 MSN 572 803.95455 20.19604317
1873
+ ## 57 MSP 7185 1017.40167 7.27016886
1874
+ ## 58 MSY 3799 1177.70571 6.49017497
1875
+ ## 59 MVY 221 173.00000 -0.28571429
1876
+ ## 60 MYR 59 550.66102 4.60344828
1877
+ ## 61 OAK 312 2576.00000 3.07766990
1878
+ ## 62 OKC 346 1325.00000 30.61904762
1879
+ ## 63 OMA 849 1135.56655 14.69889841
1880
+ ## 64 ORD 17283 729.00081 5.87661475
1881
+ ## 65 ORF 1536 288.52344 10.94909344
1882
+ ## 66 PBI 6554 1028.83811 8.56297210
1883
+ ## 67 PDX 1354 2445.56573 5.14157973
1884
+ ## 68 PHL 1632 94.32353 10.12719014
1885
+ ## 69 PHX 4656 2141.30326 2.09704733
1886
+ ## 70 PIT 2875 334.06122 7.68099053
1887
+ ## 71 PSE 365 1617.00000 7.87150838
1888
+ ## 72 PVD 376 160.00000 16.23463687
1889
+ ## 73 PWM 2352 276.12840 11.66040210
1890
+ ## 74 RDU 8163 426.75769 10.05238095
1891
+ ## 75 RIC 2454 281.40465 20.11125320
1892
+ ## 76 ROC 2416 259.25083 11.56064461
1893
+ ## 77 RSW 3537 1072.85327 3.23814963
1894
+ ## 78 SAN 2737 2437.29923 3.13916574
1895
+ ## 79 SAT 686 1578.34111 6.94537178
1896
+ ## 80 SAV 804 709.18408 15.12950601
1897
+ ## 81 SDF 1157 645.98358 12.66938406
1898
+ ## 82 SEA 3923 2412.66531 -1.09909910
1899
+ ## 83 SFO 13331 2577.92356 2.67289152
1900
+ ## 84 SJC 329 2569.00000 3.44817073
1901
+ ## 85 SJU 5819 1599.83365 2.52052659
1902
+ ## 86 SLC 2467 1986.98662 0.17625459
1903
+ ## 87 SMF 284 2521.00000 12.10992908
1904
+ ## 88 SNA 825 2434.00000 -7.86822660
1905
+ ## 89 SRQ 1211 1044.65153 3.08243131
1906
+ ## 90 STL 4339 878.72321 11.07846451
1907
+ ## 91 STT 522 1626.98276 -3.83590734
1908
+ ## 92 SYR 1761 205.92164 8.90392501
1909
+ ## 93 TPA 7466 1003.93557 7.40852503
1910
+ ## 94 TUL 315 1215.00000 33.65986395
1911
+ ## 95 TVC 101 652.38614 12.96842105
1912
+ ## 96 TYS 631 638.80983 24.06920415
1913
+ ## 97 XNA 1036 1142.50579 7.46572581
1914
+ ```
1915
+
1916
+ # Using Data Table
1917
+
1918
+
1919
+ ```ruby
1920
+ R.library('data.table')
1921
+ R.install_and_loads('curl')
1922
+
1923
+ input = "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
1924
+ @flights = R.fread(input)
1925
+ puts @flights
1926
+ puts @flights.dim
1927
+ ```
1928
+
1929
+ ```
1930
+ ## year month day dep_delay arr_delay carrier origin dest air_time
1931
+ ## 1: 2014 1 1 14 13 AA JFK LAX 359
1932
+ ## 2: 2014 1 1 -3 13 AA JFK LAX 363
1933
+ ## 3: 2014 1 1 2 9 AA JFK LAX 351
1934
+ ## 4: 2014 1 1 -8 -26 AA LGA PBI 157
1935
+ ## 5: 2014 1 1 2 1 AA JFK LAX 350
1936
+ ## ---
1937
+ ## 253312: 2014 10 31 1 -30 UA LGA IAH 201
1938
+ ## 253313: 2014 10 31 -5 -14 UA EWR IAH 189
1939
+ ## 253314: 2014 10 31 -8 16 MQ LGA RDU 83
1940
+ ## 253315: 2014 10 31 -4 15 MQ LGA DTW 75
1941
+ ## 253316: 2014 10 31 -5 1 MQ LGA SDF 110
1942
+ ## distance hour
1943
+ ## 1: 2475 9
1944
+ ## 2: 2475 11
1945
+ ## 3: 2475 19
1946
+ ## 4: 1035 7
1947
+ ## 5: 2475 13
1948
+ ## ---
1949
+ ## 253312: 1416 14
1950
+ ## 253313: 1400 8
1951
+ ## 253314: 431 11
1952
+ ## 253315: 502 11
1953
+ ## 253316: 659 8
1954
+ ## [1] 253316 11
1955
+ ```
1956
+
1957
+
1958
+ ```ruby
1959
+
1960
+ data_table = R.data__table(
1961
+ ID: R.c("b","b","b","a","a","c"),
1962
+ a: (1..6),
1963
+ b: (7..12),
1964
+ c: (13..18)
1965
+ )
1966
+
1967
+ puts data_table
1968
+ puts data_table.ID
1969
+ ```
1970
+
1971
+ ```
1972
+ ## ID a b c
1973
+ ## 1: b 1 7 13
1974
+ ## 2: b 2 8 14
1975
+ ## 3: b 3 9 15
1976
+ ## 4: a 4 10 16
1977
+ ## 5: a 5 11 17
1978
+ ## 6: c 6 12 18
1979
+ ## [1] "b" "b" "b" "a" "a" "c"
1980
+ ```
1981
+
1982
+
1983
+ ```ruby
1984
+ # subset rows in i
1985
+ ans = @flights[(:origin.eq "JFK") & (:month.eq 6)]
1986
+ puts ans.head
1987
+
1988
+ # Get the first two rows from flights.
1989
+
1990
+ ans = @flights[(1..2)]
1991
+ puts ans
1992
+
1993
+ # Sort flights first by column origin in ascending order, and then by dest in descending order:
1994
+
1995
+ # ans = @flights[E.order(:origin, -(:dest))]
1996
+ # puts ans.head
1997
+ ```
1998
+
1999
+ ```
2000
+ ## year month day dep_delay arr_delay carrier origin dest air_time
2001
+ ## 1: 2014 6 1 -9 -5 AA JFK LAX 324
2002
+ ## 2: 2014 6 1 -10 -13 AA JFK LAX 329
2003
+ ## 3: 2014 6 1 18 -1 AA JFK LAX 326
2004
+ ## 4: 2014 6 1 -6 -16 AA JFK LAX 320
2005
+ ## 5: 2014 6 1 -4 -45 AA JFK LAX 326
2006
+ ## 6: 2014 6 1 -6 -23 AA JFK LAX 329
2007
+ ## distance hour
2008
+ ## 1: 2475 8
2009
+ ## 2: 2475 12
2010
+ ## 3: 2475 7
2011
+ ## 4: 2475 10
2012
+ ## 5: 2475 18
2013
+ ## 6: 2475 14
2014
+ ## year month day dep_delay arr_delay carrier origin dest air_time
2015
+ ## 1: 2014 1 1 14 13 AA JFK LAX 359
2016
+ ## 2: 2014 1 1 -3 13 AA JFK LAX 363
2017
+ ## distance hour
2018
+ ## 1: 2475 9
2019
+ ## 2: 2475 11
2020
+ ```
2021
+
2022
+
2023
+ ```ruby
2024
+ # Select column(s) in j
2025
+ # select arr_delay column, but return it as a vector.
2026
+
2027
+ ans = @flights[:all, :arr_delay]
2028
+ puts ans.head
2029
+
2030
+ # Select arr_delay column, but return as a data.table instead.
2031
+
2032
+ ans = @flights[:all, :arr_delay.list]
2033
+ puts ans.head
2034
+
2035
+ ans = @flights[:all, E.list(:arr_delay, :dep_delay)]
2036
+ ```
2037
+
2038
+ ```
2039
+ ## [1] 13 13 9 -26 1 0
2040
+ ## arr_delay
2041
+ ## 1: 13
2042
+ ## 2: 13
2043
+ ## 3: 9
2044
+ ## 4: -26
2045
+ ## 5: 1
2046
+ ## 6: 0
2047
+ ```
2048
+
1767
2049
  # Graphics in Galaaz
1768
2050
 
1769
2051
  Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
@@ -1872,8 +2154,37 @@ puts @mtcars
1872
2154
  ## Toyota Corolla Toyota Corolla 2.29 above
1873
2155
  ```
1874
2156
  Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
1875
- 'R.awt' to create a plotting device, since gKnit does take care of it:
2157
+ 'R.awt' to create a plotting device, since gKnit does take care of it. Galaaz
2158
+ provides integration with ggplot. The interested reader should check online for more
2159
+ information on ggplot, since it is outside the scope of this manual describing
2160
+ how ggplot works. We give here but a brief description on how this plot is generated.
2161
+
2162
+ ggplot implements the 'grammar of graphics'. In this approach, plots are build by
2163
+ adding layers to the plot. On the first layer we describe what we want on the 'x'
2164
+ and 'y' axis of the plot. In this case, we have 'car_name' on the 'x' axis and
2165
+ 'mpg\_z' on the 'y' axis. Then the type of graph is specified by adding
2166
+ 'geom\_bar' (for a bar graph). We specify that our bars should be filled using
2167
+ 'mpg\_type', which is either 'above' or 'bellow' giving then two colours for
2168
+ filling. On the next layer we specify the labels for the graph, then we add the
2169
+ title and subtitle. Finally, in a bar chart usually bars go on the vertical direction,
2170
+ but in this graph we want the bars to be horizontally layed so we add 'coord\_flip'.
2171
+
2172
+
2173
+ ```ruby
2174
+ require 'ggplot'
2175
+
2176
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
2177
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
2178
+ R.scale_fill_manual(name: 'Mileage',
2179
+ labels: R.c('Above Average', 'Below Average'),
2180
+ values: R.c('above': '#00ba38', 'below': '#f8766d')) +
2181
+ R.labs(subtitle: "Normalised mileage from 'mtcars'",
2182
+ title: "Diverging Bars") +
2183
+ R.coord_flip
2184
+ ```
2185
+
1876
2186
 
2187
+ ![](/home/rbotafogo/desenv/galaaz/blogs/manual/manual_files/figure-html/diverging_bar.png)<!-- -->
1877
2188
 
1878
2189
 
1879
2190
  [TO BE CONTINUED...]