statsailr 0.7.1 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 51e84bd76b9ed92c47e11f3baac54ade7ef338b4ab8a1935e5aaac106f559c65
4
- data.tar.gz: 2f68acb064cdb8297052ee9246a49da73db1a70750d2290b83eebe2e21c15db2
3
+ metadata.gz: 8701d54a846ae0d81c7f91080586c7dd81b05c081743ba84606c46ecc53f25a8
4
+ data.tar.gz: 7f2636ea0b00008f160e303b389d977b94e7698a1eb800603850cff38901b14e
5
5
  SHA512:
6
- metadata.gz: c1c37162684765074a27dd2a2ec2373c939ab5b125df6ae5a7ba1bd86dfd66fe9ece1e5f39a525db403a296daae8cf594687b8ba4d405cff78202472324b1389
7
- data.tar.gz: 4f0908c784e59eefec18664ba79ea33cc24148ac3c5de2a005b64553a2c9fd5a48d60d9821140f3c3b2065ea757c4f8700ef1d8e2ff99171a175ed137c747cdc
6
+ metadata.gz: 4007addec1d2a15f05c62bef40850a31648b4e2099ac7cf79890f0cb480854561990b478420bf3ab750089c88b1013b7fe4261c15738cb608b893257c2ef5134
7
+ data.tar.gz: e38f0f64d035ab90bfdd8c052f63c1b4f0cd605cac841ddaefa18de6af127658f2c6453b4fbd74b7dfcf31e29920ba26c9b6db8b8211ee452ad983dce8c0ae3e
data/README.md CHANGED
@@ -1,16 +1,16 @@
1
1
  # StatSailr
2
2
 
3
- StatSailr provides a platform for users to focus on statistics. The backend statistics engine is [R](https://www.r-project.org/), so the results are reliable. The SataSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. Each block has its way of writing instructions, which works as an intuitive interface for R.
3
+ StatSailr provides a platform for users to focus on statistics. The backend statistics engine is [R](https://www.r-project.org/), so the results are reliable. The StatSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. Each block has its way of writing instructions, which works as an intuitive interface for R.
4
4
 
5
5
 
6
6
  ## Overview
7
7
 
8
- StatSailr is a Ruby program that enables users to manipulate data and to apply statistical procedures in an intuiitive way. StatSailr converts StatSailr script into R's internal representation, and executes it. The SataSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. TOPLEVEL loads and saves datasets. DATA blocks utilize DataSailr package as its backend, which enables wrting data manipulation insturctions in a rowwise way. PROC blocks have a series of PROC instructions, which are converted to R functions and are executed sequentially.
8
+ StatSailr is a Ruby program that enables users to manipulate data and to apply statistical procedures in an intuitive way. StatSailr converts StatSailr script into R's internal representation, and executes it. The SataSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. TOPLEVEL loads and saves datasets. DATA blocks utilize DataSailr package as its backend, which enables wrting data manipulation instructions in a rowwise way. PROC blocks have a series of PROC instructions, which are converted to R functions and are executed sequentially.
9
9
 
10
10
 
11
11
  ### Quick Introduction
12
12
 
13
- The following is an example of StatSailr script. It consists of TOPLEVEL instruction, DATA block and PROC blocks.
13
+ The following is an example of StatSailr script. It consists of TOPLEVEL, DATA and PROC blocks.
14
14
 
15
15
  ```
16
16
  READ builtin="mtcars"
@@ -30,6 +30,8 @@ END
30
30
  PROC REG data=new_mtcars
31
31
  lm hp ~ powerful
32
32
  END
33
+
34
+ SAVE new_mtcars file="./new_mtcars.rda" type="rdata"
33
35
  ```
34
36
 
35
37
  Save this script as, say, create_new_mtcars.slr and run.
@@ -124,7 +126,7 @@ $ gem install statsailr
124
126
  * Then, 'sailr' and 'sailrREPL' become available.
125
127
 
126
128
 
127
- ## Grammar of StatSalr
129
+ ## Grammar of StatSailr
128
130
 
129
131
  StatSailr script consists of three parts, TOPLEVEL, DATA block and PROC block.
130
132
 
@@ -140,7 +142,7 @@ TOPLEVEL statements import and save datasets, and also StatSailr's current worki
140
142
 
141
143
  Datasets can come from built-in datasets and files. In R, built-in datasets can be used by data() function, and StatSailr READ with 'builtin=' option does the same job.
142
144
 
143
- When importing datasets from files, currently there are three types of files availble, RDS, RDATA and CSV. RDS contains a single R object, and when you import it, you can neme the object using 'as=' option. If you omit 'as=' option, the name is created based on the filename. RDATA can contain multiple R objects, but their names cannot be changed when importing that are decided when saveing. CSV is a comma separated values file.
145
+ When importing datasets from files, currently there are three types of files available, RDS, RDATA and CSV. RDS contains a single R object, and when you import it, you can name the object using 'as=' option. If you omit 'as=' option, the name is created based on the filename. RDATA can contain multiple R objects, but their names cannot be changed when importing that are decided when saving. CSV is a comma separated values file.
144
146
 
145
147
  These dataset types are decided as follows. If you specify 'type' option, its type is used. If you do not specify it, it is inferred from the file extension.
146
148
 
@@ -162,7 +164,7 @@ SAVE new_mtcars file="./new_mtcars.csv" type="csv"
162
164
 
163
165
  * show and change working directory
164
166
 
165
- The concept of working directory is really important. If you run your SailrScript in an unintentional place and ouput some data, those data might overwrite your importan data.
167
+ The concept of working directory is really important. If you run your StatSailr script in an unintentional place and output some data, those data might overwrite your important data.
166
168
 
167
169
  The default working directory should be the directory where StatSailr script file exists. If you do not specify script file, such as when you run REPL, the default working directory should be the directory where you start your command (such as REPL).
168
170
 
@@ -176,7 +178,7 @@ SETWD "~/sailr_workspace"
176
178
 
177
179
  ### DATA block
178
180
 
179
- DATA block starts with the line of DATA, new dataset name and DATA options. For DATA options, 'set=' option is required which specify the input dataset. (Note that unlinke PROC options where 'data=' usually speifies input dataset, "set=" does the same job in DATA block. This difference comes from just an aesthetic reason.) Lines that follws the first DATA line represent how to manipulate input dataset. The lines are writtein in DataSailr script. END keyword specifies the end of DATA block.
181
+ DATA block starts with the line of DATA, new dataset name and DATA options. For DATA options, 'set=' option is required which specify the input dataset. (Note that unlink PROC options where 'data=' usually specifies input dataset, "set=" does the same job in DATA block. This difference comes from just an aesthetic reason.) Lines that follws the first DATA line represent how to manipulate input dataset. The lines are written in DataSailr script. END keyword specifies the end of DATA block.
180
182
 
181
183
  ```
182
184
  DATA new_dataset set=ori_dataset
@@ -194,24 +196,23 @@ The DataSailr script is described in detail at [its official website](https://da
194
196
 
195
197
  Briefly speaking,
196
198
 
197
- 1. Rowwise dataset manipulation
198
- + Varables correspond to column names.
199
+ 1. Row by row dataset processing
200
+ + Variable names correspond to column names.
199
201
  2. Simplified available types
200
- + Int, Double and String(=Characters) are basic types, that can be used in DataSailr script and also those values can be assigned to column value (of dataset).
201
- + Regular expression and boolean are not assigned to dataset. They can be held by variables, but do not modify dataset.
202
+ + Int, Double and String are basic types, that can be used in DataSailr script and also those values can be assigned to data sets.
203
+ + Regular expression and Boolean do not affect dataset.
202
204
  + Regular expression is used for if condition and extracting substrings.
203
205
  + Boolean is internal type that is used for if condition.
204
- 3. Assignment operator (=) creates new column with the column name same as the variable left-hand-side(LHS) of assignment operator.
206
+ 3. Assignment operator (=) creates new column or updates the existing column with the same column name as the variable left-hand-side of assignment operator.
205
207
  + If the variable already exits, the column is updated.
206
- + Exceptions are assigning regular expressions and boolen, which do not modify dataset. Variables pointing to those objects are only used in the script.
207
- 4. Control flow can be done using if-(else if)-(else) statement.
208
- + Condition part needs parentheses (), and statement part require curly braces.
208
+ + Exceptions are assigning Regular Expressions, which do not modify dataset. This is used to reuse them at different lines of code.
209
+ 4. If-else statement is the only control flow statement.
209
210
  5. Arithmetic operators
210
211
  6. Built-in functions
211
212
  + Mainly used to manipulate strings.
212
- 7. Regular expression
213
+ 7. Regular Expression
213
214
  8. UTF-8
214
- + Use UTF-8 for script and dataset. It is highly recommended that dataset should be saved using UTF-8 beforehand.
215
+ + It is highly recommended t use UTF-8 for script and dataset. Data set needs to be saved in UTF-8 beforehand.
215
216
  9. push!() and discard!() built-in functions
216
217
  + push!() can create multiple rows from current row.
217
218
  + discard!() can filter out specific rows by being used with if statements.
@@ -228,7 +229,7 @@ The PROCs gem holds basic PROC settings, such as PRINT and PLOT, and its main cl
228
229
 
229
230
  #### Format
230
231
 
231
- A typical PROC block looks like the follwing. The first line start with PROC, followed by PROC command name and PROC options. The PROC first line is followed by a list of instuctions with their main and optional arguments. The PROC block ends with END keyword.
232
+ A typical PROC block looks like the following. The first line start with PROC, followed by PROC command name and PROC options. The PROC first line is followed by a list of instructions with their main and optional arguments. The PROC block ends with END keyword.
232
233
 
233
234
  ```
234
235
  PROC COMMAND proc_opts
@@ -241,7 +242,7 @@ END
241
242
  * COMMAND
242
243
  + PROC command name
243
244
  * proc_opts
244
- + This parameter can be refered from any instructions in this block. In other words, this can be seen as global settings of this PROC block.
245
+ + This parameter can be referred from any instructions in this block. In other words, this can be seen as global settings of this PROC block.
245
246
  + Internally, this parameter is managed by RBridge::ParamManager.
246
247
  * proc_statement line
247
248
  + Each line consists of instruction, main argument and optional arguments. Main argument and optional arguments are separated by slash(/).
@@ -253,7 +254,7 @@ END
253
254
  + How main argument part is parsed varies on each instruction. This is defined in main_arg_and_how_to_treat variable in setting.
254
255
  + opt_args
255
256
  + Optional arguments. Values specified in opt_args are also passed to R function's argument.
256
- + opt_args part consists of (a) key-value(s) or key(s).
257
+ + opt_args part consists of (a) key-value(s).
257
258
  + Internally, this argument parsing is conducted by methods in STSBlockParseProcOpts.
258
259
 
259
260
 
@@ -1,8 +1,82 @@
1
1
  require "r_bridge"
2
2
  require_relative "./sts_block_parse_proc_opts.rb"
3
3
 
4
+ module QuotedStringSupport
5
+ def interpret_escape_sequences(str)
6
+ # This deals with escape sequences in double quoted string literals
7
+ # The behavior should be same as libsailr (or datasailr)
8
+ new_str = ""
9
+ str_array = str.split(//)
10
+ idx = 0
11
+ while( idx < str_array.size) do
12
+ c = str_array[idx]
13
+ if(c == "\\")
14
+ idx = idx + 1
15
+ c = str_array[idx]
16
+ raise "Tokenizer error: double quoted string literal should never end with \\" if idx >= str_array.size
17
+ case c
18
+ when 't'
19
+ new_str << "\t"
20
+ when 'n'
21
+ new_str << "\n"
22
+ when 'r'
23
+ new_str << "\r"
24
+ when "\\"
25
+ new_str << "\\"
26
+ when "\'"
27
+ new_str << "\'"
28
+ when "\""
29
+ new_str << "\""
30
+ when '?'
31
+ new_str << '?'
32
+ else
33
+ new_str << c
34
+ end
35
+ else
36
+ new_str << c
37
+ end
38
+ idx = idx + 1
39
+ end
40
+ return new_str
41
+ end
42
+
43
+ def escape_backslashes(str)
44
+ str.gsub("\\", "\\\\")
45
+ end
46
+ end
47
+
4
48
  module BlockSupport
5
- def type_adjust(obj , type)
49
+ include QuotedStringSupport
50
+ class QuotedStringR
51
+ include QuotedStringSupport
52
+ def initialize(str, quote_type)
53
+ raise ":dq or :sq should be specified for quote_type" unless [:dq, :sq].include? quote_type
54
+ @ori_str = str
55
+ @quote_type = quote_type
56
+ end
57
+
58
+ def to_s
59
+ to_s_for_r_bridge
60
+ end
61
+
62
+ def to_s_for_r_bridge
63
+ if @quote_type == :dq
64
+ interpret_escape_sequences( @ori_str )
65
+ elsif
66
+ @ori_str
67
+ end
68
+ end
69
+
70
+ def to_s_for_r_parsing
71
+ if @quote_type == :dq
72
+ %q{"} + @ori_str + %q{"}
73
+ elsif @quote_type == :sq
74
+ %q{'} + escape_backslashes( @ori_str ) + %q{'}
75
+ end
76
+ end
77
+ end
78
+
79
+ def type_adjust(obj , type , *opts)
6
80
  case type
7
81
  when :ident
8
82
  if obj.is_a?(String)
@@ -22,6 +96,28 @@ module BlockSupport
22
96
  else
23
97
  raise "GramNode with inconsistent type(#{type.to_s}) and object(#{obj.class})"
24
98
  end
99
+ when :sq_string
100
+ if obj.is_a?(String)
101
+ unless opts.include?( :retain_input_string )
102
+ # default behavior
103
+ result = obj
104
+ else
105
+ result = QuotedStringR.new( obj, :sq )
106
+ end
107
+ else
108
+ raise "GramNode with inconsistent type(#{type.to_s}) and object(#{obj.class})"
109
+ end
110
+ when :dq_string
111
+ if obj.is_a?(String)
112
+ unless opts.include?( :retain_input_string )
113
+ # default behavior
114
+ result = interpret_escape_sequences( obj )
115
+ else
116
+ result = QuotedStringR.new( obj, :dq )
117
+ end
118
+ else
119
+ raise "GramNode with inconsistent type(#{type.to_s}) and object(#{obj.class})"
120
+ end
25
121
  when :sign
26
122
  if obj.is_a?(String)
27
123
  result = RBridge::SignR.new(obj)
@@ -137,10 +233,16 @@ class ProcBlock
137
233
  idx = 0
138
234
  while idx < proc_stmt_arg_ori.size() do
139
235
  elem = proc_stmt_arg_ori[idx]
140
- if( elem.type == :sign && elem.e1 == "/" )
141
- break
236
+ next_elem = proc_stmt_arg_ori[idx + 1]
237
+ next_next_elem = proc_stmt_arg_ori[idx + 2]
238
+ if(elem.type == :sign && elem.e1 == "/" ) then
239
+ if( ! next_elem.nil? && next_elem.type == :ident && # After /, optional arguments start
240
+ ! next_next_elem.nil? && next_next_elem.type == :sign && next_next_elem.e1 == "=") ||
241
+ next_elem.nil? then # After /, there is nothing
242
+ break
243
+ end
142
244
  else
143
- x = type_adjust( elem.e1, elem.type )
245
+ x = type_adjust( elem.e1, elem.type, :retain_input_string )
144
246
  proc_stmt_arg << x
145
247
  idx = idx + 1
146
248
  end
@@ -50,7 +50,6 @@ class STSBlockParseProcOpts
50
50
  # arg_opt : IDENT = primary
51
51
  # | IDENT = array
52
52
  # | IDENT = func
53
- # | IDENT
54
53
  #
55
54
  # parimary : STRING
56
55
  # | NUM
@@ -87,15 +86,15 @@ class STSBlockParseProcOpts
87
86
  else
88
87
  opt_value = ident() # According to BNF, this should be parimary(). However, ident() is more direct and makes sense here.
89
88
  end
90
- elsif [:string, :num].include? peek.type
89
+ elsif [:dq_string, :sq_string, :num].include? peek.type
91
90
  next_token()
92
91
  opt_value = primary()
93
92
  else
94
93
  p current_token()
95
- raise "the token should be :ident or primaries such as :ident, :num and :string after = . Current token: " + current_token().type.to_s
94
+ raise "the token should be :ident or primaries such as :ident, :num and :string after = . Next token: " + peek.type.to_s
96
95
  end
97
96
  else
98
- opt_value = true
97
+ raise "proc instruction optional argumeents should be in the form of a sequence of 'key = value'"
99
98
  end
100
99
  @result_hash[opt_key.to_s] = opt_value
101
100
  end
@@ -127,12 +126,14 @@ class STSBlockParseProcOpts
127
126
  case current_token.type
128
127
  when :ident
129
128
  return ident()
130
- when :string
129
+ when :dq_string
130
+ return string()
131
+ when :sq_string
131
132
  return string()
132
133
  when :num
133
134
  return num()
134
135
  else
135
- raise "the current token should be :ident, :string or :num."
136
+ raise "the current token should be :ident, :dq_string, :sq_string or :num."
136
137
  end
137
138
  end
138
139
 
@@ -155,8 +156,13 @@ class STSBlockParseProcOpts
155
156
  end
156
157
 
157
158
  def string()
158
- raise "the current token should be string" if current_token.type != :string
159
- return type_adjust( current_token.e1, :string)
159
+ if (current_token.type != :dq_string)
160
+ return type_adjust( current_token.e1, :dq_string)
161
+ elsif (current_token.type != :sq_string)
162
+ return type_adjust( current_token.e1, :sq_string)
163
+ else
164
+ raise "the current token should be string (:dq_string or :sq_string)"
165
+ end
160
166
  end
161
167
 
162
168
  def num()
@@ -40,6 +40,50 @@ module LazyFuncGeneratorSettingUtility
40
40
  return RBridge.create_strvec( ary.map(){|elem| elem.to_s } )
41
41
  end
42
42
 
43
+ def read_symbols_or_functions_as_strvec(ary)
44
+ deapth_ary = Array.new( ary.size )
45
+ idx = 0
46
+ last_idx = ary.size - 1
47
+ deapth = 0
48
+ while( idx <= last_idx )
49
+ if( ary[idx].is_a?(RBridge::SymbolR) && (ary[idx + 1].is_a?(RBridge::SignR) && ary[idx + 1].to_s == "("))
50
+ # function starts
51
+ deapth_ary[idx] = deapth
52
+ deapth = deapth + 1
53
+ idx = idx + 1
54
+ deapth_ary[idx] = deapth
55
+ elsif( ary[idx].is_a?(RBridge::SignR) && ary[idx].to_s == "(")
56
+ # parenthesis starts
57
+ deapth = deapth + 1
58
+ deapth_ary[idx] = deapth
59
+ elsif( ary[idx].is_a?(RBridge::SignR) && ary[idx].to_s == ")")
60
+ # parenthesis ends or function ends
61
+ deapth_ary[idx] = deapth
62
+ deapth = deapth - 1
63
+ else
64
+ deapth_ary[idx] = deapth
65
+ end
66
+ idx = idx + 1
67
+ end
68
+
69
+ result_ary = []
70
+ ary.zip( deapth_ary).each(){|elem, deapth|
71
+ if elem.respond_to? :to_s_for_r_parsing
72
+ elem_str = elem.to_s_for_r_parsing
73
+ else
74
+ elem_str = elem.to_s
75
+ end
76
+
77
+ if( deapth == 0)
78
+ result_ary.push( elem_str )
79
+ else
80
+ result_ary.last << " " << elem_str
81
+ end
82
+ }
83
+
84
+ return RBridge.create_strvec( result_ary )
85
+ end
86
+
43
87
  def result( name , *addl )
44
88
  if addl.empty?
45
89
  return RBridge::RResultName.new(name)