statsailr_procs_base 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +10 -0
- data/Gemfile +10 -0
- data/LICENSE.txt +675 -0
- data/README.md +195 -0
- data/Rakefile +12 -0
- data/bin/console +15 -0
- data/bin/setup +8 -0
- data/lib/statsailr_procs_base.rb +16 -0
- data/lib/statsailr_procs_base/check_statsailr_version.rb +18 -0
- data/lib/statsailr_procs_base/path.rb +7 -0
- data/lib/statsailr_procs_base/proc_setting/common_utility.R +16 -0
- data/lib/statsailr_procs_base/proc_setting/proc_cat.R +65 -0
- data/lib/statsailr_procs_base/proc_setting/proc_cat.rb +44 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/dev_copy.R +6 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/dev_copy.rb +15 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/factor.R +7 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/factor.rb +15 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/numeric.R +7 -0
- data/lib/statsailr_procs_base/proc_setting/proc_common/numeric.rb +15 -0
- data/lib/statsailr_procs_base/proc_setting/proc_mult.R +12 -0
- data/lib/statsailr_procs_base/proc_setting/proc_mult.rb +35 -0
- data/lib/statsailr_procs_base/proc_setting/proc_plot.R +52 -0
- data/lib/statsailr_procs_base/proc_setting/proc_plot.rb +54 -0
- data/lib/statsailr_procs_base/proc_setting/proc_print.R +60 -0
- data/lib/statsailr_procs_base/proc_setting/proc_print.rb +44 -0
- data/lib/statsailr_procs_base/proc_setting/proc_reg.rb +17 -0
- data/lib/statsailr_procs_base/proc_setting/proc_two.R +33 -0
- data/lib/statsailr_procs_base/proc_setting/proc_two.rb +38 -0
- data/lib/statsailr_procs_base/proc_setting/proc_uni.R +105 -0
- data/lib/statsailr_procs_base/proc_setting/proc_uni.rb +27 -0
- data/lib/statsailr_procs_base/version.rb +7 -0
- data/statsailr_procs_base.gemspec +37 -0
- metadata +80 -0
data/README.md
ADDED
@@ -0,0 +1,195 @@
|
|
1
|
+
# About statsailr_procs_base gem
|
2
|
+
|
3
|
+
This 'statsailr_procs_base' gem provides a collection of fundamental PROC settings and make PROCs available for StatSailr program. The 'statsailr' gem does not have PROC settings, and this gem is essential for StatSailr to provide a useful statistics system.
|
4
|
+
|
5
|
+
The reason why these basic PROC settings are separated as 'statsailr_procs_base' gem from the main 'statsailr' gem is maintainability. The StatSailr system (i.e. main system + PROCs) is still under development, and is updated frequently. PROCs and main system do not have specific release cycles yet. For those reasons, it is better to update them as separate gems.
|
6
|
+
|
7
|
+
|
8
|
+
## Installation
|
9
|
+
|
10
|
+
This gem is usually installed together with 'statsailr' gem, following the dependency setting in statsailr.gemspec. If you want to try the most up-to-date version, manually install it from Github.
|
11
|
+
|
12
|
+
|
13
|
+
## How each StatSailr's PROC functionality is defined
|
14
|
+
|
15
|
+
How each PROC block behaves or what instructions they have is defined as PROC settings. This gem provides those PROC settings. StatSailr calls PROCs providing gem's path_to_proc_setting method (StatSailr::ProcsBase::path_to_proc_setting in this gem), and uses the path information to access the gem's PROC settings. The method returns the base dicrectory path that contains PROC settings (a.k.a. lib/statsailr_procs_base/proc_setting directory). Based on the PROCs called from StatSailr, its corresponding setting files are identified and settings are loaded. Based on the settings, appropriate R functions are generated and executed.
|
16
|
+
|
17
|
+
Following these settings, StatSailr converts PROC block to a series of R functions. PROC settings are written in Ruby modules. Internally StatSailr creates a Ruby object for each block, and those objects extend their functionality by mixing-in corresponding PROC setting module. Currently, those files and module names are determined from PROC's command name. For example, if the block starts with 'PROC CAT', 'ProcCat' module written in 'proc_cat.rb' is the module to be used for functionality extension.
|
18
|
+
|
19
|
+
|
20
|
+
## Understanding PROC settings
|
21
|
+
|
22
|
+
The following is typical PROC setting and its explanation.
|
23
|
+
|
24
|
+
```
|
25
|
+
# proc_print.rb
|
26
|
+
|
27
|
+
module ProcPrint
|
28
|
+
def setting_for_head( setting )
|
29
|
+
setting.libname = "utils"
|
30
|
+
setting.func_name = "head"
|
31
|
+
setting.main_arg_and_how_to_treat = [ "n", "read_as_intvec", "allow_nil"]
|
32
|
+
setting.runtime_args = { "x" => param("data")}
|
33
|
+
setting.store_result = false
|
34
|
+
setting.print_opt = true
|
35
|
+
end
|
36
|
+
end
|
37
|
+
```
|
38
|
+
|
39
|
+
* module ProcPrint
|
40
|
+
+ This module is setting for 'PROC PRINT' block.
|
41
|
+
* def setting_for_head
|
42
|
+
+ This function is setting for 'head' instruction.
|
43
|
+
* setting.libname
|
44
|
+
+ When libname is specified, the function is looked up from the package.
|
45
|
+
+ In the above example, it is equivalent to utils::head()
|
46
|
+
+ When libname is set nil, the function is looked up from environment or Global.
|
47
|
+
* setting.envname
|
48
|
+
+ When libname is set nil, and envname is specified, the function is looked up from this environment.
|
49
|
+
+ This enables R functions to be called with environment name, and each PROC can implement its own R functions within their environment, and prevents function name conflicts.
|
50
|
+
* setting.func_name
|
51
|
+
+ R's function name to be called.
|
52
|
+
+ package name or environment name can be specified as mentioned above.
|
53
|
+
* setting.main_arg_and_how_to_treat
|
54
|
+
+ This is spcified in Array of length 3.
|
55
|
+
+ The first element is R's function argument name to be passed to.
|
56
|
+
+ The second element of its value is how to treat the main argument.
|
57
|
+
+ The third element of its value is whether to allow not specifying main argument. "allow_nil" or "no_nil"
|
58
|
+
+ In the above example, value specified for main argument is interpreted as intvec, and is passed to R funcion's 'x' argument.
|
59
|
+
+ "allow_nil" allows users to omit main argument, and default value of 6 is used. (See head documentation https://stat.ethz.ch/R-manual/R-devel/library/utils/html/head.html )
|
60
|
+
+ If the instruction does not take a main argument, nil or [nil, nil, nil] is specified.
|
61
|
+
* setting.runtime_args
|
62
|
+
+ This is specified in Hash or nil.
|
63
|
+
+ This setting is used to pass objects that are generated at runtime to R function.
|
64
|
+
+ The objects include objects spcified in PROC options and results of previous instructions within the same block.
|
65
|
+
+ To define, param(), result() and one_from() methods can be used. Details are described later.
|
66
|
+
* setting.store_result
|
67
|
+
+ true/false
|
68
|
+
+ If this setting is true, this instruction's result can be accessed via setting.runtime_args.
|
69
|
+
+ If set false, this instruction's result cannot be accessed.
|
70
|
+
* setting.print_opt
|
71
|
+
+ true/false
|
72
|
+
+ If this setting is true, the result is printed out using "print" function.
|
73
|
+
+ If set false, the result is not printed out.
|
74
|
+
+ String
|
75
|
+
+ If String value is set, the result is printed out using the name specified.
|
76
|
+
* setting.plot_opt
|
77
|
+
+ true/false
|
78
|
+
+ Under the situation where graphics device does not show on display and tries to output to a file, this setting is valid.
|
79
|
+
+ If true, StatSailr conducts dev.copy() at the end of this current instruction, and saves graphics device content to file.
|
80
|
+
+ If the current graphics device outputs to display, this setting is useless and ignored.
|
81
|
+
|
82
|
+
|
83
|
+
## More about main_arg_and_how_to_treat
|
84
|
+
|
85
|
+
The following methods can be used to parse and convert PROC instruction main argument to R object. This method name needs to be specified in String.
|
86
|
+
|
87
|
+
* read_as_formula
|
88
|
+
* read_as_strvec
|
89
|
+
* read_as_one_str
|
90
|
+
* read_as_numvec
|
91
|
+
* read_as_intvec
|
92
|
+
* read_as_realvec
|
93
|
+
* read_as_symbol
|
94
|
+
* read_symbols_as_strvec
|
95
|
+
|
96
|
+
|
97
|
+
|
98
|
+
## More about setting.runtime_args
|
99
|
+
|
100
|
+
This setting is used to access objects that are generated at runtime, and pass those objects to R function. These objects include objects that are spcified in PROC args, and results that are returned from previous instrutions within the same block.
|
101
|
+
|
102
|
+
* param() method
|
103
|
+
+ param() can access object specified in PROC option.
|
104
|
+
+ (The reason why 'param' is used for this access is that PROC option parameter is managed by RBridge::ParamManager.)
|
105
|
+
+ For example, "x" => param("data") means that passing an object that is named as "data" in PROC option, and passing it to "x" argument of R function.
|
106
|
+
* result() method
|
107
|
+
+ result() can access a result object of an instruction that is specified by instruction name.
|
108
|
+
+ If result() method takes multiple argument or multiple instruction names, it returns ResultNameArray object.
|
109
|
+
+ The last result from specified instruction results is used.
|
110
|
+
* Pointer to R object
|
111
|
+
+ R object that is statically generated can also be used. Usually this can be used as a default value.
|
112
|
+
* one_from() method
|
113
|
+
+ one_from() method can take ResultName, ResultNameAray, PramName or pointer to R object.
|
114
|
+
+ one_from() can take multiple arguments, and their order defines priority about which object to use.
|
115
|
+
+ (e.g.) 'setting.runtime_args = {"data" => one_from( result("factor", "numeric"), param("data")) }' means the following
|
116
|
+
+ If there are already instructions of "factor" or "numeric", use the last result of them.
|
117
|
+
+ If not, object that is specified by "data" in PROC options in used.
|
118
|
+
|
119
|
+
|
120
|
+
## Enable custom R function
|
121
|
+
|
122
|
+
StatSailr PROC instructions can also call custom R functions that are not defined in R's libraries. source_r_file() method does this work. ProcSettingModule is a module that provides source_r_file() method. source_r_file() method lets R function defined in the file available from R or StatSailr PROC instructions.
|
123
|
+
|
124
|
+
The following example enables functions in proc_scatter.R . ( The setting file name is proc_scatter.rb and File.basename(__FILE__ , ".rb") + ".R" replaces the extension with ".R".) Note that source_r_file()'s first argument takes absolute path for the directory containing the R file. This restriction prevents from loading unintentional files.
|
125
|
+
|
126
|
+
```
|
127
|
+
# proc_scatter.rb
|
128
|
+
module ProcScatter
|
129
|
+
include ProcSettingModule
|
130
|
+
...
|
131
|
+
source_r_file( __dir__, File.basename(__FILE__ , ".rb") + ".R")
|
132
|
+
...
|
133
|
+
...
|
134
|
+
end
|
135
|
+
```
|
136
|
+
|
137
|
+
## Validate PROC options
|
138
|
+
|
139
|
+
Each PROC option (i.e. option that follows PROC command) can be validated. Option name, option value type and option requirement can be checked, and types are converted appropriately if necessary.
|
140
|
+
|
141
|
+
For example, in many PROCs, data option is required. To guarantee the existence of data option, pass "data" with "required: true" to validate_option().
|
142
|
+
|
143
|
+
* validate_option() method takes the following arguments
|
144
|
+
+ option_name : option to validate
|
145
|
+
+ is_a : value assigned to the option should belong to type(s) specified.
|
146
|
+
+ "Float", "Integer", "String" or "SymbolR" can be specified.
|
147
|
+
+ Array can be used to allow some types.
|
148
|
+
+ as : value is finally dealt as this type specified.
|
149
|
+
+ required : PROC command requires the option or not
|
150
|
+
|
151
|
+
The following example specifies PROC PRINT requires 'data' option, SymbolR or String is accepted for its value, and the value is finally dealt as SymbolR.
|
152
|
+
|
153
|
+
```
|
154
|
+
module ProcPrint
|
155
|
+
include ProcSettingModule
|
156
|
+
...
|
157
|
+
validate_option("data", is_a: ["SymbolR", "String"], as: "SymbolR", required: true)
|
158
|
+
```
|
159
|
+
|
160
|
+
|
161
|
+
## Share common settings with multiple settings
|
162
|
+
|
163
|
+
StatSailr PROC instruction settings can be shared by multiple setting modules. The shared setting can be included using add_setting_from() method defined in ProcSettingModule. The included file is also written in module, and the format is almost the same.
|
164
|
+
|
165
|
+
In thee following example, ProcScatter or ProPlot?????? include DevCopySetting which provides functionality of dev.copy() function.
|
166
|
+
|
167
|
+
|
168
|
+
```
|
169
|
+
# proc_plot.rb
|
170
|
+
module ProcPlot
|
171
|
+
include ProcSettingModule
|
172
|
+
add_setting_from( __dir__, "proc_common/dev_copy.rb" ) # This file should have DevCopy module.
|
173
|
+
...
|
174
|
+
...
|
175
|
+
end
|
176
|
+
|
177
|
+
# proc_common/dev_copy.rb
|
178
|
+
module DevCopySetting
|
179
|
+
def setting_for_dev_copy( setting )
|
180
|
+
...
|
181
|
+
...
|
182
|
+
end
|
183
|
+
end
|
184
|
+
```
|
185
|
+
|
186
|
+
## Contributing
|
187
|
+
|
188
|
+
Bug reports are welcome on GitHub at https://github.com/niceume/statsailr_procs_base.
|
189
|
+
|
190
|
+
|
191
|
+
## License
|
192
|
+
|
193
|
+
The gem is available as open source under the terms of the [GPL v3 License](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
194
|
+
|
195
|
+
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# frozen_string_literal: true
|
3
|
+
|
4
|
+
require "bundler/setup"
|
5
|
+
require "statsailr_procs_base"
|
6
|
+
|
7
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
8
|
+
# with your gem easier. You can also use a different console, if you like.
|
9
|
+
|
10
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
11
|
+
# require "pry"
|
12
|
+
# Pry.start
|
13
|
+
|
14
|
+
require "irb"
|
15
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative "statsailr_procs_base/version"
|
4
|
+
require_relative "statsailr_procs_base/path"
|
5
|
+
require_relative "statsailr_procs_base/check_statsailr_version"
|
6
|
+
|
7
|
+
module StatSailr
|
8
|
+
module ProcsBase
|
9
|
+
class Error < StandardError; end
|
10
|
+
|
11
|
+
# Your code goes here...
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
StatSailr::ProcsBase::check_statsailr_version()
|
16
|
+
|
@@ -0,0 +1,18 @@
|
|
1
|
+
module StatSailr
|
2
|
+
module ProcsBase
|
3
|
+
def self.check_statsailr_version
|
4
|
+
statsailr_version = StatSailr::VERSION
|
5
|
+
statsailr_version_at_least = "0.7.1"
|
6
|
+
return version_should_be_at_least( statsailr_version , statsailr_version_at_least )
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.version_should_be_at_least( gem_version, str_version )
|
10
|
+
if( Gem::Version.new(gem_version) >= Gem::Version.new(str_version) )
|
11
|
+
return true
|
12
|
+
else
|
13
|
+
puts "\e[1m" + "WARNING: statsailr gem (#{gem_version}) needs to be newer or equal to ver. #{str_version}" + "\e[22m"
|
14
|
+
return false
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
@@ -0,0 +1,16 @@
|
|
1
|
+
if_exist_else = function( name, lst, default_val ){
|
2
|
+
if( name %in% names(lst)){
|
3
|
+
return( lst[[name]] )
|
4
|
+
}else{
|
5
|
+
return( default_val )
|
6
|
+
}
|
7
|
+
}
|
8
|
+
|
9
|
+
get_pkg_fun = function(x) {
|
10
|
+
if(length(grep("::", x))>0) {
|
11
|
+
parts<-strsplit(x, "::")[[1]]
|
12
|
+
getExportedValue(parts[1], parts[2])
|
13
|
+
} else {
|
14
|
+
x
|
15
|
+
}
|
16
|
+
}
|
@@ -0,0 +1,65 @@
|
|
1
|
+
sts_cat = new.env()
|
2
|
+
|
3
|
+
sts_cat$table = function( data , vars , missing = FALSE, ... ){
|
4
|
+
if( (! is.character(vars)) || length(vars) != 2 ){
|
5
|
+
stop("vars argument requires character vector with size of 2")
|
6
|
+
}
|
7
|
+
|
8
|
+
df_for_table = data[ , vars ]
|
9
|
+
if( missing ){
|
10
|
+
freq_ori = table(df_for_table, useNA = "ifany")
|
11
|
+
}else{
|
12
|
+
freq_ori = table(df_for_table)
|
13
|
+
}
|
14
|
+
|
15
|
+
freq = addmargins( freq_ori )
|
16
|
+
allper = format( addmargins( prop.table( freq_ori) * 100), digits = 4)
|
17
|
+
rowper = format( addmargins( prop.table( freq_ori, 1) * 100), digits = 4)
|
18
|
+
colper = format( addmargins( prop.table( freq_ori, 2) * 100), digits = 4)
|
19
|
+
|
20
|
+
nrow = nrow(freq)
|
21
|
+
ncol = ncol(freq)
|
22
|
+
nrow_ori = nrow(freq_ori)
|
23
|
+
|
24
|
+
temp = as.array( numeric( nrow * ncol * 4 ))
|
25
|
+
dim( temp ) = c( nrow * 4, ncol )
|
26
|
+
result = as.table( temp )
|
27
|
+
|
28
|
+
colnames(result) = c( colnames( freq_ori ) , "Total")
|
29
|
+
|
30
|
+
row_label1 = c( rep("%",nrow_ori ) , "%")
|
31
|
+
row_label2 = c( rep("Row%",nrow_ori ) , "")
|
32
|
+
row_label3 = c( rep("Col%",nrow_ori ) , "")
|
33
|
+
|
34
|
+
rownames(result) = as.vector( mapply( c, c( rownames( freq_ori ), "Total" ), row_label1 , row_label2 , row_label3 ))
|
35
|
+
|
36
|
+
cat( paste( "\t", names(freq), "\n", sep = "\t", collapse = "\t" ))
|
37
|
+
for( irow in seq(1, nrow) ){
|
38
|
+
for( icol in seq(1, ncol) ){
|
39
|
+
result[ (irow-1)*4 + 1, icol] = freq[irow, icol]
|
40
|
+
}
|
41
|
+
for( icol in seq(1, ncol) ){
|
42
|
+
result[ (irow-1)*4 + 2, icol] = allper[irow, icol]
|
43
|
+
}
|
44
|
+
for( icol in seq(1, ncol) ){
|
45
|
+
if(irow != nrow && icol != ncol){
|
46
|
+
result[ (irow-1)*4 + 3, icol] = rowper[irow, icol]
|
47
|
+
}else{
|
48
|
+
result[ (irow-1)*4 + 3, icol] = NA
|
49
|
+
}
|
50
|
+
}
|
51
|
+
for( icol in seq(1, ncol) ){
|
52
|
+
if(icol != ncol && irow!= nrow){
|
53
|
+
result[ (irow-1)*4 + 4, icol] = colper[irow, icol]
|
54
|
+
}else{
|
55
|
+
result[ (irow-1)*4 + 4, icol] = NA
|
56
|
+
}
|
57
|
+
}
|
58
|
+
}
|
59
|
+
|
60
|
+
cat( paste( vars[1] , " vs ", vars[2], "\n" ) )
|
61
|
+
print( result )
|
62
|
+
|
63
|
+
return( freq_ori )
|
64
|
+
}
|
65
|
+
|
@@ -0,0 +1,44 @@
|
|
1
|
+
module ProcCat
|
2
|
+
include ProcSettingModule
|
3
|
+
|
4
|
+
source_r_file( __dir__, File.basename(__FILE__ , ".rb") + ".R")
|
5
|
+
validate_option("data", is_a: ["SymbolR", "String"], as: "SymbolR" , required: true)
|
6
|
+
|
7
|
+
def setting_for_table( setting )
|
8
|
+
setting.libname = nil
|
9
|
+
setting.envname = "sts_cat"
|
10
|
+
setting.func_name = "table"
|
11
|
+
setting.main_arg_and_how_to_treat = ["vars", :read_as_strvec, :no_nil]
|
12
|
+
setting.runtime_args = {"data" => param("data") }
|
13
|
+
setting.store_result = true
|
14
|
+
setting.print_opt = false
|
15
|
+
end
|
16
|
+
|
17
|
+
def setting_for_xtabs( setting )
|
18
|
+
setting.libname = "stats"
|
19
|
+
setting.func_name = "xtabs"
|
20
|
+
setting.main_arg_and_how_to_treat = ["formula", :read_as_formula, :no_nil]
|
21
|
+
setting.runtime_args = {"data" => param("data") }
|
22
|
+
setting.store_result = true
|
23
|
+
setting.print_opt = true
|
24
|
+
end
|
25
|
+
|
26
|
+
def setting_for_fisher_test( setting )
|
27
|
+
setting.libname = "stats"
|
28
|
+
setting.func_name = "fisher.test"
|
29
|
+
setting.main_arg_and_how_to_treat = nil
|
30
|
+
setting.runtime_args = {"x" => one_from( result("table"), result("xtabs") ) }
|
31
|
+
setting.store_result = true
|
32
|
+
setting.print_opt = true
|
33
|
+
end
|
34
|
+
|
35
|
+
def setting_for_chisq_test( setting )
|
36
|
+
setting.libname = "stats"
|
37
|
+
setting.func_name = "chisq.test"
|
38
|
+
setting.main_arg_and_how_to_treat = nil
|
39
|
+
setting.runtime_args = {"x" => one_from( result("table"), result("xtabs") ) }
|
40
|
+
setting.store_result = true
|
41
|
+
setting.print_opt = true
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module DevCopySetting
|
2
|
+
include ProcSettingModule
|
3
|
+
source_r_file(__dir__, File.basename(__FILE__ , ".rb") + ".R")
|
4
|
+
|
5
|
+
def setting_for_dev_copy( setting )
|
6
|
+
setting.libname = nil
|
7
|
+
setting.envname = "sts_dev_copy"
|
8
|
+
setting.func_name = "dev_copy"
|
9
|
+
setting.main_arg_and_how_to_treat = [ "device" , :read_as_symbol , :allow_nil ]
|
10
|
+
setting.runtime_args = {}
|
11
|
+
setting.store_result = false
|
12
|
+
setting.print_opt = false
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|