statsailr 0.7.1 → 0.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +20 -19
- data/lib/statsailr/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 428f469213d12c0613eb7eeab1690cad296ac94cff448bea96f140b55f2d8a24
|
4
|
+
data.tar.gz: 684382167f17c762e22c752ac3303ec6a5666cbf83f02ef672f515167e43d2b6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 64352807559d052da5415f75f0917f629146cf5b9671bdce7b294720795404b954f16569930d5ad6d68c40465a46efc2ab2a2d9e4d5d9f18b2673926d98045e5
|
7
|
+
data.tar.gz: 4bf58f63503187e3e8a33aae1485ef226123c03e8741593c2f6cf434e90d24b055d119b921edf8ccc69f43e5d93398ec46fcb6067eb2f1be4c9cd9f4c192361f
|
data/README.md
CHANGED
@@ -1,16 +1,16 @@
|
|
1
1
|
# StatSailr
|
2
2
|
|
3
|
-
StatSailr provides a platform for users to focus on statistics. The backend statistics engine is [R](https://www.r-project.org/), so the results are reliable. The
|
3
|
+
StatSailr provides a platform for users to focus on statistics. The backend statistics engine is [R](https://www.r-project.org/), so the results are reliable. The StatSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. Each block has its way of writing instructions, which works as an intuitive interface for R.
|
4
4
|
|
5
5
|
|
6
6
|
## Overview
|
7
7
|
|
8
|
-
StatSailr is a Ruby program that enables users to manipulate data and to apply statistical procedures in an
|
8
|
+
StatSailr is a Ruby program that enables users to manipulate data and to apply statistical procedures in an intuitive way. StatSailr converts StatSailr script into R's internal representation, and executes it. The SataSailr script consists of three major blocks, TOPLEVEL, DATA, and PROC. TOPLEVEL loads and saves datasets. DATA blocks utilize DataSailr package as its backend, which enables wrting data manipulation instructions in a rowwise way. PROC blocks have a series of PROC instructions, which are converted to R functions and are executed sequentially.
|
9
9
|
|
10
10
|
|
11
11
|
### Quick Introduction
|
12
12
|
|
13
|
-
The following is an example of StatSailr script. It consists of TOPLEVEL
|
13
|
+
The following is an example of StatSailr script. It consists of TOPLEVEL, DATA and PROC blocks.
|
14
14
|
|
15
15
|
```
|
16
16
|
READ builtin="mtcars"
|
@@ -30,6 +30,8 @@ END
|
|
30
30
|
PROC REG data=new_mtcars
|
31
31
|
lm hp ~ powerful
|
32
32
|
END
|
33
|
+
|
34
|
+
SAVE new_mtcars file="./new_mtcars.rda" type="rdata"
|
33
35
|
```
|
34
36
|
|
35
37
|
Save this script as, say, create_new_mtcars.slr and run.
|
@@ -124,7 +126,7 @@ $ gem install statsailr
|
|
124
126
|
* Then, 'sailr' and 'sailrREPL' become available.
|
125
127
|
|
126
128
|
|
127
|
-
## Grammar of
|
129
|
+
## Grammar of StatSailr
|
128
130
|
|
129
131
|
StatSailr script consists of three parts, TOPLEVEL, DATA block and PROC block.
|
130
132
|
|
@@ -140,7 +142,7 @@ TOPLEVEL statements import and save datasets, and also StatSailr's current worki
|
|
140
142
|
|
141
143
|
Datasets can come from built-in datasets and files. In R, built-in datasets can be used by data() function, and StatSailr READ with 'builtin=' option does the same job.
|
142
144
|
|
143
|
-
When importing datasets from files, currently there are three types of files
|
145
|
+
When importing datasets from files, currently there are three types of files available, RDS, RDATA and CSV. RDS contains a single R object, and when you import it, you can name the object using 'as=' option. If you omit 'as=' option, the name is created based on the filename. RDATA can contain multiple R objects, but their names cannot be changed when importing that are decided when saving. CSV is a comma separated values file.
|
144
146
|
|
145
147
|
These dataset types are decided as follows. If you specify 'type' option, its type is used. If you do not specify it, it is inferred from the file extension.
|
146
148
|
|
@@ -162,7 +164,7 @@ SAVE new_mtcars file="./new_mtcars.csv" type="csv"
|
|
162
164
|
|
163
165
|
* show and change working directory
|
164
166
|
|
165
|
-
The concept of working directory is really important. If you run your
|
167
|
+
The concept of working directory is really important. If you run your StatSailr script in an unintentional place and output some data, those data might overwrite your important data.
|
166
168
|
|
167
169
|
The default working directory should be the directory where StatSailr script file exists. If you do not specify script file, such as when you run REPL, the default working directory should be the directory where you start your command (such as REPL).
|
168
170
|
|
@@ -176,7 +178,7 @@ SETWD "~/sailr_workspace"
|
|
176
178
|
|
177
179
|
### DATA block
|
178
180
|
|
179
|
-
DATA block starts with the line of DATA, new dataset name and DATA options. For DATA options, 'set=' option is required which specify the input dataset. (Note that
|
181
|
+
DATA block starts with the line of DATA, new dataset name and DATA options. For DATA options, 'set=' option is required which specify the input dataset. (Note that unlink PROC options where 'data=' usually specifies input dataset, "set=" does the same job in DATA block. This difference comes from just an aesthetic reason.) Lines that follws the first DATA line represent how to manipulate input dataset. The lines are written in DataSailr script. END keyword specifies the end of DATA block.
|
180
182
|
|
181
183
|
```
|
182
184
|
DATA new_dataset set=ori_dataset
|
@@ -194,24 +196,23 @@ The DataSailr script is described in detail at [its official website](https://da
|
|
194
196
|
|
195
197
|
Briefly speaking,
|
196
198
|
|
197
|
-
1.
|
198
|
-
+
|
199
|
+
1. Row by row dataset processing
|
200
|
+
+ Variable names correspond to column names.
|
199
201
|
2. Simplified available types
|
200
|
-
+ Int, Double and String
|
201
|
-
+ Regular expression and
|
202
|
+
+ Int, Double and String are basic types, that can be used in DataSailr script and also those values can be assigned to data sets.
|
203
|
+
+ Regular expression and Boolean do not affect dataset.
|
202
204
|
+ Regular expression is used for if condition and extracting substrings.
|
203
205
|
+ Boolean is internal type that is used for if condition.
|
204
|
-
3. Assignment operator (=) creates new column with the column name
|
206
|
+
3. Assignment operator (=) creates new column or updates the existing column with the same column name as the variable left-hand-side of assignment operator.
|
205
207
|
+ If the variable already exits, the column is updated.
|
206
|
-
+ Exceptions are assigning
|
207
|
-
4.
|
208
|
-
+ Condition part needs parentheses (), and statement part require curly braces.
|
208
|
+
+ Exceptions are assigning Regular Expressions, which do not modify dataset. This is used to reuse them at different lines of code.
|
209
|
+
4. If-else statement is the only control flow statement.
|
209
210
|
5. Arithmetic operators
|
210
211
|
6. Built-in functions
|
211
212
|
+ Mainly used to manipulate strings.
|
212
|
-
7. Regular
|
213
|
+
7. Regular Expression
|
213
214
|
8. UTF-8
|
214
|
-
+
|
215
|
+
+ It is highly recommended t use UTF-8 for script and dataset. Data set needs to be saved in UTF-8 beforehand.
|
215
216
|
9. push!() and discard!() built-in functions
|
216
217
|
+ push!() can create multiple rows from current row.
|
217
218
|
+ discard!() can filter out specific rows by being used with if statements.
|
@@ -228,7 +229,7 @@ The PROCs gem holds basic PROC settings, such as PRINT and PLOT, and its main cl
|
|
228
229
|
|
229
230
|
#### Format
|
230
231
|
|
231
|
-
A typical PROC block looks like the
|
232
|
+
A typical PROC block looks like the following. The first line start with PROC, followed by PROC command name and PROC options. The PROC first line is followed by a list of instructions with their main and optional arguments. The PROC block ends with END keyword.
|
232
233
|
|
233
234
|
```
|
234
235
|
PROC COMMAND proc_opts
|
@@ -241,7 +242,7 @@ END
|
|
241
242
|
* COMMAND
|
242
243
|
+ PROC command name
|
243
244
|
* proc_opts
|
244
|
-
+ This parameter can be
|
245
|
+
+ This parameter can be referred from any instructions in this block. In other words, this can be seen as global settings of this PROC block.
|
245
246
|
+ Internally, this parameter is managed by RBridge::ParamManager.
|
246
247
|
* proc_statement line
|
247
248
|
+ Each line consists of instruction, main argument and optional arguments. Main argument and optional arguments are separated by slash(/).
|
data/lib/statsailr/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: statsailr
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Toshihiro Umehara
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-02-
|
11
|
+
date: 2021-02-24 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: r_bridge
|