spout 0.11.0.beta2 → 0.11.0.beta3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +1 -0
- data/README.md +65 -30
- data/lib/spout/commands/deploy.rb +14 -3
- data/lib/spout/commands/help.rb +3 -1
- data/lib/spout/tasks/engine.rake +1 -1
- data/lib/spout/tests.rb +17 -8
- data/lib/spout/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0e8b31dbf0428ffd5c44e2cc701090313d57b5e8
|
4
|
+
data.tar.gz: d4d827b412188a66799084fca3fa306279264b39
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b078712373a6c6460bb791dd01c4a515bc9f787afc5d6d989fb1f9645b332c8ac1c6f6abf1a7cedcd51d694a08cecf77df06f792502d68de42a509779e0c2d51
|
7
|
+
data.tar.gz: e648d5ee7df044dc62bbfb64c6ef75cfba76aec05ef2e5f2f6f3f24270ce7bc8f424100e98288820ecde5377991ed40823002a1903bef96acb5bbd61801f6bda
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -4,9 +4,13 @@
|
|
4
4
|
[](https://gemnasium.com/sleepepi/spout)
|
5
5
|
[](https://codeclimate.com/github/sleepepi/spout)
|
6
6
|
|
7
|
-
Turn your CSV data dictionary into a JSON repository. Collaborate with others to
|
7
|
+
Turn your CSV data dictionary into a JSON repository. Collaborate with others to
|
8
|
+
update the data dictionary in JSON format. Generate new Data Dictionary from the
|
9
|
+
JSON repository. Test and validate your data dictionary using built-in tests, or
|
10
|
+
add your own tests and validations.
|
8
11
|
|
9
|
-
Spout has been used extensively to curate and clean datasets available on the
|
12
|
+
Spout has been used extensively to curate and clean datasets available on the
|
13
|
+
[National Sleep Research Resource](https://sleepdata.org).
|
10
14
|
|
11
15
|
## Installation
|
12
16
|
|
@@ -36,13 +40,19 @@ spout import data_dictionary.csv
|
|
36
40
|
|
37
41
|
The CSV should contain at minimal the two column headers:
|
38
42
|
|
39
|
-
`id`: This column will give the variable its name, and also be used to name the
|
43
|
+
`id`: This column will give the variable its name, and also be used to name the
|
44
|
+
file, i.e. `<id>.json`
|
40
45
|
|
41
|
-
`folder`: This can be blank, however it is used to place variables into a folder
|
46
|
+
`folder`: This can be blank, however it is used to place variables into a folder
|
47
|
+
hiearchy. The folder column can contain forward slashes `/` to place a variable
|
48
|
+
into a subfolder. An example may be, `id`: `myvarid`,
|
49
|
+
`folder`: `Demographics/Subfolder` would create a file
|
50
|
+
`variables/Demographics/Subfolder/myvarid.json`
|
42
51
|
|
43
52
|
Other columns that will be interpreted include:
|
44
53
|
|
45
|
-
`display_name`: The variable name as it is presented to the user. The display
|
54
|
+
`display_name`: The variable name as it is presented to the user. The display
|
55
|
+
name should be fit on a single line.
|
46
56
|
|
47
57
|
`description`: A longer description of the variable.
|
48
58
|
|
@@ -58,13 +68,19 @@ Other columns that will be interpreted include:
|
|
58
68
|
- `datetime`
|
59
69
|
- `file`
|
60
70
|
|
61
|
-
`domain`: The name of the domain that is associated with the variable.
|
71
|
+
`domain`: The name of the domain that is associated with the variable.
|
72
|
+
Typically, only variable of type `choices` have domains. These domains then
|
73
|
+
reside in `domains` folder.
|
62
74
|
|
63
|
-
`units`: A string of the associated that are appended to variable values, or
|
75
|
+
`units`: A string of the associated that are appended to variable values, or
|
76
|
+
added to coordinates in graphs representing the variable.
|
64
77
|
|
65
|
-
`calculation`: A calculation represented using algebraic expressions along with
|
78
|
+
`calculation`: A calculation represented using algebraic expressions along with
|
79
|
+
`id` of other variables.
|
66
80
|
|
67
|
-
`labels`: A series of different names for the variable that are semi-colon `;`
|
81
|
+
`labels`: A series of different names for the variable that are semi-colon `;`
|
82
|
+
separated. These labels are commonly synonyms, or related terms used primarily
|
83
|
+
for searching.
|
68
84
|
|
69
85
|
All other columns get grouped into a hash labeled `other`.
|
70
86
|
|
@@ -91,13 +107,15 @@ Other columns that are imported include:
|
|
91
107
|
|
92
108
|
### Test your repository
|
93
109
|
|
94
|
-
If you created your data dictionary repository using `spout new`, you can go
|
110
|
+
If you created your data dictionary repository using `spout new`, you can go
|
111
|
+
ahead and test using:
|
95
112
|
|
96
113
|
```
|
97
114
|
spout test
|
98
115
|
```
|
99
116
|
|
100
|
-
If not, you can add the following to your `test` directory to include all Spout
|
117
|
+
If not, you can add the following to your `test` directory to include all Spout
|
118
|
+
tests, or just a subset of Spout tests.
|
101
119
|
|
102
120
|
`test/dictionary_test.rb`
|
103
121
|
|
@@ -129,9 +147,11 @@ end
|
|
129
147
|
|
130
148
|
Then run either `spout test` or `bundle exec rake` to run your tests.
|
131
149
|
|
132
|
-
You can also use Spout iterators to create custom tests for variables, forms,
|
150
|
+
You can also use Spout iterators to create custom tests for variables, forms,
|
151
|
+
and domains in your data dictionary.
|
133
152
|
|
134
|
-
**Example Custom Test 1:** Test that `integer` and `numeric` variables have a
|
153
|
+
**Example Custom Test 1:** Test that `integer` and `numeric` variables have a
|
154
|
+
valid unit type
|
135
155
|
|
136
156
|
```ruby
|
137
157
|
class DictionaryTest < Minitest::Test
|
@@ -176,31 +196,41 @@ end
|
|
176
196
|
|
177
197
|
### Test your data dictionary coverage of your dataset
|
178
198
|
|
179
|
-
Spout lets you generate a nice visual coverage report that displays how well the
|
199
|
+
Spout lets you generate a nice visual coverage report that displays how well the
|
200
|
+
data dictionary covers your dataset. Place your dataset csvs into
|
201
|
+
`./csvs/<version>/` and then run the following Spout command:
|
180
202
|
|
181
203
|
```
|
182
204
|
spout coverage
|
183
205
|
```
|
184
206
|
|
185
|
-
This will generate an `index.html` file that can be opened and viewed in any
|
207
|
+
This will generate an `index.html` file that can be opened and viewed in any
|
208
|
+
browser.
|
186
209
|
|
187
|
-
Spout coverage validates that values stored in your dataset match up with
|
210
|
+
Spout coverage validates that values stored in your dataset match up with
|
211
|
+
variables and domains defined in your data dictionary.
|
188
212
|
|
189
213
|
### Identify outliers in your dataset
|
190
214
|
|
191
|
-
Spout lets you generate detect outliers in your underlying datasets. Place your
|
215
|
+
Spout lets you generate detect outliers in your underlying datasets. Place your
|
216
|
+
dataset csvs into `./csvs/<version>/` and then run the following Spout command:
|
192
217
|
|
193
218
|
```
|
194
219
|
spout outliers
|
195
220
|
```
|
196
221
|
|
197
|
-
This will generate an `outliers.html` file that can be opened and viewed in any
|
222
|
+
This will generate an `outliers.html` file that can be opened and viewed in any
|
223
|
+
browser.
|
198
224
|
|
199
|
-
Spout outliers computes the
|
225
|
+
Spout outliers computes the
|
226
|
+
[inner and outer fences](http://www.wikihow.com/Calculate-Outliers) to identify
|
227
|
+
minor and major outliers in the dataset.
|
200
228
|
|
201
229
|
### Create a CSV Data Dictionary from your JSON repository
|
202
230
|
|
203
|
-
Provide an optional version parameter to name the folder the CSVs will be
|
231
|
+
Provide an optional version parameter to name the folder the CSVs will be
|
232
|
+
generated in, defaults to what is in `VERSION` file, or if that does not
|
233
|
+
exist `1.0.0`.
|
204
234
|
|
205
235
|
```
|
206
236
|
spout export
|
@@ -220,7 +250,8 @@ spout graphs
|
|
220
250
|
|
221
251
|
This command generates JSON charts and tables of each variable in a dataset
|
222
252
|
|
223
|
-
Requires a Spout YAML configuration file, `.spout.yml`, in the root of the data
|
253
|
+
Requires a Spout YAML configuration file, `.spout.yml`, in the root of the data
|
254
|
+
dictionary that defines the variables used to create the charts:
|
224
255
|
|
225
256
|
- `visit`: This variable is used to separate subject encounters in a histogram
|
226
257
|
- `charts`: Array of choices, numeric, or integer variables for charts
|
@@ -238,15 +269,18 @@ charts:
|
|
238
269
|
title: Race
|
239
270
|
```
|
240
271
|
|
241
|
-
To only generate graphs for a few select variables, add the variable names after
|
272
|
+
To only generate graphs for a few select variables, add the variable names after
|
273
|
+
the `spout graphs` command.
|
242
274
|
|
243
|
-
For example, the command below will only generate graphs for the two variables
|
275
|
+
For example, the command below will only generate graphs for the two variables
|
276
|
+
`ahi` and `bmi`.
|
244
277
|
|
245
278
|
```
|
246
279
|
spout g ahi bmi
|
247
280
|
```
|
248
281
|
|
249
|
-
You can also specify a limit to the amount of rows to read in from the CSV files
|
282
|
+
You can also specify a limit to the amount of rows to read in from the CSV files
|
283
|
+
by specifying the `-rows` flag.
|
250
284
|
|
251
285
|
```
|
252
286
|
spout g --rows=10 ahi
|
@@ -255,7 +289,8 @@ spout g --rows=10 ahi
|
|
255
289
|
This will generate a graph for ahi for the first 10 rows of each dataset CSV.
|
256
290
|
|
257
291
|
|
258
|
-
This will generate charts and tables for each variable in the dataset plotted
|
292
|
+
This will generate charts and tables for each variable in the dataset plotted
|
293
|
+
against the variables listed under `charts`.
|
259
294
|
|
260
295
|
### Example Variable that references a Domain and a Form
|
261
296
|
|
@@ -308,7 +343,8 @@ This will generate charts and tables for each variable in the dataset plotted ag
|
|
308
343
|
spout deploy NAME
|
309
344
|
```
|
310
345
|
|
311
|
-
This command pushes a tagged version of the data dictionary to a webserver
|
346
|
+
This command pushes a tagged version of the data dictionary to a webserver
|
347
|
+
specified in the `.spout.yml` file.
|
312
348
|
|
313
349
|
```
|
314
350
|
webservers:
|
@@ -339,8 +375,9 @@ The following steps are run:
|
|
339
375
|
- `CHANGELOG.md` top line should include version, ex: `## 0.1.0`
|
340
376
|
- Git Repo should have zero uncommitted changes
|
341
377
|
- **Tests Pass**
|
342
|
-
- `spout t` passes for RC and FINAL versions
|
343
|
-
|
378
|
+
- `spout t` passes for RC and FINAL versions
|
379
|
+
- **Dataset Coverage Check**
|
380
|
+
- `spout c` passes for RC and FINAL versions
|
344
381
|
- **Graph Generation**
|
345
382
|
- `spout g` is run
|
346
383
|
- Graphs are pushed to server
|
@@ -350,6 +387,4 @@ The following steps are run:
|
|
350
387
|
- **Documentation Uploads**
|
351
388
|
- `README.md` and `KNOWNISSUES.md` are uploaded
|
352
389
|
- **Server-Side Updates**
|
353
|
-
- Server checks out branch of specified tag
|
354
|
-
- Server runs `load_data_dictionary!` for specified dataset slug
|
355
390
|
- Server refreshes dataset folder to reflect new dataset and data dictionaries
|
@@ -49,6 +49,9 @@ module Spout
|
|
49
49
|
@version = version
|
50
50
|
@skip_checks = !(argv.delete('--skip-checks').nil? && argv.delete('--no-checks').nil?)
|
51
51
|
|
52
|
+
@skip_tests = !(argv.delete('--skip-tests').nil? && argv.delete('--no-tests').nil?)
|
53
|
+
@skip_coverage = !(argv.delete('--skip-coverage').nil? && argv.delete('--no-coverage').nil?)
|
54
|
+
|
52
55
|
@skip_variables = !(argv.delete('--skip-variables').nil? && argv.delete('--no-variables').nil?)
|
53
56
|
@skip_dataset = !(argv.delete('--skip-dataset').nil? && argv.delete('--no-dataset').nil?)
|
54
57
|
@skip_dictionary = !(argv.delete('--skip-dictionary').nil? && argv.delete('--no-dictionary').nil?)
|
@@ -80,6 +83,7 @@ module Spout
|
|
80
83
|
config_file_load
|
81
84
|
version_check
|
82
85
|
test_check
|
86
|
+
coverage_check
|
83
87
|
user_authorization
|
84
88
|
upload_variables
|
85
89
|
dataset_uploads
|
@@ -176,12 +180,12 @@ module Spout
|
|
176
180
|
end
|
177
181
|
|
178
182
|
def test_check
|
179
|
-
if @
|
183
|
+
if @skip_tests
|
180
184
|
puts ' Spout Tests: ' + 'SKIP'.colorize(:blue)
|
181
185
|
return
|
182
186
|
end
|
183
187
|
|
184
|
-
print
|
188
|
+
print ' Spout Tests: '
|
185
189
|
|
186
190
|
stdout = quietly do
|
187
191
|
`spout t`
|
@@ -193,8 +197,15 @@ module Spout
|
|
193
197
|
message = "#{INDENT}spout t".colorize(:white) + " had errors or failures".colorize(:red) + "\n#{INDENT}Please fix all errors and failures and then run spout deploy again."
|
194
198
|
failure message
|
195
199
|
end
|
200
|
+
end
|
201
|
+
|
202
|
+
def coverage_check
|
203
|
+
if @skip_coverage
|
204
|
+
puts ' Dataset Coverage: ' + 'SKIP'.colorize(:blue)
|
205
|
+
return
|
206
|
+
end
|
196
207
|
|
197
|
-
puts '
|
208
|
+
puts ' Dataset Coverage: ' + 'NOT IMPLEMENTED'.colorize(:yellow)
|
198
209
|
end
|
199
210
|
|
200
211
|
def user_authorization
|
data/lib/spout/commands/help.rb
CHANGED
@@ -138,7 +138,9 @@ Optional Flags:
|
|
138
138
|
to a maximum of N rows
|
139
139
|
<variable> Only deploy specified variable(s)
|
140
140
|
Ex: spout deploy production age gender
|
141
|
-
--skip-checks Skip
|
141
|
+
--skip-checks Skip git tag and version checks
|
142
|
+
--skip-tests Skip data dictionary tests
|
143
|
+
--skip-coverage Skip dataset coverage check
|
142
144
|
--skip-variables Skip upload of dataset variables
|
143
145
|
--skip-dataset Skip upload of dataset CSVs
|
144
146
|
--skip-dictionary Skip upload of data dictionary
|
data/lib/spout/tasks/engine.rake
CHANGED
data/lib/spout/tests.rb
CHANGED
@@ -3,18 +3,16 @@ require 'json'
|
|
3
3
|
|
4
4
|
require 'minitest/autorun'
|
5
5
|
require 'minitest/reporters'
|
6
|
-
require 'ansi/code'
|
7
6
|
require 'colorize'
|
8
7
|
|
9
8
|
module Minitest
|
10
9
|
module Reporters
|
11
10
|
class SpoutReporter < BaseReporter
|
12
|
-
include ANSI::Code
|
13
11
|
include RelativePosition
|
14
12
|
|
15
13
|
def start
|
16
14
|
super
|
17
|
-
print
|
15
|
+
print 'Loaded Suite test'.colorize(:white)
|
18
16
|
puts
|
19
17
|
puts
|
20
18
|
puts 'Started'
|
@@ -23,13 +21,13 @@ module Minitest
|
|
23
21
|
|
24
22
|
def report
|
25
23
|
super
|
26
|
-
puts 'Finished in %.5f seconds.'
|
24
|
+
puts format('Finished in %.5f seconds.', total_time)
|
27
25
|
puts
|
28
|
-
print(
|
29
|
-
print(', %d assertions, '
|
26
|
+
print format('%d tests', count).colorize(:white)
|
27
|
+
print format(', %d assertions, ', assertions)
|
30
28
|
color = failures.zero? && errors.zero? ? :green : :red
|
31
|
-
print(
|
32
|
-
print(
|
29
|
+
print format('%d failures, %d errors, ', failures, errors).colorize(color)
|
30
|
+
print format('%d skips', skips).colorize(:yellow)
|
33
31
|
puts
|
34
32
|
puts
|
35
33
|
end
|
@@ -50,6 +48,17 @@ module Minitest
|
|
50
48
|
|
51
49
|
protected
|
52
50
|
|
51
|
+
def print_colored_status(test)
|
52
|
+
color = if test.passed?
|
53
|
+
:green
|
54
|
+
elsif test.skipped?
|
55
|
+
:yellow
|
56
|
+
else
|
57
|
+
:red
|
58
|
+
end
|
59
|
+
print pad_mark(result(test).to_s.upcase).colorize(color)
|
60
|
+
end
|
61
|
+
|
53
62
|
def before_suite(suite)
|
54
63
|
puts suite
|
55
64
|
end
|
data/lib/spout/version.rb
CHANGED