spout 0.11.0.beta2 → 0.11.0.beta3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +1 -0
- data/README.md +65 -30
- data/lib/spout/commands/deploy.rb +14 -3
- data/lib/spout/commands/help.rb +3 -1
- data/lib/spout/tasks/engine.rake +1 -1
- data/lib/spout/tests.rb +17 -8
- data/lib/spout/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0e8b31dbf0428ffd5c44e2cc701090313d57b5e8
|
4
|
+
data.tar.gz: d4d827b412188a66799084fca3fa306279264b39
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b078712373a6c6460bb791dd01c4a515bc9f787afc5d6d989fb1f9645b332c8ac1c6f6abf1a7cedcd51d694a08cecf77df06f792502d68de42a509779e0c2d51
|
7
|
+
data.tar.gz: e648d5ee7df044dc62bbfb64c6ef75cfba76aec05ef2e5f2f6f3f24270ce7bc8f424100e98288820ecde5377991ed40823002a1903bef96acb5bbd61801f6bda
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -4,9 +4,13 @@
|
|
4
4
|
[![Dependency Status](https://gemnasium.com/sleepepi/spout.svg)](https://gemnasium.com/sleepepi/spout)
|
5
5
|
[![Code Climate](https://codeclimate.com/github/sleepepi/spout/badges/gpa.svg)](https://codeclimate.com/github/sleepepi/spout)
|
6
6
|
|
7
|
-
Turn your CSV data dictionary into a JSON repository. Collaborate with others to
|
7
|
+
Turn your CSV data dictionary into a JSON repository. Collaborate with others to
|
8
|
+
update the data dictionary in JSON format. Generate new Data Dictionary from the
|
9
|
+
JSON repository. Test and validate your data dictionary using built-in tests, or
|
10
|
+
add your own tests and validations.
|
8
11
|
|
9
|
-
Spout has been used extensively to curate and clean datasets available on the
|
12
|
+
Spout has been used extensively to curate and clean datasets available on the
|
13
|
+
[National Sleep Research Resource](https://sleepdata.org).
|
10
14
|
|
11
15
|
## Installation
|
12
16
|
|
@@ -36,13 +40,19 @@ spout import data_dictionary.csv
|
|
36
40
|
|
37
41
|
The CSV should contain at minimal the two column headers:
|
38
42
|
|
39
|
-
`id`: This column will give the variable its name, and also be used to name the
|
43
|
+
`id`: This column will give the variable its name, and also be used to name the
|
44
|
+
file, i.e. `<id>.json`
|
40
45
|
|
41
|
-
`folder`: This can be blank, however it is used to place variables into a folder
|
46
|
+
`folder`: This can be blank, however it is used to place variables into a folder
|
47
|
+
hiearchy. The folder column can contain forward slashes `/` to place a variable
|
48
|
+
into a subfolder. An example may be, `id`: `myvarid`,
|
49
|
+
`folder`: `Demographics/Subfolder` would create a file
|
50
|
+
`variables/Demographics/Subfolder/myvarid.json`
|
42
51
|
|
43
52
|
Other columns that will be interpreted include:
|
44
53
|
|
45
|
-
`display_name`: The variable name as it is presented to the user. The display
|
54
|
+
`display_name`: The variable name as it is presented to the user. The display
|
55
|
+
name should be fit on a single line.
|
46
56
|
|
47
57
|
`description`: A longer description of the variable.
|
48
58
|
|
@@ -58,13 +68,19 @@ Other columns that will be interpreted include:
|
|
58
68
|
- `datetime`
|
59
69
|
- `file`
|
60
70
|
|
61
|
-
`domain`: The name of the domain that is associated with the variable.
|
71
|
+
`domain`: The name of the domain that is associated with the variable.
|
72
|
+
Typically, only variable of type `choices` have domains. These domains then
|
73
|
+
reside in `domains` folder.
|
62
74
|
|
63
|
-
`units`: A string of the associated that are appended to variable values, or
|
75
|
+
`units`: A string of the associated that are appended to variable values, or
|
76
|
+
added to coordinates in graphs representing the variable.
|
64
77
|
|
65
|
-
`calculation`: A calculation represented using algebraic expressions along with
|
78
|
+
`calculation`: A calculation represented using algebraic expressions along with
|
79
|
+
`id` of other variables.
|
66
80
|
|
67
|
-
`labels`: A series of different names for the variable that are semi-colon `;`
|
81
|
+
`labels`: A series of different names for the variable that are semi-colon `;`
|
82
|
+
separated. These labels are commonly synonyms, or related terms used primarily
|
83
|
+
for searching.
|
68
84
|
|
69
85
|
All other columns get grouped into a hash labeled `other`.
|
70
86
|
|
@@ -91,13 +107,15 @@ Other columns that are imported include:
|
|
91
107
|
|
92
108
|
### Test your repository
|
93
109
|
|
94
|
-
If you created your data dictionary repository using `spout new`, you can go
|
110
|
+
If you created your data dictionary repository using `spout new`, you can go
|
111
|
+
ahead and test using:
|
95
112
|
|
96
113
|
```
|
97
114
|
spout test
|
98
115
|
```
|
99
116
|
|
100
|
-
If not, you can add the following to your `test` directory to include all Spout
|
117
|
+
If not, you can add the following to your `test` directory to include all Spout
|
118
|
+
tests, or just a subset of Spout tests.
|
101
119
|
|
102
120
|
`test/dictionary_test.rb`
|
103
121
|
|
@@ -129,9 +147,11 @@ end
|
|
129
147
|
|
130
148
|
Then run either `spout test` or `bundle exec rake` to run your tests.
|
131
149
|
|
132
|
-
You can also use Spout iterators to create custom tests for variables, forms,
|
150
|
+
You can also use Spout iterators to create custom tests for variables, forms,
|
151
|
+
and domains in your data dictionary.
|
133
152
|
|
134
|
-
**Example Custom Test 1:** Test that `integer` and `numeric` variables have a
|
153
|
+
**Example Custom Test 1:** Test that `integer` and `numeric` variables have a
|
154
|
+
valid unit type
|
135
155
|
|
136
156
|
```ruby
|
137
157
|
class DictionaryTest < Minitest::Test
|
@@ -176,31 +196,41 @@ end
|
|
176
196
|
|
177
197
|
### Test your data dictionary coverage of your dataset
|
178
198
|
|
179
|
-
Spout lets you generate a nice visual coverage report that displays how well the
|
199
|
+
Spout lets you generate a nice visual coverage report that displays how well the
|
200
|
+
data dictionary covers your dataset. Place your dataset csvs into
|
201
|
+
`./csvs/<version>/` and then run the following Spout command:
|
180
202
|
|
181
203
|
```
|
182
204
|
spout coverage
|
183
205
|
```
|
184
206
|
|
185
|
-
This will generate an `index.html` file that can be opened and viewed in any
|
207
|
+
This will generate an `index.html` file that can be opened and viewed in any
|
208
|
+
browser.
|
186
209
|
|
187
|
-
Spout coverage validates that values stored in your dataset match up with
|
210
|
+
Spout coverage validates that values stored in your dataset match up with
|
211
|
+
variables and domains defined in your data dictionary.
|
188
212
|
|
189
213
|
### Identify outliers in your dataset
|
190
214
|
|
191
|
-
Spout lets you generate detect outliers in your underlying datasets. Place your
|
215
|
+
Spout lets you generate detect outliers in your underlying datasets. Place your
|
216
|
+
dataset csvs into `./csvs/<version>/` and then run the following Spout command:
|
192
217
|
|
193
218
|
```
|
194
219
|
spout outliers
|
195
220
|
```
|
196
221
|
|
197
|
-
This will generate an `outliers.html` file that can be opened and viewed in any
|
222
|
+
This will generate an `outliers.html` file that can be opened and viewed in any
|
223
|
+
browser.
|
198
224
|
|
199
|
-
Spout outliers computes the
|
225
|
+
Spout outliers computes the
|
226
|
+
[inner and outer fences](http://www.wikihow.com/Calculate-Outliers) to identify
|
227
|
+
minor and major outliers in the dataset.
|
200
228
|
|
201
229
|
### Create a CSV Data Dictionary from your JSON repository
|
202
230
|
|
203
|
-
Provide an optional version parameter to name the folder the CSVs will be
|
231
|
+
Provide an optional version parameter to name the folder the CSVs will be
|
232
|
+
generated in, defaults to what is in `VERSION` file, or if that does not
|
233
|
+
exist `1.0.0`.
|
204
234
|
|
205
235
|
```
|
206
236
|
spout export
|
@@ -220,7 +250,8 @@ spout graphs
|
|
220
250
|
|
221
251
|
This command generates JSON charts and tables of each variable in a dataset
|
222
252
|
|
223
|
-
Requires a Spout YAML configuration file, `.spout.yml`, in the root of the data
|
253
|
+
Requires a Spout YAML configuration file, `.spout.yml`, in the root of the data
|
254
|
+
dictionary that defines the variables used to create the charts:
|
224
255
|
|
225
256
|
- `visit`: This variable is used to separate subject encounters in a histogram
|
226
257
|
- `charts`: Array of choices, numeric, or integer variables for charts
|
@@ -238,15 +269,18 @@ charts:
|
|
238
269
|
title: Race
|
239
270
|
```
|
240
271
|
|
241
|
-
To only generate graphs for a few select variables, add the variable names after
|
272
|
+
To only generate graphs for a few select variables, add the variable names after
|
273
|
+
the `spout graphs` command.
|
242
274
|
|
243
|
-
For example, the command below will only generate graphs for the two variables
|
275
|
+
For example, the command below will only generate graphs for the two variables
|
276
|
+
`ahi` and `bmi`.
|
244
277
|
|
245
278
|
```
|
246
279
|
spout g ahi bmi
|
247
280
|
```
|
248
281
|
|
249
|
-
You can also specify a limit to the amount of rows to read in from the CSV files
|
282
|
+
You can also specify a limit to the amount of rows to read in from the CSV files
|
283
|
+
by specifying the `-rows` flag.
|
250
284
|
|
251
285
|
```
|
252
286
|
spout g --rows=10 ahi
|
@@ -255,7 +289,8 @@ spout g --rows=10 ahi
|
|
255
289
|
This will generate a graph for ahi for the first 10 rows of each dataset CSV.
|
256
290
|
|
257
291
|
|
258
|
-
This will generate charts and tables for each variable in the dataset plotted
|
292
|
+
This will generate charts and tables for each variable in the dataset plotted
|
293
|
+
against the variables listed under `charts`.
|
259
294
|
|
260
295
|
### Example Variable that references a Domain and a Form
|
261
296
|
|
@@ -308,7 +343,8 @@ This will generate charts and tables for each variable in the dataset plotted ag
|
|
308
343
|
spout deploy NAME
|
309
344
|
```
|
310
345
|
|
311
|
-
This command pushes a tagged version of the data dictionary to a webserver
|
346
|
+
This command pushes a tagged version of the data dictionary to a webserver
|
347
|
+
specified in the `.spout.yml` file.
|
312
348
|
|
313
349
|
```
|
314
350
|
webservers:
|
@@ -339,8 +375,9 @@ The following steps are run:
|
|
339
375
|
- `CHANGELOG.md` top line should include version, ex: `## 0.1.0`
|
340
376
|
- Git Repo should have zero uncommitted changes
|
341
377
|
- **Tests Pass**
|
342
|
-
- `spout t` passes for RC and FINAL versions
|
343
|
-
|
378
|
+
- `spout t` passes for RC and FINAL versions
|
379
|
+
- **Dataset Coverage Check**
|
380
|
+
- `spout c` passes for RC and FINAL versions
|
344
381
|
- **Graph Generation**
|
345
382
|
- `spout g` is run
|
346
383
|
- Graphs are pushed to server
|
@@ -350,6 +387,4 @@ The following steps are run:
|
|
350
387
|
- **Documentation Uploads**
|
351
388
|
- `README.md` and `KNOWNISSUES.md` are uploaded
|
352
389
|
- **Server-Side Updates**
|
353
|
-
- Server checks out branch of specified tag
|
354
|
-
- Server runs `load_data_dictionary!` for specified dataset slug
|
355
390
|
- Server refreshes dataset folder to reflect new dataset and data dictionaries
|
@@ -49,6 +49,9 @@ module Spout
|
|
49
49
|
@version = version
|
50
50
|
@skip_checks = !(argv.delete('--skip-checks').nil? && argv.delete('--no-checks').nil?)
|
51
51
|
|
52
|
+
@skip_tests = !(argv.delete('--skip-tests').nil? && argv.delete('--no-tests').nil?)
|
53
|
+
@skip_coverage = !(argv.delete('--skip-coverage').nil? && argv.delete('--no-coverage').nil?)
|
54
|
+
|
52
55
|
@skip_variables = !(argv.delete('--skip-variables').nil? && argv.delete('--no-variables').nil?)
|
53
56
|
@skip_dataset = !(argv.delete('--skip-dataset').nil? && argv.delete('--no-dataset').nil?)
|
54
57
|
@skip_dictionary = !(argv.delete('--skip-dictionary').nil? && argv.delete('--no-dictionary').nil?)
|
@@ -80,6 +83,7 @@ module Spout
|
|
80
83
|
config_file_load
|
81
84
|
version_check
|
82
85
|
test_check
|
86
|
+
coverage_check
|
83
87
|
user_authorization
|
84
88
|
upload_variables
|
85
89
|
dataset_uploads
|
@@ -176,12 +180,12 @@ module Spout
|
|
176
180
|
end
|
177
181
|
|
178
182
|
def test_check
|
179
|
-
if @
|
183
|
+
if @skip_tests
|
180
184
|
puts ' Spout Tests: ' + 'SKIP'.colorize(:blue)
|
181
185
|
return
|
182
186
|
end
|
183
187
|
|
184
|
-
print
|
188
|
+
print ' Spout Tests: '
|
185
189
|
|
186
190
|
stdout = quietly do
|
187
191
|
`spout t`
|
@@ -193,8 +197,15 @@ module Spout
|
|
193
197
|
message = "#{INDENT}spout t".colorize(:white) + " had errors or failures".colorize(:red) + "\n#{INDENT}Please fix all errors and failures and then run spout deploy again."
|
194
198
|
failure message
|
195
199
|
end
|
200
|
+
end
|
201
|
+
|
202
|
+
def coverage_check
|
203
|
+
if @skip_coverage
|
204
|
+
puts ' Dataset Coverage: ' + 'SKIP'.colorize(:blue)
|
205
|
+
return
|
206
|
+
end
|
196
207
|
|
197
|
-
puts '
|
208
|
+
puts ' Dataset Coverage: ' + 'NOT IMPLEMENTED'.colorize(:yellow)
|
198
209
|
end
|
199
210
|
|
200
211
|
def user_authorization
|
data/lib/spout/commands/help.rb
CHANGED
@@ -138,7 +138,9 @@ Optional Flags:
|
|
138
138
|
to a maximum of N rows
|
139
139
|
<variable> Only deploy specified variable(s)
|
140
140
|
Ex: spout deploy production age gender
|
141
|
-
--skip-checks Skip
|
141
|
+
--skip-checks Skip git tag and version checks
|
142
|
+
--skip-tests Skip data dictionary tests
|
143
|
+
--skip-coverage Skip dataset coverage check
|
142
144
|
--skip-variables Skip upload of dataset variables
|
143
145
|
--skip-dataset Skip upload of dataset CSVs
|
144
146
|
--skip-dictionary Skip upload of data dictionary
|
data/lib/spout/tasks/engine.rake
CHANGED
data/lib/spout/tests.rb
CHANGED
@@ -3,18 +3,16 @@ require 'json'
|
|
3
3
|
|
4
4
|
require 'minitest/autorun'
|
5
5
|
require 'minitest/reporters'
|
6
|
-
require 'ansi/code'
|
7
6
|
require 'colorize'
|
8
7
|
|
9
8
|
module Minitest
|
10
9
|
module Reporters
|
11
10
|
class SpoutReporter < BaseReporter
|
12
|
-
include ANSI::Code
|
13
11
|
include RelativePosition
|
14
12
|
|
15
13
|
def start
|
16
14
|
super
|
17
|
-
print
|
15
|
+
print 'Loaded Suite test'.colorize(:white)
|
18
16
|
puts
|
19
17
|
puts
|
20
18
|
puts 'Started'
|
@@ -23,13 +21,13 @@ module Minitest
|
|
23
21
|
|
24
22
|
def report
|
25
23
|
super
|
26
|
-
puts 'Finished in %.5f seconds.'
|
24
|
+
puts format('Finished in %.5f seconds.', total_time)
|
27
25
|
puts
|
28
|
-
print(
|
29
|
-
print(', %d assertions, '
|
26
|
+
print format('%d tests', count).colorize(:white)
|
27
|
+
print format(', %d assertions, ', assertions)
|
30
28
|
color = failures.zero? && errors.zero? ? :green : :red
|
31
|
-
print(
|
32
|
-
print(
|
29
|
+
print format('%d failures, %d errors, ', failures, errors).colorize(color)
|
30
|
+
print format('%d skips', skips).colorize(:yellow)
|
33
31
|
puts
|
34
32
|
puts
|
35
33
|
end
|
@@ -50,6 +48,17 @@ module Minitest
|
|
50
48
|
|
51
49
|
protected
|
52
50
|
|
51
|
+
def print_colored_status(test)
|
52
|
+
color = if test.passed?
|
53
|
+
:green
|
54
|
+
elsif test.skipped?
|
55
|
+
:yellow
|
56
|
+
else
|
57
|
+
:red
|
58
|
+
end
|
59
|
+
print pad_mark(result(test).to_s.upcase).colorize(color)
|
60
|
+
end
|
61
|
+
|
53
62
|
def before_suite(suite)
|
54
63
|
puts suite
|
55
64
|
end
|
data/lib/spout/version.rb
CHANGED