spout 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 25fe4f0b1ed9242f0e8d1689e52bc372e285161f
4
- data.tar.gz: e69ffdc296da47898fddf7f1a44d25ab3eb49b9c
3
+ metadata.gz: b17c5cb1d0059611bb4ebb9807ffb83f5015a765
4
+ data.tar.gz: 3c3e5d7493590ecfffcb6a7534d73b4ce0f56bee
5
5
  SHA512:
6
- metadata.gz: 00b32e110a7dd21da41d5a7738f9802d33b7bf607fb581821e6fc375b992e9bc80abc10be3da2f140294363742f4ef89c2a8b8df97dfd07aca248f212a087586
7
- data.tar.gz: c62938092d52b3a67318139fe2a33cacf836c67a2d9b6d46674ec17d02e546139b4acb456dfb7ebec1aa5a933f9c33e8ba8147184b332aa24b19231e729f6b30
6
+ metadata.gz: 50846a11d41e8654a283a5836bd9d5496745bdf4c0707e2a7494ceca86633ac743ac2f9fda054185a5ebe6f0c72bf218414c4b7d71fa05deda1541ab368f2bad
7
+ data.tar.gz: 08da58a7d5533b2210144e66d30158e825131a24f968f956cd7d64aae584084b3c97aed6acdb00d47351132158ca44dbab0aa9862874681d010ba4d08438d8b3
data/CHANGELOG.md CHANGED
@@ -1,4 +1,11 @@
1
- ## 0.1.0
1
+ ## 0.2.0 (June 26, 2013)
2
+
3
+ ### Enhancements
4
+ - Domains can now be imported using `spout import_domains CSVFILE`
5
+ - Data Dictionary can now be exported to the Hybrid data dictionary CSV format along with an optional version number:
6
+ - `spout hybrid [1.0.0]`
7
+
8
+ ## 0.1.0 (May 21, 2013)
2
9
 
3
10
  ### Enhancements
4
11
  - Existing Data Dictionaries can be converted to JSON format from a CSV file
data/README.md CHANGED
@@ -34,7 +34,7 @@ spout import data_dictionary.csv
34
34
 
35
35
  The CSV should contain at minimal the two column headers:
36
36
 
37
- `id`: This column will give the variable it's name, and also be used to name the file, i.e. `<id>.json`
37
+ `id`: This column will give the variable its name, and also be used to name the file, i.e. `<id>.json`
38
38
  `folder`: This can be blank, however it is used to place variables into a folder hiearchy. The folder column can contain forward slashes `/` to place a variable into a subfolder. An example may be, `id`: `myvarid`, `folder`: `Demographics/Subfolder` would create a file `variables/Demographics/Subfolder/myvarid.json`
39
39
 
40
40
  Other columns that will be interpreted include:
@@ -64,6 +64,24 @@ Other columns that will be interpreted include:
64
64
 
65
65
  All other columns get grouped into a hash labeled `other`.
66
66
 
67
+ #### Importing domains from an existing CSV file
68
+
69
+ ```
70
+ spout import_domains data_dictionary_domains.csv
71
+ ```
72
+
73
+ The CSV should contain at minimal three column headers:
74
+
75
+ `domain_id`: The name of the associated domain for the choice/option.
76
+ `value`: The value of the choice/option.
77
+ `display_name`: The display name of the choice/option.
78
+
79
+ Other columns that are imported include:
80
+
81
+ `description`: A longer description of the choice/option.
82
+ `folder`: The name of the folder path where the domain resides.
83
+
84
+
67
85
  ### Test your repository
68
86
 
69
87
  If you created your data dictionary repository using `spout new`, you can go ahead and test using:
@@ -99,7 +117,7 @@ Then run either `spout test` or `bundle exec rake` to run your tests.
99
117
 
100
118
  ### Create a CSV Data Dictionary from your JSON repository
101
119
 
102
- Provide an optional version parameter to name the folder the CSVs will be generated in, defaults to 1.0.0 currently.
120
+ Provide an optional version parameter to name the folder the CSVs will be generated in, defaults to 1.0.0.
103
121
 
104
122
  ```
105
123
  spout export
@@ -116,3 +134,18 @@ or
116
134
  ```
117
135
  bundle exec rake dd:create [VERSION=1.0.0]
118
136
  ```
137
+
138
+
139
+ ### Export to the Hybrid Data Dictionary format from your JSON repository
140
+
141
+ Exporting to a format compatible with [Hybrid](https://github.com/sleepepi/hybrid) is also available.
142
+
143
+ ```
144
+ spout hybrid
145
+ ```
146
+
147
+ You can optionally provide a version string
148
+
149
+ ```
150
+ spout hybrid [1.0.0]
151
+ ```
data/lib/spout/actions.rb CHANGED
@@ -11,8 +11,12 @@ module Spout
11
11
  system "bundle exec rake"
12
12
  when 'import', 'i', 'im', 'imp', '--import', '-i', '-im', '-imp'
13
13
  import_from_csv(argv)
14
+ when 'import_domain', '--import_domain', 'import_domains', '--import_domains'
15
+ import_from_csv(argv, 'domains')
14
16
  when 'export', 'e', 'ex', 'exp', '--export', '-e', '-ex', '-exp'
15
17
  new_data_dictionary_export(argv)
18
+ when 'hybrid', '-hybrid', '--hybrid', 'y', 'hy', '-y', '-hy'
19
+ new_data_dictionary_export(argv, 'hybrid')
16
20
  else
17
21
  help
18
22
  end
@@ -20,7 +24,7 @@ module Spout
20
24
 
21
25
  protected
22
26
 
23
- def import_from_csv(argv)
27
+ def csv_usage
24
28
  usage = <<-EOT
25
29
 
26
30
  Usage: spout import CSVFILE
@@ -28,12 +32,15 @@ Usage: spout import CSVFILE
28
32
  The CSVFILE must be the location of a valid CSV file.
29
33
 
30
34
  EOT
35
+ usage
36
+ end
31
37
 
38
+ def import_from_csv(argv, type = "")
32
39
  csv_file = File.join(argv[1].to_s.strip)
33
40
  if File.exists?(csv_file)
34
- system "bundle exec rake dd:import CSV=#{csv_file}"
41
+ system "bundle exec rake dd:import CSV=#{csv_file} #{'TYPE='+type if type.to_s != ''}"
35
42
  else
36
- puts usage
43
+ puts csv_usage
37
44
  end
38
45
  end
39
46
 
@@ -43,23 +50,28 @@ EOT
43
50
  Usage: spout COMMAND [ARGS]
44
51
 
45
52
  The most common spout commands are:
46
- [n]ew Create a new Spout dictionary. "spout new my_dd" creates a
47
- new data dictionary called MyDD in "./my_dd"
48
- [t]est Running the test file
49
- [i]mport Import a CSV file into the JSON repository
50
- [e]xport Export the JSON respository to a CSV
51
- [v]ersion Returns the version of Spout
52
-
53
- Each command can be referenced by the first letter: Ex: `spout t`, for test
53
+ [n]ew Create a new Spout dictionary.
54
+ "spout new my_dd" creates a new data
55
+ dictionary called MyDD in "./my_dd"
56
+ [t]est Running the test file
57
+ [i]mport Import a CSV file into the JSON repository
58
+ [e]xport [1.0.0] Export the JSON respository to a CSV
59
+ h[y]brid [1.0.0] Export the JSON repository in the Hybrid
60
+ Dictionary format
61
+ [v]ersion Returns the version of Spout
62
+
63
+ Commands can be referenced by the first letter:
64
+ Ex: `spout t`, for test
54
65
 
55
66
  EOT
56
67
  puts help_message
57
68
  end
58
69
 
59
- def new_data_dictionary_export(argv)
70
+ def new_data_dictionary_export(argv, type = '')
60
71
  version = argv[1].to_s.gsub(/[^a-zA-Z0-9\.-]/, '_').strip
61
72
  version_string = (version == '' ? "" : "VERSION=#{version}")
62
- system "bundle exec rake dd:create #{version_string}"
73
+ type_string = type.to_s == '' ? "" : "TYPE=#{type}"
74
+ system "bundle exec rake dd:create #{version_string} #{type_string}"
63
75
  end
64
76
 
65
77
  def new_template_dictionary(argv)
@@ -19,75 +19,231 @@ namespace :dd do
19
19
  desc 'Create Data Dictionary from repository'
20
20
  task :create do
21
21
 
22
- folder = "dd/#{ENV['VERSION'] || Spout::Application.new.version}"
22
+ folder = "dd/#{ENV['VERSION'] || '1.0.0'}"
23
23
  FileUtils.mkpath folder
24
24
 
25
- CSV.open("#{folder}/variables.csv", "wb") do |csv|
26
- keys = %w(id display_name description type units domain labels calculation)
27
- csv << ['folder'] + keys
28
- Dir.glob("variables/**/*.json").each do |file|
29
- if json = JSON.parse(File.read(file)) rescue false
30
- variable_folder = file.gsub(/variables\//, '').split('/')[0..-2].join('/')
31
- csv << [variable_folder] + keys.collect{|key| json[key].kind_of?(Array) ? json[key].join(';') : json[key].to_s}
32
- end
33
- end
34
- end
35
- CSV.open("#{folder}/domains.csv", "wb") do |csv|
36
- keys = %w(value display_name description)
37
- csv << ['folder', 'id'] + keys
38
- Dir.glob("domains/**/*.json").each do |file|
39
- if json = JSON.parse(File.read(file)) rescue false
40
- domain_folder = file.gsub(/domains\//, '').split('/')[0..-2].join('/')
41
- domain_name = file.gsub(/domains\//, '').split('/').last.to_s.gsub(/.json/, '')
42
- json.each do |hash|
43
- csv << [domain_folder, domain_name] + keys.collect{|key| hash[key]}
44
- end
45
- end
46
- end
25
+ case ENV['TYPE']
26
+ when 'hybrid'
27
+ hybrid_export(folder)
28
+ else
29
+ standard_export(folder)
47
30
  end
48
31
 
32
+
49
33
  puts "Data Dictionary Created in #{folder}"
50
34
  end
51
35
 
52
36
  desc 'Initialize JSON repository from a CSV file: CSV=datadictionary.csv'
53
37
  task :import do
54
- additional_csv_info = "\n\nFor additional information on specifying CSV column headers before import see:\n\n " + "https://github.com/sleepepi/spout#generate-a-new-repository-from-an-existing-csv-file".colorize( :light_cyan ) + "\n\n"
55
-
56
38
  puts ENV['CSV'].inspect
57
39
  if File.exists?(ENV['CSV'].to_s)
58
- CSV.parse( File.open(ENV['CSV'].to_s, 'r:iso-8859-1:utf-8'){|f| f.read}, headers: true ) do |line|
59
- row = line.to_hash
60
- if not row.keys.include?('id')
61
- puts "\nMissing column header `".colorize( :red ) + "id".colorize( :light_cyan ) + "` in data dictionary.".colorize( :red ) + additional_csv_info
62
- exit(1)
40
+ ENV['TYPE'] == 'domains' ? import_domains : import_variables
41
+ else
42
+ puts "\nPlease specify a valid CSV file.".colorize( :red ) + additional_csv_info
43
+ end
44
+ end
45
+ end
46
+
47
+ def standard_export(folder)
48
+ CSV.open("#{folder}/variables.csv", "wb") do |csv|
49
+ keys = %w(id display_name description type units domain labels calculation)
50
+ csv << ['folder'] + keys
51
+ Dir.glob("variables/**/*.json").each do |file|
52
+ if json = JSON.parse(File.read(file)) rescue false
53
+ variable_folder = variable_folder_path(file)
54
+ csv << [variable_folder] + keys.collect{|key| json[key].kind_of?(Array) ? json[key].join(';') : json[key].to_s}
55
+ end
56
+ end
57
+ end
58
+ CSV.open("#{folder}/domains.csv", "wb") do |csv|
59
+ keys = %w(value display_name description)
60
+ csv << ['folder', 'domain_id'] + keys
61
+ Dir.glob("domains/**/*.json").each do |file|
62
+ if json = JSON.parse(File.read(file)) rescue false
63
+ domain_folder = domain_folder_path(file)
64
+ domain_name = extract_domain_name(file)
65
+ json.each do |hash|
66
+ csv << [domain_folder, domain_name] + keys.collect{|key| hash[key]}
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
72
+
73
+ def extract_domain_name(file)
74
+ file.gsub(/domains\//, '').split('/').last.to_s.gsub(/.json/, '')
75
+ end
76
+
77
+ def domain_folder_path(file)
78
+ file.gsub(/domains\//, '').split('/')[0..-2].join('/')
79
+ end
80
+
81
+ def variable_folder_path(file)
82
+ file.gsub(/variables\//, '').split('/')[0..-2].join('/')
83
+ end
84
+
85
+ def hybrid_concept_type(json)
86
+ if json['hybrid'] and json['hybrid']['type'].to_s != ''
87
+ json['hybrid']['type']
88
+ else
89
+ hybrid_concept_type_map(json['type'])
90
+ end
91
+ end
92
+
93
+ def hybrid_concept_type_map(variable_type)
94
+ hybrid_types = { "choices" => "categorical",
95
+ "numeric" => "continuous",
96
+ "integer" => "continuous",
97
+ "string" => "free text",
98
+ "text" => "free text",
99
+ "date" => "datetime",
100
+ "time" => "datetime",
101
+ "file" => "free text" }
102
+ hybrid_types[variable_type] || variable_type
103
+ end
104
+
105
+ def hybrid_property(json, property)
106
+ json['hybrid'] ? json['hybrid'][property] : ''
107
+ end
108
+
109
+ def hybrid_export(folder)
110
+ domain_parents = {}
111
+ CSV.open("#{folder}/hybrid.csv", "wb") do |csv|
112
+ csv << ["Folder", "Short Name", "Description", "Concept Type", "Units", "Terms", "Internal Terms", "Parents", "Children", "Field Values", "Sensitivity", "Display Name", "Commonly Used", "Calculation", "Source Name", "Source File"]
113
+ Dir.glob("variables/**/*.json").each do |file|
114
+ if json = JSON.parse(File.read(file)) rescue false
115
+ if json['domain'].to_s != ''
116
+ domain_parents[json['domain'].to_s.downcase] ||= []
117
+ domain_parents[json['domain'].to_s.downcase] << json['id'].to_s
63
118
  end
64
- next if row['id'] == ''
65
- folder = File.join('variables', row.delete('folder').to_s)
66
- FileUtils.mkpath folder
67
- hash = {}
68
- id = row.delete('id')
69
- hash['id'] = id
70
- hash['display_name'] = row.delete('display_name')
71
- hash['description'] = row.delete('description').to_s
72
- hash['type'] = row.delete('type')
73
- domain = row.delete('domain').to_s
74
- hash['domain'] = domain if domain != ''
75
- units = row.delete('units').to_s
76
- hash['units'] = units if units != ''
77
- calculation = row.delete('calculation').to_s
78
- hash['calculation'] = calculation if calculation != ''
79
- labels = row.delete('labels').to_s.split(';')
80
- hash['labels'] = labels if labels.size > 0
81
- hash['other'] = row unless row.empty?
82
-
83
- file_name = File.join(folder, id.downcase + '.json')
84
- File.open(file_name, 'w') do |file|
85
- file.write(JSON.pretty_generate(hash))
119
+ row = [
120
+ variable_folder_path(file), # Folder
121
+ json['id'], # Short Name
122
+ json['description'], # Description
123
+ hybrid_concept_type(json), # Concept Type
124
+ json['units'], # Units
125
+ (json['labels'] || []).join(';'), # Terms
126
+ '', # Internal Terms
127
+ '', # Parents
128
+ '', # Children
129
+ '', # Field Values
130
+ hybrid_property(json, 'access level'), # Sensitivity
131
+ json['display_name'], # Display Name
132
+ hybrid_property(json, 'most commonly used'), # Commonly Used
133
+ json['calculation'], # Calculation
134
+ hybrid_property(json, 'SOURCE'), # Source Name
135
+ hybrid_property(json, 'filename') # Source File
136
+ ]
137
+ csv << row
138
+ end
139
+ end
140
+ Dir.glob("domains/**/*.json").each do |file|
141
+ if json = JSON.parse(File.read(file)) rescue false
142
+ json.each do |option|
143
+ row = [
144
+ domain_folder_path(file), # Folder
145
+ extract_domain_name(file)+'_'+option['value'].to_s, # Short Name
146
+ option['description'], # Description
147
+ 'boolean', # Concept Type
148
+ '', # Units
149
+ '', # Terms
150
+ option['value'], # Internal Terms
151
+ (domain_parents[extract_domain_name(file).downcase] || []).join(';'), # Parents
152
+ '', # Children
153
+ '', # Field Values
154
+ '0', # Sensitivity
155
+ option['display_name'], # Display Name
156
+ '', # Commonly Used
157
+ '', # Calculation
158
+ ]
159
+ csv << row
86
160
  end
87
- puts " create".colorize( :green ) + " #{file_name}"
88
161
  end
89
- else
90
- puts "\nPlease specify a valid CSV file.".colorize( :red ) + additional_csv_info
91
162
  end
92
163
  end
93
164
  end
165
+
166
+ def import_variables
167
+ CSV.parse( File.open(ENV['CSV'].to_s, 'r:iso-8859-1:utf-8'){|f| f.read}, headers: true ) do |line|
168
+ row = line.to_hash
169
+ if not row.keys.include?('id')
170
+ puts "\nMissing column header `".colorize( :red ) + "id".colorize( :light_cyan ) + "` in data dictionary.".colorize( :red ) + additional_csv_info
171
+ exit(1)
172
+ end
173
+ next if row['id'] == ''
174
+ folder = File.join('variables', row.delete('folder').to_s)
175
+ FileUtils.mkpath folder
176
+ hash = {}
177
+ id = row.delete('id')
178
+ hash['id'] = id
179
+ hash['display_name'] = row.delete('display_name')
180
+ hash['description'] = row.delete('description').to_s
181
+ hash['type'] = row.delete('type')
182
+ domain = row.delete('domain').to_s
183
+ hash['domain'] = domain if domain != ''
184
+ units = row.delete('units').to_s
185
+ hash['units'] = units if units != ''
186
+ calculation = row.delete('calculation').to_s
187
+ hash['calculation'] = calculation if calculation != ''
188
+ labels = row.delete('labels').to_s.split(';')
189
+ hash['labels'] = labels if labels.size > 0
190
+ hash['other'] = row unless row.empty?
191
+
192
+ file_name = File.join(folder, id.downcase + '.json')
193
+ File.open(file_name, 'w') do |file|
194
+ file.write(JSON.pretty_generate(hash) + "\n")
195
+ end
196
+ puts " create".colorize( :green ) + " #{file_name}"
197
+ end
198
+ end
199
+
200
+ def import_domains
201
+ domains = {}
202
+
203
+ CSV.parse( File.open(ENV['CSV'].to_s, 'r:iso-8859-1:utf-8'){|f| f.read}, headers: true ) do |line|
204
+ row = line.to_hash
205
+ if not row.keys.include?('domain_id')
206
+ puts "\nMissing column header `".colorize( :red ) + "domain_id".colorize( :light_cyan ) + "` in data dictionary.".colorize( :red ) + additional_csv_info
207
+ exit(1)
208
+ end
209
+ if not row.keys.include?('value')
210
+ puts "\nMissing column header `".colorize( :red ) + "value".colorize( :light_cyan ) + "` in data dictionary.".colorize( :red ) + additional_csv_info
211
+ exit(1)
212
+ end
213
+ if not row.keys.include?('display_name')
214
+ puts "\nMissing column header `".colorize( :red ) + "display_name".colorize( :light_cyan ) + "` in data dictionary.".colorize( :red ) + additional_csv_info
215
+ exit(1)
216
+ end
217
+
218
+ next if row['domain_id'].to_s == '' or row['value'].to_s == '' or row['display_name'].to_s == ''
219
+ folder = File.join('domains', row['folder'].to_s).gsub(/[^a-zA-Z0-9_\/\.-]/, '_')
220
+ domain_name = row['domain_id'].to_s.gsub(/[^a-zA-Z0-9_\/\.-]/, '_')
221
+ domains[domain_name] ||= {}
222
+ domains[domain_name]["folder"] = folder
223
+ domains[domain_name]["options"] ||= []
224
+
225
+ hash = {}
226
+ hash['value'] = row.delete('value').to_s
227
+ hash['display_name'] = row.delete('display_name').to_s
228
+ hash['description'] = row.delete('description').to_s
229
+
230
+ domains[domain_name]["options"] << hash
231
+ end
232
+
233
+ domains.each do |domain_name, domain_hash|
234
+ folder = domain_hash["folder"]
235
+ FileUtils.mkpath folder
236
+
237
+ file_name = File.join(folder, domain_name.downcase + '.json')
238
+
239
+ File.open(file_name, 'w') do |file|
240
+ file.write(JSON.pretty_generate(domain_hash["options"]) + "\n")
241
+ end
242
+ puts " create".colorize( :green ) + " #{file_name}"
243
+ end
244
+
245
+ end
246
+
247
+ def additional_csv_info
248
+ "\n\nFor additional information on specifying CSV column headers before import see:\n\n " + "https://github.com/sleepepi/spout#generate-a-new-repository-from-an-existing-csv-file".colorize( :light_cyan ) + "\n\n"
249
+ end
data/lib/spout/version.rb CHANGED
@@ -1,7 +1,7 @@
1
1
  module Spout
2
2
  module VERSION #:nodoc:
3
3
  MAJOR = 0
4
- MINOR = 1
4
+ MINOR = 2
5
5
  TINY = 0
6
6
  BUILD = nil # nil, "pre", "rc", "rc2"
7
7
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: spout
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Remo Mueller
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-05-21 00:00:00.000000000 Z
11
+ date: 2013-06-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake