spout 0.8.0.beta1 → 0.8.0.beta2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 55751ba1e111b174e9501fb52b1dd9a484b9ee1c
4
- data.tar.gz: ca81a991a37e425a4166dd28e6aeffa183378b1c
3
+ metadata.gz: 9e4150e7322cba034877fabe6e4a6cec8bdcd444
4
+ data.tar.gz: 25d7aba466f6667316378a3582f6625c1f541bf7
5
5
  SHA512:
6
- metadata.gz: 6fec0a99eeb2b26af88652c7b6b79b3d5de8b750593aac2196275e1a1d365dda0814f47343ac312adaf9bb41bd29065a5d6dee8092de8927bdd41136d6ffbca0
7
- data.tar.gz: 1d80934213ba8633506850aeabc142764f578d6fa727ffe7ac5ac76cc07422b4e0759d53e7200c5c686075077d3a37cd1101d7b650294d147db5a2f38f1e43eb
6
+ metadata.gz: 04137f2579d7da8bf3bcc5ad4105d7ccfe78799656015bbdc0b3e0b34a2503522eb90312919969dad7ec0fbaa6f60231aec7f90bda1e8bbfd8817b28e06e6aea
7
+ data.tar.gz: 98fae07b2e79ee985c0ef849b6e0e2706c824d0f7d5c767ca2c97b24523fefd92da973b4ccb762273c814a2843f365dadbca5535a1d7f3f388aa1d5f9ead8336
data/CHANGELOG.md CHANGED
@@ -1,10 +1,11 @@
1
1
  ## 0.8.0
2
2
 
3
3
  ### Enhancements
4
- - Added `spout json` command that generates JSON charts and tables of each variable in a dataset
4
+ - Added `spout graphs` command that generates JSON charts and tables of each variable in a dataset
5
5
  - This command requires a .spout.yml file to be specified to identify the following variables:
6
6
  - `visit`: This variable is used to separate subject encounters in a histogram
7
7
  - `charts`: Array of choices, numeric, or integer variables for charts
8
+ - The `spout pngs` command now renders the histogram form for each variable
8
9
  - **Gem Changes**
9
10
  - Updated to colorize 0.7.2
10
11
  - Use of Ruby 2.1.2 is now recommended
@@ -12,16 +13,16 @@
12
13
  ## 0.7.0 (April 16, 2014)
13
14
 
14
15
  ### Enhancements
15
- - Added `spout graphs` command that generates pie charts and histograms of each variable in a dataset
16
+ - Added `spout pngs` command that generates pie charts and histograms of each variable in a dataset
16
17
  - The following flags are available:
17
- - `spout g --type-numeric`
18
- - `spout g --type-integer`
19
- - `spout g --type-choices`
20
- - `spout g --size-lg`
21
- - `spout g --size-sm`
22
- - `spout g --type-numeric --size-sm`
18
+ - `spout p --type-numeric`
19
+ - `spout p --type-integer`
20
+ - `spout p --type-choices`
21
+ - `spout p --size-lg`
22
+ - `spout p --size-sm`
23
+ - `spout p --type-numeric --size-sm`
23
24
  - For specific variables the following can be used:
24
- - `spout g --id-<variable_id>`
25
+ - `spout p --id-<variable_id>`
25
26
 
26
27
  ## 0.6.0 (March 7, 2014)
27
28
 
data/README.md CHANGED
@@ -146,35 +146,35 @@ You can optionally provide a version string
146
146
  spout export [1.0.0]
147
147
  ```
148
148
 
149
- ### Generate graphs for data in your dataset
149
+ ### Generate images for data in your dataset
150
150
 
151
- Spout lets you generate graphs for each variable defined in your dataset. Make sure to run `spout coverage` first to validate that your data dictionary and dataset match.
151
+ Spout lets you generate images for each variable defined in your dataset. Make sure to run `spout coverage` first to validate that your data dictionary and dataset match.
152
152
 
153
153
  This command will take some time, and requires [PhantomJS](http://phantomjs.org/) to be installed on your system.
154
154
 
155
155
  ```
156
- spout graphs
156
+ spout pngs
157
157
  ```
158
158
 
159
- The following flags can be passed to the `spout graphs` command:
159
+ The following flags can be passed to the `spout pngs` command:
160
160
 
161
- - `spout g --type-numeric`
162
- - `spout g --type-integer`
163
- - `spout g --type-choices`
164
- - `spout g --size-lg`
165
- - `spout g --size-sm`
166
- - `spout g --type-numeric --size-sm`
161
+ - `spout p --type-numeric`
162
+ - `spout p --type-integer`
163
+ - `spout p --type-choices`
164
+ - `spout p --size-lg`
165
+ - `spout p --size-sm`
166
+ - `spout p --type-numeric --size-sm`
167
167
 
168
168
  For specific variables the following can be used:
169
- - `spout g --id-<variable_id>`
169
+ - `spout p --id-<variable_id>`
170
170
 
171
- Generated graphs are placed in: `./graphs/`
171
+ Generated images are placed in: `./images/`
172
172
 
173
173
 
174
174
  ### Generate charts and tables for data in your dataset
175
175
 
176
176
  ```
177
- spout json
177
+ spout graphs
178
178
  ```
179
179
 
180
180
  This command generates JSON charts and tables of each variable in a dataset
@@ -194,6 +194,23 @@ charts:
194
194
  - race
195
195
  ```
196
196
 
197
+ To only generate graphs for a few select variables, add the variable names after the `spout graphs` command.
198
+
199
+ For example, the command below will only generate graphs for the two variables `ahi` and `bmi`.
200
+
201
+ ```
202
+ spout g ahi bmi
203
+ ```
204
+
205
+ You can also specify a limit to the amount of rows to read in from the CSV files by specifying the `-rows` flag
206
+
207
+ ```
208
+ spout -rows=10 ahi
209
+ ```
210
+
211
+ This will generate a graph for ahi for the first 10 rows of each dataset CSV.
212
+
213
+
197
214
  This will generate charts and tables for each variable in the dataset plotted against the variables listed under `charts`.
198
215
 
199
216
 
data/lib/spout/actions.rb CHANGED
@@ -21,9 +21,9 @@ module Spout
21
21
  new_data_dictionary_export(argv, 'hybrid')
22
22
  when 'coverage', '-coverage', '--coverage', 'c', '-c'
23
23
  coverage_report(argv)
24
+ when 'pngs', '-pngs', '--pngs', 'p', '-p'
25
+ generate_images(argv.last(argv.size - 1))
24
26
  when 'graphs', '-graphs', '--graphs', 'g', '-g'
25
- generate_graphs(argv.last(argv.size - 1))
26
- when 'json', 'j'
27
27
  generate_charts_and_tables(argv.last(argv.size - 1))
28
28
  else
29
29
  help
@@ -70,9 +70,12 @@ The most common spout commands are:
70
70
  dictionary format
71
71
  [c]overage Coverage report, requires dataset CSVs
72
72
  in `<project_name>/csvs/`
73
- [g]raphs Generates graphs for each variable in a
73
+ [p]ngs Generates images for each variable in a
74
74
  dataset and places them
75
- in `<project_name>/graphs/`
75
+ in `<project_name>/images/<version>/`
76
+ [g]raphs Generates JSON graphs for each variable
77
+ in a dataset and places them
78
+ in `<project_name>/graphs/<version>/`
76
79
  [v]ersion Returns the version of Spout
77
80
 
78
81
  Commands can be referenced by the first letter:
@@ -131,7 +134,7 @@ EOT
131
134
  flags.select{|f| f[0..((param.size + 3) - 1)] == "--#{param}-" and f.length > param.size + 3}.collect{|f| f[(param.size + 3)..-1]}
132
135
  end
133
136
 
134
- def generate_graphs(flags)
137
+ def generate_images(flags)
135
138
  params = {}
136
139
  params['types'] = flag_values(flags, 'type')
137
140
  params['variable_ids'] = flag_values(flags, 'id')
@@ -139,7 +142,7 @@ EOT
139
142
 
140
143
  params_string = params.collect{|key, values| "#{key}=#{values.join(',')}"}.join(' ')
141
144
 
142
- system "bundle exec rake spout:graphs #{params_string}"
145
+ system "bundle exec rake spout:images #{params_string}"
143
146
  end
144
147
 
145
148
  def generate_charts_and_tables(variables)
@@ -0,0 +1,75 @@
1
+ require 'yaml'
2
+
3
+ require 'spout/helpers/subject_loader'
4
+ require 'spout/models/coverage_result'
5
+
6
+ module Spout
7
+ module Commands
8
+ class Coverage
9
+ def initialize(standard_version)
10
+ @standard_version = standard_version
11
+
12
+ @variable_files = []
13
+ @valid_ids = []
14
+ @number_of_rows = 100
15
+
16
+ spout_config = YAML.load_file('.spout.yml')
17
+ @visit = (spout_config.kind_of?(Hash) ? spout_config['visit'].to_s.strip : '')
18
+
19
+ @subject_loader = Spout::Helpers::SubjectLoader.new(@variable_files, @valid_ids, @standard_version, @number_of_rows, @visit)
20
+ @subject_loader.load_subjects_from_csvs_part_one! # Not Part Two which is essentially cleaning the data
21
+ @subjects = @subject_loader.subjects
22
+
23
+ run_coverage_report!
24
+ end
25
+
26
+ def run_coverage_report!
27
+ choice_variables = []
28
+
29
+ Dir.glob("variables/**/*.json").each do |file|
30
+ if json = JSON.parse(File.read(file)) rescue false
31
+ choice_variables << json['id'] if json['type'] == 'choices'
32
+ end
33
+ end
34
+
35
+ @matching_results = []
36
+ csv_names = ['tmp_csv_file.csv']
37
+
38
+ all_column_headers = @subjects.size > 0 ? @subjects.first.class.instance_methods(false).select{|k| k != :_visit}.reject{|k| k.to_s[-1] == '='} : []
39
+ all_column_headers.each do |column|
40
+ csv = 'tmp_csv_file.csv'
41
+ scr = Spout::Models::CoverageResult.new(csv, column.to_s, @subjects.collect(&column))
42
+ @matching_results << [ csv, column, scr ]
43
+ end
44
+
45
+
46
+ @matching_results.sort!{|a,b| [b[2].number_of_errors, a[0].to_s, a[1].to_s] <=> [a[2].number_of_errors, b[0].to_s, b[1].to_s]}
47
+
48
+ @coverage_results = []
49
+
50
+ csv_names.each do |csv_name|
51
+ total_column_count = @matching_results.select{|mr| mr[0] == csv_name}.count
52
+ mapped_column_count = @matching_results.select{|mr| mr[0] == csv_name and mr[2].number_of_errors == 0}.count
53
+ @coverage_results << [ csv_name, total_column_count, mapped_column_count ]
54
+ end
55
+
56
+ coverage_folder = File.join(Dir.pwd, 'coverage')
57
+ FileUtils.mkpath coverage_folder
58
+ coverage_file = File.join(coverage_folder, 'index.html')
59
+
60
+ print "\nGenerating: index.html\n\n"
61
+
62
+ File.open(coverage_file, 'w+') do |file|
63
+ erb_location = File.join( File.dirname(__FILE__), '../views/index.html.erb' )
64
+ file.puts ERB.new(File.read(erb_location)).result(binding)
65
+ end
66
+
67
+ open_command = 'open' if RUBY_PLATFORM.match(/darwin/) != nil
68
+ open_command = 'start' if RUBY_PLATFORM.match(/mingw/) != nil
69
+
70
+ system "#{open_command} #{coverage_file}" if ['start', 'open'].include?(open_command)
71
+ puts "#{coverage_file}\n\n"
72
+ end
73
+ end
74
+ end
75
+ end
@@ -2,179 +2,127 @@ require 'csv'
2
2
  require 'fileutils'
3
3
  require 'rubygems'
4
4
  require 'json'
5
+ require 'yaml'
6
+
7
+ require 'spout/helpers/subject_loader'
8
+ require 'spout/helpers/chart_types'
5
9
 
6
10
  module Spout
7
11
  module Commands
8
12
  class Graphs
13
+ def initialize(variables, standard_version)
14
+ @standard_version = standard_version
9
15
 
10
- def initialize(types, variable_ids, sizes)
11
-
12
- total_index_count = Dir.glob("variables/**/*.json").count
13
-
14
- last_completed = 0
15
-
16
- options_folder = 'graphs'
17
- FileUtils.mkpath( options_folder )
18
- tmp_options_file = File.join( options_folder, 'options.json' )
19
-
20
- Dir.glob("csvs/*.csv").each do |csv_file|
21
- puts "Working on: #{csv_file}"
22
- t = Time.now
23
- csv_table = CSV.table(csv_file, encoding: 'iso-8859-1').by_col!
24
- puts "Loaded #{csv_file} in #{Time.now - t} seconds."
25
-
26
- total_header_count = csv_table.headers.count
27
- csv_table.headers.each_with_index do |header, index|
28
- puts "Column #{ index + 1 } of #{ total_header_count } for #{header} in #{csv_file}"
29
- if variable_file = Dir.glob("variables/**/#{header.downcase}.json", File::FNM_CASEFOLD).first
30
- json = JSON.parse(File.read(variable_file)) rescue json = nil
31
- next unless json
32
- next unless ["choices", "numeric", "integer"].include?(json["type"])
33
- next unless types.size == 0 or types.include?(json['type'])
34
- next unless variable_ids.size == 0 or variable_ids.include?(json['id'].to_s.downcase)
35
-
36
- basename = File.basename(variable_file).gsub(/\.json$/, '').downcase
37
- col_data = csv_table[header]
38
-
39
- case json["type"] when "choices"
40
- domain_file = Dir.glob("domains/**/#{json['domain']}.json").first
41
- domain_json = JSON.parse(File.read(domain_file)) rescue domain_json = nil
42
- next unless domain_json
43
-
44
- create_pie_chart_options_file(col_data, tmp_options_file, domain_json)
45
- when 'numeric', 'integer'
46
- create_line_chart_options_file(col_data, tmp_options_file, json["units"])
47
- else
48
- next
49
- end
50
-
51
- run_phantom_js("#{basename}-lg.png", 600, tmp_options_file) if sizes.size == 0 or sizes.include?('lg')
52
- run_phantom_js("#{basename}.png", 75, tmp_options_file) if sizes.size == 0 or sizes.include?('sm')
53
- end
54
- end
55
- end
56
- File.delete(tmp_options_file) if File.exists?(tmp_options_file)
57
- end
16
+ spout_config = YAML.load_file('.spout.yml')
58
17
 
59
- def graph_values(col_data)
60
- categories = []
18
+ @visit = ''
61
19
 
62
- col_data = col_data.select{|v| !['', 'null'].include?(v.to_s.strip.downcase)}.collect(&:to_f)
20
+ if spout_config.kind_of?(Hash)
21
+ @visit = spout_config['visit'].to_s.strip
63
22
 
64
- all_integers = false
65
- all_integers = (col_data.count{|i| i.denominator != 1} == 0)
66
-
67
- minimum = col_data.min || 0
68
- maximum = col_data.max || 100
69
-
70
- default_max_buckets = 30
71
- max_buckets = all_integers ? [maximum - minimum + 1, default_max_buckets].min : default_max_buckets
72
- bucket_size = (maximum - minimum + 1).to_f / max_buckets
73
-
74
- (0..(max_buckets-1)).each do |bucket|
75
- val_min = (bucket_size * bucket) + minimum
76
- val_max = bucket_size * (bucket + 1) + minimum
77
- # Greater or equal to val_min, less than val_max
78
- # categories << "'#{val_min} to #{val_max}'"
79
- categories << "#{all_integers || (maximum - minimum) > (default_max_buckets / 2) ? val_min.round : "%0.02f" % val_min}"
23
+ chart_variables = if spout_config['charts'].kind_of?(Array)
24
+ spout_config['charts'].select{|c| c.kind_of?(Hash)}
25
+ else
26
+ []
27
+ end
28
+ else
29
+ puts "The YAML file needs to be in the following format:"
30
+ puts "---\nvisit: visit_variable_name\ncharts:\n- chart: age_variable_name\n title: Age\n- chart: gender_variable_name\n title: Gender\n- chart: race_variable_name\n title: Race\n"
31
+ exit
80
32
  end
81
33
 
82
- new_values = []
83
- (0..max_buckets-1).each do |bucket|
84
- val_min = (bucket_size * bucket) + minimum
85
- val_max = bucket_size * (bucket + 1) + minimum
86
- # Greater or equal to val_min, less than val_max
87
- new_values << col_data.count{|i| i >= val_min and i < val_max}
34
+ if Spout::Helpers::ChartTypes::get_json(@visit, 'variable') == nil
35
+ if @visit == ''
36
+ puts "The visit variable in .spout.yml can't be blank."
37
+ else
38
+ puts "Could not find the following visit variable: #{@visit}"
39
+ end
40
+ exit
41
+ end
42
+ missing_variables = chart_variables.select{|c| Spout::Helpers::ChartTypes::get_json(c['chart'], 'variable') == nil}
43
+ if missing_variables.count > 0
44
+ puts "Could not find the following chart variable#{'s' unless missing_variables.size == 1}: #{missing_variables.join(', ')}"
45
+ exit
88
46
  end
89
47
 
90
- values = []
91
-
92
- values << { name: '', data: new_values, showInLegend: false }
48
+ argv_string = variables.join(',')
49
+ @number_of_rows = nil
93
50
 
94
- [ values, categories ]
95
- end
51
+ if match_data = argv_string.match(/-rows=(\d*)/)
52
+ @number_of_rows = match_data[1].to_i
53
+ argv_string.gsub!(match_data[0], '')
54
+ end
96
55
 
56
+ @valid_ids = argv_string.split(',').compact.reject{|s| s == ''}
97
57
 
98
- def create_pie_chart_options_file(values, options_file, domain_json)
58
+ @chart_variables = chart_variables.unshift( { "chart" => @visit, "title" => 'Histogram' } )
99
59
 
100
- values.select!{|v| !['', 'null'].include?(v.to_s.strip.downcase) }
101
- counts = values.group_by{|a| a}.collect{|k,v| [(domain_json.select{|h| h['value'] == k.to_s}.first['display_name'] rescue (k.to_s == '' ? 'NULL' : k)), v.count]}
60
+ @variable_files = Dir.glob('variables/**/*.json')
102
61
 
103
- total_count = counts.collect(&:last).inject(&:+)
62
+ t = Time.now
63
+ FileUtils.mkpath "graphs/#{@standard_version}"
104
64
 
105
- data = counts.collect{|value, count| [value, (count * 100.0 / total_count)]}
65
+ @subject_loader = Spout::Helpers::SubjectLoader.new(@variable_files, @valid_ids, @standard_version, @number_of_rows, @visit)
106
66
 
107
- File.open(options_file, "w") do |outfile|
108
- outfile.puts <<-eos
109
- {
110
- "title": {
111
- "text": ""
112
- },
67
+ @subject_loader.load_subjects_from_csvs!
68
+ @subjects = @subject_loader.subjects
69
+ compute_tables_and_charts
113
70
 
114
- "credits": {
115
- "enabled": false,
116
- },
117
- "series": [{
118
- "type": "pie",
119
- "name": "",
120
- "data": #{data.to_json}
121
- }]
122
- }
123
- eos
124
- end
71
+ puts "Took #{Time.now - t} seconds."
125
72
  end
126
73
 
74
+ def compute_tables_and_charts
75
+ variable_files_count = @variable_files.count
76
+ @variable_files.each_with_index do |variable_file, file_index|
77
+ json = JSON.parse(File.read(variable_file)) rescue json = nil
78
+ next unless json
79
+ next unless @valid_ids.include?(json["id"].to_s.downcase) or @valid_ids.size == 0
80
+ next unless ["numeric", "integer", "choices"].include?(json["type"])
81
+ variable_name = json['id'].to_s.downcase
82
+ next unless Spout::Models::Subject.method_defined?(variable_name)
83
+
84
+ puts "#{file_index+1} of #{variable_files_count}: #{variable_file.gsub(/(^variables\/|\.json$)/, '').gsub('/', ' / ')}"
85
+
86
+
87
+ stats = {
88
+ charts: {},
89
+ tables: {}
90
+ }
91
+
92
+ @chart_variables.each do |chart_type_hash|
93
+ chart_type = chart_type_hash["chart"]
94
+ chart_title = chart_type_hash["title"].downcase
95
+
96
+ if chart_type == @visit
97
+ filtered_subjects = @subjects.select{ |s| s.send(chart_type) != nil } # and s.send(variable_name) != nil
98
+ if filtered_subjects.count > 0
99
+ stats[:charts][chart_title] = Spout::Helpers::ChartTypes::chart_histogram(chart_type, filtered_subjects, json, variable_name)
100
+ stats[:tables][chart_title] = Spout::Helpers::ChartTypes::table_arbitrary(chart_type, filtered_subjects, json, variable_name)
101
+ end
102
+ else
103
+ filtered_subjects = @subjects.select{ |s| s.send(chart_type) != nil } # and s.send(variable_name) != nil
104
+ if filtered_subjects.count > 0
105
+ stats[:charts][chart_title] = Spout::Helpers::ChartTypes::chart_arbitrary(chart_type, filtered_subjects, json, variable_name, visits)
106
+ stats[:tables][chart_title] = visits.collect do |visit_display_name, visit_value|
107
+ visit_subjects = filtered_subjects.select{ |s| s._visit == visit_value }
108
+ unknown_subjects = visit_subjects.select{ |s| s.send(variable_name) == nil }
109
+ (visit_subjects.count > 0 && visit_subjects.count != unknown_subjects.count) ? Spout::Helpers::ChartTypes::table_arbitrary(chart_type, visit_subjects, json, variable_name, visit_display_name) : nil
110
+ end.compact
111
+ end
112
+ end
113
+ end
114
+
115
+ chart_json_file = File.join('graphs', @standard_version, "#{json['id']}.json")
116
+ File.open(chart_json_file, 'w') { |file| file.write( JSON.pretty_generate(stats) + "\n" ) }
127
117
 
128
- def create_line_chart_options_file(values, options_file, units)
129
- ( series, categories ) = graph_values(values)
130
-
131
- File.open(options_file, "w") do |outfile|
132
- outfile.puts <<-eos
133
- {
134
- "chart": {
135
- "type": "areaspline"
136
- },
137
- "title": {
138
- "text": ""
139
- },
140
- "credits": {
141
- "enabled": false,
142
- },
143
- "xAxis": {
144
- "categories": #{categories.to_json},
145
- "labels": {
146
- "step": #{(categories.size.to_f / 12).ceil}
147
- },
148
- "title": {
149
- "text": #{units.to_json}
150
- }
151
- },
152
- "yAxis": {
153
- "maxPadding": 0,
154
- "minPadding": 0,
155
- "title": {
156
- "text": "Count"
157
- }
158
- },
159
- "series": #{series.to_json}
160
- }
161
- eos
162
118
  end
163
119
  end
164
120
 
165
- def run_phantom_js(png_name, width, tmp_options_file)
166
- graph_path = File.join(Dir.pwd, 'graphs', png_name)
167
- directory = File.join( File.dirname(__FILE__), '..', 'support', 'javascripts' )
168
-
169
- open_command = if RUBY_PLATFORM.match(/mingw/) != nil
170
- 'phantomjs.exe'
171
- else
172
- 'phantomjs'
121
+ # [["Visit 1", "1"], ["Visit 2", "2"], ["CVD Outcomes", "3"]]
122
+ def visits
123
+ @visits ||= begin
124
+ Spout::Helpers::ChartTypes::domain_array(@visit)
173
125
  end
174
-
175
- phantomjs_command = "#{open_command} #{directory}/highcharts-convert.js -infile #{tmp_options_file} -outfile #{graph_path} -scale 2.5 -width #{width} -constr Chart"
176
- # puts phantomjs_command
177
- `#{phantomjs_command}`
178
126
  end
179
127
 
180
128
  end