spout 0.8.0.beta1 → 0.8.0.beta2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 55751ba1e111b174e9501fb52b1dd9a484b9ee1c
4
- data.tar.gz: ca81a991a37e425a4166dd28e6aeffa183378b1c
3
+ metadata.gz: 9e4150e7322cba034877fabe6e4a6cec8bdcd444
4
+ data.tar.gz: 25d7aba466f6667316378a3582f6625c1f541bf7
5
5
  SHA512:
6
- metadata.gz: 6fec0a99eeb2b26af88652c7b6b79b3d5de8b750593aac2196275e1a1d365dda0814f47343ac312adaf9bb41bd29065a5d6dee8092de8927bdd41136d6ffbca0
7
- data.tar.gz: 1d80934213ba8633506850aeabc142764f578d6fa727ffe7ac5ac76cc07422b4e0759d53e7200c5c686075077d3a37cd1101d7b650294d147db5a2f38f1e43eb
6
+ metadata.gz: 04137f2579d7da8bf3bcc5ad4105d7ccfe78799656015bbdc0b3e0b34a2503522eb90312919969dad7ec0fbaa6f60231aec7f90bda1e8bbfd8817b28e06e6aea
7
+ data.tar.gz: 98fae07b2e79ee985c0ef849b6e0e2706c824d0f7d5c767ca2c97b24523fefd92da973b4ccb762273c814a2843f365dadbca5535a1d7f3f388aa1d5f9ead8336
data/CHANGELOG.md CHANGED
@@ -1,10 +1,11 @@
1
1
  ## 0.8.0
2
2
 
3
3
  ### Enhancements
4
- - Added `spout json` command that generates JSON charts and tables of each variable in a dataset
4
+ - Added `spout graphs` command that generates JSON charts and tables of each variable in a dataset
5
5
  - This command requires a .spout.yml file to be specified to identify the following variables:
6
6
  - `visit`: This variable is used to separate subject encounters in a histogram
7
7
  - `charts`: Array of choices, numeric, or integer variables for charts
8
+ - The `spout pngs` command now renders the histogram form for each variable
8
9
  - **Gem Changes**
9
10
  - Updated to colorize 0.7.2
10
11
  - Use of Ruby 2.1.2 is now recommended
@@ -12,16 +13,16 @@
12
13
  ## 0.7.0 (April 16, 2014)
13
14
 
14
15
  ### Enhancements
15
- - Added `spout graphs` command that generates pie charts and histograms of each variable in a dataset
16
+ - Added `spout pngs` command that generates pie charts and histograms of each variable in a dataset
16
17
  - The following flags are available:
17
- - `spout g --type-numeric`
18
- - `spout g --type-integer`
19
- - `spout g --type-choices`
20
- - `spout g --size-lg`
21
- - `spout g --size-sm`
22
- - `spout g --type-numeric --size-sm`
18
+ - `spout p --type-numeric`
19
+ - `spout p --type-integer`
20
+ - `spout p --type-choices`
21
+ - `spout p --size-lg`
22
+ - `spout p --size-sm`
23
+ - `spout p --type-numeric --size-sm`
23
24
  - For specific variables the following can be used:
24
- - `spout g --id-<variable_id>`
25
+ - `spout p --id-<variable_id>`
25
26
 
26
27
  ## 0.6.0 (March 7, 2014)
27
28
 
data/README.md CHANGED
@@ -146,35 +146,35 @@ You can optionally provide a version string
146
146
  spout export [1.0.0]
147
147
  ```
148
148
 
149
- ### Generate graphs for data in your dataset
149
+ ### Generate images for data in your dataset
150
150
 
151
- Spout lets you generate graphs for each variable defined in your dataset. Make sure to run `spout coverage` first to validate that your data dictionary and dataset match.
151
+ Spout lets you generate images for each variable defined in your dataset. Make sure to run `spout coverage` first to validate that your data dictionary and dataset match.
152
152
 
153
153
  This command will take some time, and requires [PhantomJS](http://phantomjs.org/) to be installed on your system.
154
154
 
155
155
  ```
156
- spout graphs
156
+ spout pngs
157
157
  ```
158
158
 
159
- The following flags can be passed to the `spout graphs` command:
159
+ The following flags can be passed to the `spout pngs` command:
160
160
 
161
- - `spout g --type-numeric`
162
- - `spout g --type-integer`
163
- - `spout g --type-choices`
164
- - `spout g --size-lg`
165
- - `spout g --size-sm`
166
- - `spout g --type-numeric --size-sm`
161
+ - `spout p --type-numeric`
162
+ - `spout p --type-integer`
163
+ - `spout p --type-choices`
164
+ - `spout p --size-lg`
165
+ - `spout p --size-sm`
166
+ - `spout p --type-numeric --size-sm`
167
167
 
168
168
  For specific variables the following can be used:
169
- - `spout g --id-<variable_id>`
169
+ - `spout p --id-<variable_id>`
170
170
 
171
- Generated graphs are placed in: `./graphs/`
171
+ Generated images are placed in: `./images/`
172
172
 
173
173
 
174
174
  ### Generate charts and tables for data in your dataset
175
175
 
176
176
  ```
177
- spout json
177
+ spout graphs
178
178
  ```
179
179
 
180
180
  This command generates JSON charts and tables of each variable in a dataset
@@ -194,6 +194,23 @@ charts:
194
194
  - race
195
195
  ```
196
196
 
197
+ To only generate graphs for a few select variables, add the variable names after the `spout graphs` command.
198
+
199
+ For example, the command below will only generate graphs for the two variables `ahi` and `bmi`.
200
+
201
+ ```
202
+ spout g ahi bmi
203
+ ```
204
+
205
+ You can also specify a limit to the amount of rows to read in from the CSV files by specifying the `-rows` flag
206
+
207
+ ```
208
+ spout -rows=10 ahi
209
+ ```
210
+
211
+ This will generate a graph for ahi for the first 10 rows of each dataset CSV.
212
+
213
+
197
214
  This will generate charts and tables for each variable in the dataset plotted against the variables listed under `charts`.
198
215
 
199
216
 
data/lib/spout/actions.rb CHANGED
@@ -21,9 +21,9 @@ module Spout
21
21
  new_data_dictionary_export(argv, 'hybrid')
22
22
  when 'coverage', '-coverage', '--coverage', 'c', '-c'
23
23
  coverage_report(argv)
24
+ when 'pngs', '-pngs', '--pngs', 'p', '-p'
25
+ generate_images(argv.last(argv.size - 1))
24
26
  when 'graphs', '-graphs', '--graphs', 'g', '-g'
25
- generate_graphs(argv.last(argv.size - 1))
26
- when 'json', 'j'
27
27
  generate_charts_and_tables(argv.last(argv.size - 1))
28
28
  else
29
29
  help
@@ -70,9 +70,12 @@ The most common spout commands are:
70
70
  dictionary format
71
71
  [c]overage Coverage report, requires dataset CSVs
72
72
  in `<project_name>/csvs/`
73
- [g]raphs Generates graphs for each variable in a
73
+ [p]ngs Generates images for each variable in a
74
74
  dataset and places them
75
- in `<project_name>/graphs/`
75
+ in `<project_name>/images/<version>/`
76
+ [g]raphs Generates JSON graphs for each variable
77
+ in a dataset and places them
78
+ in `<project_name>/graphs/<version>/`
76
79
  [v]ersion Returns the version of Spout
77
80
 
78
81
  Commands can be referenced by the first letter:
@@ -131,7 +134,7 @@ EOT
131
134
  flags.select{|f| f[0..((param.size + 3) - 1)] == "--#{param}-" and f.length > param.size + 3}.collect{|f| f[(param.size + 3)..-1]}
132
135
  end
133
136
 
134
- def generate_graphs(flags)
137
+ def generate_images(flags)
135
138
  params = {}
136
139
  params['types'] = flag_values(flags, 'type')
137
140
  params['variable_ids'] = flag_values(flags, 'id')
@@ -139,7 +142,7 @@ EOT
139
142
 
140
143
  params_string = params.collect{|key, values| "#{key}=#{values.join(',')}"}.join(' ')
141
144
 
142
- system "bundle exec rake spout:graphs #{params_string}"
145
+ system "bundle exec rake spout:images #{params_string}"
143
146
  end
144
147
 
145
148
  def generate_charts_and_tables(variables)
@@ -0,0 +1,75 @@
1
+ require 'yaml'
2
+
3
+ require 'spout/helpers/subject_loader'
4
+ require 'spout/models/coverage_result'
5
+
6
+ module Spout
7
+ module Commands
8
+ class Coverage
9
+ def initialize(standard_version)
10
+ @standard_version = standard_version
11
+
12
+ @variable_files = []
13
+ @valid_ids = []
14
+ @number_of_rows = 100
15
+
16
+ spout_config = YAML.load_file('.spout.yml')
17
+ @visit = (spout_config.kind_of?(Hash) ? spout_config['visit'].to_s.strip : '')
18
+
19
+ @subject_loader = Spout::Helpers::SubjectLoader.new(@variable_files, @valid_ids, @standard_version, @number_of_rows, @visit)
20
+ @subject_loader.load_subjects_from_csvs_part_one! # Not Part Two which is essentially cleaning the data
21
+ @subjects = @subject_loader.subjects
22
+
23
+ run_coverage_report!
24
+ end
25
+
26
+ def run_coverage_report!
27
+ choice_variables = []
28
+
29
+ Dir.glob("variables/**/*.json").each do |file|
30
+ if json = JSON.parse(File.read(file)) rescue false
31
+ choice_variables << json['id'] if json['type'] == 'choices'
32
+ end
33
+ end
34
+
35
+ @matching_results = []
36
+ csv_names = ['tmp_csv_file.csv']
37
+
38
+ all_column_headers = @subjects.size > 0 ? @subjects.first.class.instance_methods(false).select{|k| k != :_visit}.reject{|k| k.to_s[-1] == '='} : []
39
+ all_column_headers.each do |column|
40
+ csv = 'tmp_csv_file.csv'
41
+ scr = Spout::Models::CoverageResult.new(csv, column.to_s, @subjects.collect(&column))
42
+ @matching_results << [ csv, column, scr ]
43
+ end
44
+
45
+
46
+ @matching_results.sort!{|a,b| [b[2].number_of_errors, a[0].to_s, a[1].to_s] <=> [a[2].number_of_errors, b[0].to_s, b[1].to_s]}
47
+
48
+ @coverage_results = []
49
+
50
+ csv_names.each do |csv_name|
51
+ total_column_count = @matching_results.select{|mr| mr[0] == csv_name}.count
52
+ mapped_column_count = @matching_results.select{|mr| mr[0] == csv_name and mr[2].number_of_errors == 0}.count
53
+ @coverage_results << [ csv_name, total_column_count, mapped_column_count ]
54
+ end
55
+
56
+ coverage_folder = File.join(Dir.pwd, 'coverage')
57
+ FileUtils.mkpath coverage_folder
58
+ coverage_file = File.join(coverage_folder, 'index.html')
59
+
60
+ print "\nGenerating: index.html\n\n"
61
+
62
+ File.open(coverage_file, 'w+') do |file|
63
+ erb_location = File.join( File.dirname(__FILE__), '../views/index.html.erb' )
64
+ file.puts ERB.new(File.read(erb_location)).result(binding)
65
+ end
66
+
67
+ open_command = 'open' if RUBY_PLATFORM.match(/darwin/) != nil
68
+ open_command = 'start' if RUBY_PLATFORM.match(/mingw/) != nil
69
+
70
+ system "#{open_command} #{coverage_file}" if ['start', 'open'].include?(open_command)
71
+ puts "#{coverage_file}\n\n"
72
+ end
73
+ end
74
+ end
75
+ end
@@ -2,179 +2,127 @@ require 'csv'
2
2
  require 'fileutils'
3
3
  require 'rubygems'
4
4
  require 'json'
5
+ require 'yaml'
6
+
7
+ require 'spout/helpers/subject_loader'
8
+ require 'spout/helpers/chart_types'
5
9
 
6
10
  module Spout
7
11
  module Commands
8
12
  class Graphs
13
+ def initialize(variables, standard_version)
14
+ @standard_version = standard_version
9
15
 
10
- def initialize(types, variable_ids, sizes)
11
-
12
- total_index_count = Dir.glob("variables/**/*.json").count
13
-
14
- last_completed = 0
15
-
16
- options_folder = 'graphs'
17
- FileUtils.mkpath( options_folder )
18
- tmp_options_file = File.join( options_folder, 'options.json' )
19
-
20
- Dir.glob("csvs/*.csv").each do |csv_file|
21
- puts "Working on: #{csv_file}"
22
- t = Time.now
23
- csv_table = CSV.table(csv_file, encoding: 'iso-8859-1').by_col!
24
- puts "Loaded #{csv_file} in #{Time.now - t} seconds."
25
-
26
- total_header_count = csv_table.headers.count
27
- csv_table.headers.each_with_index do |header, index|
28
- puts "Column #{ index + 1 } of #{ total_header_count } for #{header} in #{csv_file}"
29
- if variable_file = Dir.glob("variables/**/#{header.downcase}.json", File::FNM_CASEFOLD).first
30
- json = JSON.parse(File.read(variable_file)) rescue json = nil
31
- next unless json
32
- next unless ["choices", "numeric", "integer"].include?(json["type"])
33
- next unless types.size == 0 or types.include?(json['type'])
34
- next unless variable_ids.size == 0 or variable_ids.include?(json['id'].to_s.downcase)
35
-
36
- basename = File.basename(variable_file).gsub(/\.json$/, '').downcase
37
- col_data = csv_table[header]
38
-
39
- case json["type"] when "choices"
40
- domain_file = Dir.glob("domains/**/#{json['domain']}.json").first
41
- domain_json = JSON.parse(File.read(domain_file)) rescue domain_json = nil
42
- next unless domain_json
43
-
44
- create_pie_chart_options_file(col_data, tmp_options_file, domain_json)
45
- when 'numeric', 'integer'
46
- create_line_chart_options_file(col_data, tmp_options_file, json["units"])
47
- else
48
- next
49
- end
50
-
51
- run_phantom_js("#{basename}-lg.png", 600, tmp_options_file) if sizes.size == 0 or sizes.include?('lg')
52
- run_phantom_js("#{basename}.png", 75, tmp_options_file) if sizes.size == 0 or sizes.include?('sm')
53
- end
54
- end
55
- end
56
- File.delete(tmp_options_file) if File.exists?(tmp_options_file)
57
- end
16
+ spout_config = YAML.load_file('.spout.yml')
58
17
 
59
- def graph_values(col_data)
60
- categories = []
18
+ @visit = ''
61
19
 
62
- col_data = col_data.select{|v| !['', 'null'].include?(v.to_s.strip.downcase)}.collect(&:to_f)
20
+ if spout_config.kind_of?(Hash)
21
+ @visit = spout_config['visit'].to_s.strip
63
22
 
64
- all_integers = false
65
- all_integers = (col_data.count{|i| i.denominator != 1} == 0)
66
-
67
- minimum = col_data.min || 0
68
- maximum = col_data.max || 100
69
-
70
- default_max_buckets = 30
71
- max_buckets = all_integers ? [maximum - minimum + 1, default_max_buckets].min : default_max_buckets
72
- bucket_size = (maximum - minimum + 1).to_f / max_buckets
73
-
74
- (0..(max_buckets-1)).each do |bucket|
75
- val_min = (bucket_size * bucket) + minimum
76
- val_max = bucket_size * (bucket + 1) + minimum
77
- # Greater or equal to val_min, less than val_max
78
- # categories << "'#{val_min} to #{val_max}'"
79
- categories << "#{all_integers || (maximum - minimum) > (default_max_buckets / 2) ? val_min.round : "%0.02f" % val_min}"
23
+ chart_variables = if spout_config['charts'].kind_of?(Array)
24
+ spout_config['charts'].select{|c| c.kind_of?(Hash)}
25
+ else
26
+ []
27
+ end
28
+ else
29
+ puts "The YAML file needs to be in the following format:"
30
+ puts "---\nvisit: visit_variable_name\ncharts:\n- chart: age_variable_name\n title: Age\n- chart: gender_variable_name\n title: Gender\n- chart: race_variable_name\n title: Race\n"
31
+ exit
80
32
  end
81
33
 
82
- new_values = []
83
- (0..max_buckets-1).each do |bucket|
84
- val_min = (bucket_size * bucket) + minimum
85
- val_max = bucket_size * (bucket + 1) + minimum
86
- # Greater or equal to val_min, less than val_max
87
- new_values << col_data.count{|i| i >= val_min and i < val_max}
34
+ if Spout::Helpers::ChartTypes::get_json(@visit, 'variable') == nil
35
+ if @visit == ''
36
+ puts "The visit variable in .spout.yml can't be blank."
37
+ else
38
+ puts "Could not find the following visit variable: #{@visit}"
39
+ end
40
+ exit
41
+ end
42
+ missing_variables = chart_variables.select{|c| Spout::Helpers::ChartTypes::get_json(c['chart'], 'variable') == nil}
43
+ if missing_variables.count > 0
44
+ puts "Could not find the following chart variable#{'s' unless missing_variables.size == 1}: #{missing_variables.join(', ')}"
45
+ exit
88
46
  end
89
47
 
90
- values = []
91
-
92
- values << { name: '', data: new_values, showInLegend: false }
48
+ argv_string = variables.join(',')
49
+ @number_of_rows = nil
93
50
 
94
- [ values, categories ]
95
- end
51
+ if match_data = argv_string.match(/-rows=(\d*)/)
52
+ @number_of_rows = match_data[1].to_i
53
+ argv_string.gsub!(match_data[0], '')
54
+ end
96
55
 
56
+ @valid_ids = argv_string.split(',').compact.reject{|s| s == ''}
97
57
 
98
- def create_pie_chart_options_file(values, options_file, domain_json)
58
+ @chart_variables = chart_variables.unshift( { "chart" => @visit, "title" => 'Histogram' } )
99
59
 
100
- values.select!{|v| !['', 'null'].include?(v.to_s.strip.downcase) }
101
- counts = values.group_by{|a| a}.collect{|k,v| [(domain_json.select{|h| h['value'] == k.to_s}.first['display_name'] rescue (k.to_s == '' ? 'NULL' : k)), v.count]}
60
+ @variable_files = Dir.glob('variables/**/*.json')
102
61
 
103
- total_count = counts.collect(&:last).inject(&:+)
62
+ t = Time.now
63
+ FileUtils.mkpath "graphs/#{@standard_version}"
104
64
 
105
- data = counts.collect{|value, count| [value, (count * 100.0 / total_count)]}
65
+ @subject_loader = Spout::Helpers::SubjectLoader.new(@variable_files, @valid_ids, @standard_version, @number_of_rows, @visit)
106
66
 
107
- File.open(options_file, "w") do |outfile|
108
- outfile.puts <<-eos
109
- {
110
- "title": {
111
- "text": ""
112
- },
67
+ @subject_loader.load_subjects_from_csvs!
68
+ @subjects = @subject_loader.subjects
69
+ compute_tables_and_charts
113
70
 
114
- "credits": {
115
- "enabled": false,
116
- },
117
- "series": [{
118
- "type": "pie",
119
- "name": "",
120
- "data": #{data.to_json}
121
- }]
122
- }
123
- eos
124
- end
71
+ puts "Took #{Time.now - t} seconds."
125
72
  end
126
73
 
74
+ def compute_tables_and_charts
75
+ variable_files_count = @variable_files.count
76
+ @variable_files.each_with_index do |variable_file, file_index|
77
+ json = JSON.parse(File.read(variable_file)) rescue json = nil
78
+ next unless json
79
+ next unless @valid_ids.include?(json["id"].to_s.downcase) or @valid_ids.size == 0
80
+ next unless ["numeric", "integer", "choices"].include?(json["type"])
81
+ variable_name = json['id'].to_s.downcase
82
+ next unless Spout::Models::Subject.method_defined?(variable_name)
83
+
84
+ puts "#{file_index+1} of #{variable_files_count}: #{variable_file.gsub(/(^variables\/|\.json$)/, '').gsub('/', ' / ')}"
85
+
86
+
87
+ stats = {
88
+ charts: {},
89
+ tables: {}
90
+ }
91
+
92
+ @chart_variables.each do |chart_type_hash|
93
+ chart_type = chart_type_hash["chart"]
94
+ chart_title = chart_type_hash["title"].downcase
95
+
96
+ if chart_type == @visit
97
+ filtered_subjects = @subjects.select{ |s| s.send(chart_type) != nil } # and s.send(variable_name) != nil
98
+ if filtered_subjects.count > 0
99
+ stats[:charts][chart_title] = Spout::Helpers::ChartTypes::chart_histogram(chart_type, filtered_subjects, json, variable_name)
100
+ stats[:tables][chart_title] = Spout::Helpers::ChartTypes::table_arbitrary(chart_type, filtered_subjects, json, variable_name)
101
+ end
102
+ else
103
+ filtered_subjects = @subjects.select{ |s| s.send(chart_type) != nil } # and s.send(variable_name) != nil
104
+ if filtered_subjects.count > 0
105
+ stats[:charts][chart_title] = Spout::Helpers::ChartTypes::chart_arbitrary(chart_type, filtered_subjects, json, variable_name, visits)
106
+ stats[:tables][chart_title] = visits.collect do |visit_display_name, visit_value|
107
+ visit_subjects = filtered_subjects.select{ |s| s._visit == visit_value }
108
+ unknown_subjects = visit_subjects.select{ |s| s.send(variable_name) == nil }
109
+ (visit_subjects.count > 0 && visit_subjects.count != unknown_subjects.count) ? Spout::Helpers::ChartTypes::table_arbitrary(chart_type, visit_subjects, json, variable_name, visit_display_name) : nil
110
+ end.compact
111
+ end
112
+ end
113
+ end
114
+
115
+ chart_json_file = File.join('graphs', @standard_version, "#{json['id']}.json")
116
+ File.open(chart_json_file, 'w') { |file| file.write( JSON.pretty_generate(stats) + "\n" ) }
127
117
 
128
- def create_line_chart_options_file(values, options_file, units)
129
- ( series, categories ) = graph_values(values)
130
-
131
- File.open(options_file, "w") do |outfile|
132
- outfile.puts <<-eos
133
- {
134
- "chart": {
135
- "type": "areaspline"
136
- },
137
- "title": {
138
- "text": ""
139
- },
140
- "credits": {
141
- "enabled": false,
142
- },
143
- "xAxis": {
144
- "categories": #{categories.to_json},
145
- "labels": {
146
- "step": #{(categories.size.to_f / 12).ceil}
147
- },
148
- "title": {
149
- "text": #{units.to_json}
150
- }
151
- },
152
- "yAxis": {
153
- "maxPadding": 0,
154
- "minPadding": 0,
155
- "title": {
156
- "text": "Count"
157
- }
158
- },
159
- "series": #{series.to_json}
160
- }
161
- eos
162
118
  end
163
119
  end
164
120
 
165
- def run_phantom_js(png_name, width, tmp_options_file)
166
- graph_path = File.join(Dir.pwd, 'graphs', png_name)
167
- directory = File.join( File.dirname(__FILE__), '..', 'support', 'javascripts' )
168
-
169
- open_command = if RUBY_PLATFORM.match(/mingw/) != nil
170
- 'phantomjs.exe'
171
- else
172
- 'phantomjs'
121
+ # [["Visit 1", "1"], ["Visit 2", "2"], ["CVD Outcomes", "3"]]
122
+ def visits
123
+ @visits ||= begin
124
+ Spout::Helpers::ChartTypes::domain_array(@visit)
173
125
  end
174
-
175
- phantomjs_command = "#{open_command} #{directory}/highcharts-convert.js -infile #{tmp_options_file} -outfile #{graph_path} -scale 2.5 -width #{width} -constr Chart"
176
- # puts phantomjs_command
177
- `#{phantomjs_command}`
178
126
  end
179
127
 
180
128
  end