kbaum-munger 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,5 @@
1
+ pkg
2
+ *~
3
+ report.html
4
+ coverage
5
+ doc
data/README ADDED
@@ -0,0 +1,90 @@
1
+ Munger Ruby Reporting Library
2
+ =============================
3
+
4
+ Munger is basically a simple data munging and reporting library
5
+ for Ruby as an alternative to Ruport, which did not fill my needs
6
+ in ways that convinced me to start over rather than try to fork or
7
+ patch it. Apologies to the Ruport chaps, whom I am sure are
8
+ smashing blokes - it just didn't wiggle my worm.
9
+
10
+ See the Wiki for details : http://github.com/schacon/munger/wikis
11
+
12
+ 3-Part Reporting
13
+ =============================
14
+
15
+ Munger creates reports in three stages, much like an Apollo rocket. My
16
+ main problem with Ruport was the coupling of different parts of these
17
+ stages in ways that didn't make the data easily re-usable, cacheable or
18
+ didn't give me enough control. I like to have my data separate from my
19
+ report, which should be renderable however I want.
20
+
21
+ * Stage 1 - Data Munging *
22
+
23
+ The first stage is getting a dataset that has all the information you need.
24
+ I like to call this stage 'munging' (pronounced: 'MON'-day + chan-'GING'),
25
+ which is taking a simple set of data (from a SQL query, perhaps) and
26
+ transforming fields, adding derived data, pivoting, etc - and making it into
27
+ a table of all the actual data-points you need.
28
+
29
+ * Stage 2 - Report Formatting *
30
+
31
+ Then there is the Reporting. To me, this means taking your massaged dataset
32
+ and doing all the fun reporting to it. This includes grouping, subgrouping,
33
+ sorting, column ordering, multi-level aggregation (sums, avg, etc) and
34
+ highlighting important information (values that are too small, too high, etc).
35
+
36
+ It can be argued that pivoting should be at this level, rather than the first,
37
+ but I decided to put it there instead, mostly because I really think of the
38
+ pivoted data as a different data set and also for performance reasons - the
39
+ pivot data can be a bear to produce, and I plan on caching the first stage and
40
+ then running different reporting options on it.
41
+
42
+ * Stage 3 - Output Rendering *
43
+
44
+ Now that I have my super spiffy report, I want to be able to render it however
45
+ I want, possibly in multiple formats - HTML and XLS are the most important to
46
+ me, but PDF, text, csv, etc will also likely be produced eventually.
47
+
48
+
49
+ Examples
50
+ =============================
51
+
52
+ The starting data can be ActiveRecord collections or an array of Hashes.
53
+
54
+ # webpage_hit table has ip_address, hit_date, action, referrer #
55
+
56
+ * Simple Example *
57
+
58
+ hits = WebpageHits.find(:all, :conditions => ['hit_date > ?', 1.days.ago])
59
+ @table_data = Munger::Report.new(:data => data)
60
+ @table_data.sort('hit_date').aggregate(:count => :action)
61
+ html_table = Munger::Render::Html.new(@table_data).render
62
+
63
+
64
+ * More Complex Example *
65
+
66
+ hits = WebpageHits.find(:all, :conditions => ['hit_date > ?', 7.days.ago])
67
+
68
+ data = Munger::Data.new
69
+ data.transform_column('hit_date') { |row| row.hit_date.day }
70
+ data.add_column('controller') { |row| row.action.split('/').first }
71
+
72
+ day_columns = data.pivot('hit_date', 'action', 'ip_address', :count)
73
+
74
+ @table_data = Munger::Report.new(:data => data,
75
+ :columns => [:action] + day_columns,
76
+ :aggregate => {:sum => day_columns})
77
+
78
+ @table_data.sort('action').subgroup('controller')
79
+ @table_data.process.style_cells('low_traffic', :only => new_columns) do |cell, row|
80
+ # highlight any index pages that have < 500 hits
81
+ cell.to_i < 500 if row.action =~ /index/
82
+ end
83
+
84
+ html_table = Munger::Render::Html.new(@table_data).render
85
+
86
+
87
+
88
+
89
+
90
+
data/Rakefile ADDED
@@ -0,0 +1,21 @@
1
+ require 'rubygems'
2
+ #Gem::manage_gems
3
+ require 'rake/gempackagetask'
4
+ require 'rake/rdoctask'
5
+ require 'spec/rake/spectask'
6
+
7
+ begin
8
+ require 'jeweler'
9
+ Jeweler::Tasks.new do |gemspec|
10
+ gemspec.name = "kbaum-munger"
11
+ gemspec.version="0.1.4"
12
+ gemspec.summary = "fork of munger reporting to create a gem"
13
+ gemspec.description = "A different and possibly longer explanation of"
14
+ gemspec.email = "karl@weshopnetwork.com"
15
+ gemspec.homepage = "http://github.com/kbaum/munger"
16
+ gemspec.authors = ["Karl Baum"]
17
+ end
18
+ rescue LoadError
19
+ puts "Jeweler not available. Install it with: sudo gem install jeweler"
20
+ end
21
+
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.4
@@ -0,0 +1,30 @@
1
+ require File.dirname(__FILE__) + "/example_helper"
2
+ include ExampleHelper
3
+
4
+ data = Munger::Data.load_data(test_data)
5
+
6
+ data.add_column([:advert, :rate]) do |row|
7
+ rate = (row.clicks / row.airtime)
8
+ [row.advert.capitalize, rate]
9
+ end
10
+
11
+ #data.filter_rows { |row| row.rate > 10 }
12
+
13
+ #new_columns = data.pivot('airtime', 'advert', 'rate', :average)
14
+
15
+ report = Munger::Report.from_data(data)
16
+ report.columns(:advert => 'Spot', :airdate => 'Air Date', :airtime => 'Airtime', :rate => 'Rate')
17
+ report.sort = [['airtime', :asc], ['rate', :asc]]
18
+ #report.subgroup('airtime')
19
+ #report.aggregate(Proc.new {|arr| arr.inject(0) {|total, i| i * i + (total - 30) }} => :airtime, :avg => :rate)
20
+ report.process
21
+
22
+ report.style_cells('myRed', :only => :rate) { |cell, row| (cell.to_i < 10) }
23
+
24
+ #puts html = Munger::Render.to_html(report, :classes => {:table => 'other-class'} )
25
+ puts text = Munger::Render.to_text(report)
26
+
27
+
28
+ f = File.open('test.html', 'w')
29
+ f.write(html)
30
+ f.close
@@ -0,0 +1,2 @@
1
+ GET /example | Status: 200 | Params: {:format=>"html"}
2
+ GET /favicon.ico | Status: 404 | Params: {}
@@ -0,0 +1,23 @@
1
+ require File.expand_path(File.dirname(__FILE__) + "/../lib/munger")
2
+
3
+ require 'fileutils'
4
+ require 'logger'
5
+ require 'pp'
6
+
7
+ module ExampleHelper
8
+ def test_data
9
+ [
10
+ {:advert => "spot 1", :airtime => 15, :airdate => "2008-01-01", :clicks => 301},
11
+ {:advert => "spot 1", :airtime => 30, :airdate => "2008-01-02", :clicks => 199},
12
+ {:advert => "spot 1", :airtime => 30, :airdate => "2008-01-03", :clicks => 234},
13
+ {:advert => "spot 1", :airtime => 15, :airdate => "2008-01-04", :clicks => 342},
14
+ {:advert => "spot 2", :airtime => 30, :airdate => "2008-01-01", :clicks => 172},
15
+ {:advert => "spot 2", :airtime => 15, :airdate => "2008-01-02", :clicks => 217},
16
+ {:advert => "spot 2", :airtime => 90, :airdate => "2008-01-03", :clicks => 1023},
17
+ {:advert => "spot 2", :airtime => 30, :airdate => "2008-01-04", :clicks => 321},
18
+ {:advert => "spot 3", :airtime => 60, :airdate => "2008-01-01", :clicks => 512},
19
+ {:advert => "spot 3", :airtime => 30, :airdate => "2008-01-02", :clicks => 813},
20
+ {:advert => "spot 3", :airtime => 15, :airdate => "2008-01-03", :clicks => 333},
21
+ ]
22
+ end
23
+ end
@@ -0,0 +1,100 @@
1
+ require 'rubygems'
2
+ require 'sinatra'
3
+ require File.expand_path(File.dirname(__FILE__) + "/../lib/munger")
4
+
5
+ get '/' do
6
+ data = Munger::Data.load_data(test_data)
7
+
8
+ report = Munger::Report.from_data(data)
9
+ report.process
10
+
11
+ out = Munger::Render.to_html(report, :classes => {:table => 'other-class'} )
12
+ show(out)
13
+ end
14
+
15
+ get '/pivot' do
16
+ data = Munger::Data.load_data(test_data)
17
+
18
+ data.add_column([:advert, :rate]) do |row|
19
+ rate = (row.clicks / row.airtime)
20
+ [row.advert.capitalize, rate]
21
+ end
22
+
23
+ new_columns = data.pivot('airtime', 'advert', 'rate', :average)
24
+
25
+ report = Munger::Report.from_data(data)
26
+ report.columns([:advert] + new_columns.sort)
27
+ report.process
28
+
29
+ report.style_cells('myRed', :only => new_columns) { |cell, row| (cell.to_i < 10 && cell.to_i > 0) }
30
+
31
+ out = Munger::Render.to_html(report, :classes => {:table => 'other-class'} )
32
+
33
+ show(out)
34
+ end
35
+
36
+ get '/example' do
37
+ data = Munger::Data.load_data(test_data)
38
+
39
+ data.add_column([:advert, :rate]) do |row|
40
+ rate = (row.clicks / row.airtime)
41
+ [row.advert.capitalize, rate]
42
+ end
43
+
44
+ #data.filter_rows { |row| row.rate > 10 }
45
+ #new_columns = data.pivot('airtime', 'advert', 'rate', :average)
46
+
47
+ report = Munger::Report.from_data(data)
48
+ report.columns(:advert => 'Spot', :airdate => 'Air Date', :airtime => 'Airtime', :rate => 'Rate')
49
+ report.sort = [['airtime', :asc], ['rate', :asc]]
50
+ report.subgroup('airtime', :with_titles => true)
51
+ report.aggregate(Proc.new {|arr| arr.inject(0) {|total, i| i * i + (total - 30) }} => :airtime, :average => :rate)
52
+ report.process
53
+
54
+ report.style_cells('myRed', :only => :rate) { |cell, row| (cell.to_i < 10) }
55
+
56
+ out = Munger::Render.to_html(report, :classes => {:table => 'other-class'} )
57
+
58
+ show(out)
59
+ end
60
+
61
+ def test_data
62
+ [
63
+ {:advert => "spot 1", :airtime => 15, :airdate => "2008-01-01", :clicks => 301},
64
+ {:advert => "spot 1", :airtime => 30, :airdate => "2008-01-02", :clicks => 199},
65
+ {:advert => "spot 1", :airtime => 30, :airdate => "2008-01-03", :clicks => 234},
66
+ {:advert => "spot 1", :airtime => 15, :airdate => "2008-01-04", :clicks => 342},
67
+ {:advert => "spot 2", :airtime => 30, :airdate => "2008-01-01", :clicks => 172},
68
+ {:advert => "spot 2", :airtime => 15, :airdate => "2008-01-02", :clicks => 217},
69
+ {:advert => "spot 2", :airtime => 90, :airdate => "2008-01-03", :clicks => 1023},
70
+ {:advert => "spot 2", :airtime => 30, :airdate => "2008-01-04", :clicks => 321},
71
+ {:advert => "spot 3", :airtime => 60, :airdate => "2008-01-01", :clicks => 512},
72
+ {:advert => "spot 3", :airtime => 30, :airdate => "2008-01-02", :clicks => 813},
73
+ {:advert => "spot 3", :airtime => 15, :airdate => "2008-01-03", :clicks => 333},
74
+ ]
75
+ end
76
+
77
+ def show(data)
78
+ %Q(
79
+ <html>
80
+ <head>
81
+ <style>
82
+ .myRed { background: #e44; }
83
+
84
+ tr.group0 { background: #bbb;}
85
+ tr.group1 { background: #ddd;}
86
+
87
+ tr.groupHeader1 { background: #ccc;}
88
+
89
+ table tr td {padding: 0 15px;}
90
+ table tr th { background: #aaa; padding: 5px; }
91
+ body { font-family: verdana, "Lucida Grande", arial, helvetica, sans-serif;
92
+ color: #333; }
93
+ </style>
94
+ </head>
95
+ <body>
96
+ #{data}
97
+ </body>
98
+ </html>
99
+ )
100
+ end
File without changes
data/lib/munger.rb ADDED
@@ -0,0 +1,16 @@
1
+ $:.unshift(File.dirname(__FILE__)) unless
2
+ $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
3
+
4
+ require 'munger/data'
5
+ require 'munger/report'
6
+ require 'munger/item'
7
+
8
+ require 'munger/render'
9
+ require 'munger/render/csv'
10
+ require 'munger/render/html'
11
+ require 'munger/render/sortable_html'
12
+ require 'munger/render/text'
13
+
14
+ module Munger
15
+ VERSION = '0.1.3'
16
+ end
@@ -0,0 +1,232 @@
1
+ module Munger #:nodoc:
2
+
3
+ # this class is a data munger
4
+ # it takes raw data (arrays of hashes, basically)
5
+ # and can manipulate it in various interesting ways
6
+ class Data
7
+
8
+ attr_accessor :data
9
+
10
+ # will accept active record collection or array of hashes
11
+ def initialize(options = {})
12
+ @data = options[:data] if options[:data]
13
+ yield self if block_given?
14
+ end
15
+
16
+ def <<(data)
17
+ add_data(data)
18
+ end
19
+
20
+ def add_data(data)
21
+ if @data
22
+ @data = @data + data
23
+ else
24
+ @data = data
25
+ end
26
+ @data
27
+ end
28
+
29
+
30
+ #--
31
+ # NOTE:
32
+ # The name seems redundant; why:
33
+ # Munger::Data.load_data(data)
34
+ # and not:
35
+ # Munger::Data.load(data)
36
+ #++
37
+ def self.load_data(data, options = {})
38
+ Data.new(:data => data)
39
+ end
40
+
41
+ def columns
42
+ @columns ||= clean_data(@data.first).to_hash.keys
43
+ rescue
44
+ puts clean_data(@data.first).to_hash.inspect
45
+ end
46
+
47
+ # :default: The default value to use for the column in existing rows.
48
+ # Set to nil if not specified.
49
+ # if a block is passed, you can set the values manually
50
+ def add_column(names, options = {})
51
+ default = options[:default] || nil
52
+ @data.each_with_index do |row, index|
53
+ if block_given?
54
+ col_data = yield Item.ensure(row)
55
+ else
56
+ col_data = default
57
+ end
58
+
59
+ if names.is_a? Array
60
+ names.each_with_index do |col, i|
61
+ row[col] = col_data[i]
62
+ end
63
+ else
64
+ row[names] = col_data
65
+ end
66
+ @data[index] = Item.ensure(row)
67
+ end
68
+ end
69
+ alias :add_columns :add_column
70
+ alias :transform_column :add_column
71
+ alias :transform_columns :add_column
72
+
73
+ def clean_data(hash_or_ar)
74
+ if hash_or_ar.is_a? Hash
75
+ return Item.ensure(hash_or_ar)
76
+ elsif hash_or_ar.respond_to? :attributes
77
+ return Item.ensure(hash_or_ar.attributes)
78
+ end
79
+ hash_or_ar
80
+ end
81
+
82
+ def filter_rows
83
+ new_data = []
84
+
85
+ @data.each do |row|
86
+ row = Item.ensure(row)
87
+ if (yield row)
88
+ new_data << row
89
+ end
90
+ end
91
+
92
+ @data = new_data
93
+ end
94
+
95
+ # group the data like sql
96
+ def group(groups, agg_hash = {})
97
+ data_hash = {}
98
+
99
+ agg_columns = []
100
+ agg_hash.each do |key, columns|
101
+ Data.array(columns).each do |col| # column name
102
+ agg_columns << col
103
+ end
104
+ end
105
+ agg_columns = agg_columns.uniq.compact
106
+
107
+ @data.each do |row|
108
+ row_key = Data.array(groups).map { |rk| row[rk] }
109
+ data_hash[row_key] ||= {:cells => {}, :data => {}, :count => 0}
110
+ focus = data_hash[row_key]
111
+ focus[:data] = clean_data(row)
112
+
113
+ agg_columns.each do |col|
114
+ focus[:cells][col] ||= []
115
+ focus[:cells][col] << row[col]
116
+ end
117
+ focus[:count] += 1
118
+ end
119
+
120
+ new_data = []
121
+ new_keys = []
122
+
123
+ data_hash.each do |row_key, data|
124
+ new_row = data[:data]
125
+ agg_hash.each do |key, columns|
126
+ Data.array(columns).each do |col| # column name
127
+ newcol = ''
128
+ if key.is_a?(Array) && key[1].is_a?(Proc)
129
+ newcol = key[0].to_s + '_' + col.to_s
130
+ new_row[newcol] = key[1].call(data[:cells][col])
131
+ else
132
+ newcol = key.to_s + '_' + col.to_s
133
+ case key
134
+ when :average
135
+ sum = data[:cells][col].inject { |sum, a| sum + a }
136
+ new_row[newcol] = (sum / data[:count])
137
+ when :count
138
+ new_row[newcol] = data[:count]
139
+ else
140
+ new_row[newcol] = data[:cells][col].inject { |sum, a| sum + a }
141
+ end
142
+ end
143
+ new_keys << newcol
144
+ end
145
+ end
146
+ new_data << Item.ensure(new_row)
147
+ end
148
+
149
+ @data = new_data
150
+ new_keys.compact
151
+ end
152
+
153
+ def pivot(columns, rows, value, aggregation = :sum)
154
+ data_hash = {}
155
+
156
+ @data.each do |row|
157
+ column_key = Data.array(columns).map { |rk| row[rk] }
158
+ row_key = Data.array(rows).map { |rk| row[rk] }
159
+ data_hash[row_key] ||= {}
160
+ data_hash[row_key][column_key] ||= {:sum => 0, :data => {}, :count => 0}
161
+ focus = data_hash[row_key][column_key]
162
+ focus[:data] = clean_data(row)
163
+ focus[:count] += 1
164
+ focus[:sum] += row[value]
165
+ end
166
+
167
+ new_data = []
168
+ new_keys = {}
169
+
170
+ data_hash.each do |row_key, row_hash|
171
+ new_row = {}
172
+ row_hash.each do |column_key, data|
173
+ column_key.each do |ckey|
174
+ new_row.merge!(data[:data])
175
+ case aggregation
176
+ when :average
177
+ new_row[ckey] = (data[:sum] / data[:count])
178
+ when :count
179
+ new_row[ckey] = data[:count]
180
+ else
181
+ new_row[ckey] = data[:sum]
182
+ end
183
+ new_keys[ckey] = true
184
+ end
185
+ end
186
+ new_data << Item.ensure(new_row)
187
+ end
188
+
189
+ @data = new_data
190
+ new_keys.keys
191
+ end
192
+
193
+ def self.array(string_or_array)
194
+ if string_or_array.is_a? Array
195
+ return string_or_array
196
+ else
197
+ return [string_or_array]
198
+ end
199
+ end
200
+
201
+ def size
202
+ @data.size
203
+ end
204
+ alias :length :size
205
+
206
+ def valid?
207
+ if ((@data.size > 0) &&
208
+ (@data.respond_to? :each_with_index) &&
209
+ (@data.first.respond_to? :keys)) &&
210
+ (!@data.first.is_a? String)
211
+ return true
212
+ else
213
+ return false
214
+ end
215
+ rescue
216
+ false
217
+ end
218
+
219
+ # cols is an array of column names, if given the nested arrays are built in this order
220
+ def to_a(cols=nil)
221
+ array = []
222
+ cols ||= self.columns
223
+ @data.each do |row|
224
+ array << cols.inject([]){ |a,col| a << row[col] }
225
+ end
226
+ array
227
+ end
228
+
229
+ end
230
+
231
+ end
232
+