shades 0.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,15 @@
1
+ ---
2
+ !binary "U0hBMQ==":
3
+ metadata.gz: !binary |-
4
+ Y2M3MzYxNDM1Y2VlYTY5NDQxMzFhMTgzZmIyNjExMzg3MGY2MTIyMg==
5
+ data.tar.gz: !binary |-
6
+ ZWJjZDEyZjVkM2NmMzhjMzMwN2MwZjAzMDVkN2VkZTUzMGFhMzJiOA==
7
+ !binary "U0hBNTEy":
8
+ metadata.gz: !binary |-
9
+ NWZjNGY4OTk5YmNhMDhlYWNkYWU0ZGIwYzVkNTViMzE1ZTIwN2EzZDdiNDYy
10
+ YjMwNmEwOTVlYWFiMmM0ZTNkMWQ4MTQwNGM3MDYxZThlMzA1NDJmZjA0NGRj
11
+ YThiNzMwMDEyOTQ3ZjE0YWJiNDRlNjdiODIwNTk2MjdmMzFlM2E=
12
+ data.tar.gz: !binary |-
13
+ MmJkYWU5NGI2NzhlODcyNzdmMWE1MGIzMzg3YzNjMDQwNTZjNTc3YWI2ZDZk
14
+ ZWNkNjQ4MzA5M2IwYzYyMWFjNTRmMWFkMTk5Y2UxOGI0NzhkMjFlZDU4NDRh
15
+ MWY5ZjNmZGEzYzdhMTQ1MDc4Y2EwNjMyMTBiOTAzY2NiOTM4MTk=
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ coverage
6
+ InstalledFiles
7
+ lib/bundler/man
8
+ pkg
9
+ rdoc
10
+ spec/reports
11
+ test/tmp
12
+ test/version_tmp
13
+ tmp
14
+
15
+ # YARD artifacts
16
+ .yardoc
17
+ _yardoc
18
+ doc/
19
+ tags
data/LICENSE ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2013 Dietrich Featherston
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7
+ the Software, and to permit persons to whom the Software is furnished to do so,
8
+ subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
15
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
16
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,83 @@
1
+ # Shades
2
+
3
+ Get a new perspective on your data. In-memory [OLAP cubing](http://en.wikipedia.org/wiki/OLAP_cube), histograms, and more for Ruby.
4
+
5
+ ![](https://dl.dropboxusercontent.com/u/1133314/i/shades.gif)
6
+
7
+ ## As a command line utility for OLAP cubing
8
+
9
+ The ```shades``` utility will accept whitespace-delimited data, one event per line, preceeded by two commented lines describing the dimensions and data within.
10
+
11
+ ```
12
+ # dimensions: timestamp transactionid customer item
13
+ # measures: quantity amount
14
+ 1371958271 1 jack golfclubs 3 75.00
15
+ 1371937693 1 jack gin 2 40.00
16
+ 1371979661 2 jane jar 6 6.00
17
+ ```
18
+
19
+ Each line will be parsed as a ```Shades::Event``` according to the metadata given in the first two lines. So the line
20
+
21
+ ```
22
+ 1371937693 1 jack gin 2 40.00
23
+ ```
24
+
25
+ Will create a ```Shades::Event``` of the form:
26
+
27
+ ```
28
+ dimensions:
29
+ timestamp = 1371937693
30
+ transactionid = 1
31
+ customer = jack
32
+ item = gin
33
+ measures:
34
+ quantity = 2
35
+ amount = 40.00
36
+ ```
37
+
38
+ Then we can perform simple aggregations like so. This one finds the total amount each customer has spent
39
+
40
+ ```> cat transactions.txt | shades "sum(amount) by customer"```
41
+
42
+ ```
43
+ customer amount
44
+ jack 115.00
45
+ jane 6.00
46
+ ```
47
+
48
+ ## As a command line utility for histogramming
49
+
50
+ Histograms are indespensible for understanding value distributions in a data set--especially distributions with a long tail or heavy skew like response times in computer systems or cost of goods on Amazon. Typically it is difficult to pick appropriate bin widths if you don't already have a solid understanding of the data. Shades implements dynamic rebalancing histograms based on [this paper](http://pages.cs.wisc.edu/~donjerko/hist.pdf) so they always make sense for your data set.
51
+
52
+ Say another file with the same structure as above includes one-minute system load averages as ```load1```
53
+
54
+ ```
55
+ cat hoststats.txt | histo load1
56
+ 0.174 ( 7) #######
57
+ 0.805 ( 30) ##############################
58
+ 1.974 ( 11) ###########
59
+ 2.936 ( 10) ##########
60
+ 3.911 ( 8) ########
61
+ 5.164 ( 5) #####
62
+ 6.744 ( 7) #######
63
+ 7.852 ( 4) ####
64
+ 9.310 ( 1) #
65
+ 20.250 ( 1) #
66
+ ```
67
+
68
+ Each of these lines is a histogram bucket with the average value on the left and the number of items in the bucket in parenthesis. So the line ```5.164 ( 5) #####``` can be read as "there are 5 values with a mean close to 5.164".
69
+
70
+ You can even feed data cubing output from above into the ```histo``` utility. Let's say we look back at the customer transaction data from above. To get a sense of the distribution of transaction amounts, you would simply do the following.
71
+
72
+ ```
73
+ cat transactions.txt | shades -p "sum(amount) by transactionid" | histo amount
74
+ ```
75
+
76
+ ## Use in code
77
+
78
+ Shades also offers a public OLAP cubing API. See the ```shades``` and ```histo``` utilities for examples of building data cubes and histograms, respectively.
79
+
80
+ ## Roadmap
81
+
82
+ - Add 'where' clauses for filtering
83
+ - Numerosity bounding of output from ```shades``` by only including the top ranking rows in a set of dimensions.
data/Rakefile ADDED
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/bin/histo ADDED
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby
2
+ $: << File.realpath(File.dirname(__FILE__) + "/../lib")
3
+
4
+ require 'shades'
5
+
6
+ def main(mkey, max_buckets, output_width)
7
+
8
+ # set up the histogram data to accept streaming input
9
+ histo = Shades::DynamicHistogram.new(max_buckets)
10
+ p = Shades::StreamParser.new do |e|
11
+ histo.add(e.measure(mkey))
12
+ end
13
+
14
+ # stream stdin lines to the parser
15
+ $stdin.each_line do |line|
16
+ p << line
17
+ end
18
+
19
+ $stdout.puts histo.ascii_art
20
+ end
21
+
22
+ measure = ARGV[-1]
23
+ max_buckets = 10
24
+ output_width = 30
25
+
26
+ main(measure, max_buckets, output_width)
27
+
data/bin/shades ADDED
@@ -0,0 +1,50 @@
1
+ #!/usr/bin/env ruby
2
+ $: << File.realpath(File.dirname(__FILE__) + "/../lib")
3
+
4
+ require 'getoptlong'
5
+ require 'shades'
6
+
7
+ def main(parseable_output, q)
8
+
9
+ events_in = []
10
+ p = Shades::StreamParser.new do |e|
11
+ events_in << e
12
+ end
13
+
14
+ $stdin.each_line do |line|
15
+ p << line
16
+ end
17
+
18
+ events_out = q.run(events_in)
19
+ fmt = Shades::Formatter.new(" ")
20
+ if parseable_output
21
+ fmt.text($stdout, events_out)
22
+ else
23
+ fmt.pretty_text($stdout, events_out)
24
+ end
25
+ end
26
+
27
+ opt_parser = GetoptLong.new
28
+ opt_parser.set_options(
29
+ # create output that can itself be used as input to this program
30
+ ["-p", "--parseable", GetoptLong::NO_ARGUMENT]
31
+ )
32
+
33
+ parseable_output = false
34
+
35
+ begin
36
+ begin
37
+ opt,arg = opt_parser.get_option
38
+ break if not opt
39
+ case opt
40
+ when "-p" || "--parseable"
41
+ parseable_output = true
42
+ end
43
+ rescue => err
44
+ $stderr.puts err.message
45
+ end
46
+ end while 1
47
+
48
+ qs = ARGV[-1]
49
+ q = Shades::QueryParser::parse(qs)
50
+ main(parseable_output, q)
data/lib/cube.rb ADDED
@@ -0,0 +1,238 @@
1
+ module Shades
2
+ class Query
3
+ def initialize(opts)
4
+ dimensions = []
5
+ @dcs = opts[:categorizations].map do |d|
6
+ dimensions.push(d)
7
+ DimensionComputer.new(d, d)
8
+ end unless opts[:categorizations].nil?
9
+ @mrs = opts[:rollups].map do |r|
10
+ MeasureRollup.new(r[:measure], r[:measure], r[:stat]) unless opts[:rollups].nil?
11
+ end unless opts[:rollups].nil?
12
+ @sorting = opts[:sorting]
13
+ if !does_sorting?
14
+ # got a query with no sorting parameters. but surely we should
15
+ # return the info with some meaningful ordering
16
+ # so, we choose to order by the first measure returned in the query, largest values firs
17
+ @sorting = [{:key => outbound_measures.first, :asc => false}]
18
+ end
19
+ @pre = Processor.new(@dcs)
20
+ end
21
+
22
+ def rollup_list
23
+ @mrs
24
+ end
25
+
26
+ def does_sorting?
27
+ !@sorting.nil? && !@sorting.empty?
28
+ end
29
+
30
+ def does_rollups?
31
+ !@dcs.nil? && !@dcs.empty? && !@mrs.nil? && !@mrs.empty?
32
+ end
33
+
34
+ def outbound_measures
35
+ @mrs.map { |r| r.outbound_measure }
36
+ end
37
+
38
+ def run(events_in)
39
+ if does_rollups?
40
+ aggregator = Aggregator.new(self)
41
+ events_in.each do |event|
42
+ eout = @pre.send(event)
43
+ if !eout.nil?
44
+ aggregator.add(eout)
45
+ end
46
+ end
47
+ results = aggregator.snapshot
48
+ end
49
+ if !@sorting.nil?
50
+ results.sort! do |a, b|
51
+ multicompare(a, b)
52
+ end
53
+ end
54
+ results
55
+ end
56
+
57
+ def multicompare(a, b)
58
+ c = 0
59
+ @sorting.each do |s|
60
+ v1 = lookup(a, s[:key])
61
+ v2 = lookup(b, s[:key])
62
+ asc = s[:asc]
63
+ if v1 < v2
64
+ c = if asc; -1 else 1 end
65
+ elsif v2 < v1
66
+ c = if asc; 1 else -1 end
67
+ end
68
+ end
69
+ c
70
+ end
71
+
72
+ def lookup(e, k)
73
+ v = e.dimension(k)
74
+ if v.nil?
75
+ v = e.measure(k)
76
+ end
77
+ natural_order(v)
78
+ end
79
+
80
+ def natural_order(o)
81
+ begin
82
+ return Float(o)
83
+ rescue
84
+ end
85
+ o
86
+ end
87
+ end
88
+
89
+ class Processor
90
+
91
+ def initialize(dcs)
92
+ @dcs = dcs
93
+ end
94
+
95
+ def send(event)
96
+ # optionally filter stuff out here by some kind of predicate in the future
97
+ ###
98
+ # remap inbound to outbound dimensions
99
+ d = {}
100
+ dlist = []
101
+ @dcs.each do |dc|
102
+ dlist.push(dc.outbound_dimension)
103
+ d[dc.outbound_dimension] = dc.get_value(event)
104
+ end
105
+ m = {}
106
+ mlist = []
107
+ event.measures.each do |k|
108
+ mlist.push(k)
109
+ m[k] = event.measure(k)
110
+ end
111
+ Event.new(Metadata.new(dlist, mlist), d, m)
112
+ end
113
+ end
114
+
115
+ class DimensionComputer
116
+
117
+ def initialize(inbound, outbound)
118
+ @inbound = inbound
119
+ @outbound = outbound
120
+ end
121
+
122
+ def outbound_dimension
123
+ @outbound
124
+ end
125
+
126
+ def get_value(event)
127
+ event.dimension(@inbound)
128
+ end
129
+ end
130
+
131
+ class MeasureRollup
132
+
133
+ attr_accessor :stat
134
+ attr_accessor :outbound
135
+
136
+ def initialize(inbound, outbound, stat)
137
+ @inbound = inbound
138
+ @outbound = outbound
139
+ @stat = stat
140
+ end
141
+
142
+ def outbound_measure
143
+ @outbound
144
+ end
145
+ def get_value(event)
146
+ event.measure(@inbound)
147
+ end
148
+ end
149
+
150
+ class Aggregator
151
+
152
+ def initialize(query)
153
+ @query = query
154
+ @state = {}
155
+ end
156
+
157
+ def add(event)
158
+ agg_event = AggEvent.new(@query, event)
159
+ if @state.has_key?(agg_event.key)
160
+ @state[event.key].add(agg_event)
161
+ else
162
+ @state[event.key] = agg_event
163
+ end
164
+ end
165
+
166
+ def snapshot
167
+ @state.values
168
+ end
169
+ end
170
+
171
+ class AggEvent
172
+
173
+ attr_accessor :metadata
174
+
175
+ def initialize(query, event)
176
+ @query = query
177
+ @key = event.key
178
+ @dvalues = {}
179
+ @dlist = []
180
+ event.dimensions.each do |k|
181
+ @dvalues[k] = event.dimension(k)
182
+ @dlist.push(k)
183
+ end
184
+ @rollup_info_by_measure = {}
185
+ @stats_by_measure = {}
186
+ @mlist = []
187
+ @query.rollup_list.each do |r|
188
+ @mlist.push(r.outbound_measure)
189
+ outbound_measure = r.outbound_measure
190
+ @rollup_info_by_measure[outbound_measure] = r
191
+ initial_value = r.get_value(event)
192
+ stat = r.stat.call
193
+ stat.add(initial_value)
194
+ @stats_by_measure[outbound_measure] = stat
195
+ end
196
+ @metadata = Metadata.new(@dlist, @mlist)
197
+ end
198
+
199
+ def key
200
+ @key
201
+ end
202
+
203
+ def dimensions
204
+ @dlist
205
+ end
206
+
207
+ def dimension(d)
208
+ @dvalues[d]
209
+ end
210
+
211
+ def measure(m)
212
+ if @stats_by_measure.has_key?(m)
213
+ @stats_by_measure[m].get_value
214
+ else
215
+ 0.0
216
+ end
217
+ end
218
+
219
+ def measures
220
+ @mlist
221
+ end
222
+
223
+ def add(event)
224
+ measures.each do |k|
225
+ value = @rollup_info_by_measure[k].get_value(event)
226
+ @stats_by_measure[k].add(value)
227
+ end
228
+ end
229
+
230
+ def line
231
+ f = []
232
+ f << dimensions.map { |k| '%s' % dimension(k) }
233
+ f << measures.map { |k| '%.5f' % measure(k) }
234
+ f.join("\t")
235
+ end
236
+ end
237
+
238
+ end
data/lib/formatter.rb ADDED
@@ -0,0 +1,52 @@
1
+ module Shades
2
+ class Formatter
3
+ def initialize(spacer = " ")
4
+ @spacer = spacer
5
+ end
6
+ def text(out, events)
7
+ metadata = events[0].metadata unless events.empty?
8
+ lines = []
9
+ out.puts "# dimensions: %s" % (metadata.dimensions.join(@spacer))
10
+ out.puts "# measures: %s" % (metadata.measures.join(@spacer))
11
+ events.each do |e|
12
+ out.puts e.line
13
+ end
14
+ end
15
+ def pretty_text(out, events)
16
+ metadata = events[0].metadata unless events.empty?
17
+ lines = []
18
+ (events.length+1).times {|i|lines[i] = []}
19
+ metadata.dimensions.each do |d|
20
+ w = find_longest_field(d, events)
21
+ row = 1
22
+ events.each do |e|
23
+ v = e.dimension(d)
24
+ lines[row] << "%-#{w}s" % v
25
+ row += 1
26
+ end
27
+ lines[0] << "%-#{w}s" % d
28
+ end
29
+ metadata.measures.each do |m|
30
+ row = 1
31
+ max_value = find_abs_max(m, events) + 1.0
32
+ vlen = Integer(Math::log10(10*max_value)) + 5
33
+ w = [vlen, m.length].max
34
+ events.each do |e|
35
+ v = e.measure(m)
36
+ lines[row] << "%#{w}.4f" % v
37
+ row +=1
38
+ end
39
+ lines[0] << "%-#{w}s" % m
40
+ end
41
+ lines.map{|l|l.join(@spacer)}.each do |line|
42
+ out.puts line
43
+ end
44
+ end
45
+ def find_abs_max(m, events)
46
+ events.inject(Float::MIN) { |max, e| [max, e.measure(m).abs].max }
47
+ end
48
+ def find_longest_field(d, events)
49
+ events.inject(d.length) { |max, e| [max, e.dimension(d).length].max }
50
+ end
51
+ end
52
+ end
data/lib/histo.rb ADDED
@@ -0,0 +1,129 @@
1
+ # Histogram utilities
2
+ module Shades
3
+
4
+ # streaming histograms:
5
+ # implementation of the clojure library from BigML: https://github.com/bigmlcom/histogram
6
+ class DynamicHistogram
7
+
8
+ def initialize(max_size)
9
+ @res = StreamReservoir.new(max_size)
10
+ end
11
+
12
+ def add(f)
13
+ @res.add(StreamBin.new(1, f))
14
+ @res.compress
15
+ end
16
+
17
+ def lines
18
+ @res.lines
19
+ end
20
+
21
+ def ascii_art
22
+ @res.histo_text
23
+ end
24
+ end
25
+
26
+ class StreamReservoir
27
+
28
+ def initialize(max_size)
29
+ @max_size = max_size
30
+ @n = 0
31
+ @bins = []
32
+ end
33
+
34
+ def add(bin)
35
+ @n += bin.count
36
+ # bind the bin index to place this data
37
+ i = placement(bin)
38
+ @bins.insert(i, bin)
39
+ end
40
+
41
+ def placement(bin)
42
+ @bins.length.times do |i|
43
+ if @bins[i].mean >= bin.mean
44
+ return i
45
+ end
46
+ end
47
+ return @bins.length
48
+ end
49
+
50
+ def compress
51
+ while @bins.length > @max_size
52
+ min_gap_index = -1
53
+ min_gap = Float::MAX
54
+ # find the bin covering the smallest range
55
+ (@bins.length-1).times do |i|
56
+ bin_a = @bins[i]
57
+ bin_b = @bins[i+1]
58
+ gap = bin_b.mean - bin_a.mean
59
+ if min_gap > gap
60
+ min_gap = gap
61
+ min_gap_index = i
62
+ end
63
+ end
64
+ # and merge that bin with the one to its right
65
+ prevbin = @bins[min_gap_index]
66
+ nextbin = @bins.delete_at(min_gap_index+1)
67
+ prevbin.merge(nextbin)
68
+ end
69
+ end
70
+
71
+ def lines
72
+ a = []
73
+ @bins.each do |b|
74
+ a << "%8d %10.4f" % [b.count, b.mean]
75
+ end
76
+ a.join("\n")
77
+ end
78
+
79
+ ## outputs a histogram of the form
80
+ # 0.502 ( 27) ##############################
81
+ # 1.108 ( 14) ###############
82
+ # 1.731 ( 7) #######
83
+ # 2.343 ( 3) ###
84
+ # 3.138 ( 4) ####
85
+ # 3.968 ( 6) ######
86
+ # 4.548 ( 4) ####
87
+ # 5.225 ( 2) ##
88
+ # 5.990 ( 2) ##
89
+ # 8.720 ( 1) #
90
+ ##
91
+ ## So, the line above that reads "0.502 ( 27) ##############################"
92
+ ## can be read as: "There are 27 values close to 0.502"
93
+ def histo_text
94
+ a = []
95
+ max_bin_count = 1
96
+ width = 30
97
+ @bins.each do |b|
98
+ if b.count > max_bin_count
99
+ max_bin_count = b.count
100
+ end
101
+ end
102
+ @bins.each do |b|
103
+ repeat = width * Float(b.count)/Float(max_bin_count)
104
+ a << "%10.3f (%3d) %s" % [b.mean, b.count, '#' * repeat]
105
+ end
106
+ a.join("\n")
107
+ end
108
+ end
109
+
110
+ class StreamBin
111
+
112
+ attr_accessor :count
113
+ attr_accessor :sum
114
+
115
+ def initialize(count, sum)
116
+ @count = count
117
+ @sum = sum
118
+ end
119
+
120
+ def merge(sb)
121
+ @count += sb.count
122
+ @sum += sb.sum
123
+ end
124
+
125
+ def mean
126
+ Float(@sum) / Float(@count)
127
+ end
128
+ end
129
+ end
data/lib/model.rb ADDED
@@ -0,0 +1,65 @@
1
+ module Shades
2
+ class Metadata
3
+ attr_accessor :dimensions
4
+ attr_accessor :measures
5
+ def initialize(dimensions, measures)
6
+ @dimensions = dimensions
7
+ @measures = measures
8
+ end
9
+
10
+ # parse an event line that adheres to this metadat
11
+ def parse_event(line, sep)
12
+ values = line.split(sep)
13
+ d = {}
14
+ @dimensions.zip(values.take(@dimensions.length)).each do |k, v|
15
+ d[k] = v.strip
16
+ end
17
+ m = {}
18
+ @measures.zip(values.drop(@dimensions.length)).each do |k, v|
19
+ m[k] = Float(v.strip)
20
+ end
21
+ Event.new(self, d, m)
22
+ end
23
+ end
24
+
25
+ class Event
26
+
27
+ attr_accessor :metadata
28
+
29
+ def initialize(metadata, dimensions, measures)
30
+ @metadata = metadata
31
+ @dvalues = dimensions
32
+ @mvalues = measures
33
+ @key = @dvalues.keys.map{ |k| k + "=" + @dvalues.fetch(k) }.join(";")
34
+ end
35
+
36
+ def key
37
+ @key
38
+ end
39
+
40
+ def dimension(d)
41
+ @dvalues[d]
42
+ end
43
+
44
+ def measure(m)
45
+ @mvalues[m]
46
+ end
47
+
48
+ def dimensions
49
+ @metadata.dimensions
50
+ end
51
+
52
+ def measures
53
+ @metadata.measures
54
+ end
55
+
56
+ def line
57
+ f = []
58
+ f << @metadata.dimensions.map { |k| '%s' % dimension(k) }
59
+ f << @metadata.measures.map { |k| '%.5f' % measure(k) }
60
+ puts f.inspect
61
+ f.join("\t")
62
+ end
63
+ end
64
+ end
65
+
@@ -0,0 +1,166 @@
1
+ module Shades
2
+
3
+ ## queries are of the form:
4
+ ## <stat-type> <measure>[, [<stat-type>] <measure>]* by <dimension>[, <dimension>] order by <dimension|measure>[, <dimension|measure>]*
5
+ ## for example, to get the mean load1, and load5 measures by unique combination of host role and kernel version:
6
+ ## mean load1, load5 by role, kernelversion
7
+ class QueryParser
8
+
9
+ def self.parse(qs)
10
+ parts = qs.scan(/\w+/)
11
+ tokens = []
12
+ t = BeginRollupToken.new
13
+ parts.each do |p|
14
+ t = t.emit(p)
15
+ tokens << t
16
+ end
17
+ rollups = rollups_pass(tokens)
18
+ categorizations = categorizations_pass(tokens)
19
+ sorting = sorting_pass(tokens)
20
+ Query.new(:categorizations => categorizations, :rollups => rollups, :sorting => sorting)
21
+ end
22
+
23
+ def self.rollups_pass(tokens)
24
+ stat = nil
25
+ r = []
26
+ tokens.each do |t|
27
+ if t.kind_of? StatTypeToken
28
+ stat = t.stat
29
+ elsif t.kind_of? MeasureRefToken
30
+ r << { :measure => t.text, :stat => stat }
31
+ end
32
+ end
33
+ r
34
+ end
35
+
36
+ def self.categorizations_pass(tokens)
37
+ d = []
38
+ tokens.each do |t|
39
+ if t.kind_of? DimensionRefToken
40
+ d << t.text
41
+ end
42
+ end
43
+ d
44
+ end
45
+
46
+ def self.sorting_pass(tokens)
47
+ s = []
48
+ tokens.each do |t|
49
+ if t.kind_of? SortKeyToken
50
+ s << { :key => t.text, :asc => true }
51
+ end
52
+ end
53
+ s
54
+ end
55
+ end
56
+
57
+ class Token
58
+ attr_accessor :text
59
+ def initialize(s)
60
+ @text = s
61
+ end
62
+ def to_s
63
+ @text
64
+ end
65
+ end
66
+
67
+ class BeginRollupToken < Token
68
+ def initialize
69
+ super("<begin>")
70
+ end
71
+ def emit(s)
72
+ StatTypeToken::parse(s)
73
+ end
74
+ end
75
+
76
+ class StatTypeToken < Token
77
+ attr_accessor :stat
78
+ def initialize(name, stat)
79
+ super(name)
80
+ @stat = stat
81
+ end
82
+
83
+ ## context free parsing of the next token to be called by the prior token
84
+ def self.parse(s)
85
+ stat = Stats::StatsType::get(s)
86
+ if stat.nil?
87
+ nil
88
+ else
89
+ StatTypeToken.new(s, stat)
90
+ end
91
+ end
92
+
93
+ ## given the next string, parse and return the next token
94
+ def emit(s)
95
+ # a measure must always follow a stat
96
+ MeasureRefToken::parse(s)
97
+ end
98
+ end
99
+
100
+ class MeasureRefToken < Token
101
+ def self.parse(s)
102
+ MeasureRefToken.new(s)
103
+ end
104
+ def emit(s)
105
+ if s.downcase.eql?("by")
106
+ BeginCategorizationToken::parse(s)
107
+ else
108
+ t = StatTypeToken::parse(s)
109
+ if !t.nil?
110
+ t
111
+ else
112
+ MeasureRefToken::parse(s)
113
+ end
114
+ end
115
+ end
116
+ end
117
+
118
+ class BeginCategorizationToken < Token
119
+ def self.parse(s)
120
+ BeginCategorizationToken.new(s)
121
+ end
122
+ def emit(s)
123
+ DimensionRefToken::parse(s)
124
+ end
125
+ end
126
+
127
+ class DimensionRefToken < Token
128
+ def self.parse(s)
129
+ DimensionRefToken.new(s)
130
+ end
131
+ def emit(s)
132
+ if s.downcase.eql?("order")
133
+ OrderToken::parse(s)
134
+ else
135
+ DimensionRefToken::parse(s)
136
+ end
137
+ end
138
+ end
139
+
140
+ class OrderToken < Token
141
+ def self.parse(s)
142
+ OrderToken.new(s)
143
+ end
144
+ def emit(s)
145
+ BeginSortingToken::parse(s)
146
+ end
147
+ end
148
+
149
+ class BeginSortingToken < Token
150
+ def self.parse(s)
151
+ BeginSortingToken.new(s)
152
+ end
153
+ def emit(s)
154
+ SortKeyToken::parse(s)
155
+ end
156
+ end
157
+
158
+ class SortKeyToken < Token
159
+ def self.parse(s)
160
+ SortKeyToken.new(s)
161
+ end
162
+ def emit(s)
163
+ SortKeyToken::parse(s)
164
+ end
165
+ end
166
+ end
data/lib/shades.rb ADDED
@@ -0,0 +1,10 @@
1
+ require 'model'
2
+ require 'cube'
3
+ require 'histo'
4
+ require 'stats'
5
+ require 'streamparser'
6
+ require 'queryparser'
7
+ require 'formatter'
8
+
9
+ module Shades
10
+ end
data/lib/stats.rb ADDED
@@ -0,0 +1,68 @@
1
+ module Stats
2
+
3
+ SUM = lambda { Sum.new(0.0) }
4
+ MEAN = lambda { Mean.new(0.0) }
5
+
6
+ class StatsType
7
+ def self.get(name)
8
+ if name.eql?("sum")
9
+ SUM
10
+ elsif name.eql?("mean")
11
+ MEAN
12
+ end
13
+ end
14
+ end
15
+
16
+ class Sum < StatsType
17
+
18
+ def initialize(d)
19
+ @val = d
20
+ end
21
+
22
+ def add(d)
23
+ @val += d
24
+ end
25
+
26
+ def remove(d)
27
+ @val -= d
28
+ end
29
+
30
+ def merge(stat)
31
+ @val += stat.get_value
32
+ end
33
+
34
+ def get_value
35
+ @val
36
+ end
37
+ end
38
+
39
+ class Mean < StatsType
40
+
41
+ attr_accessor :sum
42
+ attr_accessor :n
43
+
44
+ def initialize(d)
45
+ @sum = 0.0
46
+ @n = 0.0
47
+ end
48
+
49
+ def add(d)
50
+ @sum += d
51
+ @n += 1.0
52
+ end
53
+
54
+ def remove(d)
55
+ @sum -= d
56
+ @n -= 1.0
57
+ end
58
+
59
+ def merge(stat)
60
+ @sum += stat.sum
61
+ @n += stat.n
62
+ end
63
+
64
+ def get_value
65
+ @sum / @n
66
+ end
67
+ end
68
+ end
@@ -0,0 +1,31 @@
1
+ module Shades
2
+ # parse a stream of events with whitespace delimited fields preceeded by metadata headers
3
+ class StreamParser
4
+
5
+ def initialize(&receiver)
6
+ @dimensions = nil
7
+ @measures = nil
8
+ @receiver = receiver
9
+ end
10
+
11
+ def <<(line)
12
+ line.strip!
13
+ if !@metadata.nil?
14
+ event = @metadata.parse_event(line, /\s+/)
15
+ @receiver.call(event)
16
+ elsif line.start_with?("#")
17
+ parts = line.scan(/\w+/)
18
+ if parts[0].eql?("dimensions")
19
+ @dimensions = parts.drop(1)
20
+ elsif parts[0].eql?("measures")
21
+ @measures = parts.drop(1)
22
+ end
23
+ if !@dimensions.nil? && !@measures.nil?
24
+ @metadata = Shades::Metadata.new(@dimensions, @measures)
25
+ end
26
+ else
27
+ $stderr.puts "discarding line received before metadata"
28
+ end
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,31 @@
1
+ #!/bin/sh
2
+ #/ Usage: build-ctags [<options>]
3
+ #/ Build tags file in the RAILS_ROOT directory. Any <options> provided are
4
+ #/ passed to ctags.
5
+ #/
6
+ #/ This command requires exuberant ctags to be installed.
7
+ set -e
8
+
9
+ # show usage message
10
+ if [ "$1" == "--help" ]; then
11
+ grep ^#/ <"$0" | cut -c4-
12
+ exit 2
13
+ fi
14
+
15
+ # change into RAILS_ROOT
16
+ cd "$(dirname "$0")/.."
17
+
18
+ # check that ctags version is correct
19
+ ctags --version >/dev/null 2>&1 || {
20
+ echo "ctags not found or too old." 1>&2
21
+ exit 1
22
+ }
23
+
24
+ # Run ctags with a bunch of ruby related config
25
+ RUBYALIAS='/.*alias(_method)?[[:space:]]+:([[:alnum:]_=!?]+),?[[:space:]]+:([[:alnum:]_=!]+)/\\2/f/'
26
+ exec ctags -R \
27
+ --tag-relative=yes \
28
+ --totals=yes \
29
+ --extra=+f \
30
+ --fields=+iaS \
31
+ --regex-ruby="$RUBYALIAS" "$@"
data/shades.gemspec ADDED
@@ -0,0 +1,28 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'shades'
3
+ s.version = '0.11'
4
+
5
+ s.summary = "Get a new perspective on your data. In-memory data cubing of event data for Ruby."
6
+ s.description = <<-EOF
7
+ Shades computes data cubes for you from events composed of dimensions and measures.
8
+ EOF
9
+
10
+ s.files = `git ls-files`.split("\n")
11
+ s.require_path = 'lib'
12
+ s.require_paths = %w[lib]
13
+ s.executables = ["shades", "histo"]
14
+
15
+ s.add_development_dependency 'rake-compiler'
16
+ s.add_development_dependency 'rspec'
17
+ s.add_development_dependency 'rdoc'
18
+
19
+ s.has_rdoc = true
20
+ s.rdoc_options += ['--title', 'shades', '--line-numbers', '--inline-source', '--main', 'README.md']
21
+ s.extra_rdoc_files += ['README.md', *Dir['lib/**/*.rb']]
22
+
23
+ s.authors = ["Dietrich Featherston"]
24
+ s.email = "d@d2fn.com"
25
+ s.homepage = "https://github.com/d2fn/shades-rb"
26
+ s.rubyforge_project = "shades"
27
+ s.license = "MIT"
28
+ end
data/transactions.txt ADDED
@@ -0,0 +1,5 @@
1
+ # dimensions: timestamp transactionid customer item
2
+ # measures: quantity amount
3
+ 1371958271 1 jack golfclubs 3 75.00
4
+ 1371937693 1 jack gin 2 40.00
5
+ 1371979661 2 jane jar 6 6.00
metadata ADDED
@@ -0,0 +1,123 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: shades
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.11'
5
+ platform: ruby
6
+ authors:
7
+ - Dietrich Featherston
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2013-06-25 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake-compiler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ! '>='
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ! '>='
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rspec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ! '>='
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ! '>='
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rdoc
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ! '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ! '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: ! ' Shades computes data cubes for you from events composed of dimensions
56
+ and measures.
57
+
58
+ '
59
+ email: d@d2fn.com
60
+ executables:
61
+ - shades
62
+ - histo
63
+ extensions: []
64
+ extra_rdoc_files:
65
+ - README.md
66
+ - lib/cube.rb
67
+ - lib/formatter.rb
68
+ - lib/histo.rb
69
+ - lib/model.rb
70
+ - lib/queryparser.rb
71
+ - lib/shades.rb
72
+ - lib/stats.rb
73
+ - lib/streamparser.rb
74
+ files:
75
+ - .gitignore
76
+ - LICENSE
77
+ - README.md
78
+ - Rakefile
79
+ - bin/histo
80
+ - bin/shades
81
+ - lib/cube.rb
82
+ - lib/formatter.rb
83
+ - lib/histo.rb
84
+ - lib/model.rb
85
+ - lib/queryparser.rb
86
+ - lib/shades.rb
87
+ - lib/stats.rb
88
+ - lib/streamparser.rb
89
+ - script/build-ctags
90
+ - shades.gemspec
91
+ - transactions.txt
92
+ homepage: https://github.com/d2fn/shades-rb
93
+ licenses:
94
+ - MIT
95
+ metadata: {}
96
+ post_install_message:
97
+ rdoc_options:
98
+ - --title
99
+ - shades
100
+ - --line-numbers
101
+ - --inline-source
102
+ - --main
103
+ - README.md
104
+ require_paths:
105
+ - lib
106
+ required_ruby_version: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ! '>='
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ required_rubygems_version: !ruby/object:Gem::Requirement
112
+ requirements:
113
+ - - ! '>='
114
+ - !ruby/object:Gem::Version
115
+ version: '0'
116
+ requirements: []
117
+ rubyforge_project: shades
118
+ rubygems_version: 2.0.3
119
+ signing_key:
120
+ specification_version: 4
121
+ summary: Get a new perspective on your data. In-memory data cubing of event data for
122
+ Ruby.
123
+ test_files: []