RubyGems - shades - Versions diffs - 0.11 - Mend

shades 0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,15 @@
+---
+!binary "U0hBMQ==":
+  metadata.gz: !binary |-
+    Y2M3MzYxNDM1Y2VlYTY5NDQxMzFhMTgzZmIyNjExMzg3MGY2MTIyMg==
+  data.tar.gz: !binary |-
+    ZWJjZDEyZjVkM2NmMzhjMzMwN2MwZjAzMDVkN2VkZTUzMGFhMzJiOA==
+!binary "U0hBNTEy":
+  metadata.gz: !binary |-
+    NWZjNGY4OTk5YmNhMDhlYWNkYWU0ZGIwYzVkNTViMzE1ZTIwN2EzZDdiNDYy
+    YjMwNmEwOTVlYWFiMmM0ZTNkMWQ4MTQwNGM3MDYxZThlMzA1NDJmZjA0NGRj
+    YThiNzMwMDEyOTQ3ZjE0YWJiNDRlNjdiODIwNTk2MjdmMzFlM2E=
+  data.tar.gz: !binary |-
+    MmJkYWU5NGI2NzhlODcyNzdmMWE1MGIzMzg3YzNjMDQwNTZjNTc3YWI2ZDZk
+    ZWNkNjQ4MzA5M2IwYzYyMWFjNTRmMWFkMTk5Y2UxOGI0NzhkMjFlZDU4NDRh
+    MWY5ZjNmZGEzYzdhMTQ1MDc4Y2EwNjMyMTBiOTAzY2NiOTM4MTk=

data/.gitignore ADDED Viewed

@@ -0,0 +1,19 @@
+*.gem
+*.rbc
+.bundle
+.config
+coverage
+InstalledFiles
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp
+# YARD artifacts
+.yardoc
+_yardoc
+doc/
+tags

data/LICENSE ADDED Viewed

@@ -0,0 +1,18 @@
+Copyright (c) 2013 Dietrich Featherston
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Shades
+Get a new perspective on your data. In-memory [OLAP cubing](http://en.wikipedia.org/wiki/OLAP_cube), histograms, and more for Ruby.
+![](https://dl.dropboxusercontent.com/u/1133314/i/shades.gif)
+## As a command line utility for OLAP cubing
+The ```shades``` utility will accept whitespace-delimited data, one event per line, preceeded by two commented lines describing the dimensions and data within.
+```
+# dimensions: timestamp transactionid customer item
+# measures: quantity amount
+1371958271 1 jack golfclubs  3 75.00
+1371937693 1 jack gin        2 40.00
+1371979661 2 jane jar        6  6.00
+```
+Each line will be parsed as a ```Shades::Event``` according to the metadata given in the first two lines. So the line
+```
+1371937693 1 jack gin        2 40.00
+```
+Will create a ```Shades::Event``` of the form:
+```
+dimensions:
+  timestamp      = 1371937693
+  transactionid  = 1
+  customer       = jack
+  item           = gin
+measures:
+  quantity       = 2
+  amount         = 40.00
+```
+Then we can perform simple aggregations like so. This one finds the total amount each customer has spent
+```> cat transactions.txt | shades "sum(amount) by customer"```
+```
+customer  amount
+jack      115.00
+jane        6.00
+```
+## As a command line utility for histogramming
+Histograms are indespensible for understanding value distributions in a data set--especially distributions with a long tail or heavy skew like response times in computer systems or cost of goods on Amazon. Typically it is difficult to pick appropriate bin widths if you don't already have a solid understanding of the data. Shades implements dynamic rebalancing histograms based on [this paper](http://pages.cs.wisc.edu/~donjerko/hist.pdf) so they always make sense for your data set.
+Say another file with the same structure as above includes one-minute system load averages as ```load1```
+```
+cat hoststats.txt | histo load1
+     0.174 (  7) #######
+     0.805 ( 30) ##############################
+     1.974 ( 11) ###########
+     2.936 ( 10) ##########
+     3.911 (  8) ########
+     5.164 (  5) #####
+     6.744 (  7) #######
+     7.852 (  4) ####
+     9.310 (  1) #
+    20.250 (  1) #
+```
+Each of these lines is a histogram bucket with the average value on the left and the number of items in the bucket in parenthesis. So the line ```5.164 (  5) #####``` can be read as "there are 5 values with a mean close to 5.164".
+You can even feed data cubing output from above into the ```histo``` utility. Let's say we look back at the customer transaction data from above. To get a sense of the distribution of transaction amounts, you would simply do the following.
+```
+cat transactions.txt | shades -p "sum(amount) by transactionid" | histo amount
+```
+## Use in code
+Shades also offers a public OLAP cubing API. See the ```shades``` and ```histo``` utilities for examples of building data cubes and histograms, respectively.
+## Roadmap
+- Add 'where' clauses for filtering
+- Numerosity bounding of output from ```shades``` by only including the top ranking rows in a set of dimensions.

data/Rakefile ADDED Viewed

	@@ -0,0 +1 @@
1	+ require "bundler/gem_tasks"

data/bin/histo ADDED Viewed

@@ -0,0 +1,27 @@
+#!/usr/bin/env ruby
+$: << File.realpath(File.dirname(__FILE__) + "/../lib")
+require 'shades'
+def main(mkey, max_buckets, output_width)
+  # set up the histogram data to accept streaming input
+  histo = Shades::DynamicHistogram.new(max_buckets)
+  p = Shades::StreamParser.new do |e|
+    histo.add(e.measure(mkey))
+  end
+  # stream stdin lines to the parser
+  $stdin.each_line do |line|
+    p << line
+  end
+  $stdout.puts histo.ascii_art
+end
+measure = ARGV[-1]
+max_buckets = 10
+output_width = 30
+main(measure, max_buckets, output_width)

data/bin/shades ADDED Viewed

@@ -0,0 +1,50 @@
+#!/usr/bin/env ruby
+$: << File.realpath(File.dirname(__FILE__) + "/../lib")
+require 'getoptlong'
+require 'shades'
+def main(parseable_output, q)
+  events_in = []
+  p = Shades::StreamParser.new do |e|
+    events_in << e
+  end
+  $stdin.each_line do |line|
+    p << line
+  end
+  events_out = q.run(events_in)
+  fmt = Shades::Formatter.new(" ")
+  if parseable_output
+    fmt.text($stdout, events_out)
+  else
+    fmt.pretty_text($stdout, events_out)
+  end
+end
+opt_parser = GetoptLong.new
+opt_parser.set_options(
+    # create output that can itself be used as input to this program
+    ["-p", "--parseable", GetoptLong::NO_ARGUMENT]
+  )
+parseable_output = false
+begin
+  begin
+    opt,arg = opt_parser.get_option
+    break if not opt
+    case opt
+      when "-p" || "--parseable"
+        parseable_output = true
+    end
+  rescue => err
+    $stderr.puts err.message
+  end
+end while 1
+qs = ARGV[-1]
+q = Shades::QueryParser::parse(qs)
+main(parseable_output, q)

data/lib/cube.rb ADDED Viewed

@@ -0,0 +1,238 @@
+module Shades
+  class Query
+    def initialize(opts)
+      dimensions = []
+      @dcs = opts[:categorizations].map do |d|
+        dimensions.push(d)
+        DimensionComputer.new(d, d)
+      end unless opts[:categorizations].nil?
+      @mrs = opts[:rollups].map do |r|
+        MeasureRollup.new(r[:measure], r[:measure], r[:stat]) unless opts[:rollups].nil?
+      end unless opts[:rollups].nil?
+      @sorting = opts[:sorting]
+      if !does_sorting?
+        # got a query with no sorting parameters. but surely we should
+        # return the info with some meaningful ordering
+        # so, we choose to order by the first measure returned in the query, largest values firs
+        @sorting = [{:key => outbound_measures.first, :asc => false}]
+      end
+      @pre = Processor.new(@dcs)
+    end
+    def rollup_list
+      @mrs
+    end
+    def does_sorting?
+      !@sorting.nil? && !@sorting.empty?
+    end
+    def does_rollups?
+      !@dcs.nil? && !@dcs.empty? && !@mrs.nil? && !@mrs.empty?
+    end
+    def outbound_measures
+      @mrs.map { |r| r.outbound_measure }
+    end
+    def run(events_in)
+      if does_rollups?
+        aggregator = Aggregator.new(self)
+        events_in.each do |event|
+          eout = @pre.send(event)
+          if !eout.nil?
+            aggregator.add(eout)
+          end
+        end
+        results = aggregator.snapshot
+      end
+      if !@sorting.nil?
+        results.sort! do |a, b|
+          multicompare(a, b)
+        end
+      end
+      results
+    end
+    def multicompare(a, b)
+      c = 0
+      @sorting.each do |s|
+        v1 = lookup(a, s[:key])
+        v2 = lookup(b, s[:key])
+        asc = s[:asc]
+        if v1 < v2
+          c = if asc; -1 else 1 end
+        elsif v2 < v1
+          c = if asc; 1 else -1 end
+        end
+      end
+      c
+    end
+    def lookup(e, k)
+      v = e.dimension(k)
+      if v.nil?
+        v = e.measure(k)
+      end
+      natural_order(v)
+    end
+    def natural_order(o)
+      begin
+        return Float(o)
+      rescue
+      end
+      o
+    end
+  end
+  class Processor
+    def initialize(dcs)
+      @dcs = dcs
+    end
+    def send(event)
+      # optionally filter stuff out here by some kind of predicate in the future
+      ###
+      # remap inbound to outbound dimensions
+      d = {}
+      dlist = []
+      @dcs.each do |dc|
+        dlist.push(dc.outbound_dimension)
+        d[dc.outbound_dimension] = dc.get_value(event)
+      end
+      m = {}
+      mlist = []
+      event.measures.each do |k|
+        mlist.push(k)
+        m[k] = event.measure(k)
+      end
+      Event.new(Metadata.new(dlist, mlist), d, m)
+    end
+  end
+  class DimensionComputer
+    def initialize(inbound, outbound)
+      @inbound = inbound
+      @outbound = outbound
+    end
+    def outbound_dimension
+      @outbound
+    end
+    def get_value(event)
+      event.dimension(@inbound)
+    end
+  end
+  class MeasureRollup
+    attr_accessor :stat
+    attr_accessor :outbound
+    def initialize(inbound, outbound, stat)
+      @inbound = inbound
+      @outbound = outbound
+      @stat = stat
+    end
+    def outbound_measure
+      @outbound
+    end
+    def get_value(event)
+      event.measure(@inbound)
+    end
+  end
+  class Aggregator
+    def initialize(query)
+      @query = query
+      @state = {}
+    end
+    def add(event)
+      agg_event = AggEvent.new(@query, event)
+      if @state.has_key?(agg_event.key)
+        @state[event.key].add(agg_event)
+      else
+        @state[event.key] = agg_event
+      end
+    end
+    def snapshot
+      @state.values
+    end
+  end
+  class AggEvent
+    attr_accessor :metadata
+    def initialize(query, event)
+      @query = query
+      @key = event.key
+      @dvalues = {}
+      @dlist = []
+      event.dimensions.each do |k|
+        @dvalues[k] = event.dimension(k)
+        @dlist.push(k)
+      end
+      @rollup_info_by_measure = {}
+      @stats_by_measure = {}
+      @mlist = []
+      @query.rollup_list.each do |r|
+        @mlist.push(r.outbound_measure)
+        outbound_measure = r.outbound_measure
+        @rollup_info_by_measure[outbound_measure] = r
+        initial_value = r.get_value(event)
+        stat = r.stat.call
+        stat.add(initial_value)
+        @stats_by_measure[outbound_measure] = stat
+      end
+      @metadata = Metadata.new(@dlist, @mlist)
+    end
+    def key
+      @key
+    end
+    def dimensions
+      @dlist
+    end
+    def dimension(d)
+      @dvalues[d]
+    end
+    def measure(m)
+      if @stats_by_measure.has_key?(m)
+        @stats_by_measure[m].get_value
+      else
+        0.0
+      end
+    end
+    def measures
+      @mlist
+    end
+    def add(event)
+      measures.each do |k|
+        value = @rollup_info_by_measure[k].get_value(event)
+        @stats_by_measure[k].add(value)
+      end
+    end
+    def line
+      f = []
+      f << dimensions.map { |k| '%s' % dimension(k) }
+      f << measures.map   { |k| '%.5f' % measure(k) }
+      f.join("\t")
+    end
+  end
+end

data/lib/formatter.rb ADDED Viewed

@@ -0,0 +1,52 @@
+module Shades
+  class Formatter
+    def initialize(spacer = " ")
+      @spacer = spacer
+    end
+    def text(out, events)
+      metadata = events[0].metadata unless events.empty?
+      lines = []
+      out.puts "# dimensions: %s" % (metadata.dimensions.join(@spacer))
+      out.puts "# measures: %s"   % (metadata.measures.join(@spacer))
+      events.each do |e|
+        out.puts e.line
+      end
+    end
+    def pretty_text(out, events)
+      metadata = events[0].metadata unless events.empty?
+      lines = []
+      (events.length+1).times {|i|lines[i] = []}
+      metadata.dimensions.each do |d|
+        w = find_longest_field(d, events)
+        row = 1
+        events.each do |e|
+          v = e.dimension(d)
+          lines[row] << "%-#{w}s" % v
+          row += 1
+        end
+        lines[0] << "%-#{w}s" % d
+      end
+      metadata.measures.each do |m|
+        row = 1
+        max_value = find_abs_max(m, events) + 1.0
+        vlen = Integer(Math::log10(10*max_value)) + 5
+        w = [vlen, m.length].max
+        events.each do |e|
+          v = e.measure(m)
+          lines[row] << "%#{w}.4f" % v
+          row +=1
+        end
+        lines[0] << "%-#{w}s" % m
+      end
+      lines.map{|l|l.join(@spacer)}.each do |line|
+        out.puts line
+      end
+    end
+    def find_abs_max(m, events)
+      events.inject(Float::MIN) { |max, e| [max, e.measure(m).abs].max }
+    end
+    def find_longest_field(d, events)
+      events.inject(d.length) { |max, e| [max, e.dimension(d).length].max }
+    end
+  end
+end

data/lib/histo.rb ADDED Viewed

@@ -0,0 +1,129 @@
+# Histogram utilities
+module Shades
+  # streaming histograms:
+  # implementation of the clojure library from BigML: https://github.com/bigmlcom/histogram
+  class DynamicHistogram
+    def initialize(max_size)
+      @res = StreamReservoir.new(max_size)
+    end
+    def add(f)
+      @res.add(StreamBin.new(1, f))
+      @res.compress
+    end
+    def lines
+      @res.lines
+    end
+    def ascii_art
+      @res.histo_text
+    end
+  end
+  class StreamReservoir
+    def initialize(max_size)
+      @max_size = max_size
+      @n = 0
+      @bins = []
+    end
+    def add(bin)
+      @n += bin.count
+      # bind the bin index to place this data
+      i = placement(bin)
+      @bins.insert(i, bin)
+    end
+    def placement(bin)
+      @bins.length.times do |i|
+        if @bins[i].mean >= bin.mean
+          return i
+        end
+      end
+      return @bins.length
+    end
+    def compress
+      while @bins.length > @max_size
+        min_gap_index = -1
+        min_gap = Float::MAX
+        # find the bin covering the smallest range
+        (@bins.length-1).times do |i|
+          bin_a = @bins[i]
+          bin_b = @bins[i+1]
+          gap = bin_b.mean - bin_a.mean
+          if min_gap > gap
+            min_gap = gap
+            min_gap_index = i
+          end
+        end
+        # and merge that bin with the one to its right
+        prevbin = @bins[min_gap_index]
+        nextbin = @bins.delete_at(min_gap_index+1)
+        prevbin.merge(nextbin)
+      end
+    end
+    def lines
+      a = []
+      @bins.each do |b|
+        a << "%8d %10.4f" % [b.count, b.mean]
+      end
+      a.join("\n")
+    end
+    ## outputs a histogram of the form
+    # 0.502 ( 27) ##############################
+    # 1.108 ( 14) ###############
+    # 1.731 (  7) #######
+    # 2.343 (  3) ###
+    # 3.138 (  4) ####
+    # 3.968 (  6) ######
+    # 4.548 (  4) ####
+    # 5.225 (  2) ##
+    # 5.990 (  2) ##
+    # 8.720 (  1) #
+    ##
+    ## So, the line above that reads "0.502 ( 27) ##############################"
+    ## can be read as: "There are 27 values close to 0.502"
+    def histo_text
+      a = []
+      max_bin_count = 1
+      width = 30
+      @bins.each do |b|
+        if b.count > max_bin_count
+          max_bin_count = b.count
+        end
+      end
+      @bins.each do |b|
+        repeat = width * Float(b.count)/Float(max_bin_count)
+        a << "%10.3f (%3d) %s" % [b.mean, b.count, '#' * repeat]
+      end
+      a.join("\n")
+    end
+  end
+  class StreamBin
+    attr_accessor :count
+    attr_accessor :sum
+    def initialize(count, sum)
+      @count = count
+      @sum = sum
+    end
+    def merge(sb)
+      @count += sb.count
+      @sum += sb.sum
+    end
+    def mean
+      Float(@sum) / Float(@count)
+    end
+  end
+end

data/lib/model.rb ADDED Viewed

@@ -0,0 +1,65 @@
+module Shades
+  class Metadata
+    attr_accessor :dimensions
+    attr_accessor :measures
+    def initialize(dimensions, measures)
+      @dimensions = dimensions
+      @measures = measures
+    end
+    # parse an event line that adheres to this metadat
+    def parse_event(line, sep)
+      values = line.split(sep)
+      d = {}
+      @dimensions.zip(values.take(@dimensions.length)).each do |k, v|
+        d[k] = v.strip
+      end
+      m = {}
+      @measures.zip(values.drop(@dimensions.length)).each do |k, v|
+        m[k] = Float(v.strip)
+      end
+      Event.new(self, d, m)
+    end
+  end
+  class Event
+    attr_accessor :metadata
+    def initialize(metadata, dimensions, measures)
+      @metadata = metadata
+      @dvalues = dimensions
+      @mvalues = measures
+      @key = @dvalues.keys.map{ |k| k + "=" + @dvalues.fetch(k) }.join(";")
+    end
+    def key
+      @key
+    end
+    def dimension(d)
+      @dvalues[d]
+    end
+    def measure(m)
+      @mvalues[m]
+    end
+    def dimensions
+      @metadata.dimensions
+    end
+    def measures
+      @metadata.measures
+    end
+    def line
+      f = []
+      f << @metadata.dimensions.map { |k| '%s' % dimension(k) }
+      f << @metadata.measures.map   { |k| '%.5f' % measure(k) }
+      puts f.inspect
+      f.join("\t")
+    end
+  end
+end

data/lib/queryparser.rb ADDED Viewed

@@ -0,0 +1,166 @@
+module Shades
+  ## queries are of the form:
+  ## <stat-type> <measure>[, [<stat-type>] <measure>]* by <dimension>[, <dimension>] order by <dimension|measure>[, <dimension|measure>]*
+  ## for example, to get the mean load1, and load5 measures by unique combination of host role and kernel version:
+  ##   mean load1, load5 by role, kernelversion
+  class QueryParser
+    def self.parse(qs)
+      parts = qs.scan(/\w+/)
+      tokens = []
+      t = BeginRollupToken.new
+      parts.each do |p|
+        t = t.emit(p)
+        tokens << t
+      end
+      rollups = rollups_pass(tokens)
+      categorizations = categorizations_pass(tokens)
+      sorting = sorting_pass(tokens)
+      Query.new(:categorizations => categorizations, :rollups => rollups, :sorting => sorting)
+    end
+    def self.rollups_pass(tokens)
+      stat = nil
+      r = []
+      tokens.each do |t|
+        if t.kind_of? StatTypeToken
+          stat = t.stat
+        elsif t.kind_of? MeasureRefToken
+          r << { :measure => t.text, :stat => stat }
+        end
+      end
+      r
+    end
+    def self.categorizations_pass(tokens)
+      d = []
+      tokens.each do |t|
+        if t.kind_of? DimensionRefToken
+          d << t.text
+        end
+      end
+      d
+    end
+    def self.sorting_pass(tokens)
+      s = []
+      tokens.each do |t|
+        if t.kind_of? SortKeyToken
+          s << { :key => t.text, :asc => true }
+        end
+      end
+      s
+    end
+  end
+  class Token
+    attr_accessor :text
+    def initialize(s)
+      @text = s
+    end
+    def to_s
+      @text
+    end
+  end
+  class BeginRollupToken < Token
+    def initialize
+      super("<begin>")
+    end
+    def emit(s)
+      StatTypeToken::parse(s)
+    end
+  end
+  class StatTypeToken < Token
+    attr_accessor :stat
+    def initialize(name, stat)
+      super(name)
+      @stat = stat
+    end
+    ## context free parsing of the next token to be called by the prior token
+    def self.parse(s)
+      stat = Stats::StatsType::get(s)
+      if stat.nil?
+        nil
+      else
+        StatTypeToken.new(s, stat)
+      end
+    end
+    ## given the next string, parse and return the next token
+    def emit(s)
+      # a measure must always follow a stat
+      MeasureRefToken::parse(s)
+    end
+  end
+  class MeasureRefToken < Token
+    def self.parse(s)
+      MeasureRefToken.new(s)
+    end
+    def emit(s)
+      if s.downcase.eql?("by")
+        BeginCategorizationToken::parse(s)
+      else
+        t = StatTypeToken::parse(s)
+        if !t.nil?
+          t
+        else
+          MeasureRefToken::parse(s)
+        end
+      end
+    end
+  end
+  class BeginCategorizationToken < Token
+    def self.parse(s)
+      BeginCategorizationToken.new(s)
+    end
+    def emit(s)
+      DimensionRefToken::parse(s)
+    end
+  end
+  class DimensionRefToken < Token
+    def self.parse(s)
+      DimensionRefToken.new(s)
+    end
+    def emit(s)
+      if s.downcase.eql?("order")
+        OrderToken::parse(s)
+      else
+        DimensionRefToken::parse(s)
+      end
+    end
+  end
+  class OrderToken < Token
+    def self.parse(s)
+      OrderToken.new(s)
+    end
+    def emit(s)
+      BeginSortingToken::parse(s)
+    end
+  end
+  class BeginSortingToken < Token
+    def self.parse(s)
+      BeginSortingToken.new(s)
+    end
+    def emit(s)
+      SortKeyToken::parse(s)
+    end
+  end
+  class SortKeyToken < Token
+    def self.parse(s)
+      SortKeyToken.new(s)
+    end
+    def emit(s)
+      SortKeyToken::parse(s)
+    end
+  end
+end

data/lib/shades.rb ADDED Viewed

@@ -0,0 +1,10 @@
+require 'model'
+require 'cube'
+require 'histo'
+require 'stats'
+require 'streamparser'
+require 'queryparser'
+require 'formatter'
+module Shades
+end

data/lib/stats.rb ADDED Viewed

@@ -0,0 +1,68 @@
+module Stats
+  SUM  = lambda { Sum.new(0.0)  }
+  MEAN = lambda { Mean.new(0.0) }
+  class StatsType
+    def self.get(name)
+      if name.eql?("sum")
+        SUM
+      elsif name.eql?("mean")
+        MEAN
+      end
+    end
+  end
+  class Sum < StatsType
+    def initialize(d)
+      @val = d
+    end
+    def add(d)
+      @val += d
+    end
+    def remove(d)
+      @val -= d
+    end
+    def merge(stat)
+      @val += stat.get_value
+    end
+    def get_value
+      @val
+    end
+  end
+  class Mean < StatsType
+    attr_accessor :sum
+    attr_accessor :n
+    def initialize(d)
+      @sum = 0.0
+      @n = 0.0
+    end
+    def add(d)
+      @sum += d
+      @n += 1.0
+    end
+    def remove(d)
+      @sum -= d
+      @n -= 1.0
+    end
+    def merge(stat)
+      @sum += stat.sum
+      @n += stat.n
+    end
+    def get_value
+      @sum / @n
+    end
+  end
+end

data/lib/streamparser.rb ADDED Viewed

@@ -0,0 +1,31 @@
+module Shades
+  # parse a stream of events with whitespace delimited fields preceeded by metadata headers
+  class StreamParser
+    def initialize(&receiver)
+      @dimensions = nil
+      @measures = nil
+      @receiver = receiver
+    end
+    def <<(line)
+      line.strip!
+      if !@metadata.nil?
+        event = @metadata.parse_event(line, /\s+/)
+        @receiver.call(event)
+      elsif line.start_with?("#")
+        parts = line.scan(/\w+/)
+        if parts[0].eql?("dimensions")
+          @dimensions = parts.drop(1)
+        elsif parts[0].eql?("measures")
+          @measures = parts.drop(1)
+        end
+        if !@dimensions.nil? && !@measures.nil?
+          @metadata = Shades::Metadata.new(@dimensions, @measures)
+        end
+      else
+        $stderr.puts "discarding line received before metadata"
+      end
+    end
+  end
+end

data/script/build-ctags ADDED Viewed

@@ -0,0 +1,31 @@
+#!/bin/sh
+#/ Usage: build-ctags [<options>]
+#/ Build tags file in the RAILS_ROOT directory. Any <options> provided are
+#/ passed to ctags.
+#/
+#/ This command requires exuberant ctags to be installed.
+set -e
+# show usage message
+if [ "$1" == "--help" ]; then
+    grep ^#/ <"$0" | cut -c4-
+    exit 2
+fi
+# change into RAILS_ROOT
+cd "$(dirname "$0")/.."
+# check that ctags version is correct
+ctags --version >/dev/null 2>&1 || {
+    echo "ctags not found or too old." 1>&2
+    exit 1
+}
+# Run ctags with a bunch of ruby related config
+RUBYALIAS='/.*alias(_method)?[[:space:]]+:([[:alnum:]_=!?]+),?[[:space:]]+:([[:alnum:]_=!]+)/\\2/f/'
+exec ctags -R \
+    --tag-relative=yes \
+    --totals=yes \
+    --extra=+f \
+    --fields=+iaS \
+    --regex-ruby="$RUBYALIAS" "$@"

data/shades.gemspec ADDED Viewed

@@ -0,0 +1,28 @@
+Gem::Specification.new do |s|
+  s.name = 'shades'
+  s.version = '0.11'
+  s.summary = "Get a new perspective on your data. In-memory data cubing of event data for Ruby."
+  s.description = <<-EOF
+    Shades computes data cubes for you from events composed of dimensions and measures.
+  EOF
+  s.files = `git ls-files`.split("\n")
+  s.require_path = 'lib'
+  s.require_paths = %w[lib]
+  s.executables = ["shades", "histo"]
+  s.add_development_dependency 'rake-compiler'
+  s.add_development_dependency 'rspec'
+  s.add_development_dependency 'rdoc'
+  s.has_rdoc = true
+  s.rdoc_options += ['--title', 'shades', '--line-numbers', '--inline-source', '--main', 'README.md']
+  s.extra_rdoc_files += ['README.md', *Dir['lib/**/*.rb']]
+  s.authors = ["Dietrich Featherston"]
+  s.email = "d@d2fn.com"
+  s.homepage = "https://github.com/d2fn/shades-rb"
+  s.rubyforge_project = "shades"
+  s.license = "MIT"
+end

data/transactions.txt ADDED Viewed

@@ -0,0 +1,5 @@
+# dimensions: timestamp transactionid customer item
+# measures: quantity amount
+1371958271 1 jack golfclubs  3 75.00
+1371937693 1 jack gin        2 40.00
+1371979661 2 jane jar        6  6.00

metadata ADDED Viewed

@@ -0,0 +1,123 @@
+--- !ruby/object:Gem::Specification
+name: shades
+version: !ruby/object:Gem::Version
+  version: '0.11'
+platform: ruby
+authors:
+- Dietrich Featherston
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2013-06-25 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rake-compiler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rdoc
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+description: ! '    Shades computes data cubes for you from events composed of dimensions
+  and measures.
+'
+email: d@d2fn.com
+executables:
+- shades
+- histo
+extensions: []
+extra_rdoc_files:
+- README.md
+- lib/cube.rb
+- lib/formatter.rb
+- lib/histo.rb
+- lib/model.rb
+- lib/queryparser.rb
+- lib/shades.rb
+- lib/stats.rb
+- lib/streamparser.rb
+files:
+- .gitignore
+- LICENSE
+- README.md
+- Rakefile
+- bin/histo
+- bin/shades
+- lib/cube.rb
+- lib/formatter.rb
+- lib/histo.rb
+- lib/model.rb
+- lib/queryparser.rb
+- lib/shades.rb
+- lib/stats.rb
+- lib/streamparser.rb
+- script/build-ctags
+- shades.gemspec
+- transactions.txt
+homepage: https://github.com/d2fn/shades-rb
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options:
+- --title
+- shades
+- --line-numbers
+- --inline-source
+- --main
+- README.md
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project: shades
+rubygems_version: 2.0.3
+signing_key:
+specification_version: 4
+summary: Get a new perspective on your data. In-memory data cubing of event data for
+  Ruby.
+test_files: []