RubyGems - cluster_eval - Versions diffs - 0.1.1 - Mend

cluster_eval 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +7 -0
data/.gitignore +14 -0
data/.travis.yml +24 -0
data/Gemfile +10 -0
data/README.md +74 -0
data/Rakefile +10 -0
data/bin/cluster_eval +52 -0
data/bin/console +14 -0
data/bin/setup +7 -0
data/cluster_eval.gemspec +35 -0
data/ext/ConfusionMatrix.cpp +191 -0
data/ext/ConfusionMatrix.hpp +56 -0
data/ext/clusteval.cpp +25 -0
data/ext/extconf.rb +9 -0
data/ext/prettyprint.hpp +445 -0
data/lib/cluster_eval.rb +6 -0
data/lib/cluster_eval/version.rb +3 -0
data/tests/test_confusion_matrix_small.rb +54 -0
data/tests/test_confusion_matrix_small_random.rb +54 -0
data/tests/test_helper.rb +17 -0
metadata +152 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 9de67d89a54d6c2bf313a057797e3b56c74f0c6c
+  data.tar.gz: 4d5c3e6d8a5765078a214c8a14234e21526d8659
+SHA512:
+  metadata.gz: d8b451b4f4f0e123ff50bc276e5870969ac067348ac36789a25f8269fff769b2dbd00fbc76ab908478c39865e509804a94b15ba88228e4c6bfd2800d1ff56c1c
+  data.tar.gz: 662a605c5267911780356e2f4727f424addd0971457797075458d787ead697f1e92b534bae5100318b050bd39cdf160b4bddb4a489121bcc2f07f03a0c7774ca

data/.gitignore ADDED Viewed

@@ -0,0 +1,14 @@
+*~
+*.o
+*.so
+Makefile
+ext/test.rb
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/

data/.travis.yml ADDED Viewed

@@ -0,0 +1,24 @@
+language: ruby
+compiler: gcc
+rvm:
+- 1.9.3
+- 2.0.0
+- 2.1.1
+- ruby-head
+before_install:
+- gem install thor
+- gem install rice
+- gem install bundler
+- gem install rake
+- gem install minitest
+- gem install minitest-reporters
+- sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
+- sudo apt-get update -qq
+- sudo apt-get install -qq g++-4.8
+- export CXX="g++-4.8"
+- export CC="gcc-4.8"
+- sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90
+install:
+- cd ext/ && ruby extconf.rb && make

data/Gemfile ADDED Viewed

@@ -0,0 +1,10 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in cluster_eval.gemspec
+gemspec
+gem "bundler", "~> 1.8"
+gem "rake", "~> 10.0"
+gem "minitest", "~> 5.4"
+gem "minitest-reporters", "~> 1.0"
+gem 'thor', '~> 0.19'

data/README.md ADDED Viewed

@@ -0,0 +1,74 @@
+# ClusterEval
+Evaluates clusterings of a dataset using a variety of scores.
+[![Build Status](https://travis-ci.org/sbonisso/cluster_eval.svg?branch=master)](https://travis-ci.org/sbonisso/cluster_eval)
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'cluster_eval'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install cluster_eval
+Note, you will need a version of g++ installed that is compatible with [C++11](https://gcc.gnu.org/projects/cxx0x.html).
+## Usage
+This can be used as a library, or through the installed command line program `cluster_eval`.
+```
+$ cluster_eval help eval
+Usage:
+  cluster_eval eval [options]
+Options:
+  -a, [--cluster-file-a=CLUSTER_FILE_A]  # cluster file A
+  -b, [--cluster-file-b=CLUSTER_FILE_B]  # cluster file B
+  -y, [--type=TYPE]                      # type of index to compute
+```
+The `type` argument specifies the index to compute either the [Rand index](http://en.wikipedia.org/wiki/Rand_index), [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index), [Fowlkes-Mallows index](http://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index), adjusted [Rand index](http://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index), or all.
+It can take on values of: [rand/jaccard/fm/adj_rand/all]
+Each cluster file must contain two columns of integers, the first column representing the sample ID, the second column the cluster ID. The sample IDs need not be sorted, but must contain all sample IDs from 0 to n samples.
+For example if we have two files clust_a.tab and clust_b.tab, we could run the following:
+```
+$ cluster_eval eval -a clust_a.tab -b clust_b.tab -y rand
+0.643
+$ cluster_eval eval -a clust_a.tab -b clust_b.tab -y jaccard
+0.286
+$ cluster_eval eval -a clust_a.tab -b clust_b.tab -y fm
+0.456
+$ cluster_eval eval -a clust_a.tab -b clust_b.tab -y adj_rand
+0.200
+$ cluster_eval eval -a clust_a.tab -b clust_b.tab -y all
+rand	jaccard	fm	adj_rand
+0.643	0.286	0.456	0.200
+```
+## Development
+After checking out the repo, run `bin/setup` to install dependencies. Then, run `bin/console` for an interactive prompt that will allow you to experiment.
+To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release` to create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+## Contributing
+1. Fork it ( https://github.com/[my-github-username]/cluster_eval/fork )
+2. Create your feature branch (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Add some feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create a new Pull Request

data/Rakefile ADDED Viewed

@@ -0,0 +1,10 @@
+require "bundler/gem_tasks"
+require 'rake'
+require 'rake/testtask'
+Rake::TestTask.new(:all) do |t|
+  t.pattern = "tests/test_*.rb"
+end
+task :default => :all

data/bin/cluster_eval ADDED Viewed

@@ -0,0 +1,52 @@
+#!/usr/bin/env ruby
+require 'thor'
+require 'cluster_eval'
+class CLUSTER_EVAL_BIN < Thor
+  method_option :cluster_file_a, :aliases => "-a", :desc => "cluster file A"
+  method_option :cluster_file_b, :aliases => "-b", :desc => "cluster file B"
+  method_option :type, :aliases => "-y",
+  :desc => "type of index to compute [rand/jaccard/fm/adj_rand/all]"
+  desc 'eval [options]', 'evaluate clusterings'
+  def eval()
+    clust_a_f = options["cluster_file_a"]
+    clust_b_f = options["cluster_file_b"]
+    index_type = options["type"]
+    clust_a_h = {}
+    IO.foreach(clust_a_f) do |line|
+      ary = line.chomp.split("\t")
+      clust_a_h[ary[0].to_i] = ary[1].to_i
+    end
+    clust_b_h = {}
+    IO.foreach(clust_b_f) do |line|
+      ary = line.chomp.split("\t")
+      clust_b_h[ary[0].to_i] = ary[1].to_i
+    end
+    cm = ClusterEval::ConfusionMatrix.new(clust_a_h, clust_b_h)
+    if index_type == "rand" then
+      puts '%0.3f' % cm.get_rand_index
+    elsif index_type == "jaccard" then
+      puts '%0.3f' % cm.get_jaccard_index
+    elsif index_type == "fm" then
+      puts '%0.3f' % cm.get_fm_index
+    elsif index_type == "adj_rand" then
+      puts '%0.3f' % cm.get_adj_rand_index
+    elsif index_type == "all" then
+      puts ['rand', 'jaccard', 'fm', 'adj_rand'].join("\t")
+      puts ['%0.3f' % cm.get_rand_index,
+            '%0.3f' % cm.get_jaccard_index,
+            '%0.3f' % cm.get_fm_index,
+            '%0.3f' % cm.get_adj_rand_index].join("\t")
+    else
+      raise 'invalid index type'
+    end
+  end
+end
+#
+CLUSTER_EVAL_BIN.start(ARGV)

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require "bundler/setup"
+require "cluster_eval"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start

data/bin/setup ADDED Viewed

@@ -0,0 +1,7 @@
+#!/bin/bash
+set -euo pipefail
+IFS=$'\n\t'
+bundle install
+# Do any other automated setup that you need to do here

data/cluster_eval.gemspec ADDED Viewed

@@ -0,0 +1,35 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'cluster_eval/version'
+Gem::Specification.new do |spec|
+  spec.name          = "cluster_eval"
+  spec.version       = ClusterEval::VERSION
+  spec.authors       = ["sbonisso"]
+  spec.email         = ["sbonisso@ucsd.edu"]
+  # if spec.respond_to?(:metadata)
+  #   spec.metadata['allowed_push_host'] = "TODO: Set to 'http://mygemserver.com' to prevent pushes to rubygems.org, or delete to allow pushes to any server."
+  # end
+  spec.summary       = %q{Evaluation of clusterings}
+  spec.description   = %q{Evaluate partitionings of different clustering approaches. Provides different metrics to use.}
+  spec.homepage      = "https://github.com/sbonisso/cluster_eval"
+  spec.license       = "MIT"
+  spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
+  spec.bindir        = "bin"
+  #spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  spec.executables = "cluster_eval"
+  spec.require_paths = ["lib", "ext"]
+  spec.extensions = ["ext/extconf.rb"]
+  spec.add_dependency "thor", '~> 0.19'
+  spec.add_dependency "rice", '~> 1.7'
+  spec.add_development_dependency "bundler", "~> 1.8"
+  spec.add_development_dependency "rake", "~> 10.0"
+  spec.add_development_dependency "minitest", "~> 5.4"
+  spec.add_development_dependency "minitest-reporters", "~> 1.0"
+end

data/ext/ConfusionMatrix.cpp ADDED Viewed

@@ -0,0 +1,191 @@
+#include "ConfusionMatrix.hpp"
+/**
+ *
+ */
+ConfusionMatrix::ConfusionMatrix() {
+    a_ = 0;
+    b_ = 0;
+    c_ = 0;
+    d_ = 0;
+}
+/**
+ *
+ */
+ConfusionMatrix::~ConfusionMatrix() {}
+/**
+ * each ruby hash are pairs of {read_id => clust_id}
+ */
+ConfusionMatrix::ConfusionMatrix(Rice::Hash c1_hsh, Rice::Hash c2_hsh) {
+    a_ = 0;
+    b_ = 0;
+    c_ = 0;
+    d_ = 0;
+    populate_vect(c1_hsh, clust_1_);
+    populate_vect(c2_hsh, clust_2_);
+    //
+    int max_clust_1 = 0;
+    for(int i = 0; i < (int)clust_1_.size(); i++) {
+	if(clust_1_[i] > max_clust_1) {
+	    max_clust_1 = clust_1_[i];
+	}
+    }
+    int max_clust_2 = 0;
+    for(int i = 0; i < (int)clust_2_.size(); i++) {
+	if(clust_2_[i] > max_clust_2) {
+	    max_clust_2 = clust_2_[i];
+	}
+    }
+    // set up clust_id => [id_1, id_2, ...]
+    clust_map_1_.resize(max_clust_1+1);
+    clust_map_2_.resize(max_clust_2+1);
+    //
+    for(int i = 0; i < (int)clust_1_.size(); i++) {
+    	clust_map_1_[clust_1_[i]].push_back(i);
+    }
+    for(int i = 0; i < (int)clust_2_.size(); i++) {
+    	clust_map_2_[clust_2_[i]].push_back(i);
+    }
+    //
+    compute_confusion_matrix();
+}
+/**
+ * given a ruby hash, populate a vector of read_id as index, cluster_id as value
+ */
+void ConfusionMatrix::populate_vect(Rice::Hash &hsh,
+				    std::vector<int> &clust_v) {
+    int len = hsh.size();
+    clust_v.resize(len);
+    //
+    Rice::Hash::iterator it = hsh.begin();
+    Rice::Hash::iterator it_end = hsh.end();
+    int i = 0;
+    for(; it != it_end; ++it) {
+	Rice::Hash::Entry en(*it);
+	int x = FIX2INT(en.key);
+	int y = FIX2INT(en.value.value());
+	clust_v[x] = y;
+	i++;
+    }
+}
+/**
+ *
+ */
+void ConfusionMatrix::print_matrix() {
+    std::cout<<a_<<"\t"<<b_<<"\t"<<c_<<"\t"<<d_<<std::endl;
+}
+/**
+ * return ruby array with contents of confusion matrix: [a, b, c, d]
+ */
+Rice::Array ConfusionMatrix::get_confusion_matrix() {
+    Rice::Array ary;
+    ary.push(a_);
+    ary.push(b_);
+    ary.push(c_);
+    ary.push(d_);
+    return ary;
+}
+/**
+ * compute confusion matrix entries a, b, c, and d
+ * for the matrix:
+ *    ---------
+ *    | a | b |
+ *    ---------
+ *    | c | d |
+ *    -------
+ */
+void ConfusionMatrix::compute_confusion_matrix() {
+    std::pair<int,int> v1 = int_and_diff(clust_1_, clust_map_2_);
+    std::pair<int,int> v2 = int_and_diff(clust_2_, clust_map_1_);
+    //
+    int n = (int)clust_1_.size();
+    a_ = v1.first;
+    c_ = v1.second;
+    b_ = v2.second;
+    d_ = ((n*(n-1))/2) - (a_+c_+b_);
+}
+/**
+ * return Rand index: (a+b)/(a+b+c+d)
+ */
+double ConfusionMatrix::get_rand_index() {
+    return (double)(a_+d_)/(double)(a_+b_+c_+d_);
+}
+/**
+ * return Jaccard index: a/(a+c+b)
+ */
+double ConfusionMatrix::get_jaccard_index() {
+    return (double)(a_)/(double)(a_+c_+b_);
+}
+/**
+ * returns Fowlkes-Mallows (FM) index: a/sqrt((a+c)*(a+b))
+ */
+double ConfusionMatrix::get_fm_index() {
+    double d1 = (double)(a_+c_);
+    double d2 = (double)(a_+b_);
+    return (double)a_/sqrt(d1*d2);
+}
+/**
+ * returns the adjusted Rand index using the confusion matrix
+ */
+double ConfusionMatrix::get_adj_rand_index() {
+    double total = a_+b_+c_+d_;
+    double n1 = ((b_+a_)*(c_+a_) + (b_+d_)*(c_+d_));
+    double num = (total*((double)(a_+d_))) - n1;
+    double denom = (total*total - n1);
+    return (num/denom);
+}
+/**
+ * get counts of: <intersection, difference> between the vector of
+ * cluster assignments (A) and cluster assignment groupings (B)
+ */
+std::pair<int,int> ConfusionMatrix::int_and_diff(std::vector<int> clust_v,
+						 std::vector<std::vector<int>> clust_map) {
+    //
+    int intsect = 0;
+    int diff = 0;
+    for(int i = 0; i < (int)clust_map.size(); i++) {
+	//
+	std::vector<int> vs = clust_map[i];
+	int len = (int) vs.size();
+	for(int i = 0; i < len; i++) {
+	    int val_i = vs[i];
+	    for(int j = 0; j < len; j++) {
+		if(i < j) {
+		    int val_j = vs[j];
+		    bool t1 = clust_v[val_i] == clust_v[val_j];
+		    if(t1) { intsect++; }
+		    else { diff++; }
+		}
+	    }
+	}
+    }
+    //
+    return std::pair<int,int>(intsect, diff);
+}
+/**
+ * compute confusion matrix in naive N**2 time, used for comparing
+ * (i.e., sanity check) testing results
+ */
+Rice::Array ConfusionMatrix::get_confusion_matrix_naive() {
+    int len = (int)clust_1_.size();
+    int a = 0; int b = 0; int c = 0; int d = 0;
+    for(int i = 0; i < len; i++) {
+	for(int j = 0; j < len; j++) {
+	    if(i >= j) { continue; }
+	    bool t1 = (clust_1_[i] == clust_1_[j]);
+	    bool t2 = (clust_2_[i] == clust_2_[j]);
+	    if(t1 && t2) { a += 1; }
+	    else if(!t1 && !t2) { d += 1; }
+	    else if(!t1 && t2) { c += 1; }
+	    else if(t1 && !t2) { b += 1; }
+	    else {}
+	}
+    }
+    //
+    Rice::Array ary;
+    ary.push(a);
+    ary.push(b);
+    ary.push(c);
+    ary.push(d);
+    return ary;
+}

data/ext/ConfusionMatrix.hpp ADDED Viewed

@@ -0,0 +1,56 @@
+#ifndef CONFUSION_MATRIX_HPP
+#define CONFUSION_MATRIX_HPP
+#include <iostream>
+#include <string>
+#include <vector>
+#include <utility>
+#include <cmath>
+#include "prettyprint.hpp"
+#ifndef CPPPROG
+#include "rice/String.hpp"
+#include "rice/Array.hpp"
+#include "rice/Hash.hpp"
+#endif
+class ConfusionMatrix {
+protected:
+    void compute_confusion_matrix();
+    int a_;
+    int b_;
+    int c_;
+    int d_;
+    //
+    std::vector<int> clust_1_;
+    std::vector<int> clust_2_;
+    //
+    std::vector<std::vector<int> > clust_map_1_;
+    std::vector<std::vector<int> > clust_map_2_;
+    // fill in clust_vect from hash (ruby hash)
+    void populate_vect(Rice::Hash &hsh, std::vector<int> &clust_v);
+    //
+    std::pair<int,int> int_and_diff(std::vector<int> clust_v,
+				    std::vector<std::vector<int>> clust_map);
+public:
+    ConfusionMatrix();
+    ConfusionMatrix(Rice::Hash c1_hsh, Rice::Hash c2_hsh);
+    virtual ~ConfusionMatrix();
+    void print_matrix();
+    Rice::Array get_confusion_matrix();
+    // for computing various indices
+    double get_rand_index();
+    double get_jaccard_index();
+    double get_fm_index();
+    double get_adj_rand_index();
+    // compute using naive N^2 for testing purposes
+    Rice::Array get_confusion_matrix_naive();
+};
+#endif

data/ext/clusteval.cpp ADDED Viewed

@@ -0,0 +1,25 @@
+#include "rice/Data_Type.hpp"
+#include "rice/Constructor.hpp"
+#include "rice/Hash.hpp"
+#include "ConfusionMatrix.hpp"
+using namespace Rice;
+extern "C"
+void Init_ClusterEval()
+{
+    Module rb_cModule = define_module("ClusterEval");
+    Data_Type<ConfusionMatrix> rb_cConfusionMatrix =
+	define_class_under<ConfusionMatrix>(rb_cModule,"ConfusionMatrix")
+	.define_constructor(Constructor<ConfusionMatrix>())
+	//.define_constructor(Constructor<ConfusionMatrix,std::string,std::string>())
+	.define_constructor(Constructor<ConfusionMatrix,Hash,Hash>())
+	.define_method("get_rand_index", &ConfusionMatrix::get_rand_index)
+	.define_method("get_jaccard_index", &ConfusionMatrix::get_jaccard_index)
+	.define_method("get_fm_index", &ConfusionMatrix::get_fm_index)
+	.define_method("get_adj_rand_index", &ConfusionMatrix::get_adj_rand_index)
+	.define_method("print_matrix", &ConfusionMatrix::print_matrix)
+	.define_method("get_confusion_matrix", &ConfusionMatrix::get_confusion_matrix)
+	.define_method("get_confusion_matrix_naive", &ConfusionMatrix::get_confusion_matrix_naive);
+}

data/ext/extconf.rb ADDED Viewed

@@ -0,0 +1,9 @@
+require 'mkmf-rice'
+$CPPFLAGS << ' -std=c++11'
+$CPPFLAGS << ' -Wno-deprecated'
+$warnflags = $warnflags.split("-Wimplicit-function-declaration").join(" ")
+$warnflags = $warnflags.split("-Wdeclaration-after-statement").join(" ")
+create_makefile('ClusterEval')

data/ext/prettyprint.hpp ADDED Viewed

@@ -0,0 +1,445 @@
+//          Copyright Louis Delacroix 2010 - 2014.
+// Distributed under the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE_1_0.txt or copy at
+//          http://www.boost.org/LICENSE_1_0.txt)
+//
+// A pretty printing library for C++
+//
+// Usage:
+// Include this header, and operator<< will "just work".
+#ifndef H_PRETTY_PRINT
+#define H_PRETTY_PRINT
+#include <cstddef>
+#include <iterator>
+#include <memory>
+#include <ostream>
+#include <set>
+#include <tuple>
+#include <type_traits>
+#include <unordered_set>
+#include <utility>
+#include <valarray>
+namespace pretty_print
+{
+    namespace detail
+    {
+        // SFINAE type trait to detect whether T::const_iterator exists.
+        struct sfinae_base
+        {
+            using yes = char;
+            using no  = yes[2];
+        };
+        template <typename T>
+        struct has_const_iterator : private sfinae_base
+        {
+        private:
+            template <typename C> static yes & test(typename C::const_iterator*);
+            template <typename C> static no  & test(...);
+        public:
+            static const bool value = sizeof(test<T>(nullptr)) == sizeof(yes);
+            using type =  T;
+        };
+        template <typename T>
+        struct has_begin_end : private sfinae_base
+        {
+        private:
+            template <typename C>
+            static yes & f(typename std::enable_if<
+                std::is_same<decltype(static_cast<typename C::const_iterator(C::*)() const>(&C::begin)),
+                             typename C::const_iterator(C::*)() const>::value>::type *);
+            template <typename C> static no & f(...);
+            template <typename C>
+            static yes & g(typename std::enable_if<
+                std::is_same<decltype(static_cast<typename C::const_iterator(C::*)() const>(&C::end)),
+                             typename C::const_iterator(C::*)() const>::value, void>::type*);
+            template <typename C> static no & g(...);
+        public:
+            static bool const beg_value = sizeof(f<T>(nullptr)) == sizeof(yes);
+            static bool const end_value = sizeof(g<T>(nullptr)) == sizeof(yes);
+        };
+    }  // namespace detail
+    // Holds the delimiter values for a specific character type
+    template <typename TChar>
+    struct delimiters_values
+    {
+        using char_type = TChar;
+        const char_type * prefix;
+        const char_type * delimiter;
+        const char_type * postfix;
+    };
+    // Defines the delimiter values for a specific container and character type
+    template <typename T, typename TChar>
+    struct delimiters
+    {
+        using type = delimiters_values<TChar>;
+        static const type values;
+    };
+    // Functor to print containers. You can use this directly if you want
+    // to specificy a non-default delimiters type. The printing logic can
+    // be customized by specializing the nested template.
+    template <typename T,
+              typename TChar = char,
+              typename TCharTraits = ::std::char_traits<TChar>,
+              typename TDelimiters = delimiters<T, TChar>>
+    struct print_container_helper
+    {
+        using delimiters_type = TDelimiters;
+        using ostream_type = std::basic_ostream<TChar, TCharTraits>;
+        template <typename U>
+        struct printer
+        {
+            static void print_body(const U & c, ostream_type & stream)
+            {
+                using std::begin;
+                using std::end;
+                auto it = begin(c);
+                const auto the_end = end(c);
+                if (it != the_end)
+                {
+                    for ( ; ; )
+                    {
+                        stream << *it;
+                    if (++it == the_end) break;
+                    if (delimiters_type::values.delimiter != NULL)
+                        stream << delimiters_type::values.delimiter;
+                    }
+                }
+            }
+        };
+        print_container_helper(const T & container)
+        : container_(container)
+        { }
+        inline void operator()(ostream_type & stream) const
+        {
+            if (delimiters_type::values.prefix != NULL)
+                stream << delimiters_type::values.prefix;
+            printer<T>::print_body(container_, stream);
+            if (delimiters_type::values.postfix != NULL)
+                stream << delimiters_type::values.postfix;
+        }
+    private:
+        const T & container_;
+    };
+    // Specialization for pairs
+    template <typename T, typename TChar, typename TCharTraits, typename TDelimiters>
+    template <typename T1, typename T2>
+    struct print_container_helper<T, TChar, TCharTraits, TDelimiters>::printer<std::pair<T1, T2>>
+    {
+        using ostream_type = print_container_helper<T, TChar, TCharTraits, TDelimiters>::ostream_type;
+        static void print_body(const std::pair<T1, T2> & c, ostream_type & stream)
+        {
+            stream << c.first;
+            if (print_container_helper<T, TChar, TCharTraits, TDelimiters>::delimiters_type::values.delimiter != NULL)
+                stream << print_container_helper<T, TChar, TCharTraits, TDelimiters>::delimiters_type::values.delimiter;
+            stream << c.second;
+        }
+    };
+    // Specialization for tuples
+    template <typename T, typename TChar, typename TCharTraits, typename TDelimiters>
+    template <typename ...Args>
+    struct print_container_helper<T, TChar, TCharTraits, TDelimiters>::printer<std::tuple<Args...>>
+    {
+        using ostream_type = print_container_helper<T, TChar, TCharTraits, TDelimiters>::ostream_type;
+        using element_type = std::tuple<Args...>;
+        template <std::size_t I> struct Int { };
+        static void print_body(const element_type & c, ostream_type & stream)
+        {
+            tuple_print(c, stream, Int<0>());
+        }
+        static void tuple_print(const element_type &, ostream_type &, Int<sizeof...(Args)>)
+        {
+        }
+        static void tuple_print(const element_type & c, ostream_type & stream,
+                                typename std::conditional<sizeof...(Args) != 0, Int<0>, std::nullptr_t>::type)
+        {
+            stream << std::get<0>(c);
+            tuple_print(c, stream, Int<1>());
+        }
+        template <std::size_t N>
+        static void tuple_print(const element_type & c, ostream_type & stream, Int<N>)
+        {
+            if (print_container_helper<T, TChar, TCharTraits, TDelimiters>::delimiters_type::values.delimiter != NULL)
+                stream << print_container_helper<T, TChar, TCharTraits, TDelimiters>::delimiters_type::values.delimiter;
+            stream << std::get<N>(c);
+            tuple_print(c, stream, Int<N + 1>());
+        }
+    };
+    // Prints a print_container_helper to the specified stream.
+    template<typename T, typename TChar, typename TCharTraits, typename TDelimiters>
+    inline std::basic_ostream<TChar, TCharTraits> & operator<<(
+        std::basic_ostream<TChar, TCharTraits> & stream,
+        const print_container_helper<T, TChar, TCharTraits, TDelimiters> & helper)
+    {
+        helper(stream);
+        return stream;
+    }
+    // Basic is_container template; specialize to derive from std::true_type for all desired container types
+    template <typename T>
+    struct is_container : public std::integral_constant<bool,
+                                                        detail::has_const_iterator<T>::value &&
+                                                        detail::has_begin_end<T>::beg_value  &&
+                                                        detail::has_begin_end<T>::end_value> { };
+    template <typename T, std::size_t N>
+    struct is_container<T[N]> : std::true_type { };
+    template <std::size_t N>
+    struct is_container<char[N]> : std::false_type { };
+    template <typename T>
+    struct is_container<std::valarray<T>> : std::true_type { };
+    template <typename T1, typename T2>
+    struct is_container<std::pair<T1, T2>> : std::true_type { };
+    template <typename ...Args>
+    struct is_container<std::tuple<Args...>> : std::true_type { };
+    // Default delimiters
+    template <typename T> struct delimiters<T, char> { static const delimiters_values<char> values; };
+    template <typename T> const delimiters_values<char> delimiters<T, char>::values = { "[", ", ", "]" };
+    template <typename T> struct delimiters<T, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T> const delimiters_values<wchar_t> delimiters<T, wchar_t>::values = { L"[", L", ", L"]" };
+    // Delimiters for (multi)set and unordered_(multi)set
+    template <typename T, typename TComp, typename TAllocator>
+    struct delimiters< ::std::set<T, TComp, TAllocator>, char> { static const delimiters_values<char> values; };
+    template <typename T, typename TComp, typename TAllocator>
+    const delimiters_values<char> delimiters< ::std::set<T, TComp, TAllocator>, char>::values = { "{", ", ", "}" };
+    template <typename T, typename TComp, typename TAllocator>
+    struct delimiters< ::std::set<T, TComp, TAllocator>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T, typename TComp, typename TAllocator>
+    const delimiters_values<wchar_t> delimiters< ::std::set<T, TComp, TAllocator>, wchar_t>::values = { L"{", L", ", L"}" };
+    template <typename T, typename TComp, typename TAllocator>
+    struct delimiters< ::std::multiset<T, TComp, TAllocator>, char> { static const delimiters_values<char> values; };
+    template <typename T, typename TComp, typename TAllocator>
+    const delimiters_values<char> delimiters< ::std::multiset<T, TComp, TAllocator>, char>::values = { "{", ", ", "}" };
+    template <typename T, typename TComp, typename TAllocator>
+    struct delimiters< ::std::multiset<T, TComp, TAllocator>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T, typename TComp, typename TAllocator>
+    const delimiters_values<wchar_t> delimiters< ::std::multiset<T, TComp, TAllocator>, wchar_t>::values = { L"{", L", ", L"}" };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    struct delimiters< ::std::unordered_set<T, THash, TEqual, TAllocator>, char> { static const delimiters_values<char> values; };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    const delimiters_values<char> delimiters< ::std::unordered_set<T, THash, TEqual, TAllocator>, char>::values = { "{", ", ", "}" };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    struct delimiters< ::std::unordered_set<T, THash, TEqual, TAllocator>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    const delimiters_values<wchar_t> delimiters< ::std::unordered_set<T, THash, TEqual, TAllocator>, wchar_t>::values = { L"{", L", ", L"}" };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    struct delimiters< ::std::unordered_multiset<T, THash, TEqual, TAllocator>, char> { static const delimiters_values<char> values; };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    const delimiters_values<char> delimiters< ::std::unordered_multiset<T, THash, TEqual, TAllocator>, char>::values = { "{", ", ", "}" };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    struct delimiters< ::std::unordered_multiset<T, THash, TEqual, TAllocator>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T, typename THash, typename TEqual, typename TAllocator>
+    const delimiters_values<wchar_t> delimiters< ::std::unordered_multiset<T, THash, TEqual, TAllocator>, wchar_t>::values = { L"{", L", ", L"}" };
+    // Delimiters for pair and tuple
+    template <typename T1, typename T2> struct delimiters<std::pair<T1, T2>, char> { static const delimiters_values<char> values; };
+    template <typename T1, typename T2> const delimiters_values<char> delimiters<std::pair<T1, T2>, char>::values = { "(", ", ", ")" };
+    template <typename T1, typename T2> struct delimiters< ::std::pair<T1, T2>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename T1, typename T2> const delimiters_values<wchar_t> delimiters< ::std::pair<T1, T2>, wchar_t>::values = { L"(", L", ", L")" };
+    template <typename ...Args> struct delimiters<std::tuple<Args...>, char> { static const delimiters_values<char> values; };
+    template <typename ...Args> const delimiters_values<char> delimiters<std::tuple<Args...>, char>::values = { "(", ", ", ")" };
+    template <typename ...Args> struct delimiters< ::std::tuple<Args...>, wchar_t> { static const delimiters_values<wchar_t> values; };
+    template <typename ...Args> const delimiters_values<wchar_t> delimiters< ::std::tuple<Args...>, wchar_t>::values = { L"(", L", ", L")" };
+    // Type-erasing helper class for easy use of custom delimiters.
+    // Requires TCharTraits = std::char_traits<TChar> and TChar = char or wchar_t, and MyDelims needs to be defined for TChar.
+    // Usage: "cout << pretty_print::custom_delims<MyDelims>(x)".
+    struct custom_delims_base
+    {
+        virtual ~custom_delims_base() { }
+        virtual std::ostream & stream(::std::ostream &) = 0;
+        virtual std::wostream & stream(::std::wostream &) = 0;
+    };
+    template <typename T, typename Delims>
+    struct custom_delims_wrapper : custom_delims_base
+    {
+        custom_delims_wrapper(const T & t_) : t(t_) { }
+        std::ostream & stream(std::ostream & s)
+        {
+            return s << print_container_helper<T, char, std::char_traits<char>, Delims>(t);
+        }
+        std::wostream & stream(std::wostream & s)
+        {
+            return s << print_container_helper<T, wchar_t, std::char_traits<wchar_t>, Delims>(t);
+        }
+    private:
+        const T & t;
+    };
+    template <typename Delims>
+    struct custom_delims
+    {
+        template <typename Container>
+        custom_delims(const Container & c) : base(new custom_delims_wrapper<Container, Delims>(c)) { }
+        std::unique_ptr<custom_delims_base> base;
+    };
+    template <typename TChar, typename TCharTraits, typename Delims>
+    inline std::basic_ostream<TChar, TCharTraits> & operator<<(std::basic_ostream<TChar, TCharTraits> & s, const custom_delims<Delims> & p)
+    {
+        return p.base->stream(s);
+    }
+    // A wrapper for a C-style array given as pointer-plus-size.
+    // Usage: std::cout << pretty_print_array(arr, n) << std::endl;
+    template<typename T>
+    struct array_wrapper_n
+    {
+        typedef const T * const_iterator;
+        typedef T value_type;
+        array_wrapper_n(const T * const a, size_t n) : _array(a), _n(n) { }
+        inline const_iterator begin() const { return _array; }
+        inline const_iterator end() const { return _array + _n; }
+    private:
+        const T * const _array;
+        size_t _n;
+    };
+    // A wrapper for hash-table based containers that offer local iterators to each bucket.
+    // Usage: std::cout << bucket_print(m, 4) << std::endl;  (Prints bucket 5 of container m.)
+    template <typename T>
+    struct bucket_print_wrapper
+    {
+        typedef typename T::const_local_iterator const_iterator;
+        typedef typename T::size_type size_type;
+        const_iterator begin() const
+        {
+            return m_map.cbegin(n);
+        }
+        const_iterator end() const
+        {
+            return m_map.cend(n);
+        }
+        bucket_print_wrapper(const T & m, size_type bucket) : m_map(m), n(bucket) { }
+    private:
+        const T & m_map;
+        const size_type n;
+    };
+}   // namespace pretty_print
+// Global accessor functions for the convenience wrappers
+template<typename T>
+inline pretty_print::array_wrapper_n<T> pretty_print_array(const T * const a, size_t n)
+{
+    return pretty_print::array_wrapper_n<T>(a, n);
+}
+template <typename T> pretty_print::bucket_print_wrapper<T>
+bucket_print(const T & m, typename T::size_type n)
+{
+    return pretty_print::bucket_print_wrapper<T>(m, n);
+}
+// Main magic entry point: An overload snuck into namespace std.
+// Can we do better?
+namespace std
+{
+    // Prints a container to the stream using default delimiters
+    template<typename T, typename TChar, typename TCharTraits>
+    inline typename enable_if< ::pretty_print::is_container<T>::value,
+                              basic_ostream<TChar, TCharTraits> &>::type
+    operator<<(basic_ostream<TChar, TCharTraits> & stream, const T & container)
+    {
+        return stream << ::pretty_print::print_container_helper<T, TChar, TCharTraits>(container);
+    }
+}
+#endif  // H_PRETTY_PRINT

data/lib/cluster_eval.rb ADDED Viewed

@@ -0,0 +1,6 @@
+require "cluster_eval/version"
+require 'ClusterEval'
+module ClusterEval
+end

data/lib/cluster_eval/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module ClusterEval
+  VERSION = "0.1.1"
+end

data/tests/test_confusion_matrix_small.rb ADDED Viewed

@@ -0,0 +1,54 @@
+require_relative 'test_helper'
+require 'cluster_eval'
+#
+# test tiny example
+#
+class TestConfusionMatrixSmall < MiniTest::Test
+  #
+  def setup
+    @h1= {0 => 0, 1 => 0, 2 => 1, 3 => 1, 4 => 1, 5 => 2, 6 => 2, 7 => 2}
+    @h2= {0 => 0, 1 => 1, 2 => 1, 3 => 1, 4 => 1, 5 => 1, 6 => 2, 7 => 2}
+  end
+  #
+  #
+  def test_small_confusion_matrix()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    cm_ary = cm.get_confusion_matrix
+    assert_equal(4, cm_ary[0], 'Incorrect num for a')
+    assert_equal(3, cm_ary[1], 'Incorrect num for b')
+    assert_equal(7, cm_ary[2], 'Incorrect num for c')
+    assert_equal(14, cm_ary[3], 'Incorrect num for d')
+    #
+    cm_naive = cm.get_confusion_matrix_naive
+    assert_equal(cm_ary, cm_naive, 'confusion matrices do not agree')
+  end
+  #
+  #
+  def test_small_jaccard()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    ji_val = cm.get_jaccard_index
+    assert_in_delta(0.28571, ji_val, 0.0001, 'incorrect Jaccard index')
+  end
+  #
+  #
+  def test_small_rand()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    rand_val = cm.get_rand_index
+    assert_in_delta(0.64285, rand_val, 0.0001, 'incorrect Rand index')
+  end
+  #
+  #
+  def test_small_fm()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    fm_val = cm.get_fm_index
+    assert_in_delta(0.45584, fm_val, 0.0001, 'incorrect FM index')
+  end
+  #
+  #
+  def test_small_rand_adj_rand()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    rand_val = cm.get_adj_rand_index
+    assert_in_delta(0.2, rand_val, 0.0001, 'incorrect adjusted Rand index')
+  end
+end

data/tests/test_confusion_matrix_small_random.rb ADDED Viewed

@@ -0,0 +1,54 @@
+require_relative 'test_helper'
+require 'cluster_eval'
+#
+# test two random (small) clusterings
+#
+class TestConfusionMatrixSmallRandom < MiniTest::Test
+  #
+  def setup
+    @h1 = {0 => 0, 1 => 1, 2 => 2, 3 => 0, 4 => 3, 5 => 4, 6 => 5, 7 => 1}
+    @h2 = {0 => 1, 1 => 1, 2 => 0, 3 => 0, 4 => 2, 5 => 2, 6 => 2, 7 => 2}
+  end
+  #
+  #
+  def test_small_rand_confusion_matrix()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    cm_ary = cm.get_confusion_matrix
+    assert_equal(0, cm_ary[0], 'Incorrect num for a')
+    assert_equal(2, cm_ary[1], 'Incorrect num for b')
+    assert_equal(8, cm_ary[2], 'Incorrect num for c')
+    assert_equal(18, cm_ary[3], 'Incorrect num for d')
+    #
+    cm_naive = cm.get_confusion_matrix_naive
+    assert_equal(cm_ary, cm_naive, 'confusion matrices do not agree')
+  end
+  #
+  #
+  def test_small_rand_jaccard()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    ji_val = cm.get_jaccard_index
+    assert_in_delta(0.000, ji_val, 0.0001, 'incorrect Jaccard index')
+  end
+  #
+  #
+  def test_small_rand_rand()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    rand_val = cm.get_rand_index
+    assert_in_delta(0.64285, rand_val, 0.0001, 'incorrect Rand index')
+  end
+  #
+  #
+  def test_small_rand_fm()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    fm_val = cm.get_fm_index
+    assert_in_delta(0.0000, fm_val, 0.0001, 'incorrect FM index')
+  end
+  #
+  #
+  def test_small_rand_adj_rand()
+    cm = ClusterEval::ConfusionMatrix.new(@h1,@h2)
+    rand_val = cm.get_adj_rand_index
+    assert_in_delta(-0.129032, rand_val, 0.0001, 'incorrect adjusted Rand index')
+  end
+end

data/tests/test_helper.rb ADDED Viewed

@@ -0,0 +1,17 @@
+# minitest stuff
+require 'rubygems'
+gem 'minitest'
+require 'minitest/autorun'
+# minitest-reporters
+require "minitest/reporters"
+#Minitest::Reporters.use!
+Minitest::Reporters.use! Minitest::Reporters::SpecReporter.new
+def get_data_dir()
+  "#{File.dirname(__FILE__)}/data/"
+end
+## if/when create a gem, remove this
+$LOAD_PATH.unshift("#{File.dirname(__FILE__)}/../lib/")
+$LOAD_PATH.unshift("#{File.dirname(__FILE__)}/../ext/")

metadata ADDED Viewed

@@ -0,0 +1,152 @@
+--- !ruby/object:Gem::Specification
+name: cluster_eval
+version: !ruby/object:Gem::Version
+  version: 0.1.1
+platform: ruby
+authors:
+- sbonisso
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-06-06 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: thor
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.19'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.19'
+- !ruby/object:Gem::Dependency
+  name: rice
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.8'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.8'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+- !ruby/object:Gem::Dependency
+  name: minitest
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.4'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.4'
+- !ruby/object:Gem::Dependency
+  name: minitest-reporters
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+description: Evaluate partitionings of different clustering approaches. Provides different
+  metrics to use.
+email:
+- sbonisso@ucsd.edu
+executables:
+- cluster_eval
+extensions:
+- ext/extconf.rb
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- ".travis.yml"
+- Gemfile
+- README.md
+- Rakefile
+- bin/cluster_eval
+- bin/console
+- bin/setup
+- cluster_eval.gemspec
+- ext/ConfusionMatrix.cpp
+- ext/ConfusionMatrix.hpp
+- ext/clusteval.cpp
+- ext/extconf.rb
+- ext/prettyprint.hpp
+- lib/cluster_eval.rb
+- lib/cluster_eval/version.rb
+- tests/test_confusion_matrix_small.rb
+- tests/test_confusion_matrix_small_random.rb
+- tests/test_helper.rb
+homepage: https://github.com/sbonisso/cluster_eval
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+- ext
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.6
+signing_key:
+specification_version: 4
+summary: Evaluation of clusterings
+test_files: []
+has_rdoc: