cabalist 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (38) hide show
  1. data/.gitignore +4 -0
  2. data/.travis.yml +4 -0
  3. data/Gemfile +4 -0
  4. data/LICENSE +0 -0
  5. data/README.md +121 -0
  6. data/Rakefile +17 -0
  7. data/cabalist.gemspec +42 -0
  8. data/lib/cabalist.rb +13 -2
  9. data/lib/cabalist/configuration.rb +20 -0
  10. data/lib/cabalist/frontend.rb +111 -0
  11. data/lib/cabalist/model_additions.rb +127 -0
  12. data/lib/cabalist/railtie.rb +14 -0
  13. data/lib/cabalist/version.rb +3 -0
  14. data/lib/cabalist/views/classifier.haml +82 -0
  15. data/lib/cabalist/views/index.haml +21 -0
  16. data/lib/cabalist/views/layout.haml +39 -0
  17. data/lib/generators/cabalist/classifier/USAGE +0 -0
  18. data/lib/generators/cabalist/classifier/classifier_generator.rb +37 -0
  19. data/lib/generators/cabalist/classifier/templates/migrations/add_cabalist.rb.erb +11 -0
  20. data/lib/generators/cabalist/install/USAGE +0 -0
  21. data/lib/generators/cabalist/install/install_generator.rb +26 -0
  22. data/lib/generators/cabalist/install/templates/initializers/cabalist.rb +4 -0
  23. data/lib/generators/cabalist/install/templates/public/images/eye_12x9.png +0 -0
  24. data/lib/generators/cabalist/install/templates/public/images/logo.png +0 -0
  25. data/lib/generators/cabalist/install/templates/public/images/ncirl.png +0 -0
  26. data/lib/generators/cabalist/install/templates/public/images/pen_12x12.png +0 -0
  27. data/lib/generators/cabalist/install/templates/public/images/symbol.png +0 -0
  28. data/lib/generators/cabalist/install/templates/public/images/target_12x12.png +0 -0
  29. data/lib/generators/cabalist/install/templates/public/images/x_14x14.png +0 -0
  30. data/lib/generators/cabalist/install/templates/public/javascripts/cabalist.js +23 -0
  31. data/lib/generators/cabalist/install/templates/public/stylesheets/cabalist.css +197 -0
  32. data/spec/cabalist/frontend_spec.rb +79 -0
  33. data/spec/cabalist/model_additions_spec.rb +104 -0
  34. data/spec/cabalist/performance_benchmark_spec.rb +76 -0
  35. data/spec/spec_helper.rb +30 -0
  36. metadata +164 -12
  37. data/lib/cabalist/manager.rb +0 -30
  38. data/lib/cabalist/object_hooks.rb +0 -59
data/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/.travis.yml ADDED
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - 1.9.3
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in cabalist.gemspec
4
+ gemspec
data/LICENSE ADDED
File without changes
data/README.md ADDED
@@ -0,0 +1,121 @@
1
+ Introduction to Cabalist [![Build Status](https://secure.travis-ci.org/marcinwyszynski/cabalist.png?branch=master)](http://travis-ci.org/marcinwyszynski/cabalist)
2
+ ------------------------
3
+
4
+ Cabalist is conceived as a simple way of adding some smarts (machine learning capabilities) to your Ruby on Rails models without having to dig deep into mind-boggling AI algorithms. Using it is meant to be as straightforward as adding a few lines to your existing code and running a Rails generator or two.
5
+
6
+ Installation
7
+ ------------
8
+
9
+ First and foremost add Cabalist to your Gemfile as dependency:
10
+
11
+ ```ruby
12
+ gem 'cabalist'
13
+ ```
14
+
15
+ ... and run Bundler command (bundle) to install it and it's own dependencies. Once that is sorted out you will want to run an installer - a generator that comes with Cabalist:
16
+
17
+ ```bash
18
+ $ rails g cabalist:install
19
+ ```
20
+
21
+ Running this command will create a new initializer (cabalist.rb) in your config/initializers directory, copy gem assets to the public folder and add a route to the GUI. GUI is just one way of interacting with Cabalist so if you do not need it, you may simply remove the route and use Cabalist capabilities through the API it provides.
22
+
23
+ Basic setup
24
+ -----------
25
+
26
+ In order to add Cabalist capabilities mechanism to your model, you will need to specify which attributes should be used as features (predictors) by a machine learning algorithm and what the class variable for the model should be - that is which attribute Cabalist try to infer. You would do it like this:
27
+
28
+ ```ruby
29
+ class Cat < ActiveRecord::Base
30
+ # attributes: name, color, gender, good
31
+ acts_as_cabalist :features => [:color, :gender],
32
+ :class_variable => :good
33
+ end
34
+ ```
35
+
36
+ Before you can use the power of AI, you will have to run another generator that will generate a migration adding a timestamp field to this model - autoclassified_at. This is used to distinguish records that you have clasified yourself from those classified by the algorithm. Later you can use this distinction to validate your model - see if it performs to your satisfaction. You run the generator like this:
37
+
38
+ ```bash
39
+ $ rails g cabalist:classifier <Class>
40
+ ```
41
+
42
+ ...where <Class> should be the name of the class you want to enable Cabalist for. This very generator will also ask whether you want this classifier to also be accessible through the GUI. If you don't use GUI this won't bother you at all. If you do you can still choose what it will expose to the user. It can always be manipulated directly through config/initializers/cabalist.rb file. The attribute of the configuration you should be looking for is called frontend_classes.
43
+
44
+ Using Cabalist
45
+ --------------
46
+
47
+ Depending on the amount of data to crunch, creating a prediction model may well take a while. The good thing is that this happens only once for each model as once computed, the model is stored in LevelDB (local key-value store).
48
+
49
+ Now that the Cabalist has set up it's shop, all Cabalist-enabled models gain access to two methods - classify and classify!. The first method will infer the value of the attribute designated as a class variable:
50
+
51
+ ```ruby
52
+ cat = Cat::new(:name => 'Filemon', :color => 'white', :gender => 'M')
53
+ cat.classify
54
+ => 'y' # ... which means Cabalist thinks this cat is good
55
+ ```
56
+
57
+ The latter method will set the class variable field to the predicted value and return object instance (self).
58
+
59
+ ```ruby
60
+ cat = Cat::new(:name => 'Filemon', :color => 'white', :gender => 'M')
61
+ cat.classify!
62
+ => <Cat:0x00000101433f58 @name="Filemon", @color="white", @gender="M", @good="y">
63
+ ```
64
+
65
+ Defaults explained
66
+ ------------------
67
+
68
+ By default, the collection Cablist is going to load in order to build a prediction data set is the result of 'manually_classified' scope of the Cabalist-enabled class. This scope is provided for you by Cabalist and looks at your class variable name (whether it is set) and at the autoclassified_at attribute (whethet it is nil). Still, you may want to have a very different idea what data should be used to train your model and you can create an appropriate class method to gather it. You pass that method name as a symbol to :collection option of the act_as_cabalist method. Like so:
69
+
70
+ ```ruby
71
+ class Cat < ActiveRecord::Base
72
+ # attributes: name, color, gender, good
73
+ acts_as_cabalist :features => [:color, :gender],
74
+ :class_variable => :good,
75
+ :collection => :cats_i_care_about
76
+ end
77
+ ```
78
+
79
+ The other thing you can change is the algorithm used for generating predictions. By default Cabalist uses [ID3](http://en.wikipedia.org/wiki/ID3_algorithm), a decision tree learning algorithm - a rather arbitrary choice, mind you. You can easily change it to any of the following algorithms:
80
+ - :hyperpipes for [Hyperpipes](http://code.google.com/p/ourmine/wiki/HyperPipes)
81
+ - :ib1 for [Simple Instance Based Learning](http://en.wikipedia.org/wiki/Instance-based_learning)
82
+ - :id3 for [Iterative Dichotomiser 3](http://en.wikipedia.org/wiki/ID3_algorithm)
83
+ - :one_r for [One Attribute Rule](http://www.soc.napier.ac.uk/~peter/vldb/dm/node8.html)
84
+ - :prism for [PRISM](http://www.sciencedirect.com/science/article/pii/S0020737387800032)
85
+ - :zero_r for [ZeroR](http://chem-eng.utoronto.ca/~datamining/dmc/zeror.htm)
86
+
87
+ All algorithms come from an excellent [ai4r](https://github.com/SergioFierens/ai4r) gem. You can choose a specific algorithm to use by your Cabalist model by passing one of the options mentioned above to the :algorithm option like so:
88
+
89
+ ```ruby
90
+ class Cat < ActiveRecord::Base
91
+ # attributes: name, color, gender, good
92
+ acts_as_cabalist :features => [:color, :gender],
93
+ :class_variable => :good,
94
+ :algorithm => :prism
95
+ end
96
+ ```
97
+
98
+ You can use different algorithms in different models and I would encourage you to give each one a go - perhaps except for ZeroR which is only really good for benchmarking (all it does is return the most popular result of a class variable).
99
+
100
+ Helping your Cabalist
101
+ ---------------------
102
+
103
+ So far we've used raw data, derived directly from attributes in your model. You may want to pre-process your data before you pass it to Cabalist. Please remember that you know your domain best and the more smarts you put into AI, the more smarts it will throw back at you. So let's imagine that instead of passing the color attribute directly, we may want to have a method which will tell us whether the color is light or dark - presumably this has something to do with a cat's character:
104
+
105
+ ```ruby
106
+ class Cat < ActiveRecord::Base
107
+ # attributes: name, color, gender, good
108
+ acts_as_cabalist( :features => [:light_or_dark, :gender],
109
+ :class_variable => :good )
110
+
111
+ def light_or_dark
112
+ if %w(white yellow orange grey).include?(color)
113
+ 'light'
114
+ else
115
+ 'dark'
116
+ end
117
+ end
118
+ end
119
+ ```
120
+
121
+ Ok, so much for now. Happy categorizing :)
data/Rakefile ADDED
@@ -0,0 +1,17 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ desc 'Default: run specs.'
5
+ task :default => :spec
6
+
7
+ desc "Run specs"
8
+ RSpec::Core::RakeTask.new do |t|
9
+ t.pattern = "./spec/**/*_spec.rb"
10
+ end
11
+
12
+ desc "Generate code coverage"
13
+ RSpec::Core::RakeTask.new(:coverage) do |t|
14
+ t.pattern = "./spec/**/*_spec.rb"
15
+ t.rcov = true
16
+ t.rcov_opts = ['--exclude', 'spec']
17
+ end
data/cabalist.gemspec ADDED
@@ -0,0 +1,42 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "cabalist/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "cabalist"
7
+ s.version = Cabalist::VERSION
8
+ s.authors = ["Marcin Wyszynski"]
9
+ s.email = ["marcin.pixie@gmail.com"]
10
+ s.homepage = "http://github.com/marcinwyszynski/cabalist"
11
+ s.summary = %q{Minimum setup machine learning (classification) library for Ruby on Rails applications.}
12
+ s.description = <<-EOF
13
+ Cabalist is conceived as a simple way of adding some smarts
14
+ (machine learning capabilities) to your Ruby on Rails models
15
+ without having to dig deep into mind-boggling AI algorithms.
16
+ Using it is meant to be as straightforward as adding a few
17
+ lines to your existing code and running a Rails generator or two.
18
+ EOF
19
+
20
+ s.rubyforge_project = "cabalist"
21
+
22
+ s.files = `git ls-files`.split("\n")
23
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
24
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
25
+ s.require_paths = ["lib"]
26
+
27
+ # Gem dependencies
28
+ s.add_dependency('ai4r')
29
+ s.add_dependency('haml', '>= 3.0')
30
+ s.add_dependency('kaminari', '>= 0.13.0')
31
+ s.add_dependency('leveldb-ruby')
32
+ s.add_dependency('padrino-helpers')
33
+ s.add_dependency('rake')
34
+ s.add_dependency('sinatra')
35
+
36
+ # Gem development dependencies
37
+ s.add_development_dependency('activerecord')
38
+ s.add_development_dependency('rspec')
39
+ s.add_development_dependency('sqlite3')
40
+ s.add_development_dependency('with_model')
41
+
42
+ end
data/lib/cabalist.rb CHANGED
@@ -1,2 +1,13 @@
1
- require File.dirname(__FILE__) + '/cabalist/manager'
2
- require File.dirname(__FILE__) + '/cabalist/object_hooks'
1
+ require "cabalist/version"
2
+ require "cabalist/configuration"
3
+ require "cabalist/frontend"
4
+ require "cabalist/model_additions"
5
+ require "cabalist/railtie" if defined? Rails
6
+
7
+ module Cabalist
8
+
9
+ def self.config
10
+ yield Cabalist::Configuration.instance
11
+ end
12
+
13
+ end
@@ -0,0 +1,20 @@
1
+ require 'leveldb'
2
+ require 'singleton'
3
+
4
+ module Cabalist
5
+ class Configuration
6
+
7
+ include Singleton
8
+
9
+ attr_accessor :db_path, :frontend_classes
10
+
11
+ def initialize
12
+ self.frontend_classes = []
13
+ end
14
+
15
+ def database
16
+ LevelDB::DB::new(db_path)
17
+ end
18
+
19
+ end
20
+ end
@@ -0,0 +1,111 @@
1
+ require 'haml'
2
+ require 'kaminari/sinatra'
3
+ require 'padrino-helpers'
4
+ require 'sinatra/base'
5
+
6
+ module Cabalist
7
+ class Frontend < Sinatra::Base
8
+
9
+ PER_PAGE = 25
10
+
11
+ before do
12
+ @classes = Cabalist::Configuration.instance.frontend_classes
13
+ @app_name = begin
14
+ ::Rails.root.to_s.split('/').last.humanize.titlecase
15
+ rescue
16
+ 'Test Rails Application'
17
+ end
18
+ end
19
+
20
+ # Index page, shows a dashboard and links to classifiers.
21
+ get '/' do
22
+ haml :index
23
+ end
24
+
25
+ # List all (classified and non-classified) records for <class_name>.
26
+ get "/:class_name/all/?:page?" do
27
+ page = params[:page].to_i < 1 ? 1 : params[:page].to_i
28
+ klass = params[:class_name].titleize.constantize
29
+ @collection = klass::page(page).per(PER_PAGE)
30
+ haml :classifier,
31
+ :locals => { :klass => klass,
32
+ :page => page,
33
+ :scope => 'all',
34
+ :total => klass.count }
35
+ end
36
+
37
+ # List non-classified records for <class_name>.
38
+ get "/:class_name/none/?:page?" do
39
+ klass = params[:class_name].titleize.constantize
40
+ page = params[:page].to_i < 1 ? 1 : params[:page].to_i
41
+ @collection = klass::not_classified \
42
+ .page(page).per(PER_PAGE)
43
+ haml :classifier,
44
+ :locals => { :klass => klass,
45
+ :page => page,
46
+ :scope => 'none',
47
+ :total => klass::not_classified.count }
48
+ end
49
+
50
+ # List manually classified records for <class_name>
51
+ get "/:class_name/manual/?:page?" do
52
+ klass = params[:class_name].titleize.constantize
53
+ page = params[:page].to_i < 1 ? 1 : params[:page].to_i
54
+ @collection = klass::manually_classified \
55
+ .page(page).per(PER_PAGE)
56
+ haml :classifier,
57
+ :locals => { :klass => klass,
58
+ :page => page,
59
+ :scope => 'manual',
60
+ :total => klass::manually_classified.count }
61
+ end
62
+
63
+ # List automatically classified records for <class_name>
64
+ get "/:class_name/auto/?:page?" do
65
+ klass = params[:class_name].titleize.constantize
66
+ page = params[:page].to_i < 1 ? 1 : params[:page].to_i
67
+ @collection = klass::auto_classified. \
68
+ page(page).per(PER_PAGE)
69
+ haml :classifier,
70
+ :locals => { :klass => klass,
71
+ :page => page,
72
+ :scope => 'auto',
73
+ :total => klass::auto_classified.count }
74
+ end
75
+
76
+ # Set the value of class variable for an object
77
+ # of <class_name> with ID <id>
78
+ post "/:class_name/teach/:id" do
79
+ klass = params[:class_name].titleize.constantize
80
+ obj = klass::find(params[:id])
81
+ if params[:classification_freeform].empty?
82
+ new_class = params[:classification]
83
+ else
84
+ new_class = params[:classification_freeform]
85
+ end
86
+ obj.teach(new_class)
87
+ if obj.save
88
+ redirect back
89
+ else
90
+ 'failure'
91
+ end
92
+ end
93
+
94
+ # Automatically classify the object of class
95
+ # <class_name> with ID <id>
96
+ post "/:class_name/autoclassify/:id" do
97
+ klass = params[:class_name].titleize.constantize
98
+ obj = klass::find(params[:id])
99
+ obj.save if obj.classify!
100
+ redirect back
101
+ end
102
+
103
+ # Rebuild the model used for classification
104
+ post "/:class_name/retrain" do
105
+ klass = params[:class_name].titleize.constantize
106
+ klass::train_model
107
+ redirect back
108
+ end
109
+
110
+ end
111
+ end
@@ -0,0 +1,127 @@
1
+ require 'ai4r'
2
+
3
+ module Cabalist
4
+ module ModelAdditions
5
+
6
+ def acts_as_cabalist(options = {})
7
+
8
+ # Make sure that all required options are set
9
+ raise 'No features specified' \
10
+ unless options.has_key?(:features)
11
+ raise 'Expecting an Array of features' \
12
+ unless options[:features].instance_of?(Array)
13
+ raise 'No class variable specified' \
14
+ unless options.has_key?(:class_variable)
15
+
16
+ # Set some sensible defaults for other options, if required
17
+ collection = options[:collection] || :manually_classified
18
+ algorithm = options[:algorithm] || :id3
19
+
20
+ # Select an algorithm for the classifier
21
+ classifier = case algorithm
22
+ when :hyperpipes then ::Ai4r::Classifiers::Hyperpipes
23
+ when :ib1 then ::Ai4r::Classifiers::IB1
24
+ when :id3 then ::Ai4r::Classifiers::ID3
25
+ when :one_r then ::Ai4r::Classifiers::OneR
26
+ when :prism then ::Ai4r::Classifiers::Prism
27
+ when :zero_r then ::Ai4r::Classifiers::ZeroR
28
+ else raise 'Unknown algorithm provided'
29
+ end
30
+
31
+ # Create scopes
32
+ scope :manually_classified,
33
+ where("autoclassified_at IS NULL AND %s IS NOT NULL" %
34
+ options[:class_variable])
35
+ scope :auto_classified,
36
+ where("autoclassified_at IS NOT NULL AND %s IS NOT NULL" %
37
+ options[:class_variable])
38
+ scope :not_classified,
39
+ where("autoclassified_at IS NULL AND %s IS NULL" %
40
+ options[:class_variable])
41
+
42
+ # Return object as an Array of features
43
+ send(:define_method, :get_features, lambda {
44
+ options[:features].map { |f| self.send(f) }
45
+ })
46
+
47
+ # Return the value of a class variable
48
+ send(:define_method, :get_class_variable, lambda {
49
+ self.send(options[:class_variable])
50
+ })
51
+
52
+ # Set the value of the class variable
53
+ send(:define_method, :set_class_variable, lambda { |c|
54
+ self.send("#{options[:class_variable]}=".to_sym, c) or self
55
+ })
56
+
57
+ # Return an Array of feature names (attributes/methods)
58
+ send(:define_singleton_method, :get_feature_names, lambda {
59
+ options[:features]
60
+ })
61
+
62
+ # Return the name of a class variable
63
+ send(:define_singleton_method, :get_class_variable_name, lambda {
64
+ options[:class_variable]
65
+ })
66
+
67
+ # Build a prediction model from scratch
68
+ send(:define_singleton_method, :build_model, lambda {
69
+ classifier::new.build(
70
+ Ai4r::Data::DataSet::new({
71
+ :data_items => send(collection).map do |el|
72
+ el.get_features.push(el.get_class_variable)
73
+ end,
74
+ :data_labels => get_feature_names + [get_class_variable_name]
75
+ })
76
+ )
77
+ })
78
+
79
+ # Build a prediction model and store it in the LevelDB
80
+ send(:define_singleton_method, :train_model, lambda {
81
+ _model = build_model
82
+ Cabalist::Configuration.instance.database.put(name,
83
+ Marshal::dump(_model))
84
+ return _model
85
+ })
86
+
87
+ # Return prediction model for the class
88
+ send(:define_singleton_method, :classifier, lambda {
89
+ _stored = Cabalist::Configuration.instance.database.get(self.name)
90
+ return _stored ? Marshal.load(_stored) : train_model
91
+ })
92
+
93
+ # Show possible values for the classification.
94
+ define_singleton_method(
95
+ :class_variable_domain,
96
+ lambda { self.classifier.data_set.build_domain(-1).to_a }
97
+ )
98
+
99
+ # Create a 'classify' method which will provide a classification
100
+ # for any new object.
101
+ send(:define_method, :classify, lambda {
102
+ begin
103
+ self.class::classifier.eval(get_features)
104
+ rescue
105
+ nil
106
+ end
107
+ })
108
+
109
+ # Create a 'classify!' method which will get a classification
110
+ # for any new object and apply it to the current instance.
111
+ send(:define_method, :classify!, lambda {
112
+ set_class_variable(classify)
113
+ self.autoclassified_at = DateTime::now
114
+ })
115
+
116
+ # Create a 'teach' method which will manually set the classificaiton
117
+ # and set the autoclassification timestamp to nil so that the new entry
118
+ # can be treated as basis for learning.
119
+ send(:define_method, :teach, lambda { |new_class|
120
+ set_class_variable(new_class)
121
+ self.autoclassified_at = nil
122
+ })
123
+
124
+ end
125
+
126
+ end
127
+ end