suggestor 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1 @@
1
+ .DS_Store
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in suggestor.gemspec
4
+ gemspec
@@ -0,0 +1,64 @@
1
+ # Suggestor
2
+ ## Recommendations gem
3
+
4
+ Suggestor is a gem that will help you relating data. For example, given a User, Movie and Review class,
5
+ being the Review related to the User and and Movie and having a rating attribute, the gem would use those
6
+ information to correlate the information, and give results, like related Movies, Similar Users (based on their
7
+ tastes) and alike.
8
+
9
+ ## Usage
10
+
11
+ The gem needs an structure of date like this:
12
+
13
+ data = {"1": {"10": 10, "12": 1}, "2": {"11":5, "12": 4}}
14
+
15
+ Each element will ("1" or "2") correspond to, following the example, to user ids. They will gave access to related items (movies).
16
+
17
+ In the example, the user "1" has seen movies identified with ids "10" and "12", given them a rating of 10 and 1, respectively. Similar with user with id "2".
18
+
19
+ After loading the gem with the data:
20
+
21
+ engine = Suggestor::Engine.new
22
+ engine.load_data(data)
23
+
24
+ We can start to get some results.
25
+
26
+
27
+ ### Similar items
28
+
29
+ For example, we can get similar users:
30
+
31
+ engine.similar_items_to("1")
32
+
33
+ Which will return an structure like
34
+
35
+ {id: similarity_score, id2: similarity_score }
36
+
37
+ Thus, you can load the data and save their similarity scores for later use.
38
+
39
+ Now, that fine and all, but what about Mr. Bob who always is ranking everything
40
+ higher. ID4 maybe is not that good after all. If that happens, Suggestor allows you to change the algorithm used:
41
+
42
+ engine.similar_items_to("1", :algorithm => :pearson_correlation)
43
+
44
+ There are two implemented methods, Euclidean Distance and Pearson Correlation.
45
+
46
+ Use Euclidean Distance (default) to compare items and get suggestions base on
47
+ actions that are normalized or not subjective (like user points earned by actions on a web site).
48
+
49
+ Use Pearson Correlation is there's some bias on the data. The algorithm will
50
+ take in mind if some user grades higher or lower and return more exact suggestions than Euclidean on that area.
51
+
52
+ ### Suggested items
53
+
54
+ Most interestingly, the gem allows you to get suggestions base on the data.
55
+ For example, which movies shoud user "2" watch based on his reviews, and similar other users tastes?
56
+
57
+ engine.recommented_related_items_for("2",:pearson_correlation)
58
+
59
+ As before, the structure returned will be
60
+
61
+ {id: similarity_score, id2: similarity_score }
62
+
63
+ But in this case, it will represent movie id's, and how similar are. You
64
+ can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
@@ -0,0 +1,2 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
@@ -0,0 +1,16 @@
1
+ require_relative '../lib/suggestor'
2
+
3
+ engine = Suggestor::Engine.new
4
+
5
+ # I'm using test data of Users and their movie recommendations
6
+ # Each user (identified by their ids) have a hash of their movies ids and
7
+ # what they've rate them with
8
+ json = File.read("test/test.json")
9
+
10
+ engine.load_data(json)
11
+
12
+ # Let's get some similar users
13
+ puts engine.similar_items_to("2").inspect
14
+
15
+ # So, after knowing them, why not having some recommendations?
16
+ puts engine.recommented_related_items_for("2", algorithm: :euclidean_distance)
@@ -0,0 +1,3 @@
1
+ require_relative 'suggestor/engine'
2
+
3
+
@@ -0,0 +1,44 @@
1
+ require_relative 'recommendation_algorithm'
2
+
3
+ module Suggestor
4
+ module Algorithms
5
+
6
+ # The euclidean distance will compare two structures of
7
+ # data, and map them on a chart, each their related element
8
+ # on each axis.
9
+
10
+ # For example, if we are dealing with user and movies, related
11
+ # by user reviews of each movie, each couple of shared movie
12
+ # rating will be used as the axis (LOTR on one, and The Matrix on other)
13
+
14
+ # The user ratings will be used to position them on the chart. Thus,
15
+ # if a user review LOTR as 1 and The Matrix with 5, it will position it
16
+ # on [1,5].
17
+
18
+ # The closest they are, the more similar their tastes are.
19
+ # More info at:
20
+ # http://en.wikipedia.org/wiki/Euclidean_metric
21
+ # http://en.wikipedia.org/wiki/Distance_correlation
22
+
23
+ class EuclideanDistance
24
+
25
+ include RecommendationAlgorithm
26
+
27
+ def similarity_score_between(first, second)
28
+ return 0.0 if no_shared_items_between?(first, second)
29
+ inverse_of_sum_of_squares_between(first, second)
30
+ end
31
+
32
+ def inverse_of_sum_of_squares_between(first, second)
33
+ 1/(1+sum_squares_of_shared_items_between(first, second))
34
+ end
35
+
36
+ def sum_squares_of_shared_items_between(first, second)
37
+ shared_items_between(first, second).inject(0.0) do |sum, item|
38
+ sum + (values_for(first)[item] - values_for(second)[item])**2
39
+ end
40
+ end
41
+
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,105 @@
1
+ require_relative 'recommendation_algorithm'
2
+
3
+ module Suggestor
4
+ module Algorithms
5
+
6
+ # The Pearson Correlation calculates a coefficient
7
+ # between two related items from the main element.
8
+
9
+ # For example, if we are dealing with user and movies, related
10
+ # by user reviews of each movie, each couple of users
11
+ # will be used as the axis ("Alvaro" on one, and "Andres" on other)
12
+
13
+ # The user movie ratings will be used to position movies on the chart.
14
+ # Thus, if a "Alvaro" reviews LOTR as 1 and "Andres" with 3,
15
+ # it will position it on [1,3].
16
+
17
+ # A line, "best-fit line", will be traced between all items, showing
18
+ # the closest distance to all of them. If the two users have the same
19
+ # ratings, it would show as a perfect diagonal (score of 1)
20
+
21
+ # The closest the movies to the line are, the more similar their tastes are.
22
+
23
+ # The great thing about using Pearson Correlation is that it works with
24
+ # bias to valuating the results. Thus, a user that always rates movies
25
+ # with great scores won't impact and mess up the results.
26
+
27
+ # It's probably a best fit for subjetive reviews (movies reviews, profile points, etc).
28
+
29
+ # More info at: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
30
+
31
+ class PearsonCorrelation
32
+
33
+ include RecommendationAlgorithm
34
+
35
+ def similarity_score_between(first, second)
36
+ return 0.0 if no_shared_items_between?(first, second)
37
+
38
+ calculate_all_sums_for(first, second)
39
+ numerator = difference_from_total_and_normalize_values
40
+ # 10.5 / 0.0 /
41
+ denominator = square_root_from_differences_of_sums
42
+
43
+ return 0.0 if denominator == 0
44
+
45
+ numerator / denominator
46
+
47
+ end
48
+
49
+ private
50
+
51
+ def calculate_all_sums_for(first,second)
52
+
53
+ shared_items = shared_items_between(first, second)
54
+ @total_related_items = shared_items.size
55
+
56
+ #simplify access
57
+ first_values = values_for(first)
58
+ second_values = values_for(second)
59
+
60
+ @first_values_sum = @second_values_sum = @first_square_values_sum = \
61
+ @second_square_values_sum = @products_sum = 0.0
62
+
63
+ shared_items.each do |item|
64
+
65
+ # Gets the corresponding value for each item on both elements
66
+ # For ex., the rating of the same movie by different users
67
+ first_value = first_values[item]
68
+ second_value = second_values[item]
69
+
70
+ # Will add all the related items values for the first
71
+ # and second item
72
+ # For ex., all movie recommendations ratings
73
+ @first_values_sum += first_value
74
+ @second_values_sum += second_value
75
+
76
+ # Adds the squares of both elements
77
+ @first_square_values_sum += first_value ** 2
78
+ @second_square_values_sum += second_value ** 2
79
+
80
+ # Adds the product of both values
81
+ @products_sum += first_value*second_value
82
+ end
83
+
84
+ end
85
+
86
+ def difference_from_total_and_normalize_values
87
+ product = @first_values_sum * @second_values_sum
88
+ normalized = product / @total_related_items
89
+ @products_sum - normalized
90
+ end
91
+
92
+ def square_root_from_differences_of_sums
93
+
94
+ power_left_result = @first_values_sum **2 /@total_related_items
95
+ equation_left = @first_square_values_sum - power_left_result
96
+
97
+ power_right_result = ( @second_values_sum **2 )/@total_related_items
98
+ equation_right = @second_square_values_sum - power_right_result
99
+ Math.sqrt(equation_left * equation_right)
100
+
101
+ end
102
+
103
+ end
104
+ end
105
+ end
@@ -0,0 +1,108 @@
1
+ module Suggestor
2
+ module Algorithms
3
+ module RecommendationAlgorithm
4
+
5
+ attr_accessor :collection
6
+
7
+ def initialize(collection)
8
+ @collection = collection
9
+ end
10
+
11
+ # returns similar items based on their similary score
12
+ # for example, similar users based on their movies reviews
13
+ def similar_items_to(main)
14
+
15
+ #just compare those whore aren't the main item
16
+ compare_to = collection.dup
17
+ compare_to.delete(main)
18
+
19
+ # return results based on their score
20
+ compare_to.keys.inject({}) do |result, other|
21
+ result.merge!({other => similarity_score_between(main,other)})
22
+ end
23
+
24
+ end
25
+
26
+ # returns recommended related items for the main user
27
+ # The most important feature. For example, a user will get
28
+ # movie recommendations based on his past movie reviews
29
+ # and how it compares with others
30
+ def recommented_related_items_for(main)
31
+
32
+ @similarities = @totals = Hash.new(0)
33
+ @main = main
34
+
35
+ create_similarities_totals
36
+ generate_rankings
37
+
38
+ end
39
+
40
+ def no_shared_items_between?(first,second)
41
+ shared_items_between(first,second).empty?
42
+ end
43
+
44
+ def shared_items_between(first,second)
45
+ return [] unless values_for(first) && values_for(second)
46
+ related_keys_for(first).select do |item|
47
+ related_keys_for(second).include? item
48
+ end
49
+ end
50
+
51
+ private
52
+
53
+ def main_already_has?(related)
54
+ collection[@main].has_key?(related)
55
+ end
56
+
57
+ def values_for(id)
58
+ collection[id.to_s]
59
+ end
60
+
61
+ def related_keys_for(id)
62
+ values_for(id).keys
63
+ end
64
+
65
+ def add_to_totals(other,item,score)
66
+ @totals[item] += collection[other][item]*score
67
+ @similarities[item] += score
68
+ end
69
+
70
+ def generate_rankings
71
+ @rankings = {}
72
+
73
+ @totals.each_pair do |item, total|
74
+ normalized_value = (total / @similarities[item])
75
+ @rankings.merge!( { item => normalized_value} )
76
+ end
77
+
78
+ @rankings
79
+
80
+ end
81
+
82
+ def create_similarities_totals
83
+
84
+ collection.keys.each do |other|
85
+
86
+ # won't bother comparing it if the compared item is the same
87
+ # as the main, or if they scores are below 0 (nothing in common)
88
+ next if other == @main
89
+ score = similarity_score_between(@main,other)
90
+ next if score <= 0
91
+
92
+ # will compare each the results but only for related items
93
+ # that the main item doesn't already have
94
+ # For ex., if they have already saw a movie they won't
95
+ # get it suggested
96
+ collection[other].keys.each do |item|
97
+
98
+ unless main_already_has?(item)
99
+ add_to_totals(other,item,score)
100
+ end
101
+
102
+ end
103
+ end
104
+ end
105
+
106
+ end
107
+ end
108
+ end
@@ -0,0 +1,13 @@
1
+ require 'delegate'
2
+
3
+ module Suggestor
4
+
5
+ class Datum < DelegateClass(Hash)
6
+
7
+ def initialize(hash)
8
+ super(hash)
9
+ end
10
+
11
+ end
12
+
13
+ end
@@ -0,0 +1,63 @@
1
+ require 'json'
2
+ require_relative 'algorithms/euclidean_distance'
3
+ require_relative 'algorithms/pearson_correlation'
4
+
5
+ module Suggestor
6
+
7
+ class WrongInputFormat < Exception; end
8
+
9
+ class Engine
10
+
11
+ attr_accessor :collection
12
+
13
+ def initialize
14
+ @collection = {}
15
+ end
16
+
17
+ def load_data(input)
18
+ add_to_collection(input)
19
+ end
20
+
21
+ def similarity_score_for(first, second, opts={})
22
+ opts[:algorithm] ||= :euclidean_distance
23
+ strategy_for(opts[:algorithm]).similarity_score_between(first, second)
24
+ end
25
+
26
+ def similar_items_to(item, opts={})
27
+ opts[:algorithm] ||= :euclidean_distance
28
+ strategy_for(opts[:algorithm]).similar_items_to(item)
29
+ end
30
+
31
+ def recommented_related_items_for(item, opts={})
32
+ opts[:algorithm] ||= :euclidean_distance
33
+ strategy_for(opts[:algorithm]).recommented_related_items_for(item)
34
+ end
35
+
36
+ private
37
+
38
+ def strategy_for(algorithm)
39
+ constantize(classify(algorithm)).new(collection)
40
+ end
41
+
42
+ # based on Rail's code
43
+ def classify(name)
44
+ name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
45
+ end
46
+
47
+ def constantize(name)
48
+ Suggestor::Algorithms.const_get(name)
49
+ end
50
+
51
+ def add_to_collection(input)
52
+ @collection.merge! parse_from_json(input)
53
+ end
54
+
55
+ def parse_from_json(json)
56
+ JSON.parse(json)
57
+ rescue Exception => ex
58
+ raise WrongInputFormat, "Wrong Data format: #{ex.message}"
59
+ end
60
+
61
+ end
62
+
63
+ end
@@ -0,0 +1,3 @@
1
+ module Suggestor
2
+ VERSION = "0.0.3"
3
+ end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "suggestor/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "suggestor"
7
+ s.version = Suggestor::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Alvaro Pereyra"]
10
+ s.email = ["alvaro@xendacentral.com"]
11
+ s.homepage = ""
12
+ s.summary = %q{Suggestor allows you to get suggestions of related items in your data}
13
+ s.description = %q{Suggestor allows you to get suggestions of related items in your data}
14
+
15
+ s.rubyforge_project = "suggestor"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = []
20
+ s.require_paths = ["lib"]
21
+ end
@@ -0,0 +1,30 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor/algorithms/euclidean_distance'
3
+ require_relative '../lib/suggestor/engine'
4
+
5
+ describe Suggestor::Algorithms::EuclideanDistance do
6
+
7
+ before do
8
+ @data_string = File.read("test/test.json")
9
+ @suggestor = Suggestor::Engine.new
10
+ @suggestor.load_data(@data_string)
11
+ @algorithm = Suggestor::Algorithms::EuclideanDistance.new(@suggestor.collection)
12
+ end
13
+
14
+ describe "when building up recommendations" do
15
+
16
+ it "must return a list of shared items between two people" do
17
+ @algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
18
+ end
19
+
20
+ it "must return 0 as similarity record if two elements hace no shared items" do
21
+ @algorithm.similarity_score_between(1,99).must_be :==, 0
22
+ end
23
+
24
+ it "must return 1 as similarity record if two elements have equal related values" do
25
+ puts @algorithm.shared_items_between(1,1).inspect
26
+ @algorithm.similarity_score_between(1,1).must_be :==, 1
27
+ end
28
+
29
+ end
30
+ end
@@ -0,0 +1,34 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor/algorithms/pearson_correlation'
3
+ require_relative '../lib/suggestor/engine'
4
+
5
+ describe Suggestor::Algorithms::PearsonCorrelation do
6
+
7
+ before do
8
+ @data_string = File.read("test/test.json")
9
+ @suggestor = Suggestor::Engine.new
10
+ @suggestor.load_data(@data_string)
11
+ @algorithm = Suggestor::Algorithms::PearsonCorrelation.new(@suggestor.collection)
12
+ end
13
+
14
+ describe "when building up recommendations" do
15
+
16
+ it "must return a list of shared items between two people" do
17
+ @algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
18
+ end
19
+
20
+ it "must return 0 as similarity record if two elements hace no shared items" do
21
+ @algorithm.similarity_score_between(1,4).must_be :==, 0
22
+ end
23
+
24
+ it "must return 1 as similarity record if two elements have equal related values" do
25
+ @algorithm.similarity_score_between(1,1).must_be :==, 1
26
+ end
27
+
28
+ it "must return -1 as similarity record if two elements are totally distant" do
29
+ @algorithm.similarity_score_between(1,99).must_be :==, 0
30
+ end
31
+
32
+
33
+ end
34
+ end
@@ -0,0 +1,48 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor'
3
+
4
+ describe Suggestor::Engine do
5
+ before do
6
+ @suggestor = Suggestor::Engine.new
7
+ @data_string = File.read("test/test.json")
8
+ end
9
+
10
+ describe "when loading up the data structure" do
11
+ it "must raise an exception with invalid data" do
12
+ lambda{ @suggestor.load_data("GIBBERISH}") }.must_raise Suggestor::WrongInputFormat
13
+ end
14
+
15
+ it "must return an array structure if data is ok" do
16
+ @suggestor.load_data(@data_string).must_be_instance_of Hash
17
+ end
18
+
19
+ end
20
+
21
+ describe "when accesing the data after load_dataing it" do
22
+
23
+ before do
24
+ @suggestor.load_data(@data_string)
25
+ end
26
+
27
+ it "must return a similarty score between to elements" do
28
+ @suggestor.similarity_score_for("1","1").must_be :==, 1
29
+ end
30
+
31
+ it "must return similar items from the base one with euclidean distance" do
32
+ expected = {"2"=>0.02702702702702703, "3"=>0.02702702702702703}
33
+ @suggestor.similar_items_to("1").must_be :==, expected
34
+ end
35
+
36
+ it "must return similar items from the base one with pearson correlation" do
37
+ expected = {"1"=>1.0, "3"=>0.0}
38
+ @suggestor.similar_items_to("2",:algorithm => :pearson_correlation).must_be :==, expected
39
+ end
40
+
41
+ it "must return similar items from the base one with euclidean distance" do
42
+ expected = {"4"=>1.0}
43
+ @suggestor.recommented_related_items_for("2").must_be :==, expected
44
+ end
45
+
46
+ end
47
+
48
+ end
@@ -0,0 +1,20 @@
1
+ {
2
+ "1":
3
+ {
4
+ "1" : 10,
5
+ "2": 3
6
+ }
7
+ ,
8
+ "2":
9
+ {
10
+ "2": 3,
11
+ "5": 1,
12
+ "1": 4,
13
+ "3": 6
14
+ },
15
+ "3":
16
+ {
17
+ "1": 4,
18
+ "4": 6
19
+ }
20
+ }
metadata ADDED
@@ -0,0 +1,76 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: suggestor
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.0.3
6
+ platform: ruby
7
+ authors:
8
+ - Alvaro Pereyra
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-09-19 00:00:00 -05:00
14
+ default_executable:
15
+ dependencies: []
16
+
17
+ description: Suggestor allows you to get suggestions of related items in your data
18
+ email:
19
+ - alvaro@xendacentral.com
20
+ executables: []
21
+
22
+ extensions: []
23
+
24
+ extra_rdoc_files: []
25
+
26
+ files:
27
+ - .gitignore
28
+ - Gemfile
29
+ - README.md
30
+ - Rakefile
31
+ - demos/playing_around.rb
32
+ - lib/suggestor.rb
33
+ - lib/suggestor/algorithms/euclidean_distance.rb
34
+ - lib/suggestor/algorithms/pearson_correlation.rb
35
+ - lib/suggestor/algorithms/recommendation_algorithm.rb
36
+ - lib/suggestor/datum.rb
37
+ - lib/suggestor/engine.rb
38
+ - lib/suggestor/version.rb
39
+ - suggestor.gemspec
40
+ - test/euclidean_test.rb
41
+ - test/pearon_correlation.rb
42
+ - test/suggestor_test.rb
43
+ - test/test.json
44
+ has_rdoc: true
45
+ homepage: ""
46
+ licenses: []
47
+
48
+ post_install_message:
49
+ rdoc_options: []
50
+
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: "0"
59
+ required_rubygems_version: !ruby/object:Gem::Requirement
60
+ none: false
61
+ requirements:
62
+ - - ">="
63
+ - !ruby/object:Gem::Version
64
+ version: "0"
65
+ requirements: []
66
+
67
+ rubyforge_project: suggestor
68
+ rubygems_version: 1.5.0
69
+ signing_key:
70
+ specification_version: 3
71
+ summary: Suggestor allows you to get suggestions of related items in your data
72
+ test_files:
73
+ - test/euclidean_test.rb
74
+ - test/pearon_correlation.rb
75
+ - test/suggestor_test.rb
76
+ - test/test.json