suggestor 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1 @@
1
+ .DS_Store
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in suggestor.gemspec
4
+ gemspec
@@ -0,0 +1,64 @@
1
+ # Suggestor
2
+ ## Recommendations gem
3
+
4
+ Suggestor is a gem that will help you relating data. For example, given a User, Movie and Review class,
5
+ being the Review related to the User and and Movie and having a rating attribute, the gem would use those
6
+ information to correlate the information, and give results, like related Movies, Similar Users (based on their
7
+ tastes) and alike.
8
+
9
+ ## Usage
10
+
11
+ The gem needs an structure of date like this:
12
+
13
+ data = {"1": {"10": 10, "12": 1}, "2": {"11":5, "12": 4}}
14
+
15
+ Each element will ("1" or "2") correspond to, following the example, to user ids. They will gave access to related items (movies).
16
+
17
+ In the example, the user "1" has seen movies identified with ids "10" and "12", given them a rating of 10 and 1, respectively. Similar with user with id "2".
18
+
19
+ After loading the gem with the data:
20
+
21
+ engine = Suggestor::Engine.new
22
+ engine.load_data(data)
23
+
24
+ We can start to get some results.
25
+
26
+
27
+ ### Similar items
28
+
29
+ For example, we can get similar users:
30
+
31
+ engine.similar_items_to("1")
32
+
33
+ Which will return an structure like
34
+
35
+ {id: similarity_score, id2: similarity_score }
36
+
37
+ Thus, you can load the data and save their similarity scores for later use.
38
+
39
+ Now, that fine and all, but what about Mr. Bob who always is ranking everything
40
+ higher. ID4 maybe is not that good after all. If that happens, Suggestor allows you to change the algorithm used:
41
+
42
+ engine.similar_items_to("1", :algorithm => :pearson_correlation)
43
+
44
+ There are two implemented methods, Euclidean Distance and Pearson Correlation.
45
+
46
+ Use Euclidean Distance (default) to compare items and get suggestions base on
47
+ actions that are normalized or not subjective (like user points earned by actions on a web site).
48
+
49
+ Use Pearson Correlation is there's some bias on the data. The algorithm will
50
+ take in mind if some user grades higher or lower and return more exact suggestions than Euclidean on that area.
51
+
52
+ ### Suggested items
53
+
54
+ Most interestingly, the gem allows you to get suggestions base on the data.
55
+ For example, which movies shoud user "2" watch based on his reviews, and similar other users tastes?
56
+
57
+ engine.recommented_related_items_for("2",:pearson_correlation)
58
+
59
+ As before, the structure returned will be
60
+
61
+ {id: similarity_score, id2: similarity_score }
62
+
63
+ But in this case, it will represent movie id's, and how similar are. You
64
+ can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
@@ -0,0 +1,2 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
@@ -0,0 +1,16 @@
1
+ require_relative '../lib/suggestor'
2
+
3
+ engine = Suggestor::Engine.new
4
+
5
+ # I'm using test data of Users and their movie recommendations
6
+ # Each user (identified by their ids) have a hash of their movies ids and
7
+ # what they've rate them with
8
+ json = File.read("test/test.json")
9
+
10
+ engine.load_data(json)
11
+
12
+ # Let's get some similar users
13
+ puts engine.similar_items_to("2").inspect
14
+
15
+ # So, after knowing them, why not having some recommendations?
16
+ puts engine.recommented_related_items_for("2", algorithm: :euclidean_distance)
@@ -0,0 +1,3 @@
1
+ require_relative 'suggestor/engine'
2
+
3
+
@@ -0,0 +1,44 @@
1
+ require_relative 'recommendation_algorithm'
2
+
3
+ module Suggestor
4
+ module Algorithms
5
+
6
+ # The euclidean distance will compare two structures of
7
+ # data, and map them on a chart, each their related element
8
+ # on each axis.
9
+
10
+ # For example, if we are dealing with user and movies, related
11
+ # by user reviews of each movie, each couple of shared movie
12
+ # rating will be used as the axis (LOTR on one, and The Matrix on other)
13
+
14
+ # The user ratings will be used to position them on the chart. Thus,
15
+ # if a user review LOTR as 1 and The Matrix with 5, it will position it
16
+ # on [1,5].
17
+
18
+ # The closest they are, the more similar their tastes are.
19
+ # More info at:
20
+ # http://en.wikipedia.org/wiki/Euclidean_metric
21
+ # http://en.wikipedia.org/wiki/Distance_correlation
22
+
23
+ class EuclideanDistance
24
+
25
+ include RecommendationAlgorithm
26
+
27
+ def similarity_score_between(first, second)
28
+ return 0.0 if no_shared_items_between?(first, second)
29
+ inverse_of_sum_of_squares_between(first, second)
30
+ end
31
+
32
+ def inverse_of_sum_of_squares_between(first, second)
33
+ 1/(1+sum_squares_of_shared_items_between(first, second))
34
+ end
35
+
36
+ def sum_squares_of_shared_items_between(first, second)
37
+ shared_items_between(first, second).inject(0.0) do |sum, item|
38
+ sum + (values_for(first)[item] - values_for(second)[item])**2
39
+ end
40
+ end
41
+
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,105 @@
1
+ require_relative 'recommendation_algorithm'
2
+
3
+ module Suggestor
4
+ module Algorithms
5
+
6
+ # The Pearson Correlation calculates a coefficient
7
+ # between two related items from the main element.
8
+
9
+ # For example, if we are dealing with user and movies, related
10
+ # by user reviews of each movie, each couple of users
11
+ # will be used as the axis ("Alvaro" on one, and "Andres" on other)
12
+
13
+ # The user movie ratings will be used to position movies on the chart.
14
+ # Thus, if a "Alvaro" reviews LOTR as 1 and "Andres" with 3,
15
+ # it will position it on [1,3].
16
+
17
+ # A line, "best-fit line", will be traced between all items, showing
18
+ # the closest distance to all of them. If the two users have the same
19
+ # ratings, it would show as a perfect diagonal (score of 1)
20
+
21
+ # The closest the movies to the line are, the more similar their tastes are.
22
+
23
+ # The great thing about using Pearson Correlation is that it works with
24
+ # bias to valuating the results. Thus, a user that always rates movies
25
+ # with great scores won't impact and mess up the results.
26
+
27
+ # It's probably a best fit for subjetive reviews (movies reviews, profile points, etc).
28
+
29
+ # More info at: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
30
+
31
+ class PearsonCorrelation
32
+
33
+ include RecommendationAlgorithm
34
+
35
+ def similarity_score_between(first, second)
36
+ return 0.0 if no_shared_items_between?(first, second)
37
+
38
+ calculate_all_sums_for(first, second)
39
+ numerator = difference_from_total_and_normalize_values
40
+ # 10.5 / 0.0 /
41
+ denominator = square_root_from_differences_of_sums
42
+
43
+ return 0.0 if denominator == 0
44
+
45
+ numerator / denominator
46
+
47
+ end
48
+
49
+ private
50
+
51
+ def calculate_all_sums_for(first,second)
52
+
53
+ shared_items = shared_items_between(first, second)
54
+ @total_related_items = shared_items.size
55
+
56
+ #simplify access
57
+ first_values = values_for(first)
58
+ second_values = values_for(second)
59
+
60
+ @first_values_sum = @second_values_sum = @first_square_values_sum = \
61
+ @second_square_values_sum = @products_sum = 0.0
62
+
63
+ shared_items.each do |item|
64
+
65
+ # Gets the corresponding value for each item on both elements
66
+ # For ex., the rating of the same movie by different users
67
+ first_value = first_values[item]
68
+ second_value = second_values[item]
69
+
70
+ # Will add all the related items values for the first
71
+ # and second item
72
+ # For ex., all movie recommendations ratings
73
+ @first_values_sum += first_value
74
+ @second_values_sum += second_value
75
+
76
+ # Adds the squares of both elements
77
+ @first_square_values_sum += first_value ** 2
78
+ @second_square_values_sum += second_value ** 2
79
+
80
+ # Adds the product of both values
81
+ @products_sum += first_value*second_value
82
+ end
83
+
84
+ end
85
+
86
+ def difference_from_total_and_normalize_values
87
+ product = @first_values_sum * @second_values_sum
88
+ normalized = product / @total_related_items
89
+ @products_sum - normalized
90
+ end
91
+
92
+ def square_root_from_differences_of_sums
93
+
94
+ power_left_result = @first_values_sum **2 /@total_related_items
95
+ equation_left = @first_square_values_sum - power_left_result
96
+
97
+ power_right_result = ( @second_values_sum **2 )/@total_related_items
98
+ equation_right = @second_square_values_sum - power_right_result
99
+ Math.sqrt(equation_left * equation_right)
100
+
101
+ end
102
+
103
+ end
104
+ end
105
+ end
@@ -0,0 +1,108 @@
1
+ module Suggestor
2
+ module Algorithms
3
+ module RecommendationAlgorithm
4
+
5
+ attr_accessor :collection
6
+
7
+ def initialize(collection)
8
+ @collection = collection
9
+ end
10
+
11
+ # returns similar items based on their similary score
12
+ # for example, similar users based on their movies reviews
13
+ def similar_items_to(main)
14
+
15
+ #just compare those whore aren't the main item
16
+ compare_to = collection.dup
17
+ compare_to.delete(main)
18
+
19
+ # return results based on their score
20
+ compare_to.keys.inject({}) do |result, other|
21
+ result.merge!({other => similarity_score_between(main,other)})
22
+ end
23
+
24
+ end
25
+
26
+ # returns recommended related items for the main user
27
+ # The most important feature. For example, a user will get
28
+ # movie recommendations based on his past movie reviews
29
+ # and how it compares with others
30
+ def recommented_related_items_for(main)
31
+
32
+ @similarities = @totals = Hash.new(0)
33
+ @main = main
34
+
35
+ create_similarities_totals
36
+ generate_rankings
37
+
38
+ end
39
+
40
+ def no_shared_items_between?(first,second)
41
+ shared_items_between(first,second).empty?
42
+ end
43
+
44
+ def shared_items_between(first,second)
45
+ return [] unless values_for(first) && values_for(second)
46
+ related_keys_for(first).select do |item|
47
+ related_keys_for(second).include? item
48
+ end
49
+ end
50
+
51
+ private
52
+
53
+ def main_already_has?(related)
54
+ collection[@main].has_key?(related)
55
+ end
56
+
57
+ def values_for(id)
58
+ collection[id.to_s]
59
+ end
60
+
61
+ def related_keys_for(id)
62
+ values_for(id).keys
63
+ end
64
+
65
+ def add_to_totals(other,item,score)
66
+ @totals[item] += collection[other][item]*score
67
+ @similarities[item] += score
68
+ end
69
+
70
+ def generate_rankings
71
+ @rankings = {}
72
+
73
+ @totals.each_pair do |item, total|
74
+ normalized_value = (total / @similarities[item])
75
+ @rankings.merge!( { item => normalized_value} )
76
+ end
77
+
78
+ @rankings
79
+
80
+ end
81
+
82
+ def create_similarities_totals
83
+
84
+ collection.keys.each do |other|
85
+
86
+ # won't bother comparing it if the compared item is the same
87
+ # as the main, or if they scores are below 0 (nothing in common)
88
+ next if other == @main
89
+ score = similarity_score_between(@main,other)
90
+ next if score <= 0
91
+
92
+ # will compare each the results but only for related items
93
+ # that the main item doesn't already have
94
+ # For ex., if they have already saw a movie they won't
95
+ # get it suggested
96
+ collection[other].keys.each do |item|
97
+
98
+ unless main_already_has?(item)
99
+ add_to_totals(other,item,score)
100
+ end
101
+
102
+ end
103
+ end
104
+ end
105
+
106
+ end
107
+ end
108
+ end
@@ -0,0 +1,13 @@
1
+ require 'delegate'
2
+
3
+ module Suggestor
4
+
5
+ class Datum < DelegateClass(Hash)
6
+
7
+ def initialize(hash)
8
+ super(hash)
9
+ end
10
+
11
+ end
12
+
13
+ end
@@ -0,0 +1,63 @@
1
+ require 'json'
2
+ require_relative 'algorithms/euclidean_distance'
3
+ require_relative 'algorithms/pearson_correlation'
4
+
5
+ module Suggestor
6
+
7
+ class WrongInputFormat < Exception; end
8
+
9
+ class Engine
10
+
11
+ attr_accessor :collection
12
+
13
+ def initialize
14
+ @collection = {}
15
+ end
16
+
17
+ def load_data(input)
18
+ add_to_collection(input)
19
+ end
20
+
21
+ def similarity_score_for(first, second, opts={})
22
+ opts[:algorithm] ||= :euclidean_distance
23
+ strategy_for(opts[:algorithm]).similarity_score_between(first, second)
24
+ end
25
+
26
+ def similar_items_to(item, opts={})
27
+ opts[:algorithm] ||= :euclidean_distance
28
+ strategy_for(opts[:algorithm]).similar_items_to(item)
29
+ end
30
+
31
+ def recommented_related_items_for(item, opts={})
32
+ opts[:algorithm] ||= :euclidean_distance
33
+ strategy_for(opts[:algorithm]).recommented_related_items_for(item)
34
+ end
35
+
36
+ private
37
+
38
+ def strategy_for(algorithm)
39
+ constantize(classify(algorithm)).new(collection)
40
+ end
41
+
42
+ # based on Rail's code
43
+ def classify(name)
44
+ name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
45
+ end
46
+
47
+ def constantize(name)
48
+ Suggestor::Algorithms.const_get(name)
49
+ end
50
+
51
+ def add_to_collection(input)
52
+ @collection.merge! parse_from_json(input)
53
+ end
54
+
55
+ def parse_from_json(json)
56
+ JSON.parse(json)
57
+ rescue Exception => ex
58
+ raise WrongInputFormat, "Wrong Data format: #{ex.message}"
59
+ end
60
+
61
+ end
62
+
63
+ end
@@ -0,0 +1,3 @@
1
+ module Suggestor
2
+ VERSION = "0.0.3"
3
+ end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "suggestor/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "suggestor"
7
+ s.version = Suggestor::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Alvaro Pereyra"]
10
+ s.email = ["alvaro@xendacentral.com"]
11
+ s.homepage = ""
12
+ s.summary = %q{Suggestor allows you to get suggestions of related items in your data}
13
+ s.description = %q{Suggestor allows you to get suggestions of related items in your data}
14
+
15
+ s.rubyforge_project = "suggestor"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = []
20
+ s.require_paths = ["lib"]
21
+ end
@@ -0,0 +1,30 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor/algorithms/euclidean_distance'
3
+ require_relative '../lib/suggestor/engine'
4
+
5
+ describe Suggestor::Algorithms::EuclideanDistance do
6
+
7
+ before do
8
+ @data_string = File.read("test/test.json")
9
+ @suggestor = Suggestor::Engine.new
10
+ @suggestor.load_data(@data_string)
11
+ @algorithm = Suggestor::Algorithms::EuclideanDistance.new(@suggestor.collection)
12
+ end
13
+
14
+ describe "when building up recommendations" do
15
+
16
+ it "must return a list of shared items between two people" do
17
+ @algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
18
+ end
19
+
20
+ it "must return 0 as similarity record if two elements hace no shared items" do
21
+ @algorithm.similarity_score_between(1,99).must_be :==, 0
22
+ end
23
+
24
+ it "must return 1 as similarity record if two elements have equal related values" do
25
+ puts @algorithm.shared_items_between(1,1).inspect
26
+ @algorithm.similarity_score_between(1,1).must_be :==, 1
27
+ end
28
+
29
+ end
30
+ end
@@ -0,0 +1,34 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor/algorithms/pearson_correlation'
3
+ require_relative '../lib/suggestor/engine'
4
+
5
+ describe Suggestor::Algorithms::PearsonCorrelation do
6
+
7
+ before do
8
+ @data_string = File.read("test/test.json")
9
+ @suggestor = Suggestor::Engine.new
10
+ @suggestor.load_data(@data_string)
11
+ @algorithm = Suggestor::Algorithms::PearsonCorrelation.new(@suggestor.collection)
12
+ end
13
+
14
+ describe "when building up recommendations" do
15
+
16
+ it "must return a list of shared items between two people" do
17
+ @algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
18
+ end
19
+
20
+ it "must return 0 as similarity record if two elements hace no shared items" do
21
+ @algorithm.similarity_score_between(1,4).must_be :==, 0
22
+ end
23
+
24
+ it "must return 1 as similarity record if two elements have equal related values" do
25
+ @algorithm.similarity_score_between(1,1).must_be :==, 1
26
+ end
27
+
28
+ it "must return -1 as similarity record if two elements are totally distant" do
29
+ @algorithm.similarity_score_between(1,99).must_be :==, 0
30
+ end
31
+
32
+
33
+ end
34
+ end
@@ -0,0 +1,48 @@
1
+ require 'minitest/autorun'
2
+ require_relative '../lib/suggestor'
3
+
4
+ describe Suggestor::Engine do
5
+ before do
6
+ @suggestor = Suggestor::Engine.new
7
+ @data_string = File.read("test/test.json")
8
+ end
9
+
10
+ describe "when loading up the data structure" do
11
+ it "must raise an exception with invalid data" do
12
+ lambda{ @suggestor.load_data("GIBBERISH}") }.must_raise Suggestor::WrongInputFormat
13
+ end
14
+
15
+ it "must return an array structure if data is ok" do
16
+ @suggestor.load_data(@data_string).must_be_instance_of Hash
17
+ end
18
+
19
+ end
20
+
21
+ describe "when accesing the data after load_dataing it" do
22
+
23
+ before do
24
+ @suggestor.load_data(@data_string)
25
+ end
26
+
27
+ it "must return a similarty score between to elements" do
28
+ @suggestor.similarity_score_for("1","1").must_be :==, 1
29
+ end
30
+
31
+ it "must return similar items from the base one with euclidean distance" do
32
+ expected = {"2"=>0.02702702702702703, "3"=>0.02702702702702703}
33
+ @suggestor.similar_items_to("1").must_be :==, expected
34
+ end
35
+
36
+ it "must return similar items from the base one with pearson correlation" do
37
+ expected = {"1"=>1.0, "3"=>0.0}
38
+ @suggestor.similar_items_to("2",:algorithm => :pearson_correlation).must_be :==, expected
39
+ end
40
+
41
+ it "must return similar items from the base one with euclidean distance" do
42
+ expected = {"4"=>1.0}
43
+ @suggestor.recommented_related_items_for("2").must_be :==, expected
44
+ end
45
+
46
+ end
47
+
48
+ end
@@ -0,0 +1,20 @@
1
+ {
2
+ "1":
3
+ {
4
+ "1" : 10,
5
+ "2": 3
6
+ }
7
+ ,
8
+ "2":
9
+ {
10
+ "2": 3,
11
+ "5": 1,
12
+ "1": 4,
13
+ "3": 6
14
+ },
15
+ "3":
16
+ {
17
+ "1": 4,
18
+ "4": 6
19
+ }
20
+ }
metadata ADDED
@@ -0,0 +1,76 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: suggestor
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.0.3
6
+ platform: ruby
7
+ authors:
8
+ - Alvaro Pereyra
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-09-19 00:00:00 -05:00
14
+ default_executable:
15
+ dependencies: []
16
+
17
+ description: Suggestor allows you to get suggestions of related items in your data
18
+ email:
19
+ - alvaro@xendacentral.com
20
+ executables: []
21
+
22
+ extensions: []
23
+
24
+ extra_rdoc_files: []
25
+
26
+ files:
27
+ - .gitignore
28
+ - Gemfile
29
+ - README.md
30
+ - Rakefile
31
+ - demos/playing_around.rb
32
+ - lib/suggestor.rb
33
+ - lib/suggestor/algorithms/euclidean_distance.rb
34
+ - lib/suggestor/algorithms/pearson_correlation.rb
35
+ - lib/suggestor/algorithms/recommendation_algorithm.rb
36
+ - lib/suggestor/datum.rb
37
+ - lib/suggestor/engine.rb
38
+ - lib/suggestor/version.rb
39
+ - suggestor.gemspec
40
+ - test/euclidean_test.rb
41
+ - test/pearon_correlation.rb
42
+ - test/suggestor_test.rb
43
+ - test/test.json
44
+ has_rdoc: true
45
+ homepage: ""
46
+ licenses: []
47
+
48
+ post_install_message:
49
+ rdoc_options: []
50
+
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: "0"
59
+ required_rubygems_version: !ruby/object:Gem::Requirement
60
+ none: false
61
+ requirements:
62
+ - - ">="
63
+ - !ruby/object:Gem::Version
64
+ version: "0"
65
+ requirements: []
66
+
67
+ rubyforge_project: suggestor
68
+ rubygems_version: 1.5.0
69
+ signing_key:
70
+ specification_version: 3
71
+ summary: Suggestor allows you to get suggestions of related items in your data
72
+ test_files:
73
+ - test/euclidean_test.rb
74
+ - test/pearon_correlation.rb
75
+ - test/suggestor_test.rb
76
+ - test/test.json