suggestor 0.0.3 → 0.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +32 -12
- data/examples/playing_around.rb +42 -0
- data/lib/suggestor.rb +3 -1
- data/lib/suggestor/algorithms/euclidean_distance.rb +9 -10
- data/lib/suggestor/algorithms/pearson_correlation.rb +50 -44
- data/lib/suggestor/algorithms/recommendation_algorithm.rb +94 -47
- data/lib/suggestor/engine.rb +10 -37
- data/lib/suggestor/version.rb +1 -1
- data/suggestor.gemspec +1 -0
- data/test/euclidean_test.rb +8 -10
- data/test/movies.json +1 -0
- data/test/{test.json → numbers.json} +0 -0
- data/test/pearson_correlation.rb +27 -0
- data/test/suggestor_test.rb +15 -20
- metadata +22 -13
- data/demos/playing_around.rb +0 -16
- data/lib/suggestor/datum.rb +0 -13
- data/test/pearon_correlation.rb +0 -34
data/README.md
CHANGED
@@ -10,16 +10,15 @@ tastes) and alike.
|
|
10
10
|
|
11
11
|
The gem needs an structure of date like this:
|
12
12
|
|
13
|
-
data = {"
|
13
|
+
data = '{"Alvaro Pereyra Rabanal": {"Primer": 10, "Memento": 9}, "Gustavo Leon": {"The Matrix":8, "Harry Potter": 8}}'
|
14
14
|
|
15
|
-
Each element will
|
15
|
+
Each element will correspond to, following the example, users. They will gave access to related items (reviews for movies).
|
16
16
|
|
17
|
-
In the example, the user "
|
17
|
+
In the example, the user "Alvaro Pereyra Rabanal" has seen movies "Primer" and "Memento", given them a rating of 10 and 9, respectively. Similar with user with "Gustavo Leon".
|
18
18
|
|
19
19
|
After loading the gem with the data:
|
20
20
|
|
21
|
-
engine = Suggestor::Engine.new
|
22
|
-
engine.load_data(data)
|
21
|
+
engine = Suggestor::Engine.new(data)
|
23
22
|
|
24
23
|
We can start to get some results.
|
25
24
|
|
@@ -28,18 +27,29 @@ We can start to get some results.
|
|
28
27
|
|
29
28
|
For example, we can get similar users:
|
30
29
|
|
31
|
-
engine.
|
30
|
+
engine.similar_to("Alvaro Pereyra Rabanal")
|
32
31
|
|
33
32
|
Which will return an structure like
|
34
33
|
|
35
|
-
|
34
|
+
[["label", similarity_score], ["label": similarity_score]]
|
35
|
+
|
36
|
+
Like:
|
37
|
+
|
38
|
+
[["Eogen Clase", 0.0001649620587264929], ["Daniel Subauste", 0.00011641443538998836], ["4D2Studio Diseno y Animacion", 8.548469823901521e-05], ["Rafael Lanfranco", 6.177033788374823e-05], ["Veronica Zapata Gotelli", 6.074965068950854e-05]]
|
36
39
|
|
37
40
|
Thus, you can load the data and save their similarity scores for later use.
|
38
41
|
|
42
|
+
You can limit the data passing a "size" argument:
|
43
|
+
|
44
|
+
engine.similar_to("Alvaro Pereyra Rabanal", :size => 5)
|
45
|
+
|
39
46
|
Now, that fine and all, but what about Mr. Bob who always is ranking everything
|
40
47
|
higher. ID4 maybe is not that good after all. If that happens, Suggestor allows you to change the algorithm used:
|
41
48
|
|
42
|
-
|
49
|
+
algorithm = Suggestor::Algorithms::PearsonCorrelation
|
50
|
+
engine = Suggestor::Engine.new(data, algorithm)
|
51
|
+
|
52
|
+
engine.recommended_to("Alvaro Pereyra Rabanal")
|
43
53
|
|
44
54
|
There are two implemented methods, Euclidean Distance and Pearson Correlation.
|
45
55
|
|
@@ -54,11 +64,21 @@ take in mind if some user grades higher or lower and return more exact suggestio
|
|
54
64
|
Most interestingly, the gem allows you to get suggestions base on the data.
|
55
65
|
For example, which movies shoud user "2" watch based on his reviews, and similar other users tastes?
|
56
66
|
|
57
|
-
engine.
|
67
|
+
engine.recommended_to("Alvaro Pereyra Rabanal")
|
58
68
|
|
59
69
|
As before, the structure returned will be
|
60
70
|
|
61
|
-
|
71
|
+
[["label", similarity_score], ["label": similarity_score]]
|
72
|
+
|
73
|
+
But in this case, it will represent movie labels, and how similar they are. You
|
74
|
+
can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
|
75
|
+
|
76
|
+
### Similar related items
|
77
|
+
|
78
|
+
We can also invert the data that the user has added, enableing us to get
|
79
|
+
similar related items. For example, let's say I'm on a Movie profile and
|
80
|
+
want to check which other movies are similar to it:
|
81
|
+
|
82
|
+
engine.similar_related_to("Batman Begins ", :size => 5)
|
62
83
|
|
63
|
-
|
64
|
-
can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
|
84
|
+
Now you can go and build your awesome recommendations web site :)
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require_relative '../lib/suggestor'
|
2
|
+
|
3
|
+
# I'm using test data of Users and their movie recommendations
|
4
|
+
# Each user have a hash of their reviews with the movie and
|
5
|
+
# what they've rate them with
|
6
|
+
json = File.read("test/movies.json")
|
7
|
+
engine = Suggestor::Engine.new(json, Suggestor::Algorithms::EuclideanDistance)
|
8
|
+
|
9
|
+
# Let's get some similar users
|
10
|
+
name = "Alvaro Pereyra Rabanal"
|
11
|
+
puts "Who is similar to #{name}"
|
12
|
+
puts engine.similar_to(name, size: 5).inspect
|
13
|
+
|
14
|
+
puts
|
15
|
+
puts
|
16
|
+
|
17
|
+
# So, after knowing them, why not having some recommendations?
|
18
|
+
puts "Interesting! But I want to see some stuff at the movies, what to watch?"
|
19
|
+
opts = {size: 5}
|
20
|
+
results = engine.recommended_to("Alvaro Pereyra Rabanal", opts)
|
21
|
+
|
22
|
+
puts results.inspect
|
23
|
+
|
24
|
+
puts
|
25
|
+
puts
|
26
|
+
|
27
|
+
# That's good, but let's take in mind bias while using Pearson Correlation:
|
28
|
+
puts "Adjust this results please"
|
29
|
+
engine = Suggestor::Engine.new(json,Suggestor::Algorithms::PearsonCorrelation)
|
30
|
+
|
31
|
+
ops = {size: 5}
|
32
|
+
results = engine.recommended_to("Alvaro Pereyra Rabanal", opts)
|
33
|
+
puts results.inspect
|
34
|
+
|
35
|
+
puts
|
36
|
+
puts
|
37
|
+
|
38
|
+
name = "Batman Begins "
|
39
|
+
puts "Now that was nice. But which others are similar to '#{name}'"
|
40
|
+
ops = {size: 10}
|
41
|
+
results = engine.similar_related_to(name, opts)
|
42
|
+
puts results.inspect
|
data/lib/suggestor.rb
CHANGED
@@ -1,5 +1,3 @@
|
|
1
|
-
require_relative 'recommendation_algorithm'
|
2
|
-
|
3
1
|
module Suggestor
|
4
2
|
module Algorithms
|
5
3
|
|
@@ -24,21 +22,22 @@ module Suggestor
|
|
24
22
|
|
25
23
|
include RecommendationAlgorithm
|
26
24
|
|
27
|
-
def
|
28
|
-
return 0.0 if
|
29
|
-
|
25
|
+
def similarity_score(first, second)
|
26
|
+
return 0.0 if nothing_shared?(first, second)
|
27
|
+
inverse_of_squares(first, second)
|
30
28
|
end
|
31
29
|
|
32
|
-
def
|
33
|
-
1/(1+
|
30
|
+
def inverse_of_squares(first, second)
|
31
|
+
1/(1+Math.sqrt(sum_squares(first, second)))
|
34
32
|
end
|
35
33
|
|
36
|
-
def
|
37
|
-
|
38
|
-
sum + (values_for(first)[item] - values_for(second)[item])**2
|
34
|
+
def sum_squares(first, second)
|
35
|
+
shared_items(first, second).inject(0.0) do |sum, item|
|
36
|
+
sum + ( values_for(first)[item] - values_for(second)[item] ) ** 2
|
39
37
|
end
|
40
38
|
end
|
41
39
|
|
42
40
|
end
|
41
|
+
|
43
42
|
end
|
44
43
|
end
|
@@ -1,5 +1,3 @@
|
|
1
|
-
require_relative 'recommendation_algorithm'
|
2
|
-
|
3
1
|
module Suggestor
|
4
2
|
module Algorithms
|
5
3
|
|
@@ -18,86 +16,94 @@ module Suggestor
|
|
18
16
|
# the closest distance to all of them. If the two users have the same
|
19
17
|
# ratings, it would show as a perfect diagonal (score of 1)
|
20
18
|
|
21
|
-
# The closest the movies to the line are, the more similar their tastes
|
19
|
+
# The closest the movies to the line are, the more similar their tastes
|
20
|
+
# are.
|
22
21
|
|
23
22
|
# The great thing about using Pearson Correlation is that it works with
|
24
23
|
# bias to valuating the results. Thus, a user that always rates movies
|
25
24
|
# with great scores won't impact and mess up the results.
|
26
25
|
|
27
|
-
# It's probably a best fit for subjetive reviews (movies reviews, profile
|
26
|
+
# It's probably a best fit for subjetive reviews (movies reviews, profile
|
27
|
+
# points, etc).
|
28
28
|
|
29
|
-
# More info at:
|
29
|
+
# More info at:
|
30
|
+
# http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
|
30
31
|
|
31
32
|
class PearsonCorrelation
|
32
33
|
|
33
34
|
include RecommendationAlgorithm
|
34
35
|
|
35
|
-
def
|
36
|
-
return
|
36
|
+
def similarity_score(first, second)
|
37
|
+
return -1.0 if nothing_shared?(first, second)
|
37
38
|
|
38
|
-
|
39
|
-
numerator = difference_from_total_and_normalize_values
|
40
|
-
# 10.5 / 0.0 /
|
41
|
-
denominator = square_root_from_differences_of_sums
|
39
|
+
process_values(first, second)
|
42
40
|
|
43
|
-
|
41
|
+
numerator = difference_from_values
|
42
|
+
denominator = square_root_from_differences
|
44
43
|
|
44
|
+
return 0.0 if denominator == 0
|
45
45
|
numerator / denominator
|
46
|
-
|
47
46
|
end
|
48
47
|
|
49
48
|
private
|
50
49
|
|
51
|
-
def
|
52
|
-
|
53
|
-
|
54
|
-
@total_related_items = shared_items.size
|
50
|
+
def process_values(first, second)
|
51
|
+
items = shared_items(first, second)
|
52
|
+
@total_related_items = items.size.to_f
|
55
53
|
|
56
|
-
|
57
|
-
first_values = values_for(first)
|
54
|
+
first_values = values_for(first)
|
58
55
|
second_values = values_for(second)
|
59
56
|
|
60
|
-
|
61
|
-
@second_square_values_sum = @products_sum = 0.0
|
57
|
+
create_helper_variables
|
62
58
|
|
63
|
-
|
59
|
+
items.each do |item|
|
64
60
|
|
65
|
-
|
66
|
-
# For ex., the rating of the same movie by different users
|
67
|
-
first_value = first_values[item]
|
61
|
+
first_value = first_values[item]
|
68
62
|
second_value = second_values[item]
|
69
63
|
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
@first_values_sum += first_value
|
74
|
-
@second_values_sum += second_value
|
75
|
-
|
76
|
-
# Adds the squares of both elements
|
77
|
-
@first_square_values_sum += first_value ** 2
|
78
|
-
@second_square_values_sum += second_value ** 2
|
64
|
+
append_values(first_value, second_value)
|
65
|
+
append_squares(first_value, second_value)
|
66
|
+
append_product(first_value, second_value)
|
79
67
|
|
80
|
-
# Adds the product of both values
|
81
|
-
@products_sum += first_value*second_value
|
82
68
|
end
|
69
|
+
end
|
83
70
|
|
71
|
+
def append_values(first_value, second_value)
|
72
|
+
@first_values_sum += first_value
|
73
|
+
@second_values_sum += second_value
|
84
74
|
end
|
85
75
|
|
86
|
-
def
|
76
|
+
def append_squares(first_value, second_value)
|
77
|
+
@first_square_values_sum += ( first_value ** 2 )
|
78
|
+
@second_square_values_sum += ( second_value ** 2 )
|
79
|
+
end
|
80
|
+
|
81
|
+
def append_product(first_value, second_value)
|
82
|
+
@products_sum += first_value * second_value
|
83
|
+
end
|
84
|
+
|
85
|
+
def difference_from_values
|
87
86
|
product = @first_values_sum * @second_values_sum
|
88
87
|
normalized = product / @total_related_items
|
89
88
|
@products_sum - normalized
|
90
89
|
end
|
91
90
|
|
92
|
-
def
|
93
|
-
|
94
|
-
|
95
|
-
|
91
|
+
def square_root_from_differences
|
92
|
+
power_left_result = ( @first_values_sum ** 2 ) / @total_related_items
|
93
|
+
equation_left = @first_square_values_sum - power_left_result
|
94
|
+
|
95
|
+
power_right_result = ( @second_values_sum ** 2 )/ @total_related_items
|
96
|
+
equation_right = @second_square_values_sum - power_right_result
|
96
97
|
|
97
|
-
|
98
|
-
|
99
|
-
Math.sqrt(equation_left * equation_right)
|
98
|
+
Math.sqrt( equation_left * equation_right )
|
99
|
+
end
|
100
100
|
|
101
|
+
def create_helper_variables
|
102
|
+
@first_values_sum = 0.0
|
103
|
+
@second_values_sum = 0.0
|
104
|
+
@first_square_values_sum = 0.0
|
105
|
+
@second_square_values_sum = 0.0
|
106
|
+
@products_sum = 0.0
|
101
107
|
end
|
102
108
|
|
103
109
|
end
|
@@ -8,50 +8,86 @@ module Suggestor
|
|
8
8
|
@collection = collection
|
9
9
|
end
|
10
10
|
|
11
|
-
#
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
compare_to.delete(main)
|
18
|
-
|
19
|
-
# return results based on their score
|
20
|
-
compare_to.keys.inject({}) do |result, other|
|
21
|
-
result.merge!({other => similarity_score_between(main,other)})
|
22
|
-
end
|
11
|
+
# Ex. Similar users based on their movies reviews
|
12
|
+
def similar_to(main, opts={})
|
13
|
+
opts.merge!(default_options)
|
14
|
+
|
15
|
+
collection = remove_self(main)
|
16
|
+
results = order_by_similarity_score(main,collection)
|
23
17
|
|
18
|
+
sort_results(results,opts[:size])
|
24
19
|
end
|
25
20
|
|
26
|
-
#
|
27
|
-
|
28
|
-
|
29
|
-
# and how it compares with others
|
30
|
-
def recommented_related_items_for(main)
|
21
|
+
# Ex. a user will get movie recommendations
|
22
|
+
def recommended_to(main, opts={})
|
23
|
+
opts.merge!(default_options)
|
31
24
|
|
32
25
|
@similarities = @totals = Hash.new(0)
|
33
|
-
@main = main
|
34
26
|
|
35
|
-
create_similarities_totals
|
36
|
-
generate_rankings
|
27
|
+
create_similarities_totals(main)
|
28
|
+
results = generate_rankings
|
37
29
|
|
30
|
+
sort_results(results,opts[:size])
|
38
31
|
end
|
39
32
|
|
40
|
-
|
41
|
-
|
33
|
+
# Ex. what other movies are related to a given one
|
34
|
+
def similar_related_to(main, opts={})
|
35
|
+
opts.merge!(default_options)
|
36
|
+
|
37
|
+
collection = invert_collection
|
38
|
+
engine = self.class.new(collection)
|
39
|
+
|
40
|
+
engine.similar_to(main,opts)
|
42
41
|
end
|
43
42
|
|
44
|
-
def
|
45
|
-
return [] unless values_for(first) && values_for(second)
|
43
|
+
def shared_items(first, second)
|
44
|
+
return [] unless values_for(first) && values_for(second)
|
45
|
+
|
46
46
|
related_keys_for(first).select do |item|
|
47
47
|
related_keys_for(second).include? item
|
48
48
|
end
|
49
|
-
end
|
49
|
+
end
|
50
50
|
|
51
51
|
private
|
52
52
|
|
53
|
-
def
|
54
|
-
|
53
|
+
def default_options
|
54
|
+
{size: 5}
|
55
|
+
end
|
56
|
+
|
57
|
+
def nothing_shared?(first, second)
|
58
|
+
shared_items(first, second).empty?
|
59
|
+
end
|
60
|
+
|
61
|
+
def remove_self(main)
|
62
|
+
cleaned = collection.dup
|
63
|
+
cleaned.delete(main)
|
64
|
+
cleaned
|
65
|
+
end
|
66
|
+
|
67
|
+
|
68
|
+
# changes { "Cat": {"1": 10, "2":20}, "Dog": {"1":5, "2": 15} }
|
69
|
+
# to {"1": {"Cat": 10, "Dog": 5}, "2": {"Cat": 20, "Dog": 15}
|
70
|
+
def invert_collection
|
71
|
+
results = {}
|
72
|
+
|
73
|
+
collection.keys.each do |main|
|
74
|
+
collection[main].keys.each do |item|
|
75
|
+
results[item] ||= {}
|
76
|
+
results[item][main] = collection[main][item]
|
77
|
+
end
|
78
|
+
end
|
79
|
+
|
80
|
+
results
|
81
|
+
end
|
82
|
+
|
83
|
+
def order_by_similarity_score(main,collection)
|
84
|
+
result = collection.keys.inject({}) do |res, other|
|
85
|
+
res.merge!({other => similarity_score(main, other)})
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
def already_has?(main, related)
|
90
|
+
collection[main].has_key?(related)
|
55
91
|
end
|
56
92
|
|
57
93
|
def values_for(id)
|
@@ -62,47 +98,58 @@ module Suggestor
|
|
62
98
|
values_for(id).keys
|
63
99
|
end
|
64
100
|
|
65
|
-
def add_to_totals(other,item,score)
|
66
|
-
@totals[item]
|
101
|
+
def add_to_totals(other, item, score)
|
102
|
+
@totals[item] += collection[other][item]*score
|
67
103
|
@similarities[item] += score
|
68
104
|
end
|
69
105
|
|
70
|
-
def
|
71
|
-
|
106
|
+
def sort_results(results,size=-1)
|
107
|
+
sorted = results.sort{|a,b| a[1] <=> b[1]}.reverse
|
108
|
+
sorted[0, size]
|
109
|
+
end
|
72
110
|
|
111
|
+
def generate_rankings
|
112
|
+
rankings = {}
|
113
|
+
|
73
114
|
@totals.each_pair do |item, total|
|
74
|
-
normalized_value = (total / @similarities[item])
|
75
|
-
|
115
|
+
normalized_value = (total / Math.sqrt(@similarities[item]))
|
116
|
+
rankings.merge!( { item => normalized_value} )
|
76
117
|
end
|
77
118
|
|
78
|
-
|
119
|
+
rankings
|
120
|
+
end
|
121
|
+
|
122
|
+
def something_in_common?(score)
|
123
|
+
score > 0
|
124
|
+
end
|
79
125
|
|
126
|
+
def same_item?(main, other)
|
127
|
+
other == main
|
80
128
|
end
|
81
129
|
|
82
|
-
def create_similarities_totals
|
130
|
+
def create_similarities_totals(main)
|
83
131
|
|
84
132
|
collection.keys.each do |other|
|
85
133
|
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
next
|
91
|
-
|
92
|
-
# will compare each the results but only for related items
|
93
|
-
# that the main item doesn't already have
|
94
|
-
# For ex., if they have already saw a movie they won't
|
95
|
-
# get it suggested
|
134
|
+
next if same_item?(main,other)
|
135
|
+
|
136
|
+
score = similarity_score(main, other)
|
137
|
+
|
138
|
+
next unless something_in_common?(score)
|
139
|
+
|
96
140
|
collection[other].keys.each do |item|
|
97
141
|
|
98
|
-
unless
|
99
|
-
add_to_totals(other,item,score)
|
142
|
+
unless already_has?(main, item)
|
143
|
+
add_to_totals(other, item, score)
|
100
144
|
end
|
101
145
|
|
102
146
|
end
|
147
|
+
|
103
148
|
end
|
149
|
+
|
104
150
|
end
|
105
151
|
|
152
|
+
|
106
153
|
end
|
107
154
|
end
|
108
155
|
end
|
data/lib/suggestor/engine.rb
CHANGED
@@ -1,6 +1,4 @@
|
|
1
1
|
require 'json'
|
2
|
-
require_relative 'algorithms/euclidean_distance'
|
3
|
-
require_relative 'algorithms/pearson_correlation'
|
4
2
|
|
5
3
|
module Suggestor
|
6
4
|
|
@@ -8,50 +6,25 @@ module Suggestor
|
|
8
6
|
|
9
7
|
class Engine
|
10
8
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
@collection = {}
|
15
|
-
end
|
16
|
-
|
17
|
-
def load_data(input)
|
18
|
-
add_to_collection(input)
|
9
|
+
def initialize(input, algorithm = Algorithms::EuclideanDistance)
|
10
|
+
@collection = parse_from_json(input)
|
11
|
+
@algorithm = algorithm.new(@collection)
|
19
12
|
end
|
20
|
-
|
21
|
-
def
|
22
|
-
|
23
|
-
strategy_for(opts[:algorithm]).similarity_score_between(first, second)
|
13
|
+
|
14
|
+
def similar_to(item, opts={})
|
15
|
+
@algorithm.similar_to(item, opts)
|
24
16
|
end
|
25
17
|
|
26
|
-
def
|
27
|
-
|
28
|
-
strategy_for(opts[:algorithm]).similar_items_to(item)
|
18
|
+
def recommended_to(item, opts={})
|
19
|
+
@algorithm.recommended_to(item, opts)
|
29
20
|
end
|
30
21
|
|
31
|
-
def
|
32
|
-
|
33
|
-
strategy_for(opts[:algorithm]).recommented_related_items_for(item)
|
22
|
+
def similar_related_to(item, opts={})
|
23
|
+
@algorithm.similar_related_to(item, opts)
|
34
24
|
end
|
35
25
|
|
36
26
|
private
|
37
|
-
|
38
|
-
def strategy_for(algorithm)
|
39
|
-
constantize(classify(algorithm)).new(collection)
|
40
|
-
end
|
41
|
-
|
42
|
-
# based on Rail's code
|
43
|
-
def classify(name)
|
44
|
-
name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
|
45
|
-
end
|
46
|
-
|
47
|
-
def constantize(name)
|
48
|
-
Suggestor::Algorithms.const_get(name)
|
49
|
-
end
|
50
27
|
|
51
|
-
def add_to_collection(input)
|
52
|
-
@collection.merge! parse_from_json(input)
|
53
|
-
end
|
54
|
-
|
55
28
|
def parse_from_json(json)
|
56
29
|
JSON.parse(json)
|
57
30
|
rescue Exception => ex
|
data/lib/suggestor/version.rb
CHANGED
data/suggestor.gemspec
CHANGED
data/test/euclidean_test.rb
CHANGED
@@ -1,29 +1,27 @@
|
|
1
1
|
require 'minitest/autorun'
|
2
|
-
|
3
|
-
require_relative '../lib/suggestor
|
2
|
+
require 'json'
|
3
|
+
require_relative '../lib/suggestor'
|
4
4
|
|
5
5
|
describe Suggestor::Algorithms::EuclideanDistance do
|
6
6
|
|
7
7
|
before do
|
8
|
-
|
9
|
-
|
10
|
-
@
|
11
|
-
@algorithm = Suggestor::Algorithms::EuclideanDistance.new(@suggestor.collection)
|
8
|
+
data_string = File.read("test/numbers.json")
|
9
|
+
data = JSON.parse(data_string)
|
10
|
+
@algorithm = Suggestor::Algorithms::EuclideanDistance.new(data)
|
12
11
|
end
|
13
12
|
|
14
13
|
describe "when building up recommendations" do
|
15
14
|
|
16
15
|
it "must return a list of shared items between two people" do
|
17
|
-
@algorithm.
|
16
|
+
@algorithm.shared_items(1,2).must_be :==, ["1","2"]
|
18
17
|
end
|
19
18
|
|
20
19
|
it "must return 0 as similarity record if two elements hace no shared items" do
|
21
|
-
@algorithm.
|
20
|
+
@algorithm.similarity_score(1,99).must_be :==, 0
|
22
21
|
end
|
23
22
|
|
24
23
|
it "must return 1 as similarity record if two elements have equal related values" do
|
25
|
-
|
26
|
-
@algorithm.similarity_score_between(1,1).must_be :==, 1
|
24
|
+
@algorithm.similarity_score(1,1).must_be :==, 1
|
27
25
|
end
|
28
26
|
|
29
27
|
end
|
data/test/movies.json
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
{"Alvaro Pereyra Rabanal":{"Enterrado":90,"La reunion del diablo":20,"Scott Pilgrim vs The World":80,"El avispon verde":20,"Se dice de mi":89,"Un tonto en el amor":75,"El secreto de sus ojos":99,"Wall Street: El dinero nunca duerme":90,"Super 8 ":90,"Kung fu panda 2":92,"La revelacion":70,"Rio":90,"El cisne negro":70,"Tron: El Legado":90,"Invasion del Mundo: Batalla Los Angeles":50,"Peluda venganza":20,"Megamente":90,"Dias de ira":76,"El especialista":20,"u00bfQue paso ayer? 2":20,"El escritor oculto":80,"Red ":33,"Amor a distancia":90,"Resident Evil 4: La Resurreccion":50,"Octubre":55,"Sin limite":65,"Love Actually ":90,"El gran concierto":90,"Perdidos en Tokio":90,"Pulp Fiction ":87,"Cazador de demonios":20,"Loco y estu00fapido amor":65,"Amor por contrato":66,"Contracorriente":90,"Actividad paranormal 2":20,"La Vigilia":15,"El rey leon":90,"Transformers: El lado oscuro de la luna":20,"Cuando Harry conocio a Sally":95,"Machete":90,"Avatar":80,"Soy el nu00famero cuatro":90,"Toy Story 3 ":99,"El discurso del rey ":90,"Noches de encanto":90,"Enredados ":55,"Red Social":80,"La otra familia ":70,"Tesis ":90,"Harry Potter and the Deathly Hallows":76,"El planeta de los simios: Revolucion":80,"Biutiful ":20,"Harry Potter y las Reliquias de la muerte: Parte II":60,"Las Cronicas de Narnia: La travesia del viajero del alba":63,"El regreso de la nana magica":20},"Angel Velasquez":{"Los indestructibles":95},"Rafael Lanfranco":{"El rey leon":20,"Enter the Dragon ":78,"Un hombre solitario":20,"Desconocido ":20,"La historia sin fin":90,"Pi: Fe en el caos":66,"Harry Potter and the Deathly Hallows":86,"Dark City ":89,"Juego de Traiciones":78,"X-Men: Primera generacion":89,"La Aldea":20,"Agora ":90,"Comer, rezar, amar":20,"La duda":89,"Invasion del Mundo: Batalla Los Angeles":86,"El nuevo entrenador":90,"Marea Roja":86,"Sniper":77,"Conoceras al hombre de tus suenos":90,"El peleador":54,"Enterrado":50,"Mary and Max ":90,"Gran Torino ":86,"Senales":20,"Real Steel ":10,"Breakin' ":94,"Tesis ":66,"Red ":20,"El secreto de sus ojos":75,"El Senor De Los Anillos: Las Dos Torres":50,"Cowboys y Aliens":82,"Mision Imposible":91,"Match Point":64,"Loco por ella":77,"La revelacion":90,"El origen":86,"El Pianista":90,"Lazos de sangre":94,"El Informante":100,"Mas alla de la vida":90,"Carancho ":20,"Harry Potter y las Reliquias de la muerte: Parte II":30,"Jumper ":20,"Apocalypse Now ":92,"Source Code ":90,"Rango ":20,"Hot Fuzz ":43,"La fuente de la vida":91,"The Doubt ":89,"Medianoche en Paris":89,"Sin limite":77,"Rapidos y Furiosos 5":90,"El discurso del rey ":81,"127 horas":85,"Following":90,"Serenity ":60,"Una propuesta atrevida":20,"Temple de Acero":81,"Piratas del Caribe: Navegando aguas misteriosas":84,"Ex, todos tenemos uno":90,"Los ilusionautas":20,"Agente Salt":90,"Star Wars: Episodio IV - Una nueva esperanza":89,"El planeta de los simios: Revolucion":89,"Kick - Ass":90,"Cyrus":90,"El Exterminador 2: El Dia del Juicio Final":87,"Invasion del mundo":82,"Mundo Surreal":92,"Kung Fu Hustle":90,"El mensajero ":90,"Red Social":94,"I saw the devil":73,"El Club de la Pelea":90,"El Senor de los Anillos: El retorno del Rey":66,"Scott Pilgrim vs The World":90,"Terminator II ":87,"Preciosa":90,"Kung fu panda 2":92,"En un rincon del corazon":20,"12 hombres molestos":55,"Thirteen Days ":11,"E.T : El Extraterrestre":90,"Harry Potter y el Prisionero de Azkaban":11,"Way of the Dragon ":90,"Los agentes del destino":75,"True Romance ":75,"Pulp Fiction ":100,"Thor":60,"8 Mile":70,"Super 8 ":90,"The Wolfman ":20,"Los imperdonables":92,"El Protegido":90,"Mi nombre es John Lennon":90,"Hable con ella ":90,"Bastardos sin gloria":80,"The Big Lebowski ":90,"social network":90,"Camino al Oscar":99,"Winnie Pooh ":50,"Beginners":90,"El Fin de Los Tiempos":20,"Celda 211 ":90,"Siempre a tu lado":70,"Tron: El Legado":20,"Que pena tu vida ":20,"Capitan America: El primer vengador":78,"El especialista":20,"Fuego Contra Fuego":78},"4D2Studio Diseno y Animacion":{"Piratas del Caribe 3: En El Fin del Mundo":90,"The Adventures of Tintin: The Secret of the Unicorn ":99,"Mundo Surreal":60,"Temple de Acero":99,"Laeon :El Profesional":90,"El secreto de sus ojos":92,"Fallen Art":96,"Kung fu panda 2":95,"Capitan America: El primer vengador":99,"Tron: El Legado":100,"Los indestructibles":9,"Rango ":20,"Megamente":20,"Thor":100,"Dias de ira":90,"Dorothy of Oz ":100,"Pandorum":65,"Los ilusionautas":20,"Siempre a tu lado":90,"El amante":90,"Cazador de demonios":75,"El u00faltimo maestro del aire":9,"Transformers: El lado oscuro de la luna":77,"X-Men: Primera generacion":90,"Los cazafantasmas":89,"Harry Potter and the Deathly Hallows":89,"Comer, rezar, amar":20,"Agora ":90,"Piratas del Caribe: Navegando aguas misteriosas":85},"Daniel Subauste":{"El rey leon":90,"La Masacre de Texas: El Origen":20,"Una loca pelicula de vampiros":20,"Calabozos y Dragones":205,"Piratas del Caribe 3: En El Fin del Mundo":20,"El Vengador":40,"Linterna Verde":60,"Luna Nueva":20,"Juan de los Muertos":100,"Harry Potter and the Deathly Hallows":70,"Avatar":60,"Dragones, destino de fuego":2,"Space Cowboys ":20,"Agora ":95,"Comer, rezar, amar":20,"X-Men: Primera generacion":90,"Piratas en el Callao":20,"Invasion del Mundo: Batalla Los Angeles":38,"El u00faltimo exorcismo":20,"Tesis ":90,"Daejame entrar":90,"Senales":20,"Red ":90,"Cowboys y Aliens":45,"Mision Imposible":55,"Megamind ":100,"Harry Potter y el Caliz de Fuego":75,"Los indestructibles":30,"La invasion":40,"La Sonrisa de Mona Lisa":90,"Pandorum":68,"Zodiaco":91,"Calabozos y Dragones 2 El Poder Mayor":80,"Transformers: El lado oscuro de la luna":60,"Corazon Valiente":90,"El cisne negro":90,"Mas alla de la vida":20,"Me enamorae en Nueva York":20,"Harry Potter y las Reliquias de la muerte: Parte II":65,"Millennium I: Los hombres que no amaban a las mujeres":60,"Un tonto en el amor":90,"Como Agua para Chocolate ":20,"Laeon :El Profesional":90,"La Naranja Mecanica":90,"Horton Hears a Who! ":90,"Sin limite":85,"Enredados ":90,"El u00faltimo maestro del aire":20,"La chica de mis suenos":90,"La Pasion de Cristo":76,"Temple de Acero":60,"Crepu00fasculo":20,"Perdidos en Tokio":99,"Piratas del Caribe: Navegando aguas misteriosas":80,"Ga'Hoole :La Leyenda De Los Guardianes":90,"Los ilusionautas":20,"Wall Street: El dinero nunca duerme":90,"El planeta de los simios: Revolucion":85,"Kick - Ass":90,"Hannibal Rising ":20,"Planet Terror ":90,"El Codigo Da Vinci":20,"Mundo Surreal":90,"Red Social":60,"Machete":99,"Kung Fu Hustle":90,"Dragones: destino de fuego ":2,"Sanctum ":80,"Scott Pilgrim vs The World":75,"Seven":90,"Triste San Valentin":20,"Megamente":100,"u00bfComo saber si es amor?":20,"Kung fu panda 2":90,"El u00faltimo guerrero Chanka":100,"Thor":60,"Apocalypto ":20,"The Kids Are All Right ":90,"Super 8 ":78,"El Protegido":20,"TRON ":60,"El avispon verde":20,"Los pitufos":75,"El juego del miedo VII 3D":20,"Mongol, el emperador":90,"Tron: El Legado":85,"El Fin de Los Tiempos":20,"The Runaways ":40,"Capitan America: El primer vengador":90,"Millennium I - Los hombres que no amaban a las mujeres":60},"Laura Vanessa M":{"El secreto de sus ojos":99,"Wall Street: El dinero nunca duerme":70,"Atraccion peligrosa":44,"El escritor oculto":86,"La Vigilia":90,"Noches de encanto":90},"Veronica Zapata Gotelli":{"Sin lugar Para los Daebiles":90,"Mundo Surreal":20,"Temple de Acero":70,"Cartas a Julieta":55,"Carancho ":90,"Mary and Max ":90,"The Kids Are All Right ":70,"Wall Street: El dinero nunca duerme":66,"La Naranja Mecanica":90,"Rio":90,"Perros de Reserva":90,"Sin City ":90,"El Truco Final":90,"Lazos de sangre":60,"El cisne negro":90,"La cinta blanca":40,"Los indestructibles":75,"Fargo ":50,"Traffic ":90,"LadyKillers":60,"El juego ":90,"Seven":90,"Psicosis":90,"Crueldad Intolerable":79,"300 ":92,"El escritor oculto":90,"El peleador":80,"Octubre":90,"Al otro lado del corazon":92,"El Resplandor":99,"Source Code ":70,"Love and Other Impossible Pursuits ":90,"Gran Torino ":96,"The King's Speech":90,"La vida de los peces ":77,"Incendies ":91,"X-Men: Primera generacion":90,"El Hombre que Nunca Estuvo Alli":90,"La chica de la capa roja":20,"Batman Begins ":90,"Terciopelo Azul ":55,"Cuando Harry conocio a Sally":90,"Triste San Valentin":90,"Buenas Noches y Buena Suerte":82,"Un Hombre Serio":90,"Soy el nu00famero cuatro":20,"The Big Lebowski ":78,"Noches de encanto":20,"Red Social":90,"El discurso del rey ":93,"Ciudadano Kane ":90,"Quaemese despuaes de Leer":90,"Pase libre":90,"Agua para elefantes":20,"Rain Man ":90,"Conoceras al hombre de tus suenos":90,"En un rincon del corazon":20,"Dinner for Schmucks ":20,"Vaertigo":90,"Un cuento chino ":20,"Batman :El Caballero Oscuro":90,"Bastardos sin gloria":90,"Belleza Americana":90,"Una esposa de mentira":20,"Biutiful ":85},"Guillermo Pereyra":{"Paris en la mira":70,"Una loca pelicula de vampiros":81}}
|
File without changes
|
@@ -0,0 +1,27 @@
|
|
1
|
+
require 'minitest/autorun'
|
2
|
+
require_relative '../lib/suggestor'
|
3
|
+
|
4
|
+
describe Suggestor::Algorithms::PearsonCorrelation do
|
5
|
+
|
6
|
+
before do
|
7
|
+
data_string = File.read("test/numbers.json")
|
8
|
+
data = JSON.parse(data_string)
|
9
|
+
@algorithm = Suggestor::Algorithms::PearsonCorrelation.new(data)
|
10
|
+
end
|
11
|
+
|
12
|
+
describe "when building up recommendations" do
|
13
|
+
|
14
|
+
it "must return a list of shared items between two people" do
|
15
|
+
@algorithm.shared_items(1,2).must_be :==, ["1","2"]
|
16
|
+
end
|
17
|
+
|
18
|
+
it "must return 1 as similarity record if two elements have equal related values" do
|
19
|
+
@algorithm.similarity_score(1,1).must_be :==, 1
|
20
|
+
end
|
21
|
+
|
22
|
+
it "must return -1 as similarity record if two elements are totally distant" do
|
23
|
+
@algorithm.similarity_score(1,99).must_be :==, -1
|
24
|
+
end
|
25
|
+
|
26
|
+
end
|
27
|
+
end
|
data/test/suggestor_test.rb
CHANGED
@@ -3,46 +3,41 @@ require_relative '../lib/suggestor'
|
|
3
3
|
|
4
4
|
describe Suggestor::Engine do
|
5
5
|
before do
|
6
|
-
@
|
7
|
-
@data_string = File.read("test/test.json")
|
6
|
+
@data_string = File.read("test/numbers.json")
|
8
7
|
end
|
9
8
|
|
10
9
|
describe "when loading up the data structure" do
|
11
10
|
it "must raise an exception with invalid data" do
|
12
|
-
lambda{
|
11
|
+
lambda{ Suggestor::Engine.new("GIBBERISH") }.must_raise Suggestor::WrongInputFormat
|
13
12
|
end
|
14
|
-
|
15
|
-
it "must return an array structure if data is ok" do
|
16
|
-
@suggestor.load_data(@data_string).must_be_instance_of Hash
|
17
|
-
end
|
18
|
-
|
19
13
|
end
|
20
14
|
|
21
15
|
describe "when accesing the data after load_dataing it" do
|
22
16
|
|
23
17
|
before do
|
24
|
-
@suggestor.
|
25
|
-
end
|
26
|
-
|
27
|
-
it "must return a similarty score between to elements" do
|
28
|
-
@suggestor.similarity_score_for("1","1").must_be :==, 1
|
18
|
+
@suggestor = Suggestor::Engine.new(@data_string)
|
29
19
|
end
|
30
20
|
|
31
21
|
it "must return similar items from the base one with euclidean distance" do
|
32
|
-
expected =
|
33
|
-
@suggestor.
|
22
|
+
expected = [["3", 0.14285714285714285], ["2", 0.14285714285714285]]
|
23
|
+
@suggestor.similar_to("1").must_be :==, expected
|
34
24
|
end
|
35
25
|
|
36
26
|
it "must return similar items from the base one with pearson correlation" do
|
37
|
-
|
38
|
-
|
27
|
+
@suggestor = Suggestor::Engine.new(@data_string,Suggestor::Algorithms::PearsonCorrelation)
|
28
|
+
expected = [["2", 0.0], ["1", 0.0]]
|
29
|
+
@suggestor.similar_to("3").must_be :==, expected
|
39
30
|
end
|
40
31
|
|
41
32
|
it "must return similar items from the base one with euclidean distance" do
|
42
|
-
expected =
|
43
|
-
@suggestor.
|
33
|
+
expected = [["4", 2.6457513110645903]]
|
34
|
+
@suggestor.recommended_to("2").must_be :==, expected
|
44
35
|
end
|
45
36
|
|
46
|
-
|
37
|
+
it "must return similar related items from one of them" do
|
38
|
+
expected = [["5", 0.3333333333333333], ["3", 0.25], ["1", 0.12389934309929541], ["4", 0.0]]
|
39
|
+
@suggestor.similar_related_to("2").must_be :==, expected
|
40
|
+
end
|
47
41
|
|
42
|
+
end
|
48
43
|
end
|
metadata
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
name: suggestor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
4
|
prerelease:
|
5
|
-
version: 0.0.
|
5
|
+
version: 0.0.6
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Alvaro Pereyra
|
@@ -10,10 +10,19 @@ autorequire:
|
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
12
|
|
13
|
-
date: 2011-09-
|
14
|
-
|
15
|
-
|
16
|
-
|
13
|
+
date: 2011-09-24 00:00:00 Z
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: rake
|
17
|
+
prerelease: false
|
18
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
19
|
+
none: false
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0"
|
24
|
+
type: :runtime
|
25
|
+
version_requirements: *id001
|
17
26
|
description: Suggestor allows you to get suggestions of related items in your data
|
18
27
|
email:
|
19
28
|
- alvaro@xendacentral.com
|
@@ -28,20 +37,19 @@ files:
|
|
28
37
|
- Gemfile
|
29
38
|
- README.md
|
30
39
|
- Rakefile
|
31
|
-
-
|
40
|
+
- examples/playing_around.rb
|
32
41
|
- lib/suggestor.rb
|
33
42
|
- lib/suggestor/algorithms/euclidean_distance.rb
|
34
43
|
- lib/suggestor/algorithms/pearson_correlation.rb
|
35
44
|
- lib/suggestor/algorithms/recommendation_algorithm.rb
|
36
|
-
- lib/suggestor/datum.rb
|
37
45
|
- lib/suggestor/engine.rb
|
38
46
|
- lib/suggestor/version.rb
|
39
47
|
- suggestor.gemspec
|
40
48
|
- test/euclidean_test.rb
|
41
|
-
- test/
|
49
|
+
- test/movies.json
|
50
|
+
- test/numbers.json
|
51
|
+
- test/pearson_correlation.rb
|
42
52
|
- test/suggestor_test.rb
|
43
|
-
- test/test.json
|
44
|
-
has_rdoc: true
|
45
53
|
homepage: ""
|
46
54
|
licenses: []
|
47
55
|
|
@@ -65,12 +73,13 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
65
73
|
requirements: []
|
66
74
|
|
67
75
|
rubyforge_project: suggestor
|
68
|
-
rubygems_version: 1.
|
76
|
+
rubygems_version: 1.8.10
|
69
77
|
signing_key:
|
70
78
|
specification_version: 3
|
71
79
|
summary: Suggestor allows you to get suggestions of related items in your data
|
72
80
|
test_files:
|
73
81
|
- test/euclidean_test.rb
|
74
|
-
- test/
|
82
|
+
- test/movies.json
|
83
|
+
- test/numbers.json
|
84
|
+
- test/pearson_correlation.rb
|
75
85
|
- test/suggestor_test.rb
|
76
|
-
- test/test.json
|
data/demos/playing_around.rb
DELETED
@@ -1,16 +0,0 @@
|
|
1
|
-
require_relative '../lib/suggestor'
|
2
|
-
|
3
|
-
engine = Suggestor::Engine.new
|
4
|
-
|
5
|
-
# I'm using test data of Users and their movie recommendations
|
6
|
-
# Each user (identified by their ids) have a hash of their movies ids and
|
7
|
-
# what they've rate them with
|
8
|
-
json = File.read("test/test.json")
|
9
|
-
|
10
|
-
engine.load_data(json)
|
11
|
-
|
12
|
-
# Let's get some similar users
|
13
|
-
puts engine.similar_items_to("2").inspect
|
14
|
-
|
15
|
-
# So, after knowing them, why not having some recommendations?
|
16
|
-
puts engine.recommented_related_items_for("2", algorithm: :euclidean_distance)
|
data/lib/suggestor/datum.rb
DELETED
data/test/pearon_correlation.rb
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
require 'minitest/autorun'
|
2
|
-
require_relative '../lib/suggestor/algorithms/pearson_correlation'
|
3
|
-
require_relative '../lib/suggestor/engine'
|
4
|
-
|
5
|
-
describe Suggestor::Algorithms::PearsonCorrelation do
|
6
|
-
|
7
|
-
before do
|
8
|
-
@data_string = File.read("test/test.json")
|
9
|
-
@suggestor = Suggestor::Engine.new
|
10
|
-
@suggestor.load_data(@data_string)
|
11
|
-
@algorithm = Suggestor::Algorithms::PearsonCorrelation.new(@suggestor.collection)
|
12
|
-
end
|
13
|
-
|
14
|
-
describe "when building up recommendations" do
|
15
|
-
|
16
|
-
it "must return a list of shared items between two people" do
|
17
|
-
@algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
|
18
|
-
end
|
19
|
-
|
20
|
-
it "must return 0 as similarity record if two elements hace no shared items" do
|
21
|
-
@algorithm.similarity_score_between(1,4).must_be :==, 0
|
22
|
-
end
|
23
|
-
|
24
|
-
it "must return 1 as similarity record if two elements have equal related values" do
|
25
|
-
@algorithm.similarity_score_between(1,1).must_be :==, 1
|
26
|
-
end
|
27
|
-
|
28
|
-
it "must return -1 as similarity record if two elements are totally distant" do
|
29
|
-
@algorithm.similarity_score_between(1,99).must_be :==, 0
|
30
|
-
end
|
31
|
-
|
32
|
-
|
33
|
-
end
|
34
|
-
end
|