suggestor 0.0.3 → 0.0.6
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +32 -12
- data/examples/playing_around.rb +42 -0
- data/lib/suggestor.rb +3 -1
- data/lib/suggestor/algorithms/euclidean_distance.rb +9 -10
- data/lib/suggestor/algorithms/pearson_correlation.rb +50 -44
- data/lib/suggestor/algorithms/recommendation_algorithm.rb +94 -47
- data/lib/suggestor/engine.rb +10 -37
- data/lib/suggestor/version.rb +1 -1
- data/suggestor.gemspec +1 -0
- data/test/euclidean_test.rb +8 -10
- data/test/movies.json +1 -0
- data/test/{test.json → numbers.json} +0 -0
- data/test/pearson_correlation.rb +27 -0
- data/test/suggestor_test.rb +15 -20
- metadata +22 -13
- data/demos/playing_around.rb +0 -16
- data/lib/suggestor/datum.rb +0 -13
- data/test/pearon_correlation.rb +0 -34
data/README.md
CHANGED
@@ -10,16 +10,15 @@ tastes) and alike.
|
|
10
10
|
|
11
11
|
The gem needs an structure of date like this:
|
12
12
|
|
13
|
-
data = {"
|
13
|
+
data = '{"Alvaro Pereyra Rabanal": {"Primer": 10, "Memento": 9}, "Gustavo Leon": {"The Matrix":8, "Harry Potter": 8}}'
|
14
14
|
|
15
|
-
Each element will
|
15
|
+
Each element will correspond to, following the example, users. They will gave access to related items (reviews for movies).
|
16
16
|
|
17
|
-
In the example, the user "
|
17
|
+
In the example, the user "Alvaro Pereyra Rabanal" has seen movies "Primer" and "Memento", given them a rating of 10 and 9, respectively. Similar with user with "Gustavo Leon".
|
18
18
|
|
19
19
|
After loading the gem with the data:
|
20
20
|
|
21
|
-
engine = Suggestor::Engine.new
|
22
|
-
engine.load_data(data)
|
21
|
+
engine = Suggestor::Engine.new(data)
|
23
22
|
|
24
23
|
We can start to get some results.
|
25
24
|
|
@@ -28,18 +27,29 @@ We can start to get some results.
|
|
28
27
|
|
29
28
|
For example, we can get similar users:
|
30
29
|
|
31
|
-
engine.
|
30
|
+
engine.similar_to("Alvaro Pereyra Rabanal")
|
32
31
|
|
33
32
|
Which will return an structure like
|
34
33
|
|
35
|
-
|
34
|
+
[["label", similarity_score], ["label": similarity_score]]
|
35
|
+
|
36
|
+
Like:
|
37
|
+
|
38
|
+
[["Eogen Clase", 0.0001649620587264929], ["Daniel Subauste", 0.00011641443538998836], ["4D2Studio Diseno y Animacion", 8.548469823901521e-05], ["Rafael Lanfranco", 6.177033788374823e-05], ["Veronica Zapata Gotelli", 6.074965068950854e-05]]
|
36
39
|
|
37
40
|
Thus, you can load the data and save their similarity scores for later use.
|
38
41
|
|
42
|
+
You can limit the data passing a "size" argument:
|
43
|
+
|
44
|
+
engine.similar_to("Alvaro Pereyra Rabanal", :size => 5)
|
45
|
+
|
39
46
|
Now, that fine and all, but what about Mr. Bob who always is ranking everything
|
40
47
|
higher. ID4 maybe is not that good after all. If that happens, Suggestor allows you to change the algorithm used:
|
41
48
|
|
42
|
-
|
49
|
+
algorithm = Suggestor::Algorithms::PearsonCorrelation
|
50
|
+
engine = Suggestor::Engine.new(data, algorithm)
|
51
|
+
|
52
|
+
engine.recommended_to("Alvaro Pereyra Rabanal")
|
43
53
|
|
44
54
|
There are two implemented methods, Euclidean Distance and Pearson Correlation.
|
45
55
|
|
@@ -54,11 +64,21 @@ take in mind if some user grades higher or lower and return more exact suggestio
|
|
54
64
|
Most interestingly, the gem allows you to get suggestions base on the data.
|
55
65
|
For example, which movies shoud user "2" watch based on his reviews, and similar other users tastes?
|
56
66
|
|
57
|
-
engine.
|
67
|
+
engine.recommended_to("Alvaro Pereyra Rabanal")
|
58
68
|
|
59
69
|
As before, the structure returned will be
|
60
70
|
|
61
|
-
|
71
|
+
[["label", similarity_score], ["label": similarity_score]]
|
72
|
+
|
73
|
+
But in this case, it will represent movie labels, and how similar they are. You
|
74
|
+
can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
|
75
|
+
|
76
|
+
### Similar related items
|
77
|
+
|
78
|
+
We can also invert the data that the user has added, enableing us to get
|
79
|
+
similar related items. For example, let's say I'm on a Movie profile and
|
80
|
+
want to check which other movies are similar to it:
|
81
|
+
|
82
|
+
engine.similar_related_to("Batman Begins ", :size => 5)
|
62
83
|
|
63
|
-
|
64
|
-
can easily use this data to save it to a BD, since Movie ratings tend to estabilize on time and won't change that often.
|
84
|
+
Now you can go and build your awesome recommendations web site :)
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require_relative '../lib/suggestor'
|
2
|
+
|
3
|
+
# I'm using test data of Users and their movie recommendations
|
4
|
+
# Each user have a hash of their reviews with the movie and
|
5
|
+
# what they've rate them with
|
6
|
+
json = File.read("test/movies.json")
|
7
|
+
engine = Suggestor::Engine.new(json, Suggestor::Algorithms::EuclideanDistance)
|
8
|
+
|
9
|
+
# Let's get some similar users
|
10
|
+
name = "Alvaro Pereyra Rabanal"
|
11
|
+
puts "Who is similar to #{name}"
|
12
|
+
puts engine.similar_to(name, size: 5).inspect
|
13
|
+
|
14
|
+
puts
|
15
|
+
puts
|
16
|
+
|
17
|
+
# So, after knowing them, why not having some recommendations?
|
18
|
+
puts "Interesting! But I want to see some stuff at the movies, what to watch?"
|
19
|
+
opts = {size: 5}
|
20
|
+
results = engine.recommended_to("Alvaro Pereyra Rabanal", opts)
|
21
|
+
|
22
|
+
puts results.inspect
|
23
|
+
|
24
|
+
puts
|
25
|
+
puts
|
26
|
+
|
27
|
+
# That's good, but let's take in mind bias while using Pearson Correlation:
|
28
|
+
puts "Adjust this results please"
|
29
|
+
engine = Suggestor::Engine.new(json,Suggestor::Algorithms::PearsonCorrelation)
|
30
|
+
|
31
|
+
ops = {size: 5}
|
32
|
+
results = engine.recommended_to("Alvaro Pereyra Rabanal", opts)
|
33
|
+
puts results.inspect
|
34
|
+
|
35
|
+
puts
|
36
|
+
puts
|
37
|
+
|
38
|
+
name = "Batman Begins "
|
39
|
+
puts "Now that was nice. But which others are similar to '#{name}'"
|
40
|
+
ops = {size: 10}
|
41
|
+
results = engine.similar_related_to(name, opts)
|
42
|
+
puts results.inspect
|
data/lib/suggestor.rb
CHANGED
@@ -1,5 +1,3 @@
|
|
1
|
-
require_relative 'recommendation_algorithm'
|
2
|
-
|
3
1
|
module Suggestor
|
4
2
|
module Algorithms
|
5
3
|
|
@@ -24,21 +22,22 @@ module Suggestor
|
|
24
22
|
|
25
23
|
include RecommendationAlgorithm
|
26
24
|
|
27
|
-
def
|
28
|
-
return 0.0 if
|
29
|
-
|
25
|
+
def similarity_score(first, second)
|
26
|
+
return 0.0 if nothing_shared?(first, second)
|
27
|
+
inverse_of_squares(first, second)
|
30
28
|
end
|
31
29
|
|
32
|
-
def
|
33
|
-
1/(1+
|
30
|
+
def inverse_of_squares(first, second)
|
31
|
+
1/(1+Math.sqrt(sum_squares(first, second)))
|
34
32
|
end
|
35
33
|
|
36
|
-
def
|
37
|
-
|
38
|
-
sum + (values_for(first)[item] - values_for(second)[item])**2
|
34
|
+
def sum_squares(first, second)
|
35
|
+
shared_items(first, second).inject(0.0) do |sum, item|
|
36
|
+
sum + ( values_for(first)[item] - values_for(second)[item] ) ** 2
|
39
37
|
end
|
40
38
|
end
|
41
39
|
|
42
40
|
end
|
41
|
+
|
43
42
|
end
|
44
43
|
end
|
@@ -1,5 +1,3 @@
|
|
1
|
-
require_relative 'recommendation_algorithm'
|
2
|
-
|
3
1
|
module Suggestor
|
4
2
|
module Algorithms
|
5
3
|
|
@@ -18,86 +16,94 @@ module Suggestor
|
|
18
16
|
# the closest distance to all of them. If the two users have the same
|
19
17
|
# ratings, it would show as a perfect diagonal (score of 1)
|
20
18
|
|
21
|
-
# The closest the movies to the line are, the more similar their tastes
|
19
|
+
# The closest the movies to the line are, the more similar their tastes
|
20
|
+
# are.
|
22
21
|
|
23
22
|
# The great thing about using Pearson Correlation is that it works with
|
24
23
|
# bias to valuating the results. Thus, a user that always rates movies
|
25
24
|
# with great scores won't impact and mess up the results.
|
26
25
|
|
27
|
-
# It's probably a best fit for subjetive reviews (movies reviews, profile
|
26
|
+
# It's probably a best fit for subjetive reviews (movies reviews, profile
|
27
|
+
# points, etc).
|
28
28
|
|
29
|
-
# More info at:
|
29
|
+
# More info at:
|
30
|
+
# http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
|
30
31
|
|
31
32
|
class PearsonCorrelation
|
32
33
|
|
33
34
|
include RecommendationAlgorithm
|
34
35
|
|
35
|
-
def
|
36
|
-
return
|
36
|
+
def similarity_score(first, second)
|
37
|
+
return -1.0 if nothing_shared?(first, second)
|
37
38
|
|
38
|
-
|
39
|
-
numerator = difference_from_total_and_normalize_values
|
40
|
-
# 10.5 / 0.0 /
|
41
|
-
denominator = square_root_from_differences_of_sums
|
39
|
+
process_values(first, second)
|
42
40
|
|
43
|
-
|
41
|
+
numerator = difference_from_values
|
42
|
+
denominator = square_root_from_differences
|
44
43
|
|
44
|
+
return 0.0 if denominator == 0
|
45
45
|
numerator / denominator
|
46
|
-
|
47
46
|
end
|
48
47
|
|
49
48
|
private
|
50
49
|
|
51
|
-
def
|
52
|
-
|
53
|
-
|
54
|
-
@total_related_items = shared_items.size
|
50
|
+
def process_values(first, second)
|
51
|
+
items = shared_items(first, second)
|
52
|
+
@total_related_items = items.size.to_f
|
55
53
|
|
56
|
-
|
57
|
-
first_values = values_for(first)
|
54
|
+
first_values = values_for(first)
|
58
55
|
second_values = values_for(second)
|
59
56
|
|
60
|
-
|
61
|
-
@second_square_values_sum = @products_sum = 0.0
|
57
|
+
create_helper_variables
|
62
58
|
|
63
|
-
|
59
|
+
items.each do |item|
|
64
60
|
|
65
|
-
|
66
|
-
# For ex., the rating of the same movie by different users
|
67
|
-
first_value = first_values[item]
|
61
|
+
first_value = first_values[item]
|
68
62
|
second_value = second_values[item]
|
69
63
|
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
@first_values_sum += first_value
|
74
|
-
@second_values_sum += second_value
|
75
|
-
|
76
|
-
# Adds the squares of both elements
|
77
|
-
@first_square_values_sum += first_value ** 2
|
78
|
-
@second_square_values_sum += second_value ** 2
|
64
|
+
append_values(first_value, second_value)
|
65
|
+
append_squares(first_value, second_value)
|
66
|
+
append_product(first_value, second_value)
|
79
67
|
|
80
|
-
# Adds the product of both values
|
81
|
-
@products_sum += first_value*second_value
|
82
68
|
end
|
69
|
+
end
|
83
70
|
|
71
|
+
def append_values(first_value, second_value)
|
72
|
+
@first_values_sum += first_value
|
73
|
+
@second_values_sum += second_value
|
84
74
|
end
|
85
75
|
|
86
|
-
def
|
76
|
+
def append_squares(first_value, second_value)
|
77
|
+
@first_square_values_sum += ( first_value ** 2 )
|
78
|
+
@second_square_values_sum += ( second_value ** 2 )
|
79
|
+
end
|
80
|
+
|
81
|
+
def append_product(first_value, second_value)
|
82
|
+
@products_sum += first_value * second_value
|
83
|
+
end
|
84
|
+
|
85
|
+
def difference_from_values
|
87
86
|
product = @first_values_sum * @second_values_sum
|
88
87
|
normalized = product / @total_related_items
|
89
88
|
@products_sum - normalized
|
90
89
|
end
|
91
90
|
|
92
|
-
def
|
93
|
-
|
94
|
-
|
95
|
-
|
91
|
+
def square_root_from_differences
|
92
|
+
power_left_result = ( @first_values_sum ** 2 ) / @total_related_items
|
93
|
+
equation_left = @first_square_values_sum - power_left_result
|
94
|
+
|
95
|
+
power_right_result = ( @second_values_sum ** 2 )/ @total_related_items
|
96
|
+
equation_right = @second_square_values_sum - power_right_result
|
96
97
|
|
97
|
-
|
98
|
-
|
99
|
-
Math.sqrt(equation_left * equation_right)
|
98
|
+
Math.sqrt( equation_left * equation_right )
|
99
|
+
end
|
100
100
|
|
101
|
+
def create_helper_variables
|
102
|
+
@first_values_sum = 0.0
|
103
|
+
@second_values_sum = 0.0
|
104
|
+
@first_square_values_sum = 0.0
|
105
|
+
@second_square_values_sum = 0.0
|
106
|
+
@products_sum = 0.0
|
101
107
|
end
|
102
108
|
|
103
109
|
end
|
@@ -8,50 +8,86 @@ module Suggestor
|
|
8
8
|
@collection = collection
|
9
9
|
end
|
10
10
|
|
11
|
-
#
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
compare_to.delete(main)
|
18
|
-
|
19
|
-
# return results based on their score
|
20
|
-
compare_to.keys.inject({}) do |result, other|
|
21
|
-
result.merge!({other => similarity_score_between(main,other)})
|
22
|
-
end
|
11
|
+
# Ex. Similar users based on their movies reviews
|
12
|
+
def similar_to(main, opts={})
|
13
|
+
opts.merge!(default_options)
|
14
|
+
|
15
|
+
collection = remove_self(main)
|
16
|
+
results = order_by_similarity_score(main,collection)
|
23
17
|
|
18
|
+
sort_results(results,opts[:size])
|
24
19
|
end
|
25
20
|
|
26
|
-
#
|
27
|
-
|
28
|
-
|
29
|
-
# and how it compares with others
|
30
|
-
def recommented_related_items_for(main)
|
21
|
+
# Ex. a user will get movie recommendations
|
22
|
+
def recommended_to(main, opts={})
|
23
|
+
opts.merge!(default_options)
|
31
24
|
|
32
25
|
@similarities = @totals = Hash.new(0)
|
33
|
-
@main = main
|
34
26
|
|
35
|
-
create_similarities_totals
|
36
|
-
generate_rankings
|
27
|
+
create_similarities_totals(main)
|
28
|
+
results = generate_rankings
|
37
29
|
|
30
|
+
sort_results(results,opts[:size])
|
38
31
|
end
|
39
32
|
|
40
|
-
|
41
|
-
|
33
|
+
# Ex. what other movies are related to a given one
|
34
|
+
def similar_related_to(main, opts={})
|
35
|
+
opts.merge!(default_options)
|
36
|
+
|
37
|
+
collection = invert_collection
|
38
|
+
engine = self.class.new(collection)
|
39
|
+
|
40
|
+
engine.similar_to(main,opts)
|
42
41
|
end
|
43
42
|
|
44
|
-
def
|
45
|
-
return [] unless values_for(first) && values_for(second)
|
43
|
+
def shared_items(first, second)
|
44
|
+
return [] unless values_for(first) && values_for(second)
|
45
|
+
|
46
46
|
related_keys_for(first).select do |item|
|
47
47
|
related_keys_for(second).include? item
|
48
48
|
end
|
49
|
-
end
|
49
|
+
end
|
50
50
|
|
51
51
|
private
|
52
52
|
|
53
|
-
def
|
54
|
-
|
53
|
+
def default_options
|
54
|
+
{size: 5}
|
55
|
+
end
|
56
|
+
|
57
|
+
def nothing_shared?(first, second)
|
58
|
+
shared_items(first, second).empty?
|
59
|
+
end
|
60
|
+
|
61
|
+
def remove_self(main)
|
62
|
+
cleaned = collection.dup
|
63
|
+
cleaned.delete(main)
|
64
|
+
cleaned
|
65
|
+
end
|
66
|
+
|
67
|
+
|
68
|
+
# changes { "Cat": {"1": 10, "2":20}, "Dog": {"1":5, "2": 15} }
|
69
|
+
# to {"1": {"Cat": 10, "Dog": 5}, "2": {"Cat": 20, "Dog": 15}
|
70
|
+
def invert_collection
|
71
|
+
results = {}
|
72
|
+
|
73
|
+
collection.keys.each do |main|
|
74
|
+
collection[main].keys.each do |item|
|
75
|
+
results[item] ||= {}
|
76
|
+
results[item][main] = collection[main][item]
|
77
|
+
end
|
78
|
+
end
|
79
|
+
|
80
|
+
results
|
81
|
+
end
|
82
|
+
|
83
|
+
def order_by_similarity_score(main,collection)
|
84
|
+
result = collection.keys.inject({}) do |res, other|
|
85
|
+
res.merge!({other => similarity_score(main, other)})
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
def already_has?(main, related)
|
90
|
+
collection[main].has_key?(related)
|
55
91
|
end
|
56
92
|
|
57
93
|
def values_for(id)
|
@@ -62,47 +98,58 @@ module Suggestor
|
|
62
98
|
values_for(id).keys
|
63
99
|
end
|
64
100
|
|
65
|
-
def add_to_totals(other,item,score)
|
66
|
-
@totals[item]
|
101
|
+
def add_to_totals(other, item, score)
|
102
|
+
@totals[item] += collection[other][item]*score
|
67
103
|
@similarities[item] += score
|
68
104
|
end
|
69
105
|
|
70
|
-
def
|
71
|
-
|
106
|
+
def sort_results(results,size=-1)
|
107
|
+
sorted = results.sort{|a,b| a[1] <=> b[1]}.reverse
|
108
|
+
sorted[0, size]
|
109
|
+
end
|
72
110
|
|
111
|
+
def generate_rankings
|
112
|
+
rankings = {}
|
113
|
+
|
73
114
|
@totals.each_pair do |item, total|
|
74
|
-
normalized_value = (total / @similarities[item])
|
75
|
-
|
115
|
+
normalized_value = (total / Math.sqrt(@similarities[item]))
|
116
|
+
rankings.merge!( { item => normalized_value} )
|
76
117
|
end
|
77
118
|
|
78
|
-
|
119
|
+
rankings
|
120
|
+
end
|
121
|
+
|
122
|
+
def something_in_common?(score)
|
123
|
+
score > 0
|
124
|
+
end
|
79
125
|
|
126
|
+
def same_item?(main, other)
|
127
|
+
other == main
|
80
128
|
end
|
81
129
|
|
82
|
-
def create_similarities_totals
|
130
|
+
def create_similarities_totals(main)
|
83
131
|
|
84
132
|
collection.keys.each do |other|
|
85
133
|
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
next
|
91
|
-
|
92
|
-
# will compare each the results but only for related items
|
93
|
-
# that the main item doesn't already have
|
94
|
-
# For ex., if they have already saw a movie they won't
|
95
|
-
# get it suggested
|
134
|
+
next if same_item?(main,other)
|
135
|
+
|
136
|
+
score = similarity_score(main, other)
|
137
|
+
|
138
|
+
next unless something_in_common?(score)
|
139
|
+
|
96
140
|
collection[other].keys.each do |item|
|
97
141
|
|
98
|
-
unless
|
99
|
-
add_to_totals(other,item,score)
|
142
|
+
unless already_has?(main, item)
|
143
|
+
add_to_totals(other, item, score)
|
100
144
|
end
|
101
145
|
|
102
146
|
end
|
147
|
+
|
103
148
|
end
|
149
|
+
|
104
150
|
end
|
105
151
|
|
152
|
+
|
106
153
|
end
|
107
154
|
end
|
108
155
|
end
|
data/lib/suggestor/engine.rb
CHANGED
@@ -1,6 +1,4 @@
|
|
1
1
|
require 'json'
|
2
|
-
require_relative 'algorithms/euclidean_distance'
|
3
|
-
require_relative 'algorithms/pearson_correlation'
|
4
2
|
|
5
3
|
module Suggestor
|
6
4
|
|
@@ -8,50 +6,25 @@ module Suggestor
|
|
8
6
|
|
9
7
|
class Engine
|
10
8
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
@collection = {}
|
15
|
-
end
|
16
|
-
|
17
|
-
def load_data(input)
|
18
|
-
add_to_collection(input)
|
9
|
+
def initialize(input, algorithm = Algorithms::EuclideanDistance)
|
10
|
+
@collection = parse_from_json(input)
|
11
|
+
@algorithm = algorithm.new(@collection)
|
19
12
|
end
|
20
|
-
|
21
|
-
def
|
22
|
-
|
23
|
-
strategy_for(opts[:algorithm]).similarity_score_between(first, second)
|
13
|
+
|
14
|
+
def similar_to(item, opts={})
|
15
|
+
@algorithm.similar_to(item, opts)
|
24
16
|
end
|
25
17
|
|
26
|
-
def
|
27
|
-
|
28
|
-
strategy_for(opts[:algorithm]).similar_items_to(item)
|
18
|
+
def recommended_to(item, opts={})
|
19
|
+
@algorithm.recommended_to(item, opts)
|
29
20
|
end
|
30
21
|
|
31
|
-
def
|
32
|
-
|
33
|
-
strategy_for(opts[:algorithm]).recommented_related_items_for(item)
|
22
|
+
def similar_related_to(item, opts={})
|
23
|
+
@algorithm.similar_related_to(item, opts)
|
34
24
|
end
|
35
25
|
|
36
26
|
private
|
37
|
-
|
38
|
-
def strategy_for(algorithm)
|
39
|
-
constantize(classify(algorithm)).new(collection)
|
40
|
-
end
|
41
|
-
|
42
|
-
# based on Rail's code
|
43
|
-
def classify(name)
|
44
|
-
name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
|
45
|
-
end
|
46
|
-
|
47
|
-
def constantize(name)
|
48
|
-
Suggestor::Algorithms.const_get(name)
|
49
|
-
end
|
50
27
|
|
51
|
-
def add_to_collection(input)
|
52
|
-
@collection.merge! parse_from_json(input)
|
53
|
-
end
|
54
|
-
|
55
28
|
def parse_from_json(json)
|
56
29
|
JSON.parse(json)
|
57
30
|
rescue Exception => ex
|
data/lib/suggestor/version.rb
CHANGED
data/suggestor.gemspec
CHANGED
data/test/euclidean_test.rb
CHANGED
@@ -1,29 +1,27 @@
|
|
1
1
|
require 'minitest/autorun'
|
2
|
-
|
3
|
-
require_relative '../lib/suggestor
|
2
|
+
require 'json'
|
3
|
+
require_relative '../lib/suggestor'
|
4
4
|
|
5
5
|
describe Suggestor::Algorithms::EuclideanDistance do
|
6
6
|
|
7
7
|
before do
|
8
|
-
|
9
|
-
|
10
|
-
@
|
11
|
-
@algorithm = Suggestor::Algorithms::EuclideanDistance.new(@suggestor.collection)
|
8
|
+
data_string = File.read("test/numbers.json")
|
9
|
+
data = JSON.parse(data_string)
|
10
|
+
@algorithm = Suggestor::Algorithms::EuclideanDistance.new(data)
|
12
11
|
end
|
13
12
|
|
14
13
|
describe "when building up recommendations" do
|
15
14
|
|
16
15
|
it "must return a list of shared items between two people" do
|
17
|
-
@algorithm.
|
16
|
+
@algorithm.shared_items(1,2).must_be :==, ["1","2"]
|
18
17
|
end
|
19
18
|
|
20
19
|
it "must return 0 as similarity record if two elements hace no shared items" do
|
21
|
-
@algorithm.
|
20
|
+
@algorithm.similarity_score(1,99).must_be :==, 0
|
22
21
|
end
|
23
22
|
|
24
23
|
it "must return 1 as similarity record if two elements have equal related values" do
|
25
|
-
|
26
|
-
@algorithm.similarity_score_between(1,1).must_be :==, 1
|
24
|
+
@algorithm.similarity_score(1,1).must_be :==, 1
|
27
25
|
end
|
28
26
|
|
29
27
|
end
|
data/test/movies.json
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
{"Alvaro Pereyra Rabanal":{"Enterrado":90,"La reunion del diablo":20,"Scott Pilgrim vs The World":80,"El avispon verde":20,"Se dice de mi":89,"Un tonto en el amor":75,"El secreto de sus ojos":99,"Wall Street: El dinero nunca duerme":90,"Super 8 ":90,"Kung fu panda 2":92,"La revelacion":70,"Rio":90,"El cisne negro":70,"Tron: El Legado":90,"Invasion del Mundo: Batalla Los Angeles":50,"Peluda venganza":20,"Megamente":90,"Dias de ira":76,"El especialista":20,"u00bfQue paso ayer? 2":20,"El escritor oculto":80,"Red ":33,"Amor a distancia":90,"Resident Evil 4: La Resurreccion":50,"Octubre":55,"Sin limite":65,"Love Actually ":90,"El gran concierto":90,"Perdidos en Tokio":90,"Pulp Fiction ":87,"Cazador de demonios":20,"Loco y estu00fapido amor":65,"Amor por contrato":66,"Contracorriente":90,"Actividad paranormal 2":20,"La Vigilia":15,"El rey leon":90,"Transformers: El lado oscuro de la luna":20,"Cuando Harry conocio a Sally":95,"Machete":90,"Avatar":80,"Soy el nu00famero cuatro":90,"Toy Story 3 ":99,"El discurso del rey ":90,"Noches de encanto":90,"Enredados ":55,"Red Social":80,"La otra familia ":70,"Tesis ":90,"Harry Potter and the Deathly Hallows":76,"El planeta de los simios: Revolucion":80,"Biutiful ":20,"Harry Potter y las Reliquias de la muerte: Parte II":60,"Las Cronicas de Narnia: La travesia del viajero del alba":63,"El regreso de la nana magica":20},"Angel Velasquez":{"Los indestructibles":95},"Rafael Lanfranco":{"El rey leon":20,"Enter the Dragon ":78,"Un hombre solitario":20,"Desconocido ":20,"La historia sin fin":90,"Pi: Fe en el caos":66,"Harry Potter and the Deathly Hallows":86,"Dark City ":89,"Juego de Traiciones":78,"X-Men: Primera generacion":89,"La Aldea":20,"Agora ":90,"Comer, rezar, amar":20,"La duda":89,"Invasion del Mundo: Batalla Los Angeles":86,"El nuevo entrenador":90,"Marea Roja":86,"Sniper":77,"Conoceras al hombre de tus suenos":90,"El peleador":54,"Enterrado":50,"Mary and Max ":90,"Gran Torino ":86,"Senales":20,"Real Steel ":10,"Breakin' ":94,"Tesis ":66,"Red ":20,"El secreto de sus ojos":75,"El Senor De Los Anillos: Las Dos Torres":50,"Cowboys y Aliens":82,"Mision Imposible":91,"Match Point":64,"Loco por ella":77,"La revelacion":90,"El origen":86,"El Pianista":90,"Lazos de sangre":94,"El Informante":100,"Mas alla de la vida":90,"Carancho ":20,"Harry Potter y las Reliquias de la muerte: Parte II":30,"Jumper ":20,"Apocalypse Now ":92,"Source Code ":90,"Rango ":20,"Hot Fuzz ":43,"La fuente de la vida":91,"The Doubt ":89,"Medianoche en Paris":89,"Sin limite":77,"Rapidos y Furiosos 5":90,"El discurso del rey ":81,"127 horas":85,"Following":90,"Serenity ":60,"Una propuesta atrevida":20,"Temple de Acero":81,"Piratas del Caribe: Navegando aguas misteriosas":84,"Ex, todos tenemos uno":90,"Los ilusionautas":20,"Agente Salt":90,"Star Wars: Episodio IV - Una nueva esperanza":89,"El planeta de los simios: Revolucion":89,"Kick - Ass":90,"Cyrus":90,"El Exterminador 2: El Dia del Juicio Final":87,"Invasion del mundo":82,"Mundo Surreal":92,"Kung Fu Hustle":90,"El mensajero ":90,"Red Social":94,"I saw the devil":73,"El Club de la Pelea":90,"El Senor de los Anillos: El retorno del Rey":66,"Scott Pilgrim vs The World":90,"Terminator II ":87,"Preciosa":90,"Kung fu panda 2":92,"En un rincon del corazon":20,"12 hombres molestos":55,"Thirteen Days ":11,"E.T : El Extraterrestre":90,"Harry Potter y el Prisionero de Azkaban":11,"Way of the Dragon ":90,"Los agentes del destino":75,"True Romance ":75,"Pulp Fiction ":100,"Thor":60,"8 Mile":70,"Super 8 ":90,"The Wolfman ":20,"Los imperdonables":92,"El Protegido":90,"Mi nombre es John Lennon":90,"Hable con ella ":90,"Bastardos sin gloria":80,"The Big Lebowski ":90,"social network":90,"Camino al Oscar":99,"Winnie Pooh ":50,"Beginners":90,"El Fin de Los Tiempos":20,"Celda 211 ":90,"Siempre a tu lado":70,"Tron: El Legado":20,"Que pena tu vida ":20,"Capitan America: El primer vengador":78,"El especialista":20,"Fuego Contra Fuego":78},"4D2Studio Diseno y Animacion":{"Piratas del Caribe 3: En El Fin del Mundo":90,"The Adventures of Tintin: The Secret of the Unicorn ":99,"Mundo Surreal":60,"Temple de Acero":99,"Laeon :El Profesional":90,"El secreto de sus ojos":92,"Fallen Art":96,"Kung fu panda 2":95,"Capitan America: El primer vengador":99,"Tron: El Legado":100,"Los indestructibles":9,"Rango ":20,"Megamente":20,"Thor":100,"Dias de ira":90,"Dorothy of Oz ":100,"Pandorum":65,"Los ilusionautas":20,"Siempre a tu lado":90,"El amante":90,"Cazador de demonios":75,"El u00faltimo maestro del aire":9,"Transformers: El lado oscuro de la luna":77,"X-Men: Primera generacion":90,"Los cazafantasmas":89,"Harry Potter and the Deathly Hallows":89,"Comer, rezar, amar":20,"Agora ":90,"Piratas del Caribe: Navegando aguas misteriosas":85},"Daniel Subauste":{"El rey leon":90,"La Masacre de Texas: El Origen":20,"Una loca pelicula de vampiros":20,"Calabozos y Dragones":205,"Piratas del Caribe 3: En El Fin del Mundo":20,"El Vengador":40,"Linterna Verde":60,"Luna Nueva":20,"Juan de los Muertos":100,"Harry Potter and the Deathly Hallows":70,"Avatar":60,"Dragones, destino de fuego":2,"Space Cowboys ":20,"Agora ":95,"Comer, rezar, amar":20,"X-Men: Primera generacion":90,"Piratas en el Callao":20,"Invasion del Mundo: Batalla Los Angeles":38,"El u00faltimo exorcismo":20,"Tesis ":90,"Daejame entrar":90,"Senales":20,"Red ":90,"Cowboys y Aliens":45,"Mision Imposible":55,"Megamind ":100,"Harry Potter y el Caliz de Fuego":75,"Los indestructibles":30,"La invasion":40,"La Sonrisa de Mona Lisa":90,"Pandorum":68,"Zodiaco":91,"Calabozos y Dragones 2 El Poder Mayor":80,"Transformers: El lado oscuro de la luna":60,"Corazon Valiente":90,"El cisne negro":90,"Mas alla de la vida":20,"Me enamorae en Nueva York":20,"Harry Potter y las Reliquias de la muerte: Parte II":65,"Millennium I: Los hombres que no amaban a las mujeres":60,"Un tonto en el amor":90,"Como Agua para Chocolate ":20,"Laeon :El Profesional":90,"La Naranja Mecanica":90,"Horton Hears a Who! ":90,"Sin limite":85,"Enredados ":90,"El u00faltimo maestro del aire":20,"La chica de mis suenos":90,"La Pasion de Cristo":76,"Temple de Acero":60,"Crepu00fasculo":20,"Perdidos en Tokio":99,"Piratas del Caribe: Navegando aguas misteriosas":80,"Ga'Hoole :La Leyenda De Los Guardianes":90,"Los ilusionautas":20,"Wall Street: El dinero nunca duerme":90,"El planeta de los simios: Revolucion":85,"Kick - Ass":90,"Hannibal Rising ":20,"Planet Terror ":90,"El Codigo Da Vinci":20,"Mundo Surreal":90,"Red Social":60,"Machete":99,"Kung Fu Hustle":90,"Dragones: destino de fuego ":2,"Sanctum ":80,"Scott Pilgrim vs The World":75,"Seven":90,"Triste San Valentin":20,"Megamente":100,"u00bfComo saber si es amor?":20,"Kung fu panda 2":90,"El u00faltimo guerrero Chanka":100,"Thor":60,"Apocalypto ":20,"The Kids Are All Right ":90,"Super 8 ":78,"El Protegido":20,"TRON ":60,"El avispon verde":20,"Los pitufos":75,"El juego del miedo VII 3D":20,"Mongol, el emperador":90,"Tron: El Legado":85,"El Fin de Los Tiempos":20,"The Runaways ":40,"Capitan America: El primer vengador":90,"Millennium I - Los hombres que no amaban a las mujeres":60},"Laura Vanessa M":{"El secreto de sus ojos":99,"Wall Street: El dinero nunca duerme":70,"Atraccion peligrosa":44,"El escritor oculto":86,"La Vigilia":90,"Noches de encanto":90},"Veronica Zapata Gotelli":{"Sin lugar Para los Daebiles":90,"Mundo Surreal":20,"Temple de Acero":70,"Cartas a Julieta":55,"Carancho ":90,"Mary and Max ":90,"The Kids Are All Right ":70,"Wall Street: El dinero nunca duerme":66,"La Naranja Mecanica":90,"Rio":90,"Perros de Reserva":90,"Sin City ":90,"El Truco Final":90,"Lazos de sangre":60,"El cisne negro":90,"La cinta blanca":40,"Los indestructibles":75,"Fargo ":50,"Traffic ":90,"LadyKillers":60,"El juego ":90,"Seven":90,"Psicosis":90,"Crueldad Intolerable":79,"300 ":92,"El escritor oculto":90,"El peleador":80,"Octubre":90,"Al otro lado del corazon":92,"El Resplandor":99,"Source Code ":70,"Love and Other Impossible Pursuits ":90,"Gran Torino ":96,"The King's Speech":90,"La vida de los peces ":77,"Incendies ":91,"X-Men: Primera generacion":90,"El Hombre que Nunca Estuvo Alli":90,"La chica de la capa roja":20,"Batman Begins ":90,"Terciopelo Azul ":55,"Cuando Harry conocio a Sally":90,"Triste San Valentin":90,"Buenas Noches y Buena Suerte":82,"Un Hombre Serio":90,"Soy el nu00famero cuatro":20,"The Big Lebowski ":78,"Noches de encanto":20,"Red Social":90,"El discurso del rey ":93,"Ciudadano Kane ":90,"Quaemese despuaes de Leer":90,"Pase libre":90,"Agua para elefantes":20,"Rain Man ":90,"Conoceras al hombre de tus suenos":90,"En un rincon del corazon":20,"Dinner for Schmucks ":20,"Vaertigo":90,"Un cuento chino ":20,"Batman :El Caballero Oscuro":90,"Bastardos sin gloria":90,"Belleza Americana":90,"Una esposa de mentira":20,"Biutiful ":85},"Guillermo Pereyra":{"Paris en la mira":70,"Una loca pelicula de vampiros":81}}
|
File without changes
|
@@ -0,0 +1,27 @@
|
|
1
|
+
require 'minitest/autorun'
|
2
|
+
require_relative '../lib/suggestor'
|
3
|
+
|
4
|
+
describe Suggestor::Algorithms::PearsonCorrelation do
|
5
|
+
|
6
|
+
before do
|
7
|
+
data_string = File.read("test/numbers.json")
|
8
|
+
data = JSON.parse(data_string)
|
9
|
+
@algorithm = Suggestor::Algorithms::PearsonCorrelation.new(data)
|
10
|
+
end
|
11
|
+
|
12
|
+
describe "when building up recommendations" do
|
13
|
+
|
14
|
+
it "must return a list of shared items between two people" do
|
15
|
+
@algorithm.shared_items(1,2).must_be :==, ["1","2"]
|
16
|
+
end
|
17
|
+
|
18
|
+
it "must return 1 as similarity record if two elements have equal related values" do
|
19
|
+
@algorithm.similarity_score(1,1).must_be :==, 1
|
20
|
+
end
|
21
|
+
|
22
|
+
it "must return -1 as similarity record if two elements are totally distant" do
|
23
|
+
@algorithm.similarity_score(1,99).must_be :==, -1
|
24
|
+
end
|
25
|
+
|
26
|
+
end
|
27
|
+
end
|
data/test/suggestor_test.rb
CHANGED
@@ -3,46 +3,41 @@ require_relative '../lib/suggestor'
|
|
3
3
|
|
4
4
|
describe Suggestor::Engine do
|
5
5
|
before do
|
6
|
-
@
|
7
|
-
@data_string = File.read("test/test.json")
|
6
|
+
@data_string = File.read("test/numbers.json")
|
8
7
|
end
|
9
8
|
|
10
9
|
describe "when loading up the data structure" do
|
11
10
|
it "must raise an exception with invalid data" do
|
12
|
-
lambda{
|
11
|
+
lambda{ Suggestor::Engine.new("GIBBERISH") }.must_raise Suggestor::WrongInputFormat
|
13
12
|
end
|
14
|
-
|
15
|
-
it "must return an array structure if data is ok" do
|
16
|
-
@suggestor.load_data(@data_string).must_be_instance_of Hash
|
17
|
-
end
|
18
|
-
|
19
13
|
end
|
20
14
|
|
21
15
|
describe "when accesing the data after load_dataing it" do
|
22
16
|
|
23
17
|
before do
|
24
|
-
@suggestor.
|
25
|
-
end
|
26
|
-
|
27
|
-
it "must return a similarty score between to elements" do
|
28
|
-
@suggestor.similarity_score_for("1","1").must_be :==, 1
|
18
|
+
@suggestor = Suggestor::Engine.new(@data_string)
|
29
19
|
end
|
30
20
|
|
31
21
|
it "must return similar items from the base one with euclidean distance" do
|
32
|
-
expected =
|
33
|
-
@suggestor.
|
22
|
+
expected = [["3", 0.14285714285714285], ["2", 0.14285714285714285]]
|
23
|
+
@suggestor.similar_to("1").must_be :==, expected
|
34
24
|
end
|
35
25
|
|
36
26
|
it "must return similar items from the base one with pearson correlation" do
|
37
|
-
|
38
|
-
|
27
|
+
@suggestor = Suggestor::Engine.new(@data_string,Suggestor::Algorithms::PearsonCorrelation)
|
28
|
+
expected = [["2", 0.0], ["1", 0.0]]
|
29
|
+
@suggestor.similar_to("3").must_be :==, expected
|
39
30
|
end
|
40
31
|
|
41
32
|
it "must return similar items from the base one with euclidean distance" do
|
42
|
-
expected =
|
43
|
-
@suggestor.
|
33
|
+
expected = [["4", 2.6457513110645903]]
|
34
|
+
@suggestor.recommended_to("2").must_be :==, expected
|
44
35
|
end
|
45
36
|
|
46
|
-
|
37
|
+
it "must return similar related items from one of them" do
|
38
|
+
expected = [["5", 0.3333333333333333], ["3", 0.25], ["1", 0.12389934309929541], ["4", 0.0]]
|
39
|
+
@suggestor.similar_related_to("2").must_be :==, expected
|
40
|
+
end
|
47
41
|
|
42
|
+
end
|
48
43
|
end
|
metadata
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
name: suggestor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
4
|
prerelease:
|
5
|
-
version: 0.0.
|
5
|
+
version: 0.0.6
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Alvaro Pereyra
|
@@ -10,10 +10,19 @@ autorequire:
|
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
12
|
|
13
|
-
date: 2011-09-
|
14
|
-
|
15
|
-
|
16
|
-
|
13
|
+
date: 2011-09-24 00:00:00 Z
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: rake
|
17
|
+
prerelease: false
|
18
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
19
|
+
none: false
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0"
|
24
|
+
type: :runtime
|
25
|
+
version_requirements: *id001
|
17
26
|
description: Suggestor allows you to get suggestions of related items in your data
|
18
27
|
email:
|
19
28
|
- alvaro@xendacentral.com
|
@@ -28,20 +37,19 @@ files:
|
|
28
37
|
- Gemfile
|
29
38
|
- README.md
|
30
39
|
- Rakefile
|
31
|
-
-
|
40
|
+
- examples/playing_around.rb
|
32
41
|
- lib/suggestor.rb
|
33
42
|
- lib/suggestor/algorithms/euclidean_distance.rb
|
34
43
|
- lib/suggestor/algorithms/pearson_correlation.rb
|
35
44
|
- lib/suggestor/algorithms/recommendation_algorithm.rb
|
36
|
-
- lib/suggestor/datum.rb
|
37
45
|
- lib/suggestor/engine.rb
|
38
46
|
- lib/suggestor/version.rb
|
39
47
|
- suggestor.gemspec
|
40
48
|
- test/euclidean_test.rb
|
41
|
-
- test/
|
49
|
+
- test/movies.json
|
50
|
+
- test/numbers.json
|
51
|
+
- test/pearson_correlation.rb
|
42
52
|
- test/suggestor_test.rb
|
43
|
-
- test/test.json
|
44
|
-
has_rdoc: true
|
45
53
|
homepage: ""
|
46
54
|
licenses: []
|
47
55
|
|
@@ -65,12 +73,13 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
65
73
|
requirements: []
|
66
74
|
|
67
75
|
rubyforge_project: suggestor
|
68
|
-
rubygems_version: 1.
|
76
|
+
rubygems_version: 1.8.10
|
69
77
|
signing_key:
|
70
78
|
specification_version: 3
|
71
79
|
summary: Suggestor allows you to get suggestions of related items in your data
|
72
80
|
test_files:
|
73
81
|
- test/euclidean_test.rb
|
74
|
-
- test/
|
82
|
+
- test/movies.json
|
83
|
+
- test/numbers.json
|
84
|
+
- test/pearson_correlation.rb
|
75
85
|
- test/suggestor_test.rb
|
76
|
-
- test/test.json
|
data/demos/playing_around.rb
DELETED
@@ -1,16 +0,0 @@
|
|
1
|
-
require_relative '../lib/suggestor'
|
2
|
-
|
3
|
-
engine = Suggestor::Engine.new
|
4
|
-
|
5
|
-
# I'm using test data of Users and their movie recommendations
|
6
|
-
# Each user (identified by their ids) have a hash of their movies ids and
|
7
|
-
# what they've rate them with
|
8
|
-
json = File.read("test/test.json")
|
9
|
-
|
10
|
-
engine.load_data(json)
|
11
|
-
|
12
|
-
# Let's get some similar users
|
13
|
-
puts engine.similar_items_to("2").inspect
|
14
|
-
|
15
|
-
# So, after knowing them, why not having some recommendations?
|
16
|
-
puts engine.recommented_related_items_for("2", algorithm: :euclidean_distance)
|
data/lib/suggestor/datum.rb
DELETED
data/test/pearon_correlation.rb
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
require 'minitest/autorun'
|
2
|
-
require_relative '../lib/suggestor/algorithms/pearson_correlation'
|
3
|
-
require_relative '../lib/suggestor/engine'
|
4
|
-
|
5
|
-
describe Suggestor::Algorithms::PearsonCorrelation do
|
6
|
-
|
7
|
-
before do
|
8
|
-
@data_string = File.read("test/test.json")
|
9
|
-
@suggestor = Suggestor::Engine.new
|
10
|
-
@suggestor.load_data(@data_string)
|
11
|
-
@algorithm = Suggestor::Algorithms::PearsonCorrelation.new(@suggestor.collection)
|
12
|
-
end
|
13
|
-
|
14
|
-
describe "when building up recommendations" do
|
15
|
-
|
16
|
-
it "must return a list of shared items between two people" do
|
17
|
-
@algorithm.shared_items_between(1,2).must_be :==, ["1","2"]
|
18
|
-
end
|
19
|
-
|
20
|
-
it "must return 0 as similarity record if two elements hace no shared items" do
|
21
|
-
@algorithm.similarity_score_between(1,4).must_be :==, 0
|
22
|
-
end
|
23
|
-
|
24
|
-
it "must return 1 as similarity record if two elements have equal related values" do
|
25
|
-
@algorithm.similarity_score_between(1,1).must_be :==, 1
|
26
|
-
end
|
27
|
-
|
28
|
-
it "must return -1 as similarity record if two elements are totally distant" do
|
29
|
-
@algorithm.similarity_score_between(1,99).must_be :==, 0
|
30
|
-
end
|
31
|
-
|
32
|
-
|
33
|
-
end
|
34
|
-
end
|