co2_filter 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: aa78d78af6f6ed5e9dc74d4104f85759dc5299d6
4
+ data.tar.gz: d2b184ffa6deac97b8d0afc8e9020b92fc2263cd
5
+ SHA512:
6
+ metadata.gz: 231065d35af899464b9aad9c690e29591eba7ae846f58b3886b3b5a0ce963fe69b370d2a6df52e37745b8ad6debadb09d10f8162a481a182e138afb596fc7b99
7
+ data.tar.gz: 46ffdb54af8bec7e72366bed2772d214344c65ee8c1a7d7cee23c33953f6aaab0075c86250a28add3c4901bd809ca519fba47d359a0a62ab7d06e51c099fe902
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2016 Pivotal
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,104 @@
1
+ = co2_filter
2
+
3
+ Co2Filter is a combination Collaborative and Content-based filtering gem, with optional access to key points in the logic chain for adding your own calculations or storing them for later.
4
+
5
+ Collaborative filtering recommends based on the ratings of other users and their similarity to you.
6
+ Content-based filtering recommends based on the attributes of items and your opinions of them.
7
+
8
+ Both strategies will return a predicted rating for each item still unrated. They can each be used alone with this gem, but they can also be combined via either an averaging of the two predicted user ratings or the content-boosted collaborative filtering technique.
9
+
10
+ Content-boosted collaborative filtering first applies content-based filtering to fill out a sparse user rating set, then applies collaborative filtering on the resulting dense data.
11
+
12
+ Rating ranges are irrelevant to the algorithm, as rating averages are the key point of reference. Feel free to use whatever works best for your app. (But be warned that this means there is a chance of predicting slightly outside of the actual range.)
13
+
14
+ == Installation
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'co2_filter', git: 'https://github.com/comatose-turtle/co2_filter.git'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ == Usage
25
+
26
+ The most basic usage follows this pattern:
27
+ recommended = Co2Filter.filter(current_user: current_user, other_users: other_users, items: items)
28
+
29
+ And the results can be used as follows:
30
+
31
+ most_recommended_item_id = recommended.ids_by_rating.first
32
+ top_20_recommended_items = recommended.ids_by_rating.take(20)
33
+ predicted_user_rating = recommended[most_recommended_item_id]
34
+
35
+ The return type is a simple wrapper for a results hash. You can extract the inner hash if necessary with +to_hash+.
36
+
37
+ This gem is ORM-agnostic and expects you to select your relevant data on your own. The data you pass in should look like:
38
+ current_user = {
39
+ # item_id => rating
40
+ 'item1' => 5,
41
+ 'item2' => 1,
42
+ 'item3' => 3
43
+ # ...
44
+ }
45
+
46
+ other_users = {
47
+ # user_id => { item_id => rating }
48
+ 'user1' => {
49
+ 'item1' => 2,
50
+ 'item2' => 5,
51
+ 'item4' => 2
52
+ },
53
+ 'user2' => {
54
+ 'item1' => 5,
55
+ 'item2' => 1,
56
+ 'item4' => 5,
57
+ 'item5' => 1
58
+ }
59
+ # ...
60
+ }
61
+
62
+ # A set of all items from the dataset
63
+ items = {
64
+ # item_id => { attribute_id => strength }
65
+ }
66
+
67
+ Ids are arbitrary to the algorithm and can be strings as easily as numbers. Ratings and strengths should be numbers of some type, and the range should be consistent across rating types (i.e. item ratings, attribute strengths), but there is no range restriction enforced by the algorithm.
68
+
69
+ Attribute strength refers to a situation where attributes are applied in varying degrees rather than a simple "off" or "on" state. If this does not apply to your app, I suggest setting all strengths to 1.
70
+
71
+ === Using Individual Filters
72
+
73
+ To implement only the collaborative filter, just use:
74
+ Co2Filter::Collaborative.filter(current_user: current_user, other_users: other_users)
75
+
76
+ To implement only the content-based filter, use:
77
+ Co2Filter::ContentBased.filter(user: current_user, items: items)
78
+
79
+ The content-based filtering process consists of two steps:
80
+ 1. Constructing a user profile
81
+ 2. Using the user profile to determine recommendations
82
+
83
+ If you are interested in doing this process piecemeal (for instance, to save the user profile to the database for later use), you can do so:
84
+ user_profile = Co2Filter::ContentBased.ratings_to_profile(user_ratings: current_user, items: items)
85
+ Co2Filter::ContentBased.filter(user: user_profile, items: items)
86
+
87
+ === Content-Boosted Collaborative Filtering
88
+
89
+ Content-boosted collaborative filtering can be used as follows:
90
+ Co2Filter.content_boosted_collaborative_filter(current_user: current_user, other_users: other_users, items: items)
91
+
92
+ This is the most processor-intensive algorithm, but it too can be split up into multiple pieces if you wish:
93
+ boosted_users = Co2Filter::ContentBased.boost_ratings(users: other_users, items: items)
94
+ Co2Filter::Collaborative.filter(current_user: current_user, other_users: boosted_users)
95
+ Note that the second step is simply the basic collaborative filter. If you wish to break up the +boost_ratings+ method even further, then you are actually talking about using the +Co2Filter::ContentBased.filter+ on each of the users. (See the definition for +boost_ratings+.)
96
+
97
+ == Contributing
98
+
99
+ Bug reports and pull requests are welcome on GitHub at https://github.com/comatose-turtle/co2_filter. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the {Contributor Covenant}[http://contributor-covenant.org] code of conduct.
100
+
101
+ == License
102
+
103
+ The gem is available as open source under the terms of the {MIT License}[http://opensource.org/licenses/MIT].
104
+
data/Rakefile ADDED
@@ -0,0 +1,23 @@
1
+ require "rspec/core/rake_task"
2
+
3
+ begin
4
+ require 'bundler/setup'
5
+ rescue LoadError
6
+ puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
7
+ end
8
+
9
+ require 'rdoc/task'
10
+
11
+ RDoc::Task.new(:rdoc) do |rdoc|
12
+ rdoc.rdoc_dir = 'rdoc'
13
+ rdoc.main = "README.rdoc"
14
+ rdoc.title = 'Co2Filter'
15
+ rdoc.options << '--line-numbers'
16
+ rdoc.rdoc_files.include('README.rdoc')
17
+ rdoc.rdoc_files.include('lib/**/*.rb')
18
+ end
19
+
20
+ Bundler::GemHelper.install_tasks
21
+
22
+ RSpec::Core::RakeTask.new(:spec)
23
+ task :default => :spec
@@ -0,0 +1,2 @@
1
+ class Co2Filter::Collaborative::Results < Co2Filter::Results
2
+ end
@@ -0,0 +1,133 @@
1
+ module Co2Filter::Collaborative
2
+ autoload :Results, 'co2_filter/collaborative/results'
3
+
4
+ def self.filter(current_user:, other_users:, measure: :hybrid)
5
+ current_user = Co2Filter::RatingSet.new(current_user) unless current_user.is_a? Co2Filter::RatingSet
6
+ if measure == :euclidean
7
+ processed_users = euclidean(current_user: current_user, other_users: other_users, num_nearest: 30)
8
+ elsif measure == :cosine
9
+ processed_users = mean_centered_cosine(current_user: current_user, other_users: other_users, num_nearest: 30)
10
+ else
11
+ eu = euclidean(current_user: current_user, other_users: other_users, num_nearest: 30)
12
+ co = mean_centered_cosine(current_user: current_user, other_users: other_users, num_nearest: 30)
13
+ processed_users = {}
14
+ eu.each do |user_id, user|
15
+ processed_users[user_id] = user.merge(co[user_id]) do |k, val1, val2|
16
+ k == :coefficient ? (val1 + val2) / 2.0 : val1
17
+ end
18
+ end
19
+ end
20
+
21
+ new_items = []
22
+ processed_users.each do |user_id, user|
23
+ new_items = new_items | (user[:ratings].keys - current_user.keys)
24
+ end
25
+
26
+ item_ratings = {}
27
+ new_items.each do |item_id|
28
+ rating_influence_total = 0
29
+ weight_normal = 0
30
+ processed_users.reject do |user_id, user|
31
+ user[:ratings][item_id].nil?
32
+ end.each do |user_id, user|
33
+ rating_influence_total += user[:coefficient] * (user[:ratings][item_id] - user[:mean])
34
+ weight_normal += user[:coefficient].abs
35
+ end
36
+ item_ratings[item_id] = current_user.mean + rating_influence_total / weight_normal if weight_normal > 0
37
+ end
38
+
39
+ Results.new(item_ratings)
40
+ end
41
+
42
+ def self.mean_centered_cosine(current_user:, other_users:, num_nearest:)
43
+ processed = other_users.map do |key, user2|
44
+ user2 = Co2Filter::RatingSet.new(user2) unless user2.is_a? Co2Filter::RatingSet
45
+ [key, single_cosine(current_user, user2)]
46
+ end
47
+ processed.sort_by do |entry|
48
+ -(entry[1][:coefficient].abs)
49
+ end.take(num_nearest).inject({}) do |hash, (key, value)|
50
+ hash[key] = value
51
+ hash
52
+ end
53
+ end
54
+
55
+ def self.single_cosine(user1, user2)
56
+ sum1 = 0
57
+ sum2 = 0
58
+ union = user1.keys | user2.keys
59
+ union.each do |key|
60
+ if user1[key]
61
+ sum1 += user1[key]
62
+ end
63
+ if user2[key]
64
+ sum2 += user2[key]
65
+ end
66
+ end
67
+ mean1 = user1.length == 0 ? 0 : sum1 / user1.length
68
+ mean2 = user2.length == 0 ? 0 : sum2 / user2.length
69
+
70
+ numerator = 0
71
+ denominator1 = 0
72
+ denominator2 = 0
73
+ union.each do |key|
74
+ deviation1 = user1[key] ? user1[key] - mean1 : 0
75
+ deviation2 = user2[key] ? user2[key] - mean2 : 0
76
+
77
+ numerator += deviation1 * deviation2
78
+ denominator1 += deviation1**2
79
+ denominator2 += deviation2**2
80
+ end
81
+ {
82
+ ratings: user2,
83
+ mean: mean2,
84
+ coefficient: (denominator1 * denominator2 == 0 ? 0 : numerator / Math.sqrt(denominator1 * denominator2))
85
+ }
86
+ end
87
+
88
+ def self.euclidean(current_user:, other_users:, num_nearest:, range:0)
89
+ if range == 0
90
+ lowest = nil
91
+ highest = nil
92
+ current_user.each do |k, rating|
93
+ lowest = rating if !lowest || lowest > rating
94
+ highest = rating if !highest || highest < rating
95
+ end
96
+ other_users.each do |k, user|
97
+ user.each do |k, rating|
98
+ lowest = rating if !lowest || lowest > rating
99
+ highest = rating if !highest || highest < rating
100
+ end
101
+ end
102
+ range = highest - lowest
103
+ end
104
+ processed = other_users.map do |key, user2|
105
+ user2 = Co2Filter::RatingSet.new(user2) unless user2.is_a? Co2Filter::RatingSet
106
+ [key, single_euclidean(current_user, user2, range)]
107
+ end
108
+ processed.sort_by do |entry|
109
+ -(entry[1][:coefficient])
110
+ end.take(num_nearest).inject({}) do |hash, (key, value)|
111
+ hash[key] = value
112
+ hash
113
+ end
114
+ end
115
+
116
+ def self.single_euclidean(user1, user2, range)
117
+ numerator = 0
118
+ denominator = 0
119
+ intersect = user1.keys & user2.keys
120
+ intersect.each do |item_id|
121
+ numerator += (user1[item_id] - user2[item_id]) ** 2
122
+ denominator += range ** 2
123
+ end
124
+ relevancy_weight = intersect.size < 50.0 ? intersect.size / 50.0 : 1
125
+ coefficient = relevancy_weight * (1 - ((1.0 * numerator / denominator)**(0.5)))
126
+ user2 = Co2Filter::RatingSet.new(user2) unless user2.is_a? Co2Filter::RatingSet
127
+ {
128
+ ratings: user2.to_hash,
129
+ mean: user2.mean,
130
+ coefficient: coefficient
131
+ }
132
+ end
133
+ end
@@ -0,0 +1,2 @@
1
+ class Co2Filter::ContentBased::Results < Co2Filter::Results
2
+ end
@@ -0,0 +1,8 @@
1
+ class Co2Filter::ContentBased::UserProfile < Co2Filter::HashWrapper
2
+ attr_accessor :mean
3
+
4
+ def initialize(data, mean)
5
+ super(data)
6
+ @mean = mean
7
+ end
8
+ end
@@ -0,0 +1,56 @@
1
+ module Co2Filter::ContentBased
2
+ autoload :Results, 'co2_filter/content_based/results'
3
+ autoload :UserProfile, 'co2_filter/content_based/user_profile'
4
+
5
+ def self.filter(user:, items:)
6
+ if(user.is_a?(UserProfile))
7
+ user_profile = user
8
+ new_items = items
9
+ elsif(user.is_a?(Hash) || user.is_a?(Co2Filter::RatingSet))
10
+ user = Co2Filter::RatingSet.new(user) if user.is_a?(Hash)
11
+ user_profile = ratings_to_profile(user_ratings: user, items: items)
12
+ new_items = items.reject{|item_id, v| user[item_id]}
13
+ end
14
+ results = new_items.inject({}) do |hash, (item_id, item)|
15
+ strength_normalizer = 0
16
+ hash[item_id] = 0
17
+ item.each do |attr_id, strength|
18
+ hash[item_id] += user_profile[attr_id].to_f * strength
19
+ strength_normalizer += strength.abs if user_profile[attr_id]
20
+ end
21
+ hash[item_id] /= strength_normalizer if strength_normalizer != 0
22
+ hash[item_id] += user_profile.mean
23
+ hash
24
+ end
25
+ Results.new(results)
26
+ end
27
+
28
+ def self.ratings_to_profile(user_ratings:, items:)
29
+ user_ratings = Co2Filter::RatingSet.new(user_ratings) unless user_ratings.is_a? Co2Filter::RatingSet
30
+ user_prefs = {}
31
+ strength_normalizers = {}
32
+ user_ratings.each do |item_id, score|
33
+ deviation = score - user_ratings.mean
34
+
35
+ items[item_id].each do |attr_id, strength|
36
+ user_prefs[attr_id] ||= 0
37
+ user_prefs[attr_id] += strength * deviation
38
+ strength_normalizers[attr_id] ||= 0
39
+ strength_normalizers[attr_id] += strength.abs
40
+ end
41
+ end
42
+
43
+ user_prefs.each do |attr_id, score|
44
+ user_prefs[attr_id] /= strength_normalizers[attr_id].to_f
45
+ end
46
+
47
+ UserProfile.new(user_prefs, user_ratings.mean)
48
+ end
49
+
50
+ def self.boost_ratings(users:, items:)
51
+ users.inject({}) do |content_boosted_users, (user_id, ratings)|
52
+ content_boosted_users[user_id] = ratings.merge(filter(user: ratings, items: items))
53
+ content_boosted_users
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,17 @@
1
+ class Co2Filter::HashWrapper
2
+ def initialize(data)
3
+ @data = data.to_hash
4
+ end
5
+
6
+ def method_missing(method, *args, &block)
7
+ if [:keys, :values, :length, :size, :"[]", :"[]=", :each, :merge]
8
+ @data.send(method, *args, &block)
9
+ else
10
+ super(method, *args, &block)
11
+ end
12
+ end
13
+
14
+ def to_hash
15
+ @data
16
+ end
17
+ end
@@ -0,0 +1,10 @@
1
+ class Co2Filter::RatingSet < Co2Filter::HashWrapper
2
+ def mean
3
+ @mean ||= 1.0 * @data.values.inject(:+) / @data.size
4
+ end
5
+
6
+ def []=(key, val)
7
+ super(key, val)
8
+ @mean = nil
9
+ end
10
+ end
@@ -0,0 +1,10 @@
1
+ class Co2Filter::Results < Co2Filter::HashWrapper
2
+ def ids_by_rating
3
+ @ids_by_rating ||=
4
+ @data.sort_by do |id, item_ranking|
5
+ -item_ranking
6
+ end.map do |el|
7
+ el[0]
8
+ end
9
+ end
10
+ end
@@ -0,0 +1,3 @@
1
+ module Co2Filter
2
+ VERSION = "0.0.1"
3
+ end
data/lib/co2_filter.rb ADDED
@@ -0,0 +1,32 @@
1
+ module Co2Filter
2
+ def self.filter(current_user: , other_users: , items: nil, user_profile: nil, content_based_results: nil)
3
+ raise ArgumentError.new("An 'items' or 'content_based_results' argument must be provided.") unless items || content_based_results
4
+ collab = Collaborative.filter(current_user: current_user, other_users: other_users)
5
+
6
+ if content_based_results && content_based_results.is_a?(Results)
7
+ content = content_based_results
8
+ elsif user_profile.is_a? ContentBased::UserProfile
9
+ content = ContentBased.filter(user: user_profile, items: items)
10
+ else
11
+ content = ContentBased.filter(user: current_user, items: items)
12
+ end
13
+
14
+ hybrid = collab.merge(content) do |k, val1, val2|
15
+ (val1 + val2) / 2.0
16
+ end
17
+ Results.new(hybrid)
18
+ end
19
+
20
+ def self.content_boosted_collaborative_filter(current_user:, other_users:, items:)
21
+ content_boosted_users = ContentBased.boost_ratings(users: other_users, items: items)
22
+ results = Collaborative.filter(current_user: current_user, other_users: content_boosted_users)
23
+ Results.new(results)
24
+ end
25
+
26
+ autoload :VERSION, 'co2_filter/version'
27
+ autoload :Collaborative, 'co2_filter/collaborative'
28
+ autoload :ContentBased, 'co2_filter/content_based'
29
+ autoload :Results, 'co2_filter/results'
30
+ autoload :RatingSet, 'co2_filter/rating_set'
31
+ autoload :HashWrapper, 'co2_filter/hash_Wrapper'
32
+ end
@@ -0,0 +1,4 @@
1
+ # desc "Explaining what the task does"
2
+ # task :co2_filter do
3
+ # # Task goes here
4
+ # end
metadata ADDED
@@ -0,0 +1,101 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: co2_filter
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Tommy Orr
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-02-19 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ description:
56
+ email:
57
+ - torr@pivotal.io
58
+ executables: []
59
+ extensions: []
60
+ extra_rdoc_files: []
61
+ files:
62
+ - MIT-LICENSE
63
+ - README.rdoc
64
+ - Rakefile
65
+ - lib/co2_filter.rb
66
+ - lib/co2_filter/collaborative.rb
67
+ - lib/co2_filter/collaborative/results.rb
68
+ - lib/co2_filter/content_based.rb
69
+ - lib/co2_filter/content_based/results.rb
70
+ - lib/co2_filter/content_based/user_profile.rb
71
+ - lib/co2_filter/hash_wrapper.rb
72
+ - lib/co2_filter/rating_set.rb
73
+ - lib/co2_filter/results.rb
74
+ - lib/co2_filter/version.rb
75
+ - lib/tasks/co2_filter_tasks.rake
76
+ homepage: https://github.com/comatose-turtle/co2_filter
77
+ licenses:
78
+ - MIT
79
+ metadata: {}
80
+ post_install_message:
81
+ rdoc_options: []
82
+ require_paths:
83
+ - lib
84
+ required_ruby_version: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - ">="
87
+ - !ruby/object:Gem::Version
88
+ version: '0'
89
+ required_rubygems_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">="
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ requirements: []
95
+ rubyforge_project:
96
+ rubygems_version: 2.4.5.1
97
+ signing_key:
98
+ specification_version: 4
99
+ summary: Uses both collaborative and content-based filtering methods to enable a complex,
100
+ hybrid recommendation engine.
101
+ test_files: []