recommendify_whosv 0.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: c9fe29b27fad9b27ba8c563a24663113ba4b707c
4
+ data.tar.gz: 0101d43d29fc778e5f53056f1575d6bb43640503
5
+ SHA512:
6
+ metadata.gz: 33a5019f1d981f816417c35dd49f9d61e23e88998bcb2b41c79f633a3e2ac8b4204b0160147d19eaf932a9061d784d0cdcb32973b76431d1778fdaca8f40d683
7
+ data.tar.gz: dd629ee3d08a79e0afb649e09300987d9ee498571ff3d83ef4e73689d07616d6a93fcd125845d0e81bbc9938b0f3471f07c06813ae541e143af31a982bd0a64b
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source :rubygems
2
+
3
+ gem "redis"
4
+
5
+ group :development do
6
+ gem "rake"
7
+ gem "rspec"
8
+ gem "yard"
9
+ end
data/README.md ADDED
@@ -0,0 +1,154 @@
1
+ recommendify
2
+ ============
3
+
4
+ _Recommendify is a ruby/redis based recommendation engine_ - The recommendations can be updated/processed incrementally and on multiple hosts. The worker is implemented in plain ruby and native C.
5
+
6
+ [ ![Build status - Travis-ci](https://secure.travis-ci.org/paulasmuth/recommendify.png) ](http://travis-ci.org/paulasmuth/recommendify)
7
+
8
+ ---
9
+
10
+ #### usecases
11
+
12
+ + __"Users that bought this product also bought..."__ from `user_id--bought-->product_id` pairs
13
+ + __"Users that viewed this video also viewed..."__ from `user_id--viewed-->video_id` pairs
14
+ + __"Users that like this venue also like..."__ from `user_id--likes-->venue_id` pairs
15
+
16
+
17
+
18
+ synopsis
19
+ --------
20
+
21
+ Your input data (the so called interaction-sets) should look like this:
22
+
23
+ ```
24
+ # FORMAT A: user bought products (select buyerid, productid from sales group_by buyerid)
25
+ [user23] product5 produt42 product17
26
+ [user42] product8 produt16 product5
27
+
28
+ # FORMAT B: user watched video (this can be transformed to the upper representation with a map/reduce)
29
+ user3 -> video3
30
+ user6 -> video19
31
+ user3 -> video6
32
+ user1 -> video42
33
+ ```
34
+
35
+ The output data will look like this:
36
+
37
+ ```
38
+ # similar products based on co-concurrent buys
39
+ product5 => product17 (0.78), product8 (0.43), product42 (0.31)
40
+ product17 => product5 (0.36), product8 (0.21), product42 (0.18)
41
+
42
+ # similar videos based on co-concurrent views
43
+ video19 => video3 (0.93), video6 (0.56), video42 (0.34)
44
+ video42 => video19 (0.32), video3 (0.21), video6 (0.08)
45
+ ```
46
+
47
+ You can add new interaction-sets to the processor incrementally, but the similarities for changed items have to be re-processed after new interactions were added. You can either re-process all items (recommender.process!) from time to time or keep track of the updates and only process the changed items (recommender.process_item!)
48
+
49
+
50
+ usage
51
+ -----
52
+
53
+ ```ruby
54
+
55
+ # Our similarity matrix, we calculate the similarity via co-concurrence
56
+ # of products in "orders" using the jaccard similarity measure.
57
+ class MyRecommender < Recommendify::Base
58
+
59
+ # store only the top fifty neighbors per item
60
+ max_neighbors 50
61
+
62
+ # define an input data set "order_items". we'll add "order_id->product_id"
63
+ # pairs to this input and use the jaccard coefficient to retrieve a
64
+ # "customers that ordered item i1 also ordered item i2" statement and apply
65
+ # the result to the item<->item similarity matrix with a weight of 5.0
66
+ input_matrix :order_items,
67
+ # :native => true,
68
+ :similarity_func => :jaccard,
69
+ :weight => 5.0
70
+
71
+ end
72
+
73
+ recommender = MyRecommender.new
74
+
75
+ # add `order_id->product_id` interactions to the order_item_sim input
76
+ # you can add data incrementally and call RecommendedItem.process! to update
77
+ # the similarity matrix at any time.
78
+ recommender.order_items.add_set("order1", ["product23", "product65", "productm23"])
79
+ recommender.order_items.add_set("order2", ["product14", "product23"])
80
+
81
+ # Calculate all elements of the similarity matrix
82
+ recommender.process!
83
+
84
+ # ...or calculate a specific row of the similarity matrix (a specific item)
85
+ # use this to avoid re-processing the whole matrix after incremental updates
86
+ recommender.process_item!("product65")
87
+
88
+ # retrieve similar products to "product23"
89
+ recommender.for("item23")
90
+ => [ <Recommendify::Neighbor item_id:"product65" similarity:0.23>, (...) ]
91
+
92
+ # remove "product23" from the similarity matrix and the input matrices. you should
93
+ # do this if your items 'expire', since it will speed up the calculation
94
+ recommender.delete_item!("product23")
95
+ ```
96
+
97
+ ### how it works
98
+
99
+ Recommendify keeps an incrementally updated `item x item` matrix, the "co-concurrency matrix". This matrix stores the number of times that a combination of two items has appeared in an interaction/preferrence set. The co-concurrence counts are processed with a jaccard similarity measure to retrieve another `item x item` similarity matrix, which is used to find the N most similar items for each item. This is also called "Item-based Collaborative Filtering with binary ratings" (see Miranda, Alipio et al. [1])
100
+
101
+ 1. Group the input user->item pairs by user-id and store them into interaction sets
102
+ 2. For each item<->item combination in the interaction set increment the respective element in the co-concurrence matrix
103
+ 3. For each item<->item combination in the co-concurrence matrix calculate the item<->item similarity
104
+ 3. For each item store the N most similar items in the respective output set.
105
+
106
+
107
+ ### does it scale?
108
+
109
+ The maximum number of entries in the co-concurrence and similarity matrix is k(n) = (n^2)-(n/2), it grows O(n^2). However, in a real scenario it is very unlikely that all item<->item combinations appear in a interaction set and we use a sparse matrix which will only use memory for elemtens with a value > 0. The size of the similarity grows O(n).
110
+
111
+ ### native/fast worker
112
+
113
+ After you have compiled the native worker, you can pass the `:native => true` option to the input_matrix. This speeds up processing by at least 10x.
114
+
115
+ ```
116
+ cd ~/.rvm/gems/ruby-1.9.3-p0/gems/recommendify-0.2.2/
117
+ bundle exec rake build_native
118
+ ```
119
+
120
+ example
121
+ -------
122
+
123
+ These recommendations were calculated from 2,3mb "profile visit"-data (taken from www.talentsuche.de) - keep in mind that the recommender uses only visitor->visited data, it __doesn't know the gender__ of a user.
124
+
125
+ [ ![Example Results](https://raw.github.com/paulasmuth/recommendify/master/doc/example.png) ](http://falbala.23loc.com/~paul/recommendify_out_1.html)
126
+
127
+ full snippet: http://falbala.23loc.com/~paul/recommendify_out_1.html
128
+
129
+ Initially processing the 120.047 `visitor_id->profile_id` pairs currently takes around half an hour with the ruby-only implementation and ~130 seconds with the native/c implementation on a single core. It creates a 24.1mb hashtable in redis (with truncated user_rows a' max 100 items). In another real data set with very short user rows (purchase/payment data) it used only 3.4mb for 90k items with very good results. You can try this for yourself; the complete data and code is in `doc/example.rb` and `doc/example_data.csv`.
130
+
131
+
132
+
133
+
134
+ Sources / References
135
+ --------------------
136
+
137
+ [1] Miranda C. and Alipio J. (2008). Incremental collaborative filtering for binary ratings (LIAAD - INESC Porto, University of Porto)
138
+
139
+ [2] George Karypis (2000) Evaluation of Item-Based Top-N Recommendation Algorithms (University of Minnesota, Department of Computer Science / Army HPC Research Center)
140
+
141
+ [3] Shiwei Z., Junjie W. Hui X. and Guoping X. (2011) Scaling up top-K cosine similarity search (Data & Knowledge Engineering 70)
142
+
143
+
144
+
145
+ License
146
+ -------
147
+
148
+ Copyright (c) 2011 Paul Asmuth
149
+
150
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to use, copy and modify copies of the Software, subject to the following conditions:
151
+
152
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
153
+
154
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ require "rubygems"
2
+ require "rspec"
3
+ require 'rspec/core/rake_task'
4
+ require "yard"
5
+
6
+ desc "Run all examples"
7
+ task RSpec::Core::RakeTask.new('spec')
8
+
9
+ task :default => "spec"
10
+
11
+ desc "Generate documentation"
12
+ task YARD::Rake::YardocTask.new
13
+
14
+
15
+ desc "Compile the native client"
16
+ task :build_native do
17
+ exec "cd ext && ruby extconf.rb && make"
18
+ end
data/doc/example.png ADDED
Binary file
data/doc/example.rb ADDED
@@ -0,0 +1,87 @@
1
+ $: << ::File.expand_path("../../lib", __FILE__)
2
+ require "recommendify"
3
+ require "redis"
4
+
5
+ # configure redis
6
+ Recommendify.redis = Redis.new
7
+
8
+ # our recommender class
9
+ class UserRecommender < Recommendify::Base
10
+
11
+ #max_neighbors 50
12
+
13
+ input_matrix :visits,
14
+ :similarity_func => :jaccard,
15
+ :native => true
16
+
17
+ end
18
+
19
+ recommender = UserRecommender.new
20
+
21
+ # load some test data
22
+ buckets = Hash.new{ |h,k| h[k]=[] }
23
+ IO.read("example_data.csv").split("\n").each do |l|
24
+ user_id, item_id = l.split(",")
25
+ next if user_id.length == 0
26
+ buckets[user_id] << item_id
27
+ end
28
+
29
+ # add the test data to the recommender
30
+ buckets.each do |user_id, items|
31
+ puts "#{user_id} -> #{items.join(",")}"
32
+ items = items[0..99] # do not add more than 100 items per user
33
+ recommender.visits.add_set(user_id, items)
34
+ end
35
+
36
+ # process all items (equivalent to recommender.process!)
37
+ num_items = (all_items = recommender.all_items).size
38
+ all_items.each_with_index do |item_id, n|
39
+ puts "processing #{item_id} (#{n}/#{num_items})"
40
+ recommender.process_item!(item_id)
41
+ end
42
+
43
+
44
+ # generate a html page
45
+ def item_url(item_id)
46
+ "http://#{ENV["URL"]}/u.#{item_id.gsub("profile_", "")}.FNORD.html"
47
+ end
48
+
49
+ out_items = []
50
+ recommender.all_items.shuffle[0..2000].each do |item_id|
51
+ out_recs = recommender.for(item_id)[0..5].map do |rec|
52
+ <<-HTML
53
+ <div class="rec">
54
+ <span>#{rec.similarity}</span>
55
+ <img src="#{item_url(rec.item_id)}" />
56
+ </div>
57
+ HTML
58
+ end
59
+ next if out_recs.length < 5
60
+ out_items << <<-HTML
61
+ <div class="item">
62
+ <img src="#{item_url(item_id)}" class="item" />
63
+ #{out_recs.join("\n")}
64
+ <br class="clear" />
65
+ </div>
66
+ HTML
67
+ end
68
+
69
+ out = <<-HTML
70
+ <!DOCTYPE html>
71
+ <html>
72
+ <head>
73
+ <title>foo</title>
74
+ <style>
75
+ .item{ padding:50px; border-bottom:1px dotted #333; }
76
+ .item img.item{ float:left; margin-right:30px; }
77
+ .item .rec{ float:left; width:130px; overflow:hidden; }
78
+ .clear{ clear:both; }
79
+ </style>
80
+ </head>
81
+ <body>
82
+ #{out_items.join("\n")}
83
+ </body>
84
+ </html>
85
+ HTML
86
+
87
+ File.open('/tmp/recommendify_out_1.html', "w+"){ |f| f.write(out) }