recommendify 0.2.2 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,97 +1,102 @@
1
1
  recommendify
2
2
  ============
3
3
 
4
- Incremental and distributed item-based "Collaborative Filtering" for binary ratings with ruby and redis. In a nutshell: You feed in `user -> item` interactions and it spits out similarity vectors between items ("related items"). __scroll down for a demo...__
4
+ _Recommendify is a ruby/redis based recommendation engine_ - The recommendations can be updated/processed incrementally and on multiple hosts. The worker is implemented in plain ruby and native C.
5
5
 
6
6
  [ ![Build status - Travis-ci](https://secure.travis-ci.org/paulasmuth/recommendify.png) ](http://travis-ci.org/paulasmuth/recommendify)
7
7
 
8
+ ---
8
9
 
9
- ### use cases
10
+ #### usecases
10
11
 
11
- + "Users that bought this product also bought...".
12
- + "Users that viewed this video also viewed...".
13
- + "Users that follow this person also follow...".
12
+ + __"Users that bought this product also bought..."__ from `user_id--bought-->product_id` pairs
13
+ + __"Users that viewed this video also viewed..."__ from `user_id--viewed-->video_id` pairs
14
+ + __"Users that like this venue also like..."__ from `user_id--likes-->venue_id` pairs
14
15
 
15
16
 
16
- usage
17
- -----
18
17
 
19
- Your data should look something like this:
18
+ synopsis
19
+ --------
20
+
21
+ Your input data (the so called interaction-sets) should look like this:
22
+
23
+ ```
24
+ # FORMAT A: user bought products (select buyerid, productid from sales group_by buyerid)
25
+ [user23] product5 produt42 product17
26
+ [user42] product8 produt16 product5
27
+
28
+ # FORMAT B: user watched video (this can be transformed to the upper representation with a map/reduce)
29
+ user3 -> video3
30
+ user6 -> video19
31
+ user3 -> video6
32
+ user1 -> video42
33
+ ```
34
+
35
+ The output data will look like this:
20
36
 
21
37
  ```
22
- # which items are frequently bought togehter?
23
- [order23] product5 produt42 product17
24
- [order42] product8 produt16 product32
38
+ # similar products based on co-concurrent buys
39
+ product5 => product17 (0.78), product8 (0.43), product42 (0.31)
40
+ product17 => product5 (0.36), product8 (0.21), product42 (0.18)
25
41
 
26
- # which users are frequently watched/followed together?
27
- [user4] user9 user11 user12
28
- [user9] user6 user8 user11
42
+ # similar videos based on co-concurrent views
43
+ video19 => video3 (0.93), video6 (0.56), video42 (0.34)
44
+ video42 => video19 (0.32), video3 (0.21), video6 (0.08)
29
45
  ```
30
46
 
31
- You can add new interaction-sets to the processor incrementally, but the similarity matrix has to be manually re-processed after new interactions were added to any of the applied processors. However, the processing happens on-line and you can keep track of the changed items so you only have to re-calculate the changed rows of the matrix.
47
+ You can add new interaction-sets to the processor incrementally, but the similarities for changed items have to be re-processed after new interactions were added. You can either re-process all items (recommender.process!) from time to time or keep track of the updates and only process the changed items (recommender.process_item!)
48
+
49
+
50
+ usage
51
+ -----
32
52
 
33
53
  ```ruby
34
54
 
35
55
  # Our similarity matrix, we calculate the similarity via co-concurrence
36
- # of items in "orders" and the co-concurrence of items in user-likes using
37
- # two `item x item` matrices and the jaccard/cosine similarity measure.
56
+ # of products in "orders" using the jaccard similarity measure.
38
57
  class MyRecommender < Recommendify::Base
39
58
 
40
- # store a maximum of fifty neighbors per item
59
+ # store only the top fifty neighbors per item
41
60
  max_neighbors 50
42
61
 
43
- # define an input data set "order_item_s". we'll add "order_id->item_id"
62
+ # define an input data set "order_items". we'll add "order_id->product_id"
44
63
  # pairs to this input and use the jaccard coefficient to retrieve a
45
64
  # "customers that ordered item i1 also ordered item i2" statement and apply
46
65
  # the result to the item<->item similarity matrix with a weight of 5.0
47
- input_matrix :order_items,
48
- :similarity_func => :jaccard,
66
+ input_matrix :order_items,
67
+ # :native => true,
68
+ :similarity_func => :jaccard,
49
69
  :weight => 5.0
50
-
51
- # define an input data set "like_item_s". we'll add "user_id->item_id"
52
- # pairs to this input and use a cosine-based similarity measure to retrieve
53
- # a "users that liked item i1 also liked item i2" statement and apply the
54
- # result to the item<->item similarity matrix with a weight of 1.0
55
- input_matrix :like_items
56
- :similarity_func => :cosine,
57
- :weight => 1.0
58
70
 
59
71
  end
60
72
 
61
73
  recommender = MyRecommender.new
62
74
 
63
- # add `order_id->item_id` interactions to the order_item_sim input
75
+ # add `order_id->product_id` interactions to the order_item_sim input
64
76
  # you can add data incrementally and call RecommendedItem.process! to update
65
77
  # the similarity matrix at any time.
66
- recommender.order_items.add_set("order1", ["item23", "item65", "item23"])
67
- recommender.order_items.add_set("order2", ["item14", "item23"])
68
-
69
- # add `user_id->item_id` interactions to the like_time_sim input
70
- recommender.like_items.add_set("user1", ["item23", "item65", "item23"])
71
- recommender.like_items.add_set("user2", ["item14", "item23"])
72
-
78
+ recommender.order_items.add_set("order1", ["product23", "product65", "productm23"])
79
+ recommender.order_items.add_set("order2", ["product14", "product23"])
73
80
 
74
81
  # Calculate all elements of the similarity matrix
75
82
  recommender.process!
76
83
 
77
84
  # ...or calculate a specific row of the similarity matrix (a specific item)
78
85
  # use this to avoid re-processing the whole matrix after incremental updates
79
- recommender.process_item!("item65")
86
+ recommender.process_item!("product65")
80
87
 
81
-
82
- # retrieve similar items to "item23"
88
+ # retrieve similar products to "product23"
83
89
  recommender.for("item23")
84
- => [ <Recommendify::Neighbor item_id:"item65" similarity:0.23>, (...) ]
85
-
90
+ => [ <Recommendify::Neighbor item_id:"product65" similarity:0.23>, (...) ]
86
91
 
87
- # remove "item23" from the similarity matrix and the input matrices. you should
92
+ # remove "product23" from the similarity matrix and the input matrices. you should
88
93
  # do this if your items 'expire', since it will speed up the calculation
89
- recommender.delete_item!("item23")
94
+ recommender.delete_item!("product23")
90
95
  ```
91
96
 
92
97
  ### how it works
93
98
 
94
- Recommendify keeps an incrementally updated `item x item` matrix, the "co-concurrency matrix". This matrix stores the number of times that a combination of two items has appeared in an interaction/preferrence set. The co-concurrence counts are processed with a similarity measure to retrieve another `item x item` similarity matrix, which is used to find the N most similar items for each item. This approach was described by Miranda, Alipio et al. [1]
99
+ Recommendify keeps an incrementally updated `item x item` matrix, the "co-concurrency matrix". This matrix stores the number of times that a combination of two items has appeared in an interaction/preferrence set. The co-concurrence counts are processed with a jaccard similarity measure to retrieve another `item x item` similarity matrix, which is used to find the N most similar items for each item. This is also called "Item-based Collaborative Filtering with binary ratings" (see Miranda, Alipio et al. [1])
95
100
 
96
101
  1. Group the input user->item pairs by user-id and store them into interaction sets
97
102
  2. For each item<->item combination in the interaction set increment the respective element in the co-concurrence matrix
@@ -103,7 +108,14 @@ Recommendify keeps an incrementally updated `item x item` matrix, the "co-concur
103
108
 
104
109
  The maximum number of entries in the co-concurrence and similarity matrix is k(n) = (n^2)-(n/2), it grows O(n^2). However, in a real scenario it is very unlikely that all item<->item combinations appear in a interaction set and we use a sparse matrix which will only use memory for elemtens with a value > 0. The size of the similarity grows O(n).
105
110
 
111
+ ### native/fast worker
106
112
 
113
+ After you have compiled the native worker, you can pass the `:native => true` option to the input_matrix. This speeds up processing by at least 10x.
114
+
115
+ ```
116
+ cd ~/.rvm/gems/ruby-1.9.3-p0/gems/recommendify-0.2.2/
117
+ bundle exec rake build_native
118
+ ```
107
119
 
108
120
  example
109
121
  -------
@@ -145,15 +157,8 @@ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLI
145
157
 
146
158
  ### todo
147
159
 
148
- + rake benchmark CLASS=MySimilarityMatrix
149
- + optimize JaccardInputMatrix
150
160
  + implement CosineInputMatrix
151
- + implement NativeJaccardInputMatrix (C)
152
- + implement NativeCosineInputMatrix (C)
153
- + todo: remove item (remove from all matrices)
154
- + redis prefix issue
155
161
  + forbid ':' and '|' in item_ids
156
162
  + recommendify::base no key part issue
157
- + optimize sparsematrix memory usage (somehow)
158
163
  + make max_row length configurable
159
- + option: only add items where co-concurreny/appearnce-count > n
164
+
@@ -44,6 +44,7 @@ class Recommendify::SimilarityMatrix
44
44
  # use activesupport's orderedhash?
45
45
  def retrieve_item(item_id)
46
46
  data = Recommendify.redis.hget(redis_key, item_id)
47
+ return {} if data.nil?
47
48
  Hash[data.split("|").map{ |i| (k,s=i.split(":")) && [k,s.to_f] }]
48
49
  end
49
50
 
data/recommendify.gemspec CHANGED
@@ -3,7 +3,7 @@ $:.push File.expand_path("../lib", __FILE__)
3
3
 
4
4
  Gem::Specification.new do |s|
5
5
  s.name = "recommendify"
6
- s.version = "0.2.2"
6
+ s.version = "0.2.3"
7
7
  s.date = Date.today.to_s
8
8
  s.platform = Gem::Platform::RUBY
9
9
  s.authors = ["Paul Asmuth"]
data/spec/base_spec.rb CHANGED
@@ -133,6 +133,11 @@ describe Recommendify::Base do
133
133
  sm.similarity_matrix.should_receive(:[]).with("fnorditem").and_return({:fooitem => 0.4, :baritem => 1.5})
134
134
  sm.for("fnorditem").length.should == 2
135
135
  end
136
+
137
+ it "should not throw exception for non existing items" do
138
+ sm = Recommendify::Base.new
139
+ sm.for("not_existing_item").length.should == 0
140
+ end
136
141
 
137
142
  it "should retrieve the n-most similar neighbors as Recommendify::Neighbor objects" do
138
143
  sm = Recommendify::Base.new
metadata CHANGED
@@ -1,47 +1,51 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: recommendify
3
- version: !ruby/object:Gem::Version
4
- version: 0.2.2
3
+ version: !ruby/object:Gem::Version
5
4
  prerelease:
5
+ version: 0.2.3
6
6
  platform: ruby
7
- authors:
7
+ authors:
8
8
  - Paul Asmuth
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-02-12 00:00:00.000000000 Z
13
- dependencies:
14
- - !ruby/object:Gem::Dependency
12
+
13
+ date: 2012-02-25 00:00:00 +01:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
15
17
  name: redis
16
- requirement: &71976830 !ruby/object:Gem::Requirement
18
+ prerelease: false
19
+ requirement: &id001 !ruby/object:Gem::Requirement
17
20
  none: false
18
- requirements:
19
- - - ! '>='
20
- - !ruby/object:Gem::Version
21
+ requirements:
22
+ - - ">="
23
+ - !ruby/object:Gem::Version
21
24
  version: 2.2.2
22
25
  type: :runtime
23
- prerelease: false
24
- version_requirements: *71976830
25
- - !ruby/object:Gem::Dependency
26
+ version_requirements: *id001
27
+ - !ruby/object:Gem::Dependency
26
28
  name: rspec
27
- requirement: &71976450 !ruby/object:Gem::Requirement
29
+ prerelease: false
30
+ requirement: &id002 !ruby/object:Gem::Requirement
28
31
  none: false
29
- requirements:
32
+ requirements:
30
33
  - - ~>
31
- - !ruby/object:Gem::Version
34
+ - !ruby/object:Gem::Version
32
35
  version: 2.8.0
33
36
  type: :development
34
- prerelease: false
35
- version_requirements: *71976450
37
+ version_requirements: *id002
36
38
  description: Distributed item-based "Collaborative Filtering" with ruby and redis
37
- email:
39
+ email:
38
40
  - paul@paulasmuth.com
39
41
  executables: []
42
+
40
43
  extensions: []
44
+
41
45
  extra_rdoc_files: []
42
- files:
46
+
47
+ files:
43
48
  - Gemfile
44
- - Gemfile.lock
45
49
  - README.md
46
50
  - Rakefile
47
51
  - doc/example.png
@@ -76,32 +80,35 @@ files:
76
80
  - src/recommendify.c
77
81
  - src/sort.c
78
82
  - src/version.h
83
+ has_rdoc: true
79
84
  homepage: http://github.com/paulasmuth/recommendify
80
- licenses:
85
+ licenses:
81
86
  - MIT
82
87
  post_install_message:
83
88
  rdoc_options: []
84
- require_paths:
89
+
90
+ require_paths:
85
91
  - lib
86
- required_ruby_version: !ruby/object:Gem::Requirement
92
+ required_ruby_version: !ruby/object:Gem::Requirement
87
93
  none: false
88
- requirements:
89
- - - ! '>='
90
- - !ruby/object:Gem::Version
91
- version: '0'
92
- required_rubygems_version: !ruby/object:Gem::Requirement
94
+ requirements:
95
+ - - ">="
96
+ - !ruby/object:Gem::Version
97
+ version: "0"
98
+ required_rubygems_version: !ruby/object:Gem::Requirement
93
99
  none: false
94
- requirements:
95
- - - ! '>='
96
- - !ruby/object:Gem::Version
97
- version: '0'
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: "0"
98
104
  requirements: []
105
+
99
106
  rubyforge_project:
100
- rubygems_version: 1.8.6
107
+ rubygems_version: 1.6.2
101
108
  signing_key:
102
109
  specification_version: 3
103
110
  summary: Distributed item-based "Collaborative Filtering" with ruby and redis
104
- test_files:
111
+ test_files:
105
112
  - spec/base_spec.rb
106
113
  - spec/cc_matrix_shared.rb
107
114
  - spec/cosine_input_matrix_spec.rb
@@ -112,4 +119,3 @@ test_files:
112
119
  - spec/similarity_matrix_spec.rb
113
120
  - spec/sparse_matrix_spec.rb
114
121
  - spec/spec_helper.rb
115
- has_rdoc:
data/Gemfile.lock DELETED
@@ -1,24 +0,0 @@
1
- GEM
2
- remote: http://rubygems.org/
3
- specs:
4
- diff-lcs (1.1.3)
5
- rake (0.9.2.2)
6
- redis (2.2.2)
7
- rspec (2.8.0)
8
- rspec-core (~> 2.8.0)
9
- rspec-expectations (~> 2.8.0)
10
- rspec-mocks (~> 2.8.0)
11
- rspec-core (2.8.0)
12
- rspec-expectations (2.8.0)
13
- diff-lcs (~> 1.1.2)
14
- rspec-mocks (2.8.0)
15
- yard (0.7.4)
16
-
17
- PLATFORMS
18
- ruby
19
-
20
- DEPENDENCIES
21
- rake
22
- redis
23
- rspec
24
- yard