RubyGems - cmfrec - Versions diffs - 0.1.0 - Mend

cmfrec 0.1.0

Files changed (12) hide show

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 2e9f45e0c3826b90788782ac8a0838476fc7849e75d91f2d5949b08b382a28c3
+  data.tar.gz: 2f44b988001bf2b23e3c5938b4c5e7b9089e8943d86aa42c86e72be0ff558ea1
+SHA512:
+  metadata.gz: 592ef4363bc016da1a35a958f99dd13a0fc7e900ecbe77b3ba95fc4b4cbbe151397924ff90aab2e326ac3369ac07c0380c76b1d3a2ac71b390664118de0f8611
+  data.tar.gz: 27d3e1ce80e88af9e8b062837f68329b10718372be15c08d4cd2e056619d1cd9e5ed906e6070798c32993ccea2f617b706af5e6afb7cba4f3cfc70ea22393ba6

data/CHANGELOG.md ADDED

@@ -0,0 +1,3 @@
+## 0.1.0 (2020-11-27)
+- First release

data/LICENSE.txt ADDED

@@ -0,0 +1,24 @@
+MIT License
+Copyright (c) 2020 David Cortes
+Copyright (c) 2020 Andrew Kane
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,191 @@
+# cmfrec
+:fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
+- Supports side information :tada:
+- Works with explicit and implicit feedback
+- Uses high-performance matrix factorization
+Not available for Windows yet
+[![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
+## Installation
+Add this line to your application’s Gemfile:
+```ruby
+gem 'cmfrec'
+```
+## Getting Started
+Create a recommender
+```ruby
+recommender = Cmfrec::Recommender.new
+```
+If users rate items directly, this is known as explicit feedback. Fit the recommender with:
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1, rating: 5},
+  {user_id: 2, item_id: 1, rating: 3}
+])
+```
+> IDs can be integers, strings, or any other data type
+If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1, value: 1},
+  {user_id: 2, item_id: 1, value: 1}
+])
+```
+> Use `value` instead of rating for implicit feedback
+Get recommendations - “users like you also liked”
+```ruby
+recommender.user_recs(user_id)
+```
+Get recommendations for a new user
+```ruby
+recommender.new_user_recs([
+  {item_id: 1, value: 5},
+  {item_id: 2, value: 3}
+])
+```
+Use the `count` option to specify the number of recommendations (default is 5)
+```ruby
+recommender.user_recs(user_id, count: 3)
+```
+Get predicted ratings for specific items
+```ruby
+recommender.user_recs(user_id, item_ids: [1, 2, 3])
+```
+## Side Information
+Add side information about users, items, or both
+```ruby
+user_info = [
+  {user_id: 1, a: 1, b: 1},
+  {user_id: 2, a: 1, b: 1},
+]
+item_info = [
+  {item_id: 1, c: 1, d: 1},
+  {item_id: 2, c: 1, d: 1},
+]
+recommender.fit(ratings, user_info: user_info, item_info: item_info)
+```
+Get recommendations for a new user with ratings and side information
+```ruby
+ratings = [
+  {item_id: 1, rating: 5},
+  {item_id: 2, rating: 3}
+]
+recommender.new_user_recs(ratings, user_info: {a: 1, b: 1})
+```
+Get recommendations with only side information
+```ruby
+recommender.new_user_recs([], user_info: {a: 1, b: 1})
+```
+## Options
+Specify the number of factors and epochs
+```ruby
+Cmfrec::Recommender.new(factors: 8, epochs: 20)
+```
+If recommendations look off, trying changing `factors`. The default is 8, but 3 could be good for some applications and 300 good for others.
+### Explicit Feedback
+Add implicit features
+```ruby
+Cmfrec::Recommender.new(add_implicit_features: true)
+```
+Disable bias
+```ruby
+Cmfrec::Recommender.new(user_bias: false, item_bias: false)
+```
+## Data
+Data can be an array of hashes
+```ruby
+[{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]
+```
+Or a Rover data frame
+```ruby
+Rover.read_csv("ratings.csv")
+```
+## Reference
+Get the global mean
+```ruby
+recommender.global_mean
+```
+Get the factors
+```ruby
+recommender.user_factors
+recommender.item_factors
+```
+Get the bias
+```ruby
+recommender.user_bias
+recommender.item_bias
+```
+## History
+View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
+## Contributing
+Everyone is encouraged to help improve this project. Here are a few ways you can help:
+- [Report bugs](https://github.com/ankane/cmfrec/issues)
+- Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
+- Write, clarify, or fix documentation
+- Suggest or add new features
+To get started with development:
+```sh
+git clone https://github.com/ankane/cmfrec.git
+cd cmfrec
+bundle install
+bundle exec rake vendor:all
+bundle exec rake test
+```

data/lib/cmfrec.rb ADDED

@@ -0,0 +1,28 @@
+# stdlib
+require "etc"
+require "fiddle/import"
+# modules
+require "cmfrec/recommender"
+require "cmfrec/version"
+module Cmfrec
+  class Error < StandardError; end
+  class << self
+    attr_accessor :ffi_lib
+  end
+  lib_name =
+    if Gem.win_platform?
+      "cmfrec.dll"
+    elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
+      "libcmfrec.dylib"
+    else
+      "libcmfrec.so"
+    end
+  vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
+  self.ffi_lib = [vendor_lib]
+  # friendlier error message
+  autoload :FFI, "cmfrec/ffi"
+end

data/lib/cmfrec/ffi.rb ADDED

@@ -0,0 +1,26 @@
+module Cmfrec
+  module FFI
+    extend Fiddle::Importer
+    libs = Cmfrec.ffi_lib.dup
+    begin
+      dlload Fiddle.dlopen(libs.shift)
+    rescue Fiddle::DLError => e
+      retry if libs.any?
+      raise e
+    end
+    typealias "bool", "char"
+    # determined by CMakeLists.txt
+    typealias "int_t", "int"
+    typealias "real_t", "double"
+    extern "int_t fit_collective_explicit_als(real_t *biasA, real_t *biasB, real_t *A, real_t *B, real_t *C, real_t *D, real_t *Ai, real_t *Bi, bool add_implicit_features, bool reset_values, int_t seed, real_t *glob_mean, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t *Xfull, real_t *weight, bool user_bias, bool item_bias, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_X, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t w_implicit, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, bool include_all_X, real_t *B_plus_bias, real_t *precomputedBtB, real_t *precomputedTransBtBinvBt, real_t *precomputedBeTBeChol, real_t *precomputedBiTBi, real_t *precomputedTransCtCinvCt, real_t *precomputedCtCw)"
+    extern "int_t fit_collective_implicit_als(real_t *A, real_t *B, real_t *C, real_t *D, bool reset_values, int_t seed, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t *w_main_multiplier, real_t alpha, bool adjust_weight, bool apply_log_transf, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, real_t *precomputedBtB, real_t *precomputedBeTBe, real_t *precomputedBeTBeChol)"
+    extern "int_t factors_collective_explicit_single(real_t *a_vec, real_t *a_bias, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, real_t *u_bin_vec, int_t pbin, bool NA_as_zero_U, bool NA_as_zero_X, bool nonneg, real_t *C, real_t *Cb, real_t glob_mean, real_t *biasB, real_t *U_colmeans, real_t *Xa, int_t X_col[], size_t nnz, real_t *Xa_dense, int_t n, real_t *weight, real_t *B, real_t *Bi, bool add_implicit_features, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t *lam_unique, real_t w_main, real_t w_user, real_t w_implicit, int_t n_max, bool include_all_X, real_t *TransBtBinvBt, real_t *BtB, real_t *BeTBeChol, real_t *BiTBi, real_t *CtCw, real_t *TransCtCinvCt, real_t *B_plus_bias)"
+    extern "int_t factors_collective_implicit_single(real_t *a_vec, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, bool NA_as_zero_U, bool nonneg, real_t *U_colmeans, real_t *B, int_t n, real_t *C, real_t *Xa, int_t X_col[], size_t nnz, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t alpha, real_t w_main, real_t w_user, real_t w_main_multiplier, bool apply_log_transf, real_t *BeTBe, real_t *BtB, real_t *BeTBeChol)"
+    extern "void predict_multiple(real_t *restrict A, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasA, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_main, int_t m, int_t n, int_t predA[], int_t predB[], size_t nnz, real_t *restrict outp, int_t nthreads)"
+    extern "int_t predict_X_old_collective_explicit(int_t row[], int_t col[], real_t *restrict predicted, size_t n_predict, real_t *restrict A, real_t *restrict biasA, real_t *restrict B, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_user, int_t k_item, int_t k_main, int_t m, int_t n_max, int_t nthreads)"
+    extern "int_t topN(real_t *restrict a_vec, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasB, real_t glob_mean, real_t biasA, int_t k, int_t k_main, int_t *restrict include_ix, int_t n_include, int_t *restrict exclude_ix, int_t n_exclude, int_t *restrict outp_ix, real_t *restrict outp_score, int_t n_top, int_t n, int_t nthreads)"
+  end
+end

data/lib/cmfrec/recommender.rb ADDED

@@ -0,0 +1,548 @@
+module Cmfrec
+  class Recommender
+    attr_reader :global_mean
+    def initialize(factors: 8, epochs: 10, verbose: true, user_bias: true, item_bias: true, add_implicit_features: false)
+      set_params(
+        k: factors,
+        niter: epochs,
+        verbose: verbose,
+        user_bias: user_bias,
+        item_bias: item_bias,
+        add_implicit_features: add_implicit_features
+      )
+    end
+    def fit(train_set, user_info: nil, item_info: nil)
+      train_set = to_dataset(train_set)
+      @implicit = !train_set.any? { |v| v[:rating] }
+      unless @implicit
+        ratings = train_set.map { |o| o[:rating] }
+        check_ratings(ratings)
+      end
+      check_training_set(train_set)
+      create_maps(train_set)
+      x_row = []
+      x_col = []
+      x_val = []
+      value_key = @implicit ? :value : :rating
+      train_set.each do |v|
+        x_row << @user_map[v[:user_id]]
+        x_col << @item_map[v[:item_id]]
+        x_val << (v[value_key] || 1)
+      end
+      @m = @user_map.size
+      @n = @item_map.size
+      nnz = train_set.size
+      x_row = int_ptr(x_row)
+      x_col = int_ptr(x_col)
+      x = real_ptr(x_val)
+      x_full = nil
+      weight = nil
+      lam_unique = nil
+      uu = nil
+      ii = nil
+      @user_info_map = {}
+      u_row, u_col, u_sp, nnz_u, @m_u, p_ = process_info(user_info, @user_map, @user_info_map, :user_id)
+      @item_info_map = {}
+      i_row, i_col, i_sp, nnz_i, @n_i, q = process_info(item_info, @item_map, @item_info_map, :item_id)
+      @precompute_for_predictions = false
+      # initialize w/ normal distribution
+      reset_values = true
+      @a = Fiddle::Pointer.malloc([@m, @m_u].max * (@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      @b = Fiddle::Pointer.malloc([@n, @n_i].max * (@k_item + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      @c = p_ > 0 ? Fiddle::Pointer.malloc(p_ * (@k_user + @k) * Fiddle::SIZEOF_DOUBLE) : nil
+      @d = q > 0 ? Fiddle::Pointer.malloc(q * (@k_item + @k) * Fiddle::SIZEOF_DOUBLE) : nil
+      @bias_a = nil
+      @bias_b = nil
+      u_colmeans = Fiddle::Pointer.malloc(p_ * Fiddle::SIZEOF_DOUBLE)
+      i_colmeans = Fiddle::Pointer.malloc(q * Fiddle::SIZEOF_DOUBLE)
+      if @implicit
+        @w_main_multiplier = 1.0
+        @alpha = 1.0
+        @adjust_weight = false # downweight?
+        @apply_log_transf = false
+        # different defaults
+        @lambda_ = 1e0
+        @w_user = 10
+        @w_item = 10
+        @finalize_chol = false
+        args = [
+          @a, @b,
+          @c, @d,
+          reset_values, @random_state,
+          u_colmeans, i_colmeans,
+          @m, @n, @k,
+          x_row, x_col, x, nnz,
+          @lambda_, lam_unique,
+          uu, @m_u, p_,
+          ii, @n_i, q,
+          u_row, u_col, u_sp, nnz_u,
+          i_row, i_col, i_sp, nnz_i,
+          @na_as_zero_user, @na_as_zero_item,
+          @k_main, @k_user, @k_item,
+          @w_main, @w_user, @w_item, real_ptr([@w_main_multiplier]),
+          @alpha, @adjust_weight, @apply_log_transf,
+          @niter, @nthreads, @verbose, @handle_interrupt,
+          @use_cg, @max_cg_steps, @finalize_chol,
+          @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
+          @precompute_for_predictions,
+          nil, #precomputedBtB,
+          nil, #precomputedBeTBe,
+          nil  #precomputedBeTBeChol
+        ]
+        check_status FFI.fit_collective_implicit_als(*fiddle_args(args))
+        @global_mean = 0
+      else
+        @bias_a = Fiddle::Pointer.malloc([@m, @m_u].max * Fiddle::SIZEOF_DOUBLE) if @user_bias
+        @bias_b = Fiddle::Pointer.malloc([@n, @n_i].max * Fiddle::SIZEOF_DOUBLE) if @item_bias
+        if @add_implicit_features
+          @ai = Fiddle::Pointer.malloc([@m, @m_u].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+          @bi = Fiddle::Pointer.malloc([@n, @n_i].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+        else
+          @ai = nil
+          @bi = nil
+        end
+        glob_mean = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
+        args = [
+          @bias_a, @bias_b,
+          @a, @b,
+          @c, @d,
+          @ai, @bi,
+          @add_implicit_features,
+          reset_values, @random_state,
+          glob_mean,
+          u_colmeans, i_colmeans,
+          @m, @n, @k,
+          x_row, x_col, x, nnz,
+          x_full,
+          weight,
+          @user_bias, @item_bias,
+          @lambda_, lam_unique,
+          uu, @m_u, p_,
+          ii, @n_i, q,
+          u_row, u_col, u_sp, nnz_u,
+          i_row, i_col, i_sp, nnz_i,
+          @na_as_zero, @na_as_zero_user, @na_as_zero_item,
+          @k_main, @k_user, @k_item,
+          @w_main, @w_user, @w_item, @w_implicit,
+          @niter, @nthreads, @verbose, @handle_interrupt,
+          @use_cg, @max_cg_steps, @finalize_chol,
+          @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
+          @precompute_for_predictions,
+          @include_all_x,
+          nil, #B_plus_bias,
+          nil, #precomputedBtB,
+          nil, #precomputedTransBtBinvBt,
+          nil, #precomputedBeTBeChol,
+          nil, #precomputedBiTBi,
+          nil, #precomputedTransCtCinvCt,
+          nil  #precomputedCtCw
+        ]
+        check_status FFI.fit_collective_explicit_als(*fiddle_args(args))
+        @global_mean = real_array(glob_mean).first
+      end
+      @u_colmeans = real_array(u_colmeans)
+      @i_colmeans = real_array(i_colmeans)
+      @u_colmeans_ptr = u_colmeans
+      self
+    end
+    def user_recs(user_id, count: 5, item_ids: nil)
+      check_fit
+      user = @user_map[user_id]
+      if user
+        if item_ids
+          # remove missing ids
+          item_ids = item_ids.select { |v| @item_map[v] }
+          pred_a = int_ptr([@user_map[user_id]] * item_ids.size)
+          pred_b = int_ptr(item_ids.map { |v| @item_map[v] })
+          nnz = item_ids.size
+          outp = Fiddle::Pointer.malloc(nnz * Fiddle::SIZEOF_DOUBLE)
+          FFI.predict_multiple(
+            @a, @k_user,
+            @b, @k_item,
+            @bias_a, @bias_b,
+            @global_mean,
+            @k, @k_main,
+            @m, @n,
+            pred_a, pred_b, nnz,
+            outp,
+            @nthreads
+          )
+          scores = real_array(outp)
+          item_ids.zip(scores).map do |item_id, score|
+            {item_id: item_id, score: score}
+          end
+        else
+          a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
+          a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
+          top_n(a_vec: a_vec, a_bias: a_bias, count: count)
+        end
+      else
+        # no items if user is unknown
+        # TODO maybe most popular items
+        []
+      end
+    end
+    # TODO add item_ids
+    def new_user_recs(data, count: 5, user_info: nil)
+      check_fit
+      a_vec, a_bias = factors_warm(data, user_info: user_info)
+      top_n(a_vec: a_vec, a_bias: a_bias, count: count)
+    end
+    def user_factors
+      read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
+    end
+    def item_factors
+      read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
+    end
+    def user_bias
+      read_bias(@bias_a) if @bias_a
+    end
+    def item_bias
+      read_bias(@bias_b) if @bias_b
+    end
+    private
+    def set_params(
+      k: 40, lambda_: 1e+1, method: "als", use_cg: true, user_bias: true,
+      item_bias: true, add_implicit_features: false,
+      k_user: 0, k_item: 0, k_main: 0,
+      w_main: 1.0, w_user: 1.0, w_item: 1.0, w_implicit: 0.5,
+      maxiter: 800, niter: 10, parallelize: "separate", corr_pairs: 4,
+      max_cg_steps: 3, finalize_chol: true,
+      na_as_zero: false, na_as_zero_user: false, na_as_zero_item: false,
+      nonneg: false, nonneg_c: false, nonneg_d: false, max_cd_steps: 100,
+      precompute_for_predictions: true, include_all_x: true,
+      use_float: false,
+      random_state: 1, verbose: true, print_every: 10,
+      handle_interrupt: true, produce_dicts: false,
+      copy_data: true, nthreads: -1
+    )
+      @k = k
+      @k_user = k_user
+      @k_item = k_item
+      @k_main = k_main
+      @lambda_ = lambda_
+      @w_main = w_main
+      @w_user = w_user
+      @w_item = w_item
+      @w_implicit = w_implicit
+      @user_bias = !!user_bias
+      @item_bias = !!item_bias
+      @method = method
+      @add_implicit_features = !!add_implicit_features
+      @use_cg = !!use_cg
+      @max_cg_steps = max_cg_steps.to_i
+      @max_cd_steps = max_cd_steps.to_i
+      @finalize_chol = !!finalize_chol
+      @maxiter = maxiter
+      @niter = niter
+      @parallelize = parallelize
+      @na_as_zero = !!na_as_zero
+      @na_as_zero_user = !!na_as_zero_user
+      @na_as_zero_item = !!na_as_zero_item
+      @nonneg = !!nonneg
+      @nonneg_c = !!nonneg_c
+      @nonneg_d = !!nonneg_d
+      @precompute_for_predictions = !!precompute_for_predictions
+      @include_all_x = true
+      @use_float = !!use_float
+      @verbose = !!verbose
+      @print_every = print_every
+      @corr_pairs = corr_pairs
+      @random_state = random_state.to_i
+      @produce_dicts = !!produce_dicts
+      @handle_interrupt = !!handle_interrupt
+      @copy_data = !!copy_data
+      nthreads = Etc.nprocessors if nthreads < 0
+      @nthreads = nthreads
+    end
+    def create_maps(train_set)
+      user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
+      item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
+      raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
+      raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
+      @user_map = user_ids.zip(user_ids.size.times).to_h
+      @item_map = item_ids.zip(item_ids.size.times).to_h
+    end
+    def check_ratings(ratings)
+      unless ratings.all? { |r| !r.nil? }
+        raise ArgumentError, "Missing ratings"
+      end
+      unless ratings.all? { |r| r.is_a?(Numeric) }
+        raise ArgumentError, "Ratings must be numeric"
+      end
+    end
+    def check_training_set(train_set)
+      raise ArgumentError, "No training data" if train_set.empty?
+    end
+    def check_fit
+      raise "Not fit" unless defined?(@implicit)
+    end
+    def to_dataset(dataset)
+      if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
+        # convert keys to symbols
+        dataset = dataset.dup
+        dataset.keys.each do |k, v|
+          dataset[k.to_sym] ||= dataset.delete(k)
+        end
+        dataset.to_a
+      elsif defined?(Daru::DataFrame) && dataset.is_a?(Daru::DataFrame)
+        # convert keys to symbols
+        dataset = dataset.dup
+        new_names = dataset.vectors.to_a.map { |k| [k, k.to_sym] }.to_h
+        dataset.rename_vectors!(new_names)
+        dataset.to_a[0]
+      else
+        dataset
+      end
+    end
+    def read_factors(ptr, d1, d2)
+      arr = []
+      offset = 0
+      width = d2 * Fiddle::SIZEOF_DOUBLE
+      d1.times do |i|
+        arr << ptr[offset, width].unpack("d*")
+        offset += width
+      end
+      arr
+    end
+    def read_bias(ptr)
+      real_array(ptr)
+    end
+    def top_n(a_vec:, a_bias:, count:)
+      include_ix = nil
+      n_include = 0
+      exclude_ix = nil
+      n_exclude = 0
+      outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
+      outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
+      check_status FFI.topN(
+        a_vec, @k_user,
+        @b, @k_item,
+        @bias_b, @global_mean, a_bias,
+        @k, @k_main,
+        include_ix, n_include,
+        exclude_ix, n_exclude,
+        outp_ix, outp_score,
+        count, @n,
+        @nthreads
+      )
+      imap = @item_map.map(&:reverse).to_h
+      item_ids = int_array(outp_ix).map { |v| imap[v] }
+      scores = real_array(outp_score)
+      item_ids.zip(scores).map do |item_id, score|
+        {item_id: item_id, score: score}
+      end
+    end
+    def factors_warm(data, user_info: nil)
+      data = to_dataset(data)
+      user_info = to_dataset(user_info) if user_info
+      nnz = data.size
+      a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
+      u_vec_sp = []
+      u_vec_x_col = []
+      if user_info
+        user_info.each do |k, v|
+          next if k == :user_id
+          uc = @user_info_map[k]
+          raise "Bad key: #{k}" unless uc
+          u_vec_x_col << uc
+          u_vec_sp << v
+        end
+      end
+      p_ = @user_info_map.size
+      nnz_u_vec = u_vec_sp.size
+      u_vec_x_col = int_ptr(u_vec_x_col)
+      u_vec_sp = real_ptr(u_vec_sp)
+      u_vec = nil
+      u_bin_vec = nil
+      pbin = 0
+      weight = nil
+      lam_unique = nil
+      n_max = @n
+      if data.any?
+        if @implicit
+          ratings = data.map { |d| d[:value] || 1 }
+        else
+          ratings = data.map { |d| d[:rating] }
+          check_ratings(ratings)
+        end
+        xa = real_ptr(ratings)
+        x_col = int_ptr(data.map { |d| d[:item_id] })
+      else
+        xa = nil
+        x_col = nil
+      end
+      xa_dense = nil
+      if @implicit
+        args = [
+          a_vec,
+          u_vec, p_,
+          u_vec_sp, u_vec_x_col, nnz_u_vec,
+          @na_as_zero_user,
+          @nonneg,
+          @u_colmeans_ptr,
+          @b, @n, @c,
+          xa, x_col, nnz,
+          @k, @k_user, @k_item, @k_main,
+          @lambda_, @alpha,
+          @w_main, @w_user, @w_main_multiplier,
+          @apply_log_transf,
+          nil, #BeTBe,
+          nil, #BtB
+          nil  #BeTBeChol
+        ]
+        check_status FFI.factors_collective_implicit_single(*fiddle_args(args))
+      else
+        cb = nil
+        args = [
+          a_vec, bias_a,
+          u_vec, p_,
+          u_vec_sp, u_vec_x_col, nnz_u_vec,
+          u_bin_vec, pbin,
+          @na_as_zero_user, @na_as_zero,
+          @nonneg,
+          @c, cb,
+          @global_mean, @bias_b, @u_colmeans_ptr,
+          xa, x_col, nnz, xa_dense,
+          @n, weight, @b, @bi,
+          @add_implicit_features,
+          @k, @k_user, @k_item, @k_main,
+          @lambda_, lam_unique,
+          @w_main, @w_user, @w_implicit,
+          n_max,
+          @include_all_x,
+          nil, #TransBtBinvBt,
+          nil, #BtB,
+          nil, #BeTBeChol,
+          nil, #BiTBi,
+          nil, #CtCw,
+          nil, #TransCtCinvCt,
+          nil  #B_plus_bias
+        ]
+        check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
+      end
+      [a_vec, real_array(bias_a).first]
+    end
+    # convert boolean to int
+    def fiddle_args(args)
+      args.map { |v| v == true || v == false ? (v ? 1 : 0) : v }
+    end
+    def check_status(ret_val)
+      case ret_val
+      when 0
+        # success
+      when 1
+        raise "Could not allocate sufficient memory"
+      else
+        raise "Bad status: #{ret_val}"
+      end
+    end
+    def process_info(info, map, info_map, key)
+      return [nil, nil, nil, 0, 0, 0] unless info
+      info = to_dataset(info)
+      row = []
+      col = []
+      val = []
+      info.each do |ri|
+        rk = ri[key]
+        raise ArgumentError, "Missing #{key}" unless rk
+        r = (map[rk] ||= map.size)
+        ri.each do |k, v|
+          next if k == key
+          row << r
+          col << (info_map[k] ||= info_map.size)
+          val << v
+        end
+      end
+      [int_ptr(row), int_ptr(col), real_ptr(val), val.size, map.size, info_map.size]
+    end
+    def int_ptr(v)
+      v.pack("i*")
+    end
+    def real_ptr(v)
+      v.pack("d*")
+    end
+    def int_array(ptr)
+      ptr.to_s(ptr.size).unpack("i*")
+    end
+    def real_array(ptr)
+      ptr.to_s(ptr.size).unpack("d*")
+    end
+  end
+end

data/lib/cmfrec/version.rb ADDED

@@ -0,0 +1,3 @@
+module Cmfrec
+  VERSION = "0.1.0"
+end

data/vendor/LICENSE.txt ADDED

@@ -0,0 +1,74 @@
+MIT License
+Copyright (c) 2020 David Cortes
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.
+---
+ANSI C implementation of vector operations.
+Copyright (c) 2007-2010 Naoaki Okazaki
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+---
+C library of Limited memory BFGS (L-BFGS).
+Copyright (c) 1990, Jorge Nocedal
+Copyright (c) 2007-2010 Naoaki Okazaki
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/vendor/libcmfrec.dylib ADDED

Binary file

data/vendor/libcmfrec.so ADDED

Binary file

metadata ADDED

@@ -0,0 +1,52 @@
+--- !ruby/object:Gem::Specification
+name: cmfrec
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Andrew Kane
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2020-11-28 00:00:00.000000000 Z
+dependencies: []
+description:
+email: andrew@chartkick.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE.txt
+- README.md
+- lib/cmfrec.rb
+- lib/cmfrec/ffi.rb
+- lib/cmfrec/recommender.rb
+- lib/cmfrec/version.rb
+- vendor/LICENSE.txt
+- vendor/libcmfrec.dylib
+- vendor/libcmfrec.so
+homepage: https://github.com/ankane/cmfrec
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '2.5'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.1.4
+signing_key:
+specification_version: 4
+summary: Recommendations for Ruby using collective matrix factorization
+test_files: []