RubyGems - cmfrec - Versions diffs - 0.1.0 - Mend

cmfrec 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 2e9f45e0c3826b90788782ac8a0838476fc7849e75d91f2d5949b08b382a28c3
+  data.tar.gz: 2f44b988001bf2b23e3c5938b4c5e7b9089e8943d86aa42c86e72be0ff558ea1
+SHA512:
+  metadata.gz: 592ef4363bc016da1a35a958f99dd13a0fc7e900ecbe77b3ba95fc4b4cbbe151397924ff90aab2e326ac3369ac07c0380c76b1d3a2ac71b390664118de0f8611
+  data.tar.gz: 27d3e1ce80e88af9e8b062837f68329b10718372be15c08d4cd2e056619d1cd9e5ed906e6070798c32993ccea2f617b706af5e6afb7cba4f3cfc70ea22393ba6

data/CHANGELOG.md ADDED

@@ -0,0 +1,3 @@
+## 0.1.0 (2020-11-27)
+- First release

data/LICENSE.txt ADDED

@@ -0,0 +1,24 @@
+MIT License
+Copyright (c) 2020 David Cortes
+Copyright (c) 2020 Andrew Kane
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,191 @@
+# cmfrec
+:fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
+- Supports side information :tada:
+- Works with explicit and implicit feedback
+- Uses high-performance matrix factorization
+Not available for Windows yet
+[![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
+## Installation
+Add this line to your application’s Gemfile:
+```ruby
+gem 'cmfrec'
+```
+## Getting Started
+Create a recommender
+```ruby
+recommender = Cmfrec::Recommender.new
+```
+If users rate items directly, this is known as explicit feedback. Fit the recommender with:
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1, rating: 5},
+  {user_id: 2, item_id: 1, rating: 3}
+])
+```
+> IDs can be integers, strings, or any other data type
+If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1, value: 1},
+  {user_id: 2, item_id: 1, value: 1}
+])
+```
+> Use `value` instead of rating for implicit feedback
+Get recommendations - “users like you also liked”
+```ruby
+recommender.user_recs(user_id)
+```
+Get recommendations for a new user
+```ruby
+recommender.new_user_recs([
+  {item_id: 1, value: 5},
+  {item_id: 2, value: 3}
+])
+```
+Use the `count` option to specify the number of recommendations (default is 5)
+```ruby
+recommender.user_recs(user_id, count: 3)
+```
+Get predicted ratings for specific items
+```ruby
+recommender.user_recs(user_id, item_ids: [1, 2, 3])
+```
+## Side Information
+Add side information about users, items, or both
+```ruby
+user_info = [
+  {user_id: 1, a: 1, b: 1},
+  {user_id: 2, a: 1, b: 1},
+]
+item_info = [
+  {item_id: 1, c: 1, d: 1},
+  {item_id: 2, c: 1, d: 1},
+]
+recommender.fit(ratings, user_info: user_info, item_info: item_info)
+```
+Get recommendations for a new user with ratings and side information
+```ruby
+ratings = [
+  {item_id: 1, rating: 5},
+  {item_id: 2, rating: 3}
+]
+recommender.new_user_recs(ratings, user_info: {a: 1, b: 1})
+```
+Get recommendations with only side information
+```ruby
+recommender.new_user_recs([], user_info: {a: 1, b: 1})
+```
+## Options
+Specify the number of factors and epochs
+```ruby
+Cmfrec::Recommender.new(factors: 8, epochs: 20)
+```
+If recommendations look off, trying changing `factors`. The default is 8, but 3 could be good for some applications and 300 good for others.
+### Explicit Feedback
+Add implicit features
+```ruby
+Cmfrec::Recommender.new(add_implicit_features: true)
+```
+Disable bias
+```ruby
+Cmfrec::Recommender.new(user_bias: false, item_bias: false)
+```
+## Data
+Data can be an array of hashes
+```ruby
+[{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]
+```
+Or a Rover data frame
+```ruby
+Rover.read_csv("ratings.csv")
+```
+## Reference
+Get the global mean
+```ruby
+recommender.global_mean
+```
+Get the factors
+```ruby
+recommender.user_factors
+recommender.item_factors
+```
+Get the bias
+```ruby
+recommender.user_bias
+recommender.item_bias
+```
+## History
+View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
+## Contributing
+Everyone is encouraged to help improve this project. Here are a few ways you can help:
+- [Report bugs](https://github.com/ankane/cmfrec/issues)
+- Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
+- Write, clarify, or fix documentation
+- Suggest or add new features
+To get started with development:
+```sh
+git clone https://github.com/ankane/cmfrec.git
+cd cmfrec
+bundle install
+bundle exec rake vendor:all
+bundle exec rake test
+```

data/lib/cmfrec.rb ADDED

@@ -0,0 +1,28 @@
+# stdlib
+require "etc"
+require "fiddle/import"
+# modules
+require "cmfrec/recommender"
+require "cmfrec/version"
+module Cmfrec
+  class Error < StandardError; end
+  class << self
+    attr_accessor :ffi_lib
+  end
+  lib_name =
+    if Gem.win_platform?
+      "cmfrec.dll"
+    elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
+      "libcmfrec.dylib"
+    else
+      "libcmfrec.so"
+    end
+  vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
+  self.ffi_lib = [vendor_lib]
+  # friendlier error message
+  autoload :FFI, "cmfrec/ffi"
+end

data/lib/cmfrec/ffi.rb ADDED

@@ -0,0 +1,26 @@
+module Cmfrec
+  module FFI
+    extend Fiddle::Importer
+    libs = Cmfrec.ffi_lib.dup
+    begin
+      dlload Fiddle.dlopen(libs.shift)
+    rescue Fiddle::DLError => e
+      retry if libs.any?
+      raise e
+    end
+    typealias "bool", "char"
+    # determined by CMakeLists.txt
+    typealias "int_t", "int"
+    typealias "real_t", "double"
+    extern "int_t fit_collective_explicit_als(real_t *biasA, real_t *biasB, real_t *A, real_t *B, real_t *C, real_t *D, real_t *Ai, real_t *Bi, bool add_implicit_features, bool reset_values, int_t seed, real_t *glob_mean, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t *Xfull, real_t *weight, bool user_bias, bool item_bias, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_X, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t w_implicit, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, bool include_all_X, real_t *B_plus_bias, real_t *precomputedBtB, real_t *precomputedTransBtBinvBt, real_t *precomputedBeTBeChol, real_t *precomputedBiTBi, real_t *precomputedTransCtCinvCt, real_t *precomputedCtCw)"
+    extern "int_t fit_collective_implicit_als(real_t *A, real_t *B, real_t *C, real_t *D, bool reset_values, int_t seed, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t *w_main_multiplier, real_t alpha, bool adjust_weight, bool apply_log_transf, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, real_t *precomputedBtB, real_t *precomputedBeTBe, real_t *precomputedBeTBeChol)"
+    extern "int_t factors_collective_explicit_single(real_t *a_vec, real_t *a_bias, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, real_t *u_bin_vec, int_t pbin, bool NA_as_zero_U, bool NA_as_zero_X, bool nonneg, real_t *C, real_t *Cb, real_t glob_mean, real_t *biasB, real_t *U_colmeans, real_t *Xa, int_t X_col[], size_t nnz, real_t *Xa_dense, int_t n, real_t *weight, real_t *B, real_t *Bi, bool add_implicit_features, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t *lam_unique, real_t w_main, real_t w_user, real_t w_implicit, int_t n_max, bool include_all_X, real_t *TransBtBinvBt, real_t *BtB, real_t *BeTBeChol, real_t *BiTBi, real_t *CtCw, real_t *TransCtCinvCt, real_t *B_plus_bias)"
+    extern "int_t factors_collective_implicit_single(real_t *a_vec, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, bool NA_as_zero_U, bool nonneg, real_t *U_colmeans, real_t *B, int_t n, real_t *C, real_t *Xa, int_t X_col[], size_t nnz, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t alpha, real_t w_main, real_t w_user, real_t w_main_multiplier, bool apply_log_transf, real_t *BeTBe, real_t *BtB, real_t *BeTBeChol)"
+    extern "void predict_multiple(real_t *restrict A, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasA, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_main, int_t m, int_t n, int_t predA[], int_t predB[], size_t nnz, real_t *restrict outp, int_t nthreads)"
+    extern "int_t predict_X_old_collective_explicit(int_t row[], int_t col[], real_t *restrict predicted, size_t n_predict, real_t *restrict A, real_t *restrict biasA, real_t *restrict B, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_user, int_t k_item, int_t k_main, int_t m, int_t n_max, int_t nthreads)"
+    extern "int_t topN(real_t *restrict a_vec, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasB, real_t glob_mean, real_t biasA, int_t k, int_t k_main, int_t *restrict include_ix, int_t n_include, int_t *restrict exclude_ix, int_t n_exclude, int_t *restrict outp_ix, real_t *restrict outp_score, int_t n_top, int_t n, int_t nthreads)"
+  end
+end

data/lib/cmfrec/recommender.rb ADDED

@@ -0,0 +1,548 @@
+module Cmfrec
+  class Recommender
+    attr_reader :global_mean
+    def initialize(factors: 8, epochs: 10, verbose: true, user_bias: true, item_bias: true, add_implicit_features: false)
+      set_params(
+        k: factors,
+        niter: epochs,
+        verbose: verbose,
+        user_bias: user_bias,
+        item_bias: item_bias,
+        add_implicit_features: add_implicit_features
+      )
+    end
+    def fit(train_set, user_info: nil, item_info: nil)
+      train_set = to_dataset(train_set)
+      @implicit = !train_set.any? { |v| v[:rating] }
+      unless @implicit
+        ratings = train_set.map { |o| o[:rating] }
+        check_ratings(ratings)
+      end
+      check_training_set(train_set)
+      create_maps(train_set)
+      x_row = []
+      x_col = []
+      x_val = []
+      value_key = @implicit ? :value : :rating
+      train_set.each do |v|
+        x_row << @user_map[v[:user_id]]
+        x_col << @item_map[v[:item_id]]
+        x_val << (v[value_key] || 1)
+      end
+      @m = @user_map.size
+      @n = @item_map.size
+      nnz = train_set.size
+      x_row = int_ptr(x_row)
+      x_col = int_ptr(x_col)
+      x = real_ptr(x_val)
+      x_full = nil
+      weight = nil
+      lam_unique = nil
+      uu = nil
+      ii = nil
+      @user_info_map = {}
+      u_row, u_col, u_sp, nnz_u, @m_u, p_ = process_info(user_info, @user_map, @user_info_map, :user_id)
+      @item_info_map = {}
+      i_row, i_col, i_sp, nnz_i, @n_i, q = process_info(item_info, @item_map, @item_info_map, :item_id)
+      @precompute_for_predictions = false
+      # initialize w/ normal distribution
+      reset_values = true
+      @a = Fiddle::Pointer.malloc([@m, @m_u].max * (@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      @b = Fiddle::Pointer.malloc([@n, @n_i].max * (@k_item + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      @c = p_ > 0 ? Fiddle::Pointer.malloc(p_ * (@k_user + @k) * Fiddle::SIZEOF_DOUBLE) : nil
+      @d = q > 0 ? Fiddle::Pointer.malloc(q * (@k_item + @k) * Fiddle::SIZEOF_DOUBLE) : nil
+      @bias_a = nil
+      @bias_b = nil
+      u_colmeans = Fiddle::Pointer.malloc(p_ * Fiddle::SIZEOF_DOUBLE)
+      i_colmeans = Fiddle::Pointer.malloc(q * Fiddle::SIZEOF_DOUBLE)
+      if @implicit
+        @w_main_multiplier = 1.0
+        @alpha = 1.0
+        @adjust_weight = false # downweight?
+        @apply_log_transf = false
+        # different defaults
+        @lambda_ = 1e0
+        @w_user = 10
+        @w_item = 10
+        @finalize_chol = false
+        args = [
+          @a, @b,
+          @c, @d,
+          reset_values, @random_state,
+          u_colmeans, i_colmeans,
+          @m, @n, @k,
+          x_row, x_col, x, nnz,
+          @lambda_, lam_unique,
+          uu, @m_u, p_,
+          ii, @n_i, q,
+          u_row, u_col, u_sp, nnz_u,
+          i_row, i_col, i_sp, nnz_i,
+          @na_as_zero_user, @na_as_zero_item,
+          @k_main, @k_user, @k_item,
+          @w_main, @w_user, @w_item, real_ptr([@w_main_multiplier]),
+          @alpha, @adjust_weight, @apply_log_transf,
+          @niter, @nthreads, @verbose, @handle_interrupt,
+          @use_cg, @max_cg_steps, @finalize_chol,
+          @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
+          @precompute_for_predictions,
+          nil, #precomputedBtB,
+          nil, #precomputedBeTBe,
+          nil  #precomputedBeTBeChol
+        ]
+        check_status FFI.fit_collective_implicit_als(*fiddle_args(args))
+        @global_mean = 0
+      else
+        @bias_a = Fiddle::Pointer.malloc([@m, @m_u].max * Fiddle::SIZEOF_DOUBLE) if @user_bias
+        @bias_b = Fiddle::Pointer.malloc([@n, @n_i].max * Fiddle::SIZEOF_DOUBLE) if @item_bias
+        if @add_implicit_features
+          @ai = Fiddle::Pointer.malloc([@m, @m_u].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+          @bi = Fiddle::Pointer.malloc([@n, @n_i].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+        else
+          @ai = nil
+          @bi = nil
+        end
+        glob_mean = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
+        args = [
+          @bias_a, @bias_b,
+          @a, @b,
+          @c, @d,
+          @ai, @bi,
+          @add_implicit_features,
+          reset_values, @random_state,
+          glob_mean,
+          u_colmeans, i_colmeans,
+          @m, @n, @k,
+          x_row, x_col, x, nnz,
+          x_full,
+          weight,
+          @user_bias, @item_bias,
+          @lambda_, lam_unique,
+          uu, @m_u, p_,
+          ii, @n_i, q,
+          u_row, u_col, u_sp, nnz_u,
+          i_row, i_col, i_sp, nnz_i,
+          @na_as_zero, @na_as_zero_user, @na_as_zero_item,
+          @k_main, @k_user, @k_item,
+          @w_main, @w_user, @w_item, @w_implicit,
+          @niter, @nthreads, @verbose, @handle_interrupt,
+          @use_cg, @max_cg_steps, @finalize_chol,
+          @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
+          @precompute_for_predictions,
+          @include_all_x,
+          nil, #B_plus_bias,
+          nil, #precomputedBtB,
+          nil, #precomputedTransBtBinvBt,
+          nil, #precomputedBeTBeChol,
+          nil, #precomputedBiTBi,
+          nil, #precomputedTransCtCinvCt,
+          nil  #precomputedCtCw
+        ]
+        check_status FFI.fit_collective_explicit_als(*fiddle_args(args))
+        @global_mean = real_array(glob_mean).first
+      end
+      @u_colmeans = real_array(u_colmeans)
+      @i_colmeans = real_array(i_colmeans)
+      @u_colmeans_ptr = u_colmeans
+      self
+    end
+    def user_recs(user_id, count: 5, item_ids: nil)
+      check_fit
+      user = @user_map[user_id]
+      if user
+        if item_ids
+          # remove missing ids
+          item_ids = item_ids.select { |v| @item_map[v] }
+          pred_a = int_ptr([@user_map[user_id]] * item_ids.size)
+          pred_b = int_ptr(item_ids.map { |v| @item_map[v] })
+          nnz = item_ids.size
+          outp = Fiddle::Pointer.malloc(nnz * Fiddle::SIZEOF_DOUBLE)
+          FFI.predict_multiple(
+            @a, @k_user,
+            @b, @k_item,
+            @bias_a, @bias_b,
+            @global_mean,
+            @k, @k_main,
+            @m, @n,
+            pred_a, pred_b, nnz,
+            outp,
+            @nthreads
+          )
+          scores = real_array(outp)
+          item_ids.zip(scores).map do |item_id, score|
+            {item_id: item_id, score: score}
+          end
+        else
+          a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
+          a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
+          top_n(a_vec: a_vec, a_bias: a_bias, count: count)
+        end
+      else
+        # no items if user is unknown
+        # TODO maybe most popular items
+        []
+      end
+    end
+    # TODO add item_ids
+    def new_user_recs(data, count: 5, user_info: nil)
+      check_fit
+      a_vec, a_bias = factors_warm(data, user_info: user_info)
+      top_n(a_vec: a_vec, a_bias: a_bias, count: count)
+    end
+    def user_factors
+      read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
+    end
+    def item_factors
+      read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
+    end
+    def user_bias
+      read_bias(@bias_a) if @bias_a
+    end
+    def item_bias
+      read_bias(@bias_b) if @bias_b
+    end
+    private
+    def set_params(
+      k: 40, lambda_: 1e+1, method: "als", use_cg: true, user_bias: true,
+      item_bias: true, add_implicit_features: false,
+      k_user: 0, k_item: 0, k_main: 0,
+      w_main: 1.0, w_user: 1.0, w_item: 1.0, w_implicit: 0.5,
+      maxiter: 800, niter: 10, parallelize: "separate", corr_pairs: 4,
+      max_cg_steps: 3, finalize_chol: true,
+      na_as_zero: false, na_as_zero_user: false, na_as_zero_item: false,
+      nonneg: false, nonneg_c: false, nonneg_d: false, max_cd_steps: 100,
+      precompute_for_predictions: true, include_all_x: true,
+      use_float: false,
+      random_state: 1, verbose: true, print_every: 10,
+      handle_interrupt: true, produce_dicts: false,
+      copy_data: true, nthreads: -1
+    )
+      @k = k
+      @k_user = k_user
+      @k_item = k_item
+      @k_main = k_main
+      @lambda_ = lambda_
+      @w_main = w_main
+      @w_user = w_user
+      @w_item = w_item
+      @w_implicit = w_implicit
+      @user_bias = !!user_bias
+      @item_bias = !!item_bias
+      @method = method
+      @add_implicit_features = !!add_implicit_features
+      @use_cg = !!use_cg
+      @max_cg_steps = max_cg_steps.to_i
+      @max_cd_steps = max_cd_steps.to_i
+      @finalize_chol = !!finalize_chol
+      @maxiter = maxiter
+      @niter = niter
+      @parallelize = parallelize
+      @na_as_zero = !!na_as_zero
+      @na_as_zero_user = !!na_as_zero_user
+      @na_as_zero_item = !!na_as_zero_item
+      @nonneg = !!nonneg
+      @nonneg_c = !!nonneg_c
+      @nonneg_d = !!nonneg_d
+      @precompute_for_predictions = !!precompute_for_predictions
+      @include_all_x = true
+      @use_float = !!use_float
+      @verbose = !!verbose
+      @print_every = print_every
+      @corr_pairs = corr_pairs
+      @random_state = random_state.to_i
+      @produce_dicts = !!produce_dicts
+      @handle_interrupt = !!handle_interrupt
+      @copy_data = !!copy_data
+      nthreads = Etc.nprocessors if nthreads < 0
+      @nthreads = nthreads
+    end
+    def create_maps(train_set)
+      user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
+      item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
+      raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
+      raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
+      @user_map = user_ids.zip(user_ids.size.times).to_h
+      @item_map = item_ids.zip(item_ids.size.times).to_h
+    end
+    def check_ratings(ratings)
+      unless ratings.all? { |r| !r.nil? }
+        raise ArgumentError, "Missing ratings"
+      end
+      unless ratings.all? { |r| r.is_a?(Numeric) }
+        raise ArgumentError, "Ratings must be numeric"
+      end
+    end
+    def check_training_set(train_set)
+      raise ArgumentError, "No training data" if train_set.empty?
+    end
+    def check_fit
+      raise "Not fit" unless defined?(@implicit)
+    end
+    def to_dataset(dataset)
+      if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
+        # convert keys to symbols
+        dataset = dataset.dup
+        dataset.keys.each do |k, v|
+          dataset[k.to_sym] ||= dataset.delete(k)
+        end
+        dataset.to_a
+      elsif defined?(Daru::DataFrame) && dataset.is_a?(Daru::DataFrame)
+        # convert keys to symbols
+        dataset = dataset.dup
+        new_names = dataset.vectors.to_a.map { |k| [k, k.to_sym] }.to_h
+        dataset.rename_vectors!(new_names)
+        dataset.to_a[0]
+      else
+        dataset
+      end
+    end
+    def read_factors(ptr, d1, d2)
+      arr = []
+      offset = 0
+      width = d2 * Fiddle::SIZEOF_DOUBLE
+      d1.times do |i|
+        arr << ptr[offset, width].unpack("d*")
+        offset += width
+      end
+      arr
+    end
+    def read_bias(ptr)
+      real_array(ptr)
+    end
+    def top_n(a_vec:, a_bias:, count:)
+      include_ix = nil
+      n_include = 0
+      exclude_ix = nil
+      n_exclude = 0
+      outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
+      outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
+      check_status FFI.topN(
+        a_vec, @k_user,
+        @b, @k_item,
+        @bias_b, @global_mean, a_bias,
+        @k, @k_main,
+        include_ix, n_include,
+        exclude_ix, n_exclude,
+        outp_ix, outp_score,
+        count, @n,
+        @nthreads
+      )
+      imap = @item_map.map(&:reverse).to_h
+      item_ids = int_array(outp_ix).map { |v| imap[v] }
+      scores = real_array(outp_score)
+      item_ids.zip(scores).map do |item_id, score|
+        {item_id: item_id, score: score}
+      end
+    end
+    def factors_warm(data, user_info: nil)
+      data = to_dataset(data)
+      user_info = to_dataset(user_info) if user_info
+      nnz = data.size
+      a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
+      bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
+      u_vec_sp = []
+      u_vec_x_col = []
+      if user_info
+        user_info.each do |k, v|
+          next if k == :user_id
+          uc = @user_info_map[k]
+          raise "Bad key: #{k}" unless uc
+          u_vec_x_col << uc
+          u_vec_sp << v
+        end
+      end
+      p_ = @user_info_map.size
+      nnz_u_vec = u_vec_sp.size
+      u_vec_x_col = int_ptr(u_vec_x_col)
+      u_vec_sp = real_ptr(u_vec_sp)
+      u_vec = nil
+      u_bin_vec = nil
+      pbin = 0
+      weight = nil
+      lam_unique = nil
+      n_max = @n
+      if data.any?
+        if @implicit
+          ratings = data.map { |d| d[:value] || 1 }
+        else
+          ratings = data.map { |d| d[:rating] }
+          check_ratings(ratings)
+        end
+        xa = real_ptr(ratings)
+        x_col = int_ptr(data.map { |d| d[:item_id] })
+      else
+        xa = nil
+        x_col = nil
+      end
+      xa_dense = nil
+      if @implicit
+        args = [
+          a_vec,
+          u_vec, p_,
+          u_vec_sp, u_vec_x_col, nnz_u_vec,
+          @na_as_zero_user,
+          @nonneg,
+          @u_colmeans_ptr,
+          @b, @n, @c,
+          xa, x_col, nnz,
+          @k, @k_user, @k_item, @k_main,
+          @lambda_, @alpha,
+          @w_main, @w_user, @w_main_multiplier,
+          @apply_log_transf,
+          nil, #BeTBe,
+          nil, #BtB
+          nil  #BeTBeChol
+        ]
+        check_status FFI.factors_collective_implicit_single(*fiddle_args(args))
+      else
+        cb = nil
+        args = [
+          a_vec, bias_a,
+          u_vec, p_,
+          u_vec_sp, u_vec_x_col, nnz_u_vec,
+          u_bin_vec, pbin,
+          @na_as_zero_user, @na_as_zero,
+          @nonneg,
+          @c, cb,
+          @global_mean, @bias_b, @u_colmeans_ptr,
+          xa, x_col, nnz, xa_dense,
+          @n, weight, @b, @bi,
+          @add_implicit_features,
+          @k, @k_user, @k_item, @k_main,
+          @lambda_, lam_unique,
+          @w_main, @w_user, @w_implicit,
+          n_max,
+          @include_all_x,
+          nil, #TransBtBinvBt,
+          nil, #BtB,
+          nil, #BeTBeChol,
+          nil, #BiTBi,
+          nil, #CtCw,
+          nil, #TransCtCinvCt,
+          nil  #B_plus_bias
+        ]
+        check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
+      end
+      [a_vec, real_array(bias_a).first]
+    end
+    # convert boolean to int
+    def fiddle_args(args)
+      args.map { |v| v == true || v == false ? (v ? 1 : 0) : v }
+    end
+    def check_status(ret_val)
+      case ret_val
+      when 0
+        # success
+      when 1
+        raise "Could not allocate sufficient memory"
+      else
+        raise "Bad status: #{ret_val}"
+      end
+    end
+    def process_info(info, map, info_map, key)
+      return [nil, nil, nil, 0, 0, 0] unless info
+      info = to_dataset(info)
+      row = []
+      col = []
+      val = []
+      info.each do |ri|
+        rk = ri[key]
+        raise ArgumentError, "Missing #{key}" unless rk
+        r = (map[rk] ||= map.size)
+        ri.each do |k, v|
+          next if k == key
+          row << r
+          col << (info_map[k] ||= info_map.size)
+          val << v
+        end
+      end
+      [int_ptr(row), int_ptr(col), real_ptr(val), val.size, map.size, info_map.size]
+    end
+    def int_ptr(v)
+      v.pack("i*")
+    end
+    def real_ptr(v)
+      v.pack("d*")
+    end
+    def int_array(ptr)
+      ptr.to_s(ptr.size).unpack("i*")
+    end
+    def real_array(ptr)
+      ptr.to_s(ptr.size).unpack("d*")
+    end
+  end
+end

data/lib/cmfrec/version.rb ADDED

@@ -0,0 +1,3 @@
+module Cmfrec
+  VERSION = "0.1.0"
+end

data/vendor/LICENSE.txt ADDED

@@ -0,0 +1,74 @@
+MIT License
+Copyright (c) 2020 David Cortes
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.
+---
+ANSI C implementation of vector operations.
+Copyright (c) 2007-2010 Naoaki Okazaki
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+---
+C library of Limited memory BFGS (L-BFGS).
+Copyright (c) 1990, Jorge Nocedal
+Copyright (c) 2007-2010 Naoaki Okazaki
+All rights reserved.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/vendor/libcmfrec.dylib ADDED

Binary file

data/vendor/libcmfrec.so ADDED

Binary file

metadata ADDED

@@ -0,0 +1,52 @@
+--- !ruby/object:Gem::Specification
+name: cmfrec
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Andrew Kane
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2020-11-28 00:00:00.000000000 Z
+dependencies: []
+description:
+email: andrew@chartkick.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE.txt
+- README.md
+- lib/cmfrec.rb
+- lib/cmfrec/ffi.rb
+- lib/cmfrec/recommender.rb
+- lib/cmfrec/version.rb
+- vendor/LICENSE.txt
+- vendor/libcmfrec.dylib
+- vendor/libcmfrec.so
+homepage: https://github.com/ankane/cmfrec
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '2.5'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.1.4
+signing_key:
+specification_version: 4
+summary: Recommendations for Ruby using collective matrix factorization
+test_files: []