RubyGems - xlearn - Versions diffs - 0.1.0 → 0.1.1 - Mend

xlearn 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: cbc492d4f4cb0de9c53cac0820251fc1f747de836348280dfa9d1b7e6475f745
-  data.tar.gz: d5f1fcbbb10b96714c38fd9c0c924c98fb01dccf1872734e1084a31ad855503c
+  metadata.gz: 32722c5c1623a0f680dba1ca05e37105247a0163600da6ae2a5739173aa94066
+  data.tar.gz: 87938813a982897dcedcabd351292618d15faad53b69ae33f749b8dea7cf6bff
 SHA512:
-  metadata.gz: '048e5915264ba2749e00a91b4c59a773efcd553cf7c764e56df0107e6dd08edcb7bcb86350181b217d095351ea79f47ba6daa84407069fcd9c445683ecdc22a8'
-  data.tar.gz: 46831787724f8ec1d4063859445a0a9e6aefd2a38b4267032eb11ffcadf145ebbd37ab3babcf56599cf31f3f4143126b1663baf409099926860ce88c591adb75
+  metadata.gz: fdc680d913d7ca6100da8102d1b1fc173c9e354e99935eb0eff0e04a23f7cc1e63f5caf547a2c0afc503fb4521e519691cb866d2fabde504f1fa75e0de11238c
+  data.tar.gz: 85cd2ad8a5b6f984a5af7024481fc58a7602e47e16b91f013bb6297f335d8bf61273e266346056cb09f5310381e2ae72919c6ce27b1d7b702d5a29bfc53a5197

data/CHANGELOG.md CHANGED

@@ -1,3 +1,11 @@
+## 0.1.1
+- Added `cv` method
+- Added `partial_fit` method
+- Added `save_txt` method
+- Added `bias_term`, `linear_term`, and `latent_factors` methods
+- Added support for Daru and Numo
 ## 0.1.0
 - First release

data/README.md CHANGED

@@ -10,6 +10,8 @@ Supports:
 - Factorization machines
 - Field-aware factorization machines
+[![Build Status](https://travis-ci.org/ankane/xlearn.svg?branch=master)](https://travis-ci.org/ankane/xlearn)
 ## Installation
 First, [install xLearn](https://xlearn-doc.readthedocs.io/en/latest/install/index.html). On Mac, copy `build/lib/libxlearn_api.dylib` to `/usr/local/lib`.
@@ -22,8 +24,6 @@ gem 'xlearn'
 ## Getting Started
-This library is modeled after the [Python Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html). Some methods are missing at the moment. PRs welcome!
 Prep your data
 ```ruby
@@ -58,41 +58,136 @@ Load the model from a file
 model.load_model("model.bin")
 ```
+Save a text version of the model
+```ruby
+model.save_txt("model.txt")
+```
+Pass a validation set
+```ruby
+model.fit(x_train, y_train, eval_set: [x_val, y_val])
+```
+Train online
+```ruby
+model.partial_fit(x_train, y_train)
+```
+Get the bias term, linear term, and latent factors
+```ruby
+model.bias_term
+model.linear_term
+model.latent_factors # fm and ffm only
+```
 ## Parameters
 Specify parameters
 ```ruby
-model = XLearn::FM.new(k: 20, epoch: 50)
+model = XLearn::Linear.new(k: 20, epoch: 50)
 ```
 Supports the same parameters as [Python](https://xlearn-doc.readthedocs.io/en/latest/all_api/index.html)
-## Validation
+## Cross-Validation
-Pass a validation set when fitting
+Cross-validation
 ```ruby
-model.fit(x_train, y_train, eval_set: [x_val, y_val])
+model.cv(x, y)
+```
+Specify the number of folds
+```ruby
+model.cv(x, y, folds: 5)
+```
+## Data
+Data can be an array of arrays
+```ruby
+[[1, 2, 3], [4, 5, 6]]
+```
+Or a Daru data frame
+```ruby
+Daru::DataFrame.from_csv("houses.csv")
+```
+Or a Numo NArray
+```ruby
+Numo::DFloat.new(3, 2).seq
 ```
 ## Performance
-For performance, you can read data directly from files
+For large datasets, read data directly from files
 ```ruby
 model.fit("train.txt", eval_set: "validate.txt")
 model.predict("test.txt")
+model.cv("train.txt")
+```
+For linear models and factorization machines, use CSV:
+```txt
+label,value_1,value_2,...,value_n
+```
+Or the `libsvm` format (better for sparse data):
+```txt
+label index_1:value_1 index_2:value_2 ... index_n:value_n
+```
+> You can also use commas instead of spaces for separators
+For field-aware factorization machines, use the `libffm` format:
+```txt
+label field_1:index_1:value_1 field_2:index_2:value_2 ...
 ```
-[These formats](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html#choose-machine-learning-algorithm) are supported
+> You can also use commas instead of spaces for separators
 You can also write predictions directly to a file
 ```ruby
-model.predict("test.txt", out_file: "predictions.txt")
+model.predict("test.txt", out_path: "predictions.txt")
+```
+## xLearn Installation
+There’s an experimental branch that includes xLearn with the gem for easiest installation.
+```ruby
+gem 'xlearn', github: 'ankane/xlearn', branch: 'vendor', submodules: true
 ```
+Please file an issue if it doesn’t work for you.
+You can also specify the path to xLearn in an initializer:
+```ruby
+XLearn.ffi_lib << "/path/to/xlearn/lib/libxlearn_api.so"
+```
+> Use `libxlearn_api.dylib` for Mac and `xlearn_api.dll` for Windows
+## Credits
+This library is modeled after xLearn’s [Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html).
 ## History
 View the [changelog](https://github.com/ankane/xlearn/blob/master/CHANGELOG.md)

data/lib/xlearn/dmatrix.rb CHANGED

@@ -5,14 +5,30 @@ module XLearn
     def initialize(data, label: nil)
       @handle = ::FFI::MemoryPointer.new(:pointer)
-      nrow = data.count
-      ncol = data.first.count
+      if matrix?(data)
+        nrow = data.row_count
+        ncol = data.column_count
+        flat_data = data.to_a.flatten
+      elsif daru?(data)
+        nrow, ncol = data.shape
+        flat_data = data.map_rows(&:to_a).flatten
+      elsif narray?(data)
+        nrow, ncol = data.shape
+        # TODO convert to SFloat and pass pointer
+        # for better performance
+        flat_data = data.flatten.to_a
+      else
+        nrow = data.count
+        ncol = data.first.count
+        flat_data = data.flatten
+      end
-      c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
-      c_data.put_array_of_float(0, data.flatten)
+      c_data = ::FFI::MemoryPointer.new(:float, flat_data.size)
+      c_data.put_array_of_float(0, flat_data)
       if label
-        c_label = ::FFI::MemoryPointer.new(:float, nrow)
+        label = label.to_a
+        c_label = ::FFI::MemoryPointer.new(:float, label.size)
         c_label.put_array_of_float(0, label)
       end
@@ -31,5 +47,19 @@ module XLearn
       # must use proc instead of stabby lambda
       proc { FFI.XlearnDataFree(pointer) }
     end
+    private
+    def matrix?(data)
+      defined?(Matrix) && data.is_a?(Matrix)
+    end
+    def daru?(data)
+      defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
+    end
+    def narray?(data)
+      defined?(Numo::NArray) && data.is_a?(Numo::NArray)
+    end
   end
 end

data/lib/xlearn/ffm.rb CHANGED

@@ -4,5 +4,24 @@ module XLearn
       @model_type = "ffm"
       super
     end
+    # shape is [i, j, k]
+    # for v_{i}_{j}
+    def latent_factors
+      factor = []
+      current = -1
+      read_txt do |line|
+        if line.start_with?("v_")
+          parts = line.split(": ")
+          i = parts.first.split("_")[1].to_i
+          if i != current
+            factor << []
+            current = i
+          end
+          factor.last << parts.last.split(" ").map(&:to_f)
+        end
+      end
+      factor
+    end
   end
 end

data/lib/xlearn/fm.rb CHANGED

@@ -4,5 +4,17 @@ module XLearn
       @model_type = "fm"
       super
     end
+    # shape is [i, k]
+    # for v_{i}
+    def latent_factors
+      factor = []
+      read_txt do |line|
+        if line.start_with?("v_")
+          factor << line.split(": ").last.split(" ").map(&:to_f)
+        end
+      end
+      factor
+    end
   end
 end

data/lib/xlearn/model.rb CHANGED

@@ -20,14 +20,14 @@ module XLearn
     end
     def fit(x, y = nil, eval_set: nil)
-      if x.is_a?(String)
-        check_call FFI.XLearnSetTrain(@handle, x)
-        check_call FFI.XLearnSetBool(@handle, "from_file", true)
-      else
-        train_set = DMatrix.new(x, label: y)
-        check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
-        check_call FFI.XLearnSetBool(@handle, "from_file", false)
-      end
+      @model_path = nil
+      partial_fit(x, y, eval_set: eval_set)
+    end
+    def partial_fit(x, y = nil, eval_set: nil)
+      check_call FFI.XLearnSetPreModel(@handle, @model_path || "")
+      set_train_set(x, y)
       if eval_set
         if eval_set.is_a?(String)
@@ -38,9 +38,12 @@ module XLearn
         end
       end
-      # TODO unlink in finalizer
-      @model_file = Tempfile.new("xlearn")
+      @txt_file ||= create_tempfile
+      check_call FFI.XLearnSetTXTModel(@handle, @txt_file.path)
+      @model_file ||= create_tempfile
       check_call FFI.XLearnFit(@handle, @model_file.path)
+      @model_path = @model_file.path
     end
     def predict(x, out_path: nil)
@@ -63,24 +66,72 @@ module XLearn
       end
     end
+    def cv(x, y = nil, folds: nil)
+      set_params(fold: folds) if folds
+      set_train_set(x, y)
+      check_call FFI.XLearnCV(@handle)
+    end
     def save_model(path)
       raise Error, "Not trained" unless @model_file
       FileUtils.cp(@model_file.path, path)
     end
+    def save_txt(path)
+      raise Error, "Not trained" unless @txt_file
+      FileUtils.cp(@txt_file.path, path)
+    end
     def load_model(path)
-      @model_file ||= Tempfile.new("xlearn")
+      @model_file ||= create_tempfile
       # TODO ensure tempfile is still cleaned up
       FileUtils.cp(path, @model_file.path)
     end
+    def bias_term
+      read_txt do |line|
+        return line.split(":").last.to_f if line.start_with?("bias:")
+      end
+    end
+    def linear_term
+      term = []
+      read_txt do |line|
+        if line.start_with?("i_")
+          term << line.split(":").last.to_f
+        elsif line.start_with?("v_")
+          break
+        end
+      end
+      term
+    end
     def self.finalize(pointer)
       # must use proc instead of stabby lambda
       proc { FFI.XLearnHandleFree(pointer) }
     end
+    def self.finalize_file(file)
+      # must use proc instead of stabby lambda
+      proc do
+        file.close
+        file.unlink
+      end
+    end
     private
+    def set_train_set(x, y)
+      if x.is_a?(String)
+        check_call FFI.XLearnSetTrain(@handle, x)
+        check_call FFI.XLearnSetBool(@handle, "from_file", true)
+      else
+        train_set = DMatrix.new(x, label: y)
+        check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
+        check_call FFI.XLearnSetBool(@handle, "from_file", false)
+      end
+    end
     def set_params(params)
       params.each do |k, v|
         k = k.to_s
@@ -100,5 +151,19 @@ module XLearn
         check_call ret
       end
     end
+    def read_txt
+      if @txt_file
+        File.foreach(@txt_file.path) do |line|
+          yield line
+        end
+      end
+    end
+    def create_tempfile
+      file = Tempfile.new("xlearn")
+      ObjectSpace.define_finalizer(self, self.class.finalize_file(file))
+      file
+    end
   end
 end

data/lib/xlearn/version.rb CHANGED

@@ -1,3 +1,3 @@
 module XLearn
-  VERSION = "0.1.0"
+  VERSION = "0.1.1"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: xlearn
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - Andrew Kane
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2019-10-12 00:00:00.000000000 Z
+date: 2019-10-14 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ffi
@@ -66,6 +66,34 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '5'
+- !ruby/object:Gem::Dependency
+  name: daru
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: numo-narray
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description:
 email: andrew@chartkick.com
 executables: []