xlearn 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: cbc492d4f4cb0de9c53cac0820251fc1f747de836348280dfa9d1b7e6475f745
4
+ data.tar.gz: d5f1fcbbb10b96714c38fd9c0c924c98fb01dccf1872734e1084a31ad855503c
5
+ SHA512:
6
+ metadata.gz: '048e5915264ba2749e00a91b4c59a773efcd553cf7c764e56df0107e6dd08edcb7bcb86350181b217d095351ea79f47ba6daa84407069fcd9c445683ecdc22a8'
7
+ data.tar.gz: 46831787724f8ec1d4063859445a0a9e6aefd2a38b4267032eb11ffcadf145ebbd37ab3babcf56599cf31f3f4143126b1663baf409099926860ce88c591adb75
data/CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
1
+ ## 0.1.0
2
+
3
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2019 Andrew Kane
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,107 @@
1
+ # xLearn
2
+
3
+ [xLearn](https://github.com/aksnzhy/xlearn) - the high performance machine learning library - for Ruby
4
+
5
+ :fire: Uses the C API for blazing performance
6
+
7
+ Supports:
8
+
9
+ - Linear models
10
+ - Factorization machines
11
+ - Field-aware factorization machines
12
+
13
+ ## Installation
14
+
15
+ First, [install xLearn](https://xlearn-doc.readthedocs.io/en/latest/install/index.html). On Mac, copy `build/lib/libxlearn_api.dylib` to `/usr/local/lib`.
16
+
17
+ Add this line to your application’s Gemfile:
18
+
19
+ ```ruby
20
+ gem 'xlearn'
21
+ ```
22
+
23
+ ## Getting Started
24
+
25
+ This library is modeled after the [Python Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html). Some methods are missing at the moment. PRs welcome!
26
+
27
+ Prep your data
28
+
29
+ ```ruby
30
+ x = [[1, 2], [3, 4], [5, 6], [7, 8]]
31
+ y = [1, 2, 3, 4]
32
+ ```
33
+
34
+ Train a model
35
+
36
+ ```ruby
37
+ model = XLearn::Linear.new(task: "reg")
38
+ model.fit(x, y)
39
+ ```
40
+
41
+ Use `XLearn::FM` for factorization machines and `XLearn::FFM` for field-aware factorization machines
42
+
43
+ Make predictions
44
+
45
+ ```ruby
46
+ model.predict(x)
47
+ ```
48
+
49
+ Save the model to a file
50
+
51
+ ```ruby
52
+ model.save_model("model.bin")
53
+ ```
54
+
55
+ Load the model from a file
56
+
57
+ ```ruby
58
+ model.load_model("model.bin")
59
+ ```
60
+
61
+ ## Parameters
62
+
63
+ Specify parameters
64
+
65
+ ```ruby
66
+ model = XLearn::FM.new(k: 20, epoch: 50)
67
+ ```
68
+
69
+ Supports the same parameters as [Python](https://xlearn-doc.readthedocs.io/en/latest/all_api/index.html)
70
+
71
+ ## Validation
72
+
73
+ Pass a validation set when fitting
74
+
75
+ ```ruby
76
+ model.fit(x_train, y_train, eval_set: [x_val, y_val])
77
+ ```
78
+
79
+ ## Performance
80
+
81
+ For performance, you can read data directly from files
82
+
83
+ ```ruby
84
+ model.fit("train.txt", eval_set: "validate.txt")
85
+ model.predict("test.txt")
86
+ ```
87
+
88
+ [These formats](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html#choose-machine-learning-algorithm) are supported
89
+
90
+ You can also write predictions directly to a file
91
+
92
+ ```ruby
93
+ model.predict("test.txt", out_file: "predictions.txt")
94
+ ```
95
+
96
+ ## History
97
+
98
+ View the [changelog](https://github.com/ankane/xlearn/blob/master/CHANGELOG.md)
99
+
100
+ ## Contributing
101
+
102
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
103
+
104
+ - [Report bugs](https://github.com/ankane/xlearn/issues)
105
+ - Fix bugs and [submit pull requests](https://github.com/ankane/xlearn/pulls)
106
+ - Write, clarify, or fix documentation
107
+ - Suggest or add new features
data/lib/xlearn.rb ADDED
@@ -0,0 +1,28 @@
1
+ # dependencies
2
+ require "ffi"
3
+
4
+ # stdlib
5
+ require "csv"
6
+ require "fileutils"
7
+ require "tempfile"
8
+
9
+ # modules
10
+ require "xlearn/utils"
11
+ require "xlearn/dmatrix"
12
+ require "xlearn/model"
13
+ require "xlearn/ffm"
14
+ require "xlearn/fm"
15
+ require "xlearn/linear"
16
+ require "xlearn/version"
17
+
18
+ module XLearn
19
+ class Error < StandardError; end
20
+
21
+ class << self
22
+ attr_accessor :ffi_lib
23
+ end
24
+ self.ffi_lib = ["xlearn_api"]
25
+
26
+ # friendlier error message
27
+ autoload :FFI, "xlearn/ffi"
28
+ end
@@ -0,0 +1,35 @@
1
+ module XLearn
2
+ class DMatrix
3
+ include Utils
4
+
5
+ def initialize(data, label: nil)
6
+ @handle = ::FFI::MemoryPointer.new(:pointer)
7
+
8
+ nrow = data.count
9
+ ncol = data.first.count
10
+
11
+ c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
12
+ c_data.put_array_of_float(0, data.flatten)
13
+
14
+ if label
15
+ c_label = ::FFI::MemoryPointer.new(:float, nrow)
16
+ c_label.put_array_of_float(0, label)
17
+ end
18
+
19
+ # TODO support this
20
+ field_map = nil
21
+
22
+ check_call FFI.XlearnCreateDataFromMat(c_data, nrow, ncol, c_label, field_map, @handle)
23
+ ObjectSpace.define_finalizer(self, self.class.finalize(@handle))
24
+ end
25
+
26
+ def to_ptr
27
+ @handle
28
+ end
29
+
30
+ def self.finalize(pointer)
31
+ # must use proc instead of stabby lambda
32
+ proc { FFI.XlearnDataFree(pointer) }
33
+ end
34
+ end
35
+ end
data/lib/xlearn/ffi.rb ADDED
@@ -0,0 +1,39 @@
1
+ module XLearn
2
+ module FFI
3
+ extend ::FFI::Library
4
+
5
+ begin
6
+ ffi_lib XLearn.ffi_lib
7
+ rescue LoadError => e
8
+ raise e if ENV["XLEARN_DEBUG"]
9
+ raise LoadError, "Could not find xLearn"
10
+ end
11
+
12
+ # https://github.com/aksnzhy/xlearn/blob/master/src/c_api/c_api.h
13
+ # keep same order
14
+
15
+ attach_function :XLearnHello, %i[], :int
16
+ attach_function :XLearnCreate, %i[string pointer], :int
17
+ attach_function :XlearnCreateDataFromMat, %i[pointer uint32 uint32 pointer pointer pointer], :int
18
+ attach_function :XlearnDataFree, %i[pointer], :int
19
+ attach_function :XLearnHandleFree, %i[pointer], :int
20
+ attach_function :XLearnShow, %i[pointer], :int
21
+ attach_function :XLearnSetTrain, %i[pointer string], :int
22
+ attach_function :XLearnSetTest, %i[pointer string], :int
23
+ attach_function :XLearnSetPreModel, %i[pointer string], :int
24
+ attach_function :XLearnSetValidate, %i[pointer string], :int
25
+ attach_function :XLearnSetTXTModel, %i[pointer string], :int
26
+ attach_function :XLearnFit, %i[pointer string], :int
27
+ attach_function :XLearnCV, %i[pointer], :int
28
+ attach_function :XLearnPredictForMat, %i[pointer string pointer pointer], :int
29
+ attach_function :XLearnPredictForFile, %i[pointer string string], :int
30
+ attach_function :XLearnSetDMatrix, %i[pointer string pointer], :int
31
+ attach_function :XLearnSetStr, %i[pointer string string], :int
32
+ attach_function :XLearnSetInt, %i[pointer string int], :int
33
+ attach_function :XLearnSetFloat, %i[pointer string float], :int
34
+ attach_function :XLearnSetBool, %i[pointer string bool], :int
35
+
36
+ # errors
37
+ attach_function :XLearnGetLastError, %i[], :string
38
+ end
39
+ end
data/lib/xlearn/ffm.rb ADDED
@@ -0,0 +1,8 @@
1
+ module XLearn
2
+ class FFM < Model
3
+ def initialize(**options)
4
+ @model_type = "ffm"
5
+ super
6
+ end
7
+ end
8
+ end
data/lib/xlearn/fm.rb ADDED
@@ -0,0 +1,8 @@
1
+ module XLearn
2
+ class FM < Model
3
+ def initialize(**options)
4
+ @model_type = "fm"
5
+ super
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,8 @@
1
+ module XLearn
2
+ class Linear < Model
3
+ def initialize(**options)
4
+ @model_type = "linear"
5
+ super
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,104 @@
1
+ module XLearn
2
+ class Model
3
+ include Utils
4
+
5
+ def initialize(**options)
6
+ @handle = ::FFI::MemoryPointer.new(:pointer)
7
+ check_call FFI.XLearnCreate(@model_type, @handle)
8
+ ObjectSpace.define_finalizer(self, self.class.finalize(@handle))
9
+
10
+ options = {
11
+ task: "binary",
12
+ quiet: true
13
+ }.merge(options)
14
+
15
+ if options[:task] == "binary" && !options.key?(:sigmoid)
16
+ options[:sigmoid] = true
17
+ end
18
+
19
+ set_params(options)
20
+ end
21
+
22
+ def fit(x, y = nil, eval_set: nil)
23
+ if x.is_a?(String)
24
+ check_call FFI.XLearnSetTrain(@handle, x)
25
+ check_call FFI.XLearnSetBool(@handle, "from_file", true)
26
+ else
27
+ train_set = DMatrix.new(x, label: y)
28
+ check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
29
+ check_call FFI.XLearnSetBool(@handle, "from_file", false)
30
+ end
31
+
32
+ if eval_set
33
+ if eval_set.is_a?(String)
34
+ check_call FFI.XLearnSetValidate(@handle, eval_set)
35
+ else
36
+ valid_set = DMatrix.new(x, label: y)
37
+ check_call FFI.XLearnSetDMatrix(@handle, "validate", valid_set)
38
+ end
39
+ end
40
+
41
+ # TODO unlink in finalizer
42
+ @model_file = Tempfile.new("xlearn")
43
+ check_call FFI.XLearnFit(@handle, @model_file.path)
44
+ end
45
+
46
+ def predict(x, out_path: nil)
47
+ if x.is_a?(String)
48
+ check_call FFI.XLearnSetTest(@handle, x)
49
+ check_call FFI.XLearnSetBool(@handle, "from_file", true)
50
+ else
51
+ test_set = DMatrix.new(x)
52
+ check_call FFI.XLearnSetDMatrix(@handle, "test", test_set)
53
+ check_call FFI.XLearnSetBool(@handle, "from_file", false)
54
+ end
55
+
56
+ if out_path
57
+ check_call FFI.XLearnPredictForFile(@handle, @model_file.path, out_path)
58
+ else
59
+ length = ::FFI::MemoryPointer.new(:uint64)
60
+ out_arr = ::FFI::MemoryPointer.new(:pointer)
61
+ check_call FFI.XLearnPredictForMat(@handle, @model_file.path, length, out_arr)
62
+ out_arr.read_pointer.read_array_of_float(length.read_uint64)
63
+ end
64
+ end
65
+
66
+ def save_model(path)
67
+ raise Error, "Not trained" unless @model_file
68
+ FileUtils.cp(@model_file.path, path)
69
+ end
70
+
71
+ def load_model(path)
72
+ @model_file ||= Tempfile.new("xlearn")
73
+ # TODO ensure tempfile is still cleaned up
74
+ FileUtils.cp(path, @model_file.path)
75
+ end
76
+
77
+ def self.finalize(pointer)
78
+ # must use proc instead of stabby lambda
79
+ proc { FFI.XLearnHandleFree(pointer) }
80
+ end
81
+
82
+ private
83
+
84
+ def set_params(params)
85
+ params.each do |k, v|
86
+ k = k.to_s
87
+ ret =
88
+ case k
89
+ when "task", "metric", "opt", "log"
90
+ FFI.XLearnSetStr(@handle, k, v)
91
+ when "lr", "lambda", "init", "alpha", "beta", "lambda_1", "lambda_2"
92
+ FFI.XLearnSetFloat(@handle, k, v)
93
+ when "k", "epoch", "fold", "nthread", "block_size", "stop_window", "seed"
94
+ FFI.XLearnSetInt(@handle, k, v)
95
+ when "quiet", "on_disk", "bin_out", "norm", "lock_free", "early_stop", "sign", "sigmoid"
96
+ FFI.XLearnSetBool(@handle, k, v)
97
+ else
98
+ raise ArgumentError, "Invalid parameter: #{k}"
99
+ end
100
+ check_call ret
101
+ end
102
+ end
103
+ end
104
+ end
@@ -0,0 +1,9 @@
1
+ module XLearn
2
+ module Utils
3
+ private
4
+
5
+ def check_call(ret)
6
+ raise Error, FFI.XLearnGetLastError if ret != 0
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,3 @@
1
+ module XLearn
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: xlearn
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-10-12 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: ffi
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: minitest
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '5'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '5'
69
+ description:
70
+ email: andrew@chartkick.com
71
+ executables: []
72
+ extensions: []
73
+ extra_rdoc_files: []
74
+ files:
75
+ - CHANGELOG.md
76
+ - LICENSE.txt
77
+ - README.md
78
+ - lib/xlearn.rb
79
+ - lib/xlearn/dmatrix.rb
80
+ - lib/xlearn/ffi.rb
81
+ - lib/xlearn/ffm.rb
82
+ - lib/xlearn/fm.rb
83
+ - lib/xlearn/linear.rb
84
+ - lib/xlearn/model.rb
85
+ - lib/xlearn/utils.rb
86
+ - lib/xlearn/version.rb
87
+ homepage: https://github.com/ankane/xlearn
88
+ licenses:
89
+ - MIT
90
+ metadata: {}
91
+ post_install_message:
92
+ rdoc_options: []
93
+ require_paths:
94
+ - lib
95
+ required_ruby_version: !ruby/object:Gem::Requirement
96
+ requirements:
97
+ - - ">="
98
+ - !ruby/object:Gem::Version
99
+ version: '2.4'
100
+ required_rubygems_version: !ruby/object:Gem::Requirement
101
+ requirements:
102
+ - - ">="
103
+ - !ruby/object:Gem::Version
104
+ version: '0'
105
+ requirements: []
106
+ rubygems_version: 3.0.3
107
+ signing_key:
108
+ specification_version: 4
109
+ summary: xLearn - the high performance machine learning library - for Ruby
110
+ test_files: []