libmf 0.1.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 85fc60af42649286b87cf23130c0efafd0d8951423d31b187d13097b2418e7d1
4
- data.tar.gz: ab568af8e036b6d38fcc604746eeda4fa29e2e6e7541e6af53b77f9979e9fe82
3
+ metadata.gz: 1d7744d5436815acc2609c8c39566f5704513248b5fe0b7ddc1cc4fbd28c81e9
4
+ data.tar.gz: 0bcd219cc9360b0e9daefe418e2a14fc46969ab174db5ea93717c837e309c697
5
5
  SHA512:
6
- metadata.gz: efcbffd9ed9e6f66911a63e74d694da77d798d0fde04cb1490c1ee4eaf8d9e1e93b1af13ab8e23a4a858b74503d28a616ad6499a590f4a1568df2a9dcb65d85f
7
- data.tar.gz: 671b306cad36c2ea5da6633de6703892e19fbb5774cf22a0d458ba3d967ab50dc5ae56ed271771e4ed1a0483346d8d9f34ec5db9c8424d16f369e81c5f467857
6
+ metadata.gz: 49d31c33bea909e1a3a0a8cdebc6cce76b2cf19debc11420d76ad2016763961f376df56bdbba16e6a60e72a8014a2b44547045b2385f44b6c96f48f3f6148643
7
+ data.tar.gz: 3f264d87a4cc727634ac8b2cb5dc87a56a44c3fba7d77093c83ca3623437be6024c04b1fe154275fd3a14a6736bf6703ea2fae9b984570b5c4629af3b5c710f3
@@ -1,3 +1,27 @@
1
- ## 0.1.0
1
+ ## 0.2.1 (2020-12-28)
2
+
3
+ - Added ARM shared library for Mac
4
+
5
+ ## 0.2.0 (2020-03-26)
6
+
7
+ - Changed to BSD 3-Clause license to match LIBMF
8
+ - Added support for reading data directly from files
9
+ - Added `format: :numo` option to `p_factors` and `q_factors`
10
+ - Improved performance of loading data by 5x
11
+
12
+ ## 0.1.3 (2019-11-07)
13
+
14
+ - Made parameter names more Ruby-like
15
+ - No need to set `do_nmf` with generalized KL-divergence
16
+
17
+ ## 0.1.2 (2019-11-06)
18
+
19
+ - Fixed bug in `p_factors` and `q_factors` methods
20
+
21
+ ## 0.1.1 (2019-11-05)
22
+
23
+ - Fixed errors on Linux and Windows
24
+
25
+ ## 0.1.0 (2019-11-04)
2
26
 
3
27
  - First release
@@ -1,22 +1,30 @@
1
- Copyright (c) 2019 Andrew Kane
1
+ BSD 3-Clause License
2
2
 
3
- MIT License
3
+ Copyright (c) 2014-2015, The LIBMF Project
4
+ Copyright (c) 2019-2020, Andrew Kane
5
+ All rights reserved.
4
6
 
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
7
+ Redistribution and use in source and binary forms, with or without
8
+ modification, are permitted provided that the following conditions are met:
12
9
 
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
10
+ 1. Redistributions of source code must retain the above copyright notice, this
11
+ list of conditions and the following disclaimer.
15
12
 
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
13
+ 2. Redistributions in binary form must reproduce the above copyright notice,
14
+ this list of conditions and the following disclaimer in the documentation
15
+ and/or other materials provided with the distribution.
16
+
17
+ 3. Neither the name of the copyright holder nor the names of its
18
+ contributors may be used to endorse or promote products derived from
19
+ this software without specific prior written permission.
20
+
21
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
22
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
24
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
25
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
27
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
28
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.md CHANGED
@@ -2,7 +2,9 @@
2
2
 
3
3
  [LIBMF](https://github.com/cjlin1/libmf) - large-scale sparse matrix factorization - for Ruby
4
4
 
5
- :fire: Uses the C API for blazing performance
5
+ Check out [Disco](https://github.com/ankane/disco) for higher-level collaborative filtering
6
+
7
+ [![Build Status](https://github.com/ankane/libmf/workflows/build/badge.svg?branch=master)](https://github.com/ankane/libmf/actions)
6
8
 
7
9
  ## Installation
8
10
 
@@ -37,14 +39,19 @@ Make predictions
37
39
  model.predict(row_index, column_index)
38
40
  ```
39
41
 
40
- Get the bias and latent factors
42
+ Get the latent factors (these approximate the training matrix)
41
43
 
42
44
  ```ruby
43
- model.bias
44
45
  model.p_factors
45
46
  model.q_factors
46
47
  ```
47
48
 
49
+ Get the bias (average of all elements in the training matrix)
50
+
51
+ ```ruby
52
+ model.bias
53
+ ```
54
+
48
55
  Save the model to a file
49
56
 
50
57
  ```ruby
@@ -63,48 +70,87 @@ Pass a validation set
63
70
  model.fit(data, eval_set: eval_set)
64
71
  ```
65
72
 
73
+ ## Cross-Validation
74
+
75
+ Perform cross-validation
76
+
77
+ ```ruby
78
+ model.cv(data)
79
+ ```
80
+
81
+ Specify the number of folds
82
+
83
+ ```ruby
84
+ model.cv(data, folds: 5)
85
+ ```
86
+
66
87
  ## Parameters
67
88
 
68
- Pass parameters
89
+ Pass parameters - default values below
69
90
 
70
91
  ```ruby
71
- model = Libmf::Model.new(k: 20, nr_iters: 50)
72
- ```
73
-
74
- Supports the same parameters as LIBMF
75
-
76
- ```text
77
- variable meaning default
78
- ================================================================
79
- fun loss function 0
80
- k number of latent factors 8
81
- nr_threads number of threads used 12
82
- nr_bins number of bins 25
83
- nr_iters number of iterations 20
84
- lambda_p1 coefficient of L1-norm regularization on P 0
85
- lambda_p2 coefficient of L2-norm regularization on P 0.1
86
- lambda_q1 coefficient of L1-norm regularization on Q 0
87
- lambda_q2 coefficient of L2-norm regularization on Q 0.1
88
- eta learning rate 0.1
89
- alpha importance of negative entries 0.1
90
- c desired value of negative entries 0.0001
91
- do_nmf perform non-negative MF (NMF) false
92
- quiet no outputs to stdout false
93
- copy_data copy data in training procedure true
92
+ Libmf::Model.new(
93
+ loss: 0, # loss function
94
+ factors: 8, # number of latent factors
95
+ threads: 12, # number of threads used
96
+ bins: 25, # number of bins
97
+ iterations: 20, # number of iterations
98
+ lambda_p1: 0, # coefficient of L1-norm regularization on P
99
+ lambda_p2: 0.1, # coefficient of L2-norm regularization on P
100
+ lambda_q1: 0, # coefficient of L1-norm regularization on Q
101
+ lambda_q2: 0.1, # coefficient of L2-norm regularization on Q
102
+ learning_rate: 0.1, # learning rate
103
+ alpha: 0.1, # importance of negative entries
104
+ c: 0.0001, # desired value of negative entries
105
+ nmf: false, # perform non-negative MF (NMF)
106
+ quiet: false # no outputs to stdout
107
+ )
94
108
  ```
95
109
 
96
- ## Cross-Validation
110
+ ### Loss Functions
97
111
 
98
- Perform cross-validation
112
+ For real-valued matrix factorization
113
+
114
+ - 0 - squared error (L2-norm)
115
+ - 1 - absolute error (L1-norm)
116
+ - 2 - generalized KL-divergence
117
+
118
+ For binary matrix factorization
119
+
120
+ - 5 - logarithmic error
121
+ - 6 - squared hinge loss
122
+ - 7 - hinge loss
123
+
124
+ For one-class matrix factorization
125
+
126
+ - 10 - row-oriented pair-wise logarithmic loss
127
+ - 11 - column-oriented pair-wise logarithmic loss
128
+ - 12 - squared error (L2-norm)
129
+
130
+ ## Performance
131
+
132
+ For performance, read data directly from files
99
133
 
100
134
  ```ruby
101
- model.cv(data)
135
+ model.fit("train.txt", eval_set: "validate.txt")
136
+ model.cv("train.txt")
102
137
  ```
103
138
 
104
- Specify the number of folds
139
+ Data should be in the format `row_index column_index value`:
140
+
141
+ ```txt
142
+ 0 0 5.0
143
+ 0 2 3.5
144
+ 1 1 4.0
145
+ ```
146
+
147
+ ## Numo
148
+
149
+ Get latent factors as Numo arrays
105
150
 
106
151
  ```ruby
107
- model.cv(data, folds: 5)
152
+ model.p_factors(format: :numo)
153
+ model.q_factors(format: :numo)
108
154
  ```
109
155
 
110
156
  ## Resources
@@ -123,3 +169,13 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
123
169
  - Fix bugs and [submit pull requests](https://github.com/ankane/libmf/pulls)
124
170
  - Write, clarify, or fix documentation
125
171
  - Suggest or add new features
172
+
173
+ To get started with development:
174
+
175
+ ```sh
176
+ git clone --recursive https://github.com/ankane/libmf.git
177
+ cd libmf
178
+ bundle install
179
+ bundle exec rake vendor:all
180
+ bundle exec rake test
181
+ ```
@@ -11,15 +11,18 @@ module Libmf
11
11
  class << self
12
12
  attr_accessor :ffi_lib
13
13
  end
14
- self.ffi_lib = ["mf"]
15
-
16
- lib_path =
17
- if ::FFI::Platform.windows?
18
- "../vendor/windows/mf.dll"
14
+ lib_name =
15
+ if Gem.win_platform?
16
+ "mf.dll"
17
+ elsif RbConfig::CONFIG["arch"] =~ /arm64-darwin/i
18
+ "libmf.arm64.dylib"
19
+ elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
20
+ "libmf.dylib"
19
21
  else
20
- "libmf.bundle"
22
+ "libmf.so"
21
23
  end
22
- self.ffi_lib << File.expand_path(lib_path, __dir__)
24
+ vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
25
+ self.ffi_lib = [vendor_lib]
23
26
 
24
27
  # friendlier error message
25
28
  autoload :FFI, "libmf/ffi"
@@ -2,12 +2,7 @@ module Libmf
2
2
  module FFI
3
3
  extend ::FFI::Library
4
4
 
5
- begin
6
- ffi_lib Libmf.ffi_lib
7
- rescue LoadError => e
8
- raise e if ENV["LIBMF_DEBUG"]
9
- raise LoadError, "Could not find LIBMF"
10
- end
5
+ ffi_lib Libmf.ffi_lib
11
6
 
12
7
  class Node < ::FFI::Struct
13
8
  layout :u, :int,
@@ -51,6 +46,7 @@ module Libmf
51
46
  end
52
47
 
53
48
  attach_function :mf_get_default_param, [], Parameter.by_value
49
+ attach_function :mf_read_problem, [:string], Problem.by_value
54
50
  attach_function :mf_save_model, [Model.by_ref, :string], :int
55
51
  attach_function :mf_load_model, [:string], Model.by_ref
56
52
  attach_function :mf_destroy_model, [Model.by_ref], :void
@@ -51,16 +51,27 @@ module Libmf
51
51
  model[:b]
52
52
  end
53
53
 
54
- def p_factors
55
- reshape(model[:p].read_array_of_float(factors * rows), [rows, factors])
54
+ def p_factors(format: nil)
55
+ _factors(model[:p], rows, format)
56
56
  end
57
57
 
58
- def q_factors
59
- reshape(model[:q].read_array_of_float(factors * columns), [columns, factors])
58
+ def q_factors(format: nil)
59
+ _factors(model[:q], columns, format)
60
60
  end
61
61
 
62
62
  private
63
63
 
64
+ def _factors(ptr, n, format)
65
+ case format
66
+ when :numo
67
+ Numo::SFloat.from_string(ptr.read_bytes(n * factors * 4)).reshape(n, factors)
68
+ when nil
69
+ ptr.read_array_of_float(n * factors).each_slice(factors).to_a
70
+ else
71
+ raise ArgumentError, "Invalid format"
72
+ end
73
+ end
74
+
64
75
  def model
65
76
  raise Error, "Not fit" unless @model
66
77
  @model
@@ -68,45 +79,60 @@ module Libmf
68
79
 
69
80
  def param
70
81
  param = FFI.mf_get_default_param
82
+ options = @options.dup
71
83
  # silence insufficient blocks warning with default params
72
- options = {nr_bins: 25}.merge(@options)
84
+ options[:bins] ||= 25 unless options[:nr_bins]
85
+ options[:copy_data] = false unless options.key?(:copy_data)
86
+ options_map = {
87
+ :loss => :fun,
88
+ :factors => :k,
89
+ :threads => :nr_threads,
90
+ :bins => :nr_bins,
91
+ :iterations => :nr_iters,
92
+ :learning_rate => :eta,
93
+ :nmf => :do_nmf
94
+ }
73
95
  options.each do |k, v|
96
+ k = options_map[k] if options_map[k]
74
97
  param[k] = v
75
98
  end
99
+ # do_nmf must be true for generalized KL-divergence
100
+ param[:do_nmf] = true if param[:fun] == 2
76
101
  param
77
102
  end
78
103
 
79
104
  def create_problem(data)
105
+ if data.is_a?(String)
106
+ # need to expand path so it's absolute
107
+ return FFI.mf_read_problem(File.expand_path(data))
108
+ end
109
+
80
110
  raise Error, "No data" if data.empty?
81
111
 
82
- nodes = []
83
- r = ::FFI::MemoryPointer.new(FFI::Node, data.size)
84
- data.each_with_index do |row, i|
85
- n = FFI::Node.new(r[i])
86
- n[:u] = row[0]
87
- n[:v] = row[1]
88
- n[:r] = row[2]
89
- nodes << n
112
+ # TODO do in C for better performance
113
+ # can use FIX2INT() and RFLOAT_VALUE() instead of pack
114
+ buffer = String.new
115
+ data.each do |row|
116
+ row[0, 2].pack("i*".freeze, buffer: buffer)
117
+ row[2, 1].pack("f".freeze, buffer: buffer)
90
118
  end
91
119
 
92
- m = nodes.map { |n| n[:u] }.max + 1
93
- n = nodes.map { |n| n[:v] }.max + 1
120
+ r = ::FFI::MemoryPointer.new(FFI::Node, data.size)
121
+ r.write_bytes(buffer)
122
+
123
+ # double check size is what we expect
124
+ # FFI will throw an error above if too long
125
+ raise Error, "Bad buffer size" if r.size != buffer.bytesize
126
+
127
+ m = data.max_by { |r| r[0] }[0] + 1
128
+ n = data.max_by { |r| r[1] }[1] + 1
94
129
 
95
130
  prob = FFI::Problem.new
96
131
  prob[:m] = m
97
132
  prob[:n] = n
98
- prob[:nnz] = nodes.size
133
+ prob[:nnz] = data.size
99
134
  prob[:r] = r
100
135
  prob
101
136
  end
102
-
103
- def reshape(arr, dims)
104
- rows = dims.first
105
- new_arr = rows.times.map { [] }
106
- arr.each_with_index do |v, i|
107
- new_arr[i % rows] << v
108
- end
109
- new_arr
110
- end
111
137
  end
112
138
  end
@@ -1,3 +1,3 @@
1
1
  module Libmf
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.1"
3
3
  end
File without changes
Binary file
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: libmf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-11-06 00:00:00.000000000 Z
11
+ date: 2020-12-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -24,103 +24,31 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
- - !ruby/object:Gem::Dependency
28
- name: bundler
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - ">="
32
- - !ruby/object:Gem::Version
33
- version: '0'
34
- type: :development
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - ">="
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: rake
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: minitest
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '5'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '5'
69
- - !ruby/object:Gem::Dependency
70
- name: rake-compiler
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '0'
83
- description:
27
+ description:
84
28
  email: andrew@chartkick.com
85
29
  executables: []
86
- extensions:
87
- - ext/libmf/extconf.rb
30
+ extensions: []
88
31
  extra_rdoc_files: []
89
32
  files:
90
33
  - CHANGELOG.md
91
34
  - LICENSE.txt
92
35
  - README.md
93
- - ext/libmf/extconf.rb
94
- - lib/libmf.bundle
95
36
  - lib/libmf.rb
96
37
  - lib/libmf/ffi.rb
97
38
  - lib/libmf/model.rb
98
39
  - lib/libmf/version.rb
99
- - vendor/libmf/COPYRIGHT
100
- - vendor/libmf/Makefile
101
- - vendor/libmf/Makefile.win
102
- - vendor/libmf/README
103
- - vendor/libmf/demo/all_one_matrix.te.txt
104
- - vendor/libmf/demo/all_one_matrix.tr.txt
105
- - vendor/libmf/demo/binary_matrix.te.txt
106
- - vendor/libmf/demo/binary_matrix.tr.txt
107
- - vendor/libmf/demo/demo.bat
108
- - vendor/libmf/demo/demo.sh
109
- - vendor/libmf/demo/real_matrix.te.txt
110
- - vendor/libmf/demo/real_matrix.tr.txt
111
- - vendor/libmf/mf-predict.cpp
112
- - vendor/libmf/mf-train.cpp
113
- - vendor/libmf/mf.cpp
114
- - vendor/libmf/mf.def
115
- - vendor/libmf/mf.h
116
- - vendor/libmf/windows/mf-predict.exe
117
- - vendor/libmf/windows/mf-train.exe
118
- - vendor/libmf/windows/mf.dll
40
+ - vendor/COPYRIGHT
41
+ - vendor/demo/real_matrix.te.txt
42
+ - vendor/demo/real_matrix.tr.txt
43
+ - vendor/libmf.arm64.dylib
44
+ - vendor/libmf.dylib
45
+ - vendor/libmf.so
46
+ - vendor/mf.dll
119
47
  homepage: https://github.com/ankane/libmf
120
48
  licenses:
121
- - MIT
49
+ - BSD-3-Clause
122
50
  metadata: {}
123
- post_install_message:
51
+ post_install_message:
124
52
  rdoc_options: []
125
53
  require_paths:
126
54
  - lib
@@ -135,8 +63,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
135
63
  - !ruby/object:Gem::Version
136
64
  version: '0'
137
65
  requirements: []
138
- rubygems_version: 3.0.3
139
- signing_key:
66
+ rubygems_version: 3.2.3
67
+ signing_key:
140
68
  specification_version: 4
141
- summary: LIBMF - large-scale sparse matrix factorization - for Ruby
69
+ summary: Large-scale sparse matrix factorization for Ruby
142
70
  test_files: []