libmf 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 85fc60af42649286b87cf23130c0efafd0d8951423d31b187d13097b2418e7d1
4
- data.tar.gz: ab568af8e036b6d38fcc604746eeda4fa29e2e6e7541e6af53b77f9979e9fe82
3
+ metadata.gz: 1d7744d5436815acc2609c8c39566f5704513248b5fe0b7ddc1cc4fbd28c81e9
4
+ data.tar.gz: 0bcd219cc9360b0e9daefe418e2a14fc46969ab174db5ea93717c837e309c697
5
5
  SHA512:
6
- metadata.gz: efcbffd9ed9e6f66911a63e74d694da77d798d0fde04cb1490c1ee4eaf8d9e1e93b1af13ab8e23a4a858b74503d28a616ad6499a590f4a1568df2a9dcb65d85f
7
- data.tar.gz: 671b306cad36c2ea5da6633de6703892e19fbb5774cf22a0d458ba3d967ab50dc5ae56ed271771e4ed1a0483346d8d9f34ec5db9c8424d16f369e81c5f467857
6
+ metadata.gz: 49d31c33bea909e1a3a0a8cdebc6cce76b2cf19debc11420d76ad2016763961f376df56bdbba16e6a60e72a8014a2b44547045b2385f44b6c96f48f3f6148643
7
+ data.tar.gz: 3f264d87a4cc727634ac8b2cb5dc87a56a44c3fba7d77093c83ca3623437be6024c04b1fe154275fd3a14a6736bf6703ea2fae9b984570b5c4629af3b5c710f3
@@ -1,3 +1,27 @@
1
- ## 0.1.0
1
+ ## 0.2.1 (2020-12-28)
2
+
3
+ - Added ARM shared library for Mac
4
+
5
+ ## 0.2.0 (2020-03-26)
6
+
7
+ - Changed to BSD 3-Clause license to match LIBMF
8
+ - Added support for reading data directly from files
9
+ - Added `format: :numo` option to `p_factors` and `q_factors`
10
+ - Improved performance of loading data by 5x
11
+
12
+ ## 0.1.3 (2019-11-07)
13
+
14
+ - Made parameter names more Ruby-like
15
+ - No need to set `do_nmf` with generalized KL-divergence
16
+
17
+ ## 0.1.2 (2019-11-06)
18
+
19
+ - Fixed bug in `p_factors` and `q_factors` methods
20
+
21
+ ## 0.1.1 (2019-11-05)
22
+
23
+ - Fixed errors on Linux and Windows
24
+
25
+ ## 0.1.0 (2019-11-04)
2
26
 
3
27
  - First release
@@ -1,22 +1,30 @@
1
- Copyright (c) 2019 Andrew Kane
1
+ BSD 3-Clause License
2
2
 
3
- MIT License
3
+ Copyright (c) 2014-2015, The LIBMF Project
4
+ Copyright (c) 2019-2020, Andrew Kane
5
+ All rights reserved.
4
6
 
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
7
+ Redistribution and use in source and binary forms, with or without
8
+ modification, are permitted provided that the following conditions are met:
12
9
 
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
10
+ 1. Redistributions of source code must retain the above copyright notice, this
11
+ list of conditions and the following disclaimer.
15
12
 
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
13
+ 2. Redistributions in binary form must reproduce the above copyright notice,
14
+ this list of conditions and the following disclaimer in the documentation
15
+ and/or other materials provided with the distribution.
16
+
17
+ 3. Neither the name of the copyright holder nor the names of its
18
+ contributors may be used to endorse or promote products derived from
19
+ this software without specific prior written permission.
20
+
21
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
22
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
24
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
25
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
27
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
28
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.md CHANGED
@@ -2,7 +2,9 @@
2
2
 
3
3
  [LIBMF](https://github.com/cjlin1/libmf) - large-scale sparse matrix factorization - for Ruby
4
4
 
5
- :fire: Uses the C API for blazing performance
5
+ Check out [Disco](https://github.com/ankane/disco) for higher-level collaborative filtering
6
+
7
+ [![Build Status](https://github.com/ankane/libmf/workflows/build/badge.svg?branch=master)](https://github.com/ankane/libmf/actions)
6
8
 
7
9
  ## Installation
8
10
 
@@ -37,14 +39,19 @@ Make predictions
37
39
  model.predict(row_index, column_index)
38
40
  ```
39
41
 
40
- Get the bias and latent factors
42
+ Get the latent factors (these approximate the training matrix)
41
43
 
42
44
  ```ruby
43
- model.bias
44
45
  model.p_factors
45
46
  model.q_factors
46
47
  ```
47
48
 
49
+ Get the bias (average of all elements in the training matrix)
50
+
51
+ ```ruby
52
+ model.bias
53
+ ```
54
+
48
55
  Save the model to a file
49
56
 
50
57
  ```ruby
@@ -63,48 +70,87 @@ Pass a validation set
63
70
  model.fit(data, eval_set: eval_set)
64
71
  ```
65
72
 
73
+ ## Cross-Validation
74
+
75
+ Perform cross-validation
76
+
77
+ ```ruby
78
+ model.cv(data)
79
+ ```
80
+
81
+ Specify the number of folds
82
+
83
+ ```ruby
84
+ model.cv(data, folds: 5)
85
+ ```
86
+
66
87
  ## Parameters
67
88
 
68
- Pass parameters
89
+ Pass parameters - default values below
69
90
 
70
91
  ```ruby
71
- model = Libmf::Model.new(k: 20, nr_iters: 50)
72
- ```
73
-
74
- Supports the same parameters as LIBMF
75
-
76
- ```text
77
- variable meaning default
78
- ================================================================
79
- fun loss function 0
80
- k number of latent factors 8
81
- nr_threads number of threads used 12
82
- nr_bins number of bins 25
83
- nr_iters number of iterations 20
84
- lambda_p1 coefficient of L1-norm regularization on P 0
85
- lambda_p2 coefficient of L2-norm regularization on P 0.1
86
- lambda_q1 coefficient of L1-norm regularization on Q 0
87
- lambda_q2 coefficient of L2-norm regularization on Q 0.1
88
- eta learning rate 0.1
89
- alpha importance of negative entries 0.1
90
- c desired value of negative entries 0.0001
91
- do_nmf perform non-negative MF (NMF) false
92
- quiet no outputs to stdout false
93
- copy_data copy data in training procedure true
92
+ Libmf::Model.new(
93
+ loss: 0, # loss function
94
+ factors: 8, # number of latent factors
95
+ threads: 12, # number of threads used
96
+ bins: 25, # number of bins
97
+ iterations: 20, # number of iterations
98
+ lambda_p1: 0, # coefficient of L1-norm regularization on P
99
+ lambda_p2: 0.1, # coefficient of L2-norm regularization on P
100
+ lambda_q1: 0, # coefficient of L1-norm regularization on Q
101
+ lambda_q2: 0.1, # coefficient of L2-norm regularization on Q
102
+ learning_rate: 0.1, # learning rate
103
+ alpha: 0.1, # importance of negative entries
104
+ c: 0.0001, # desired value of negative entries
105
+ nmf: false, # perform non-negative MF (NMF)
106
+ quiet: false # no outputs to stdout
107
+ )
94
108
  ```
95
109
 
96
- ## Cross-Validation
110
+ ### Loss Functions
97
111
 
98
- Perform cross-validation
112
+ For real-valued matrix factorization
113
+
114
+ - 0 - squared error (L2-norm)
115
+ - 1 - absolute error (L1-norm)
116
+ - 2 - generalized KL-divergence
117
+
118
+ For binary matrix factorization
119
+
120
+ - 5 - logarithmic error
121
+ - 6 - squared hinge loss
122
+ - 7 - hinge loss
123
+
124
+ For one-class matrix factorization
125
+
126
+ - 10 - row-oriented pair-wise logarithmic loss
127
+ - 11 - column-oriented pair-wise logarithmic loss
128
+ - 12 - squared error (L2-norm)
129
+
130
+ ## Performance
131
+
132
+ For performance, read data directly from files
99
133
 
100
134
  ```ruby
101
- model.cv(data)
135
+ model.fit("train.txt", eval_set: "validate.txt")
136
+ model.cv("train.txt")
102
137
  ```
103
138
 
104
- Specify the number of folds
139
+ Data should be in the format `row_index column_index value`:
140
+
141
+ ```txt
142
+ 0 0 5.0
143
+ 0 2 3.5
144
+ 1 1 4.0
145
+ ```
146
+
147
+ ## Numo
148
+
149
+ Get latent factors as Numo arrays
105
150
 
106
151
  ```ruby
107
- model.cv(data, folds: 5)
152
+ model.p_factors(format: :numo)
153
+ model.q_factors(format: :numo)
108
154
  ```
109
155
 
110
156
  ## Resources
@@ -123,3 +169,13 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
123
169
  - Fix bugs and [submit pull requests](https://github.com/ankane/libmf/pulls)
124
170
  - Write, clarify, or fix documentation
125
171
  - Suggest or add new features
172
+
173
+ To get started with development:
174
+
175
+ ```sh
176
+ git clone --recursive https://github.com/ankane/libmf.git
177
+ cd libmf
178
+ bundle install
179
+ bundle exec rake vendor:all
180
+ bundle exec rake test
181
+ ```
@@ -11,15 +11,18 @@ module Libmf
11
11
  class << self
12
12
  attr_accessor :ffi_lib
13
13
  end
14
- self.ffi_lib = ["mf"]
15
-
16
- lib_path =
17
- if ::FFI::Platform.windows?
18
- "../vendor/windows/mf.dll"
14
+ lib_name =
15
+ if Gem.win_platform?
16
+ "mf.dll"
17
+ elsif RbConfig::CONFIG["arch"] =~ /arm64-darwin/i
18
+ "libmf.arm64.dylib"
19
+ elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
20
+ "libmf.dylib"
19
21
  else
20
- "libmf.bundle"
22
+ "libmf.so"
21
23
  end
22
- self.ffi_lib << File.expand_path(lib_path, __dir__)
24
+ vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
25
+ self.ffi_lib = [vendor_lib]
23
26
 
24
27
  # friendlier error message
25
28
  autoload :FFI, "libmf/ffi"
@@ -2,12 +2,7 @@ module Libmf
2
2
  module FFI
3
3
  extend ::FFI::Library
4
4
 
5
- begin
6
- ffi_lib Libmf.ffi_lib
7
- rescue LoadError => e
8
- raise e if ENV["LIBMF_DEBUG"]
9
- raise LoadError, "Could not find LIBMF"
10
- end
5
+ ffi_lib Libmf.ffi_lib
11
6
 
12
7
  class Node < ::FFI::Struct
13
8
  layout :u, :int,
@@ -51,6 +46,7 @@ module Libmf
51
46
  end
52
47
 
53
48
  attach_function :mf_get_default_param, [], Parameter.by_value
49
+ attach_function :mf_read_problem, [:string], Problem.by_value
54
50
  attach_function :mf_save_model, [Model.by_ref, :string], :int
55
51
  attach_function :mf_load_model, [:string], Model.by_ref
56
52
  attach_function :mf_destroy_model, [Model.by_ref], :void
@@ -51,16 +51,27 @@ module Libmf
51
51
  model[:b]
52
52
  end
53
53
 
54
- def p_factors
55
- reshape(model[:p].read_array_of_float(factors * rows), [rows, factors])
54
+ def p_factors(format: nil)
55
+ _factors(model[:p], rows, format)
56
56
  end
57
57
 
58
- def q_factors
59
- reshape(model[:q].read_array_of_float(factors * columns), [columns, factors])
58
+ def q_factors(format: nil)
59
+ _factors(model[:q], columns, format)
60
60
  end
61
61
 
62
62
  private
63
63
 
64
+ def _factors(ptr, n, format)
65
+ case format
66
+ when :numo
67
+ Numo::SFloat.from_string(ptr.read_bytes(n * factors * 4)).reshape(n, factors)
68
+ when nil
69
+ ptr.read_array_of_float(n * factors).each_slice(factors).to_a
70
+ else
71
+ raise ArgumentError, "Invalid format"
72
+ end
73
+ end
74
+
64
75
  def model
65
76
  raise Error, "Not fit" unless @model
66
77
  @model
@@ -68,45 +79,60 @@ module Libmf
68
79
 
69
80
  def param
70
81
  param = FFI.mf_get_default_param
82
+ options = @options.dup
71
83
  # silence insufficient blocks warning with default params
72
- options = {nr_bins: 25}.merge(@options)
84
+ options[:bins] ||= 25 unless options[:nr_bins]
85
+ options[:copy_data] = false unless options.key?(:copy_data)
86
+ options_map = {
87
+ :loss => :fun,
88
+ :factors => :k,
89
+ :threads => :nr_threads,
90
+ :bins => :nr_bins,
91
+ :iterations => :nr_iters,
92
+ :learning_rate => :eta,
93
+ :nmf => :do_nmf
94
+ }
73
95
  options.each do |k, v|
96
+ k = options_map[k] if options_map[k]
74
97
  param[k] = v
75
98
  end
99
+ # do_nmf must be true for generalized KL-divergence
100
+ param[:do_nmf] = true if param[:fun] == 2
76
101
  param
77
102
  end
78
103
 
79
104
  def create_problem(data)
105
+ if data.is_a?(String)
106
+ # need to expand path so it's absolute
107
+ return FFI.mf_read_problem(File.expand_path(data))
108
+ end
109
+
80
110
  raise Error, "No data" if data.empty?
81
111
 
82
- nodes = []
83
- r = ::FFI::MemoryPointer.new(FFI::Node, data.size)
84
- data.each_with_index do |row, i|
85
- n = FFI::Node.new(r[i])
86
- n[:u] = row[0]
87
- n[:v] = row[1]
88
- n[:r] = row[2]
89
- nodes << n
112
+ # TODO do in C for better performance
113
+ # can use FIX2INT() and RFLOAT_VALUE() instead of pack
114
+ buffer = String.new
115
+ data.each do |row|
116
+ row[0, 2].pack("i*".freeze, buffer: buffer)
117
+ row[2, 1].pack("f".freeze, buffer: buffer)
90
118
  end
91
119
 
92
- m = nodes.map { |n| n[:u] }.max + 1
93
- n = nodes.map { |n| n[:v] }.max + 1
120
+ r = ::FFI::MemoryPointer.new(FFI::Node, data.size)
121
+ r.write_bytes(buffer)
122
+
123
+ # double check size is what we expect
124
+ # FFI will throw an error above if too long
125
+ raise Error, "Bad buffer size" if r.size != buffer.bytesize
126
+
127
+ m = data.max_by { |r| r[0] }[0] + 1
128
+ n = data.max_by { |r| r[1] }[1] + 1
94
129
 
95
130
  prob = FFI::Problem.new
96
131
  prob[:m] = m
97
132
  prob[:n] = n
98
- prob[:nnz] = nodes.size
133
+ prob[:nnz] = data.size
99
134
  prob[:r] = r
100
135
  prob
101
136
  end
102
-
103
- def reshape(arr, dims)
104
- rows = dims.first
105
- new_arr = rows.times.map { [] }
106
- arr.each_with_index do |v, i|
107
- new_arr[i % rows] << v
108
- end
109
- new_arr
110
- end
111
137
  end
112
138
  end
@@ -1,3 +1,3 @@
1
1
  module Libmf
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.1"
3
3
  end
File without changes
Binary file
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: libmf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-11-06 00:00:00.000000000 Z
11
+ date: 2020-12-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -24,103 +24,31 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
- - !ruby/object:Gem::Dependency
28
- name: bundler
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - ">="
32
- - !ruby/object:Gem::Version
33
- version: '0'
34
- type: :development
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - ">="
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: rake
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: minitest
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '5'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '5'
69
- - !ruby/object:Gem::Dependency
70
- name: rake-compiler
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '0'
83
- description:
27
+ description:
84
28
  email: andrew@chartkick.com
85
29
  executables: []
86
- extensions:
87
- - ext/libmf/extconf.rb
30
+ extensions: []
88
31
  extra_rdoc_files: []
89
32
  files:
90
33
  - CHANGELOG.md
91
34
  - LICENSE.txt
92
35
  - README.md
93
- - ext/libmf/extconf.rb
94
- - lib/libmf.bundle
95
36
  - lib/libmf.rb
96
37
  - lib/libmf/ffi.rb
97
38
  - lib/libmf/model.rb
98
39
  - lib/libmf/version.rb
99
- - vendor/libmf/COPYRIGHT
100
- - vendor/libmf/Makefile
101
- - vendor/libmf/Makefile.win
102
- - vendor/libmf/README
103
- - vendor/libmf/demo/all_one_matrix.te.txt
104
- - vendor/libmf/demo/all_one_matrix.tr.txt
105
- - vendor/libmf/demo/binary_matrix.te.txt
106
- - vendor/libmf/demo/binary_matrix.tr.txt
107
- - vendor/libmf/demo/demo.bat
108
- - vendor/libmf/demo/demo.sh
109
- - vendor/libmf/demo/real_matrix.te.txt
110
- - vendor/libmf/demo/real_matrix.tr.txt
111
- - vendor/libmf/mf-predict.cpp
112
- - vendor/libmf/mf-train.cpp
113
- - vendor/libmf/mf.cpp
114
- - vendor/libmf/mf.def
115
- - vendor/libmf/mf.h
116
- - vendor/libmf/windows/mf-predict.exe
117
- - vendor/libmf/windows/mf-train.exe
118
- - vendor/libmf/windows/mf.dll
40
+ - vendor/COPYRIGHT
41
+ - vendor/demo/real_matrix.te.txt
42
+ - vendor/demo/real_matrix.tr.txt
43
+ - vendor/libmf.arm64.dylib
44
+ - vendor/libmf.dylib
45
+ - vendor/libmf.so
46
+ - vendor/mf.dll
119
47
  homepage: https://github.com/ankane/libmf
120
48
  licenses:
121
- - MIT
49
+ - BSD-3-Clause
122
50
  metadata: {}
123
- post_install_message:
51
+ post_install_message:
124
52
  rdoc_options: []
125
53
  require_paths:
126
54
  - lib
@@ -135,8 +63,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
135
63
  - !ruby/object:Gem::Version
136
64
  version: '0'
137
65
  requirements: []
138
- rubygems_version: 3.0.3
139
- signing_key:
66
+ rubygems_version: 3.2.3
67
+ signing_key:
140
68
  specification_version: 4
141
- summary: LIBMF - large-scale sparse matrix factorization - for Ruby
69
+ summary: Large-scale sparse matrix factorization for Ruby
142
70
  test_files: []