xlearn 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cbc492d4f4cb0de9c53cac0820251fc1f747de836348280dfa9d1b7e6475f745
4
- data.tar.gz: d5f1fcbbb10b96714c38fd9c0c924c98fb01dccf1872734e1084a31ad855503c
3
+ metadata.gz: 32722c5c1623a0f680dba1ca05e37105247a0163600da6ae2a5739173aa94066
4
+ data.tar.gz: 87938813a982897dcedcabd351292618d15faad53b69ae33f749b8dea7cf6bff
5
5
  SHA512:
6
- metadata.gz: '048e5915264ba2749e00a91b4c59a773efcd553cf7c764e56df0107e6dd08edcb7bcb86350181b217d095351ea79f47ba6daa84407069fcd9c445683ecdc22a8'
7
- data.tar.gz: 46831787724f8ec1d4063859445a0a9e6aefd2a38b4267032eb11ffcadf145ebbd37ab3babcf56599cf31f3f4143126b1663baf409099926860ce88c591adb75
6
+ metadata.gz: fdc680d913d7ca6100da8102d1b1fc173c9e354e99935eb0eff0e04a23f7cc1e63f5caf547a2c0afc503fb4521e519691cb866d2fabde504f1fa75e0de11238c
7
+ data.tar.gz: 85cd2ad8a5b6f984a5af7024481fc58a7602e47e16b91f013bb6297f335d8bf61273e266346056cb09f5310381e2ae72919c6ce27b1d7b702d5a29bfc53a5197
@@ -1,3 +1,11 @@
1
+ ## 0.1.1
2
+
3
+ - Added `cv` method
4
+ - Added `partial_fit` method
5
+ - Added `save_txt` method
6
+ - Added `bias_term`, `linear_term`, and `latent_factors` methods
7
+ - Added support for Daru and Numo
8
+
1
9
  ## 0.1.0
2
10
 
3
11
  - First release
data/README.md CHANGED
@@ -10,6 +10,8 @@ Supports:
10
10
  - Factorization machines
11
11
  - Field-aware factorization machines
12
12
 
13
+ [![Build Status](https://travis-ci.org/ankane/xlearn.svg?branch=master)](https://travis-ci.org/ankane/xlearn)
14
+
13
15
  ## Installation
14
16
 
15
17
  First, [install xLearn](https://xlearn-doc.readthedocs.io/en/latest/install/index.html). On Mac, copy `build/lib/libxlearn_api.dylib` to `/usr/local/lib`.
@@ -22,8 +24,6 @@ gem 'xlearn'
22
24
 
23
25
  ## Getting Started
24
26
 
25
- This library is modeled after the [Python Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html). Some methods are missing at the moment. PRs welcome!
26
-
27
27
  Prep your data
28
28
 
29
29
  ```ruby
@@ -58,41 +58,136 @@ Load the model from a file
58
58
  model.load_model("model.bin")
59
59
  ```
60
60
 
61
+ Save a text version of the model
62
+
63
+ ```ruby
64
+ model.save_txt("model.txt")
65
+ ```
66
+
67
+ Pass a validation set
68
+
69
+ ```ruby
70
+ model.fit(x_train, y_train, eval_set: [x_val, y_val])
71
+ ```
72
+
73
+ Train online
74
+
75
+ ```ruby
76
+ model.partial_fit(x_train, y_train)
77
+ ```
78
+
79
+ Get the bias term, linear term, and latent factors
80
+
81
+ ```ruby
82
+ model.bias_term
83
+ model.linear_term
84
+ model.latent_factors # fm and ffm only
85
+ ```
86
+
61
87
  ## Parameters
62
88
 
63
89
  Specify parameters
64
90
 
65
91
  ```ruby
66
- model = XLearn::FM.new(k: 20, epoch: 50)
92
+ model = XLearn::Linear.new(k: 20, epoch: 50)
67
93
  ```
68
94
 
69
95
  Supports the same parameters as [Python](https://xlearn-doc.readthedocs.io/en/latest/all_api/index.html)
70
96
 
71
- ## Validation
97
+ ## Cross-Validation
72
98
 
73
- Pass a validation set when fitting
99
+ Cross-validation
74
100
 
75
101
  ```ruby
76
- model.fit(x_train, y_train, eval_set: [x_val, y_val])
102
+ model.cv(x, y)
103
+ ```
104
+
105
+ Specify the number of folds
106
+
107
+ ```ruby
108
+ model.cv(x, y, folds: 5)
109
+ ```
110
+
111
+ ## Data
112
+
113
+ Data can be an array of arrays
114
+
115
+ ```ruby
116
+ [[1, 2, 3], [4, 5, 6]]
117
+ ```
118
+
119
+ Or a Daru data frame
120
+
121
+ ```ruby
122
+ Daru::DataFrame.from_csv("houses.csv")
123
+ ```
124
+
125
+ Or a Numo NArray
126
+
127
+ ```ruby
128
+ Numo::DFloat.new(3, 2).seq
77
129
  ```
78
130
 
79
131
  ## Performance
80
132
 
81
- For performance, you can read data directly from files
133
+ For large datasets, read data directly from files
82
134
 
83
135
  ```ruby
84
136
  model.fit("train.txt", eval_set: "validate.txt")
85
137
  model.predict("test.txt")
138
+ model.cv("train.txt")
139
+ ```
140
+
141
+ For linear models and factorization machines, use CSV:
142
+
143
+ ```txt
144
+ label,value_1,value_2,...,value_n
145
+ ```
146
+
147
+ Or the `libsvm` format (better for sparse data):
148
+
149
+ ```txt
150
+ label index_1:value_1 index_2:value_2 ... index_n:value_n
151
+ ```
152
+
153
+ > You can also use commas instead of spaces for separators
154
+
155
+ For field-aware factorization machines, use the `libffm` format:
156
+
157
+ ```txt
158
+ label field_1:index_1:value_1 field_2:index_2:value_2 ...
86
159
  ```
87
160
 
88
- [These formats](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html#choose-machine-learning-algorithm) are supported
161
+ > You can also use commas instead of spaces for separators
89
162
 
90
163
  You can also write predictions directly to a file
91
164
 
92
165
  ```ruby
93
- model.predict("test.txt", out_file: "predictions.txt")
166
+ model.predict("test.txt", out_path: "predictions.txt")
167
+ ```
168
+
169
+ ## xLearn Installation
170
+
171
+ There’s an experimental branch that includes xLearn with the gem for easiest installation.
172
+
173
+ ```ruby
174
+ gem 'xlearn', github: 'ankane/xlearn', branch: 'vendor', submodules: true
94
175
  ```
95
176
 
177
+ Please file an issue if it doesn’t work for you.
178
+
179
+ You can also specify the path to xLearn in an initializer:
180
+
181
+ ```ruby
182
+ XLearn.ffi_lib << "/path/to/xlearn/lib/libxlearn_api.so"
183
+ ```
184
+
185
+ > Use `libxlearn_api.dylib` for Mac and `xlearn_api.dll` for Windows
186
+
187
+ ## Credits
188
+
189
+ This library is modeled after xLearn’s [Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html).
190
+
96
191
  ## History
97
192
 
98
193
  View the [changelog](https://github.com/ankane/xlearn/blob/master/CHANGELOG.md)
@@ -5,14 +5,30 @@ module XLearn
5
5
  def initialize(data, label: nil)
6
6
  @handle = ::FFI::MemoryPointer.new(:pointer)
7
7
 
8
- nrow = data.count
9
- ncol = data.first.count
8
+ if matrix?(data)
9
+ nrow = data.row_count
10
+ ncol = data.column_count
11
+ flat_data = data.to_a.flatten
12
+ elsif daru?(data)
13
+ nrow, ncol = data.shape
14
+ flat_data = data.map_rows(&:to_a).flatten
15
+ elsif narray?(data)
16
+ nrow, ncol = data.shape
17
+ # TODO convert to SFloat and pass pointer
18
+ # for better performance
19
+ flat_data = data.flatten.to_a
20
+ else
21
+ nrow = data.count
22
+ ncol = data.first.count
23
+ flat_data = data.flatten
24
+ end
10
25
 
11
- c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
12
- c_data.put_array_of_float(0, data.flatten)
26
+ c_data = ::FFI::MemoryPointer.new(:float, flat_data.size)
27
+ c_data.put_array_of_float(0, flat_data)
13
28
 
14
29
  if label
15
- c_label = ::FFI::MemoryPointer.new(:float, nrow)
30
+ label = label.to_a
31
+ c_label = ::FFI::MemoryPointer.new(:float, label.size)
16
32
  c_label.put_array_of_float(0, label)
17
33
  end
18
34
 
@@ -31,5 +47,19 @@ module XLearn
31
47
  # must use proc instead of stabby lambda
32
48
  proc { FFI.XlearnDataFree(pointer) }
33
49
  end
50
+
51
+ private
52
+
53
+ def matrix?(data)
54
+ defined?(Matrix) && data.is_a?(Matrix)
55
+ end
56
+
57
+ def daru?(data)
58
+ defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
59
+ end
60
+
61
+ def narray?(data)
62
+ defined?(Numo::NArray) && data.is_a?(Numo::NArray)
63
+ end
34
64
  end
35
65
  end
@@ -4,5 +4,24 @@ module XLearn
4
4
  @model_type = "ffm"
5
5
  super
6
6
  end
7
+
8
+ # shape is [i, j, k]
9
+ # for v_{i}_{j}
10
+ def latent_factors
11
+ factor = []
12
+ current = -1
13
+ read_txt do |line|
14
+ if line.start_with?("v_")
15
+ parts = line.split(": ")
16
+ i = parts.first.split("_")[1].to_i
17
+ if i != current
18
+ factor << []
19
+ current = i
20
+ end
21
+ factor.last << parts.last.split(" ").map(&:to_f)
22
+ end
23
+ end
24
+ factor
25
+ end
7
26
  end
8
27
  end
@@ -4,5 +4,17 @@ module XLearn
4
4
  @model_type = "fm"
5
5
  super
6
6
  end
7
+
8
+ # shape is [i, k]
9
+ # for v_{i}
10
+ def latent_factors
11
+ factor = []
12
+ read_txt do |line|
13
+ if line.start_with?("v_")
14
+ factor << line.split(": ").last.split(" ").map(&:to_f)
15
+ end
16
+ end
17
+ factor
18
+ end
7
19
  end
8
20
  end
@@ -20,14 +20,14 @@ module XLearn
20
20
  end
21
21
 
22
22
  def fit(x, y = nil, eval_set: nil)
23
- if x.is_a?(String)
24
- check_call FFI.XLearnSetTrain(@handle, x)
25
- check_call FFI.XLearnSetBool(@handle, "from_file", true)
26
- else
27
- train_set = DMatrix.new(x, label: y)
28
- check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
29
- check_call FFI.XLearnSetBool(@handle, "from_file", false)
30
- end
23
+ @model_path = nil
24
+ partial_fit(x, y, eval_set: eval_set)
25
+ end
26
+
27
+ def partial_fit(x, y = nil, eval_set: nil)
28
+ check_call FFI.XLearnSetPreModel(@handle, @model_path || "")
29
+
30
+ set_train_set(x, y)
31
31
 
32
32
  if eval_set
33
33
  if eval_set.is_a?(String)
@@ -38,9 +38,12 @@ module XLearn
38
38
  end
39
39
  end
40
40
 
41
- # TODO unlink in finalizer
42
- @model_file = Tempfile.new("xlearn")
41
+ @txt_file ||= create_tempfile
42
+ check_call FFI.XLearnSetTXTModel(@handle, @txt_file.path)
43
+
44
+ @model_file ||= create_tempfile
43
45
  check_call FFI.XLearnFit(@handle, @model_file.path)
46
+ @model_path = @model_file.path
44
47
  end
45
48
 
46
49
  def predict(x, out_path: nil)
@@ -63,24 +66,72 @@ module XLearn
63
66
  end
64
67
  end
65
68
 
69
+ def cv(x, y = nil, folds: nil)
70
+ set_params(fold: folds) if folds
71
+ set_train_set(x, y)
72
+ check_call FFI.XLearnCV(@handle)
73
+ end
74
+
66
75
  def save_model(path)
67
76
  raise Error, "Not trained" unless @model_file
68
77
  FileUtils.cp(@model_file.path, path)
69
78
  end
70
79
 
80
+ def save_txt(path)
81
+ raise Error, "Not trained" unless @txt_file
82
+ FileUtils.cp(@txt_file.path, path)
83
+ end
84
+
71
85
  def load_model(path)
72
- @model_file ||= Tempfile.new("xlearn")
86
+ @model_file ||= create_tempfile
73
87
  # TODO ensure tempfile is still cleaned up
74
88
  FileUtils.cp(path, @model_file.path)
75
89
  end
76
90
 
91
+ def bias_term
92
+ read_txt do |line|
93
+ return line.split(":").last.to_f if line.start_with?("bias:")
94
+ end
95
+ end
96
+
97
+ def linear_term
98
+ term = []
99
+ read_txt do |line|
100
+ if line.start_with?("i_")
101
+ term << line.split(":").last.to_f
102
+ elsif line.start_with?("v_")
103
+ break
104
+ end
105
+ end
106
+ term
107
+ end
108
+
77
109
  def self.finalize(pointer)
78
110
  # must use proc instead of stabby lambda
79
111
  proc { FFI.XLearnHandleFree(pointer) }
80
112
  end
81
113
 
114
+ def self.finalize_file(file)
115
+ # must use proc instead of stabby lambda
116
+ proc do
117
+ file.close
118
+ file.unlink
119
+ end
120
+ end
121
+
82
122
  private
83
123
 
124
+ def set_train_set(x, y)
125
+ if x.is_a?(String)
126
+ check_call FFI.XLearnSetTrain(@handle, x)
127
+ check_call FFI.XLearnSetBool(@handle, "from_file", true)
128
+ else
129
+ train_set = DMatrix.new(x, label: y)
130
+ check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
131
+ check_call FFI.XLearnSetBool(@handle, "from_file", false)
132
+ end
133
+ end
134
+
84
135
  def set_params(params)
85
136
  params.each do |k, v|
86
137
  k = k.to_s
@@ -100,5 +151,19 @@ module XLearn
100
151
  check_call ret
101
152
  end
102
153
  end
154
+
155
+ def read_txt
156
+ if @txt_file
157
+ File.foreach(@txt_file.path) do |line|
158
+ yield line
159
+ end
160
+ end
161
+ end
162
+
163
+ def create_tempfile
164
+ file = Tempfile.new("xlearn")
165
+ ObjectSpace.define_finalizer(self, self.class.finalize_file(file))
166
+ file
167
+ end
103
168
  end
104
169
  end
@@ -1,3 +1,3 @@
1
1
  module XLearn
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xlearn
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-12 00:00:00.000000000 Z
11
+ date: 2019-10-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -66,6 +66,34 @@ dependencies:
66
66
  - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '5'
69
+ - !ruby/object:Gem::Dependency
70
+ name: daru
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
69
97
  description:
70
98
  email: andrew@chartkick.com
71
99
  executables: []