xlearn 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +104 -9
- data/lib/xlearn/dmatrix.rb +35 -5
- data/lib/xlearn/ffm.rb +19 -0
- data/lib/xlearn/fm.rb +12 -0
- data/lib/xlearn/model.rb +76 -11
- data/lib/xlearn/version.rb +1 -1
- metadata +30 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 32722c5c1623a0f680dba1ca05e37105247a0163600da6ae2a5739173aa94066
|
4
|
+
data.tar.gz: 87938813a982897dcedcabd351292618d15faad53b69ae33f749b8dea7cf6bff
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: fdc680d913d7ca6100da8102d1b1fc173c9e354e99935eb0eff0e04a23f7cc1e63f5caf547a2c0afc503fb4521e519691cb866d2fabde504f1fa75e0de11238c
|
7
|
+
data.tar.gz: 85cd2ad8a5b6f984a5af7024481fc58a7602e47e16b91f013bb6297f335d8bf61273e266346056cb09f5310381e2ae72919c6ce27b1d7b702d5a29bfc53a5197
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -10,6 +10,8 @@ Supports:
|
|
10
10
|
- Factorization machines
|
11
11
|
- Field-aware factorization machines
|
12
12
|
|
13
|
+
[](https://travis-ci.org/ankane/xlearn)
|
14
|
+
|
13
15
|
## Installation
|
14
16
|
|
15
17
|
First, [install xLearn](https://xlearn-doc.readthedocs.io/en/latest/install/index.html). On Mac, copy `build/lib/libxlearn_api.dylib` to `/usr/local/lib`.
|
@@ -22,8 +24,6 @@ gem 'xlearn'
|
|
22
24
|
|
23
25
|
## Getting Started
|
24
26
|
|
25
|
-
This library is modeled after the [Python Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html). Some methods are missing at the moment. PRs welcome!
|
26
|
-
|
27
27
|
Prep your data
|
28
28
|
|
29
29
|
```ruby
|
@@ -58,41 +58,136 @@ Load the model from a file
|
|
58
58
|
model.load_model("model.bin")
|
59
59
|
```
|
60
60
|
|
61
|
+
Save a text version of the model
|
62
|
+
|
63
|
+
```ruby
|
64
|
+
model.save_txt("model.txt")
|
65
|
+
```
|
66
|
+
|
67
|
+
Pass a validation set
|
68
|
+
|
69
|
+
```ruby
|
70
|
+
model.fit(x_train, y_train, eval_set: [x_val, y_val])
|
71
|
+
```
|
72
|
+
|
73
|
+
Train online
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
model.partial_fit(x_train, y_train)
|
77
|
+
```
|
78
|
+
|
79
|
+
Get the bias term, linear term, and latent factors
|
80
|
+
|
81
|
+
```ruby
|
82
|
+
model.bias_term
|
83
|
+
model.linear_term
|
84
|
+
model.latent_factors # fm and ffm only
|
85
|
+
```
|
86
|
+
|
61
87
|
## Parameters
|
62
88
|
|
63
89
|
Specify parameters
|
64
90
|
|
65
91
|
```ruby
|
66
|
-
model = XLearn::
|
92
|
+
model = XLearn::Linear.new(k: 20, epoch: 50)
|
67
93
|
```
|
68
94
|
|
69
95
|
Supports the same parameters as [Python](https://xlearn-doc.readthedocs.io/en/latest/all_api/index.html)
|
70
96
|
|
71
|
-
## Validation
|
97
|
+
## Cross-Validation
|
72
98
|
|
73
|
-
|
99
|
+
Cross-validation
|
74
100
|
|
75
101
|
```ruby
|
76
|
-
model.
|
102
|
+
model.cv(x, y)
|
103
|
+
```
|
104
|
+
|
105
|
+
Specify the number of folds
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
model.cv(x, y, folds: 5)
|
109
|
+
```
|
110
|
+
|
111
|
+
## Data
|
112
|
+
|
113
|
+
Data can be an array of arrays
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
[[1, 2, 3], [4, 5, 6]]
|
117
|
+
```
|
118
|
+
|
119
|
+
Or a Daru data frame
|
120
|
+
|
121
|
+
```ruby
|
122
|
+
Daru::DataFrame.from_csv("houses.csv")
|
123
|
+
```
|
124
|
+
|
125
|
+
Or a Numo NArray
|
126
|
+
|
127
|
+
```ruby
|
128
|
+
Numo::DFloat.new(3, 2).seq
|
77
129
|
```
|
78
130
|
|
79
131
|
## Performance
|
80
132
|
|
81
|
-
For
|
133
|
+
For large datasets, read data directly from files
|
82
134
|
|
83
135
|
```ruby
|
84
136
|
model.fit("train.txt", eval_set: "validate.txt")
|
85
137
|
model.predict("test.txt")
|
138
|
+
model.cv("train.txt")
|
139
|
+
```
|
140
|
+
|
141
|
+
For linear models and factorization machines, use CSV:
|
142
|
+
|
143
|
+
```txt
|
144
|
+
label,value_1,value_2,...,value_n
|
145
|
+
```
|
146
|
+
|
147
|
+
Or the `libsvm` format (better for sparse data):
|
148
|
+
|
149
|
+
```txt
|
150
|
+
label index_1:value_1 index_2:value_2 ... index_n:value_n
|
151
|
+
```
|
152
|
+
|
153
|
+
> You can also use commas instead of spaces for separators
|
154
|
+
|
155
|
+
For field-aware factorization machines, use the `libffm` format:
|
156
|
+
|
157
|
+
```txt
|
158
|
+
label field_1:index_1:value_1 field_2:index_2:value_2 ...
|
86
159
|
```
|
87
160
|
|
88
|
-
|
161
|
+
> You can also use commas instead of spaces for separators
|
89
162
|
|
90
163
|
You can also write predictions directly to a file
|
91
164
|
|
92
165
|
```ruby
|
93
|
-
model.predict("test.txt",
|
166
|
+
model.predict("test.txt", out_path: "predictions.txt")
|
167
|
+
```
|
168
|
+
|
169
|
+
## xLearn Installation
|
170
|
+
|
171
|
+
There’s an experimental branch that includes xLearn with the gem for easiest installation.
|
172
|
+
|
173
|
+
```ruby
|
174
|
+
gem 'xlearn', github: 'ankane/xlearn', branch: 'vendor', submodules: true
|
94
175
|
```
|
95
176
|
|
177
|
+
Please file an issue if it doesn’t work for you.
|
178
|
+
|
179
|
+
You can also specify the path to xLearn in an initializer:
|
180
|
+
|
181
|
+
```ruby
|
182
|
+
XLearn.ffi_lib << "/path/to/xlearn/lib/libxlearn_api.so"
|
183
|
+
```
|
184
|
+
|
185
|
+
> Use `libxlearn_api.dylib` for Mac and `xlearn_api.dll` for Windows
|
186
|
+
|
187
|
+
## Credits
|
188
|
+
|
189
|
+
This library is modeled after xLearn’s [Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html).
|
190
|
+
|
96
191
|
## History
|
97
192
|
|
98
193
|
View the [changelog](https://github.com/ankane/xlearn/blob/master/CHANGELOG.md)
|
data/lib/xlearn/dmatrix.rb
CHANGED
@@ -5,14 +5,30 @@ module XLearn
|
|
5
5
|
def initialize(data, label: nil)
|
6
6
|
@handle = ::FFI::MemoryPointer.new(:pointer)
|
7
7
|
|
8
|
-
|
9
|
-
|
8
|
+
if matrix?(data)
|
9
|
+
nrow = data.row_count
|
10
|
+
ncol = data.column_count
|
11
|
+
flat_data = data.to_a.flatten
|
12
|
+
elsif daru?(data)
|
13
|
+
nrow, ncol = data.shape
|
14
|
+
flat_data = data.map_rows(&:to_a).flatten
|
15
|
+
elsif narray?(data)
|
16
|
+
nrow, ncol = data.shape
|
17
|
+
# TODO convert to SFloat and pass pointer
|
18
|
+
# for better performance
|
19
|
+
flat_data = data.flatten.to_a
|
20
|
+
else
|
21
|
+
nrow = data.count
|
22
|
+
ncol = data.first.count
|
23
|
+
flat_data = data.flatten
|
24
|
+
end
|
10
25
|
|
11
|
-
c_data = ::FFI::MemoryPointer.new(:float,
|
12
|
-
c_data.put_array_of_float(0,
|
26
|
+
c_data = ::FFI::MemoryPointer.new(:float, flat_data.size)
|
27
|
+
c_data.put_array_of_float(0, flat_data)
|
13
28
|
|
14
29
|
if label
|
15
|
-
|
30
|
+
label = label.to_a
|
31
|
+
c_label = ::FFI::MemoryPointer.new(:float, label.size)
|
16
32
|
c_label.put_array_of_float(0, label)
|
17
33
|
end
|
18
34
|
|
@@ -31,5 +47,19 @@ module XLearn
|
|
31
47
|
# must use proc instead of stabby lambda
|
32
48
|
proc { FFI.XlearnDataFree(pointer) }
|
33
49
|
end
|
50
|
+
|
51
|
+
private
|
52
|
+
|
53
|
+
def matrix?(data)
|
54
|
+
defined?(Matrix) && data.is_a?(Matrix)
|
55
|
+
end
|
56
|
+
|
57
|
+
def daru?(data)
|
58
|
+
defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
|
59
|
+
end
|
60
|
+
|
61
|
+
def narray?(data)
|
62
|
+
defined?(Numo::NArray) && data.is_a?(Numo::NArray)
|
63
|
+
end
|
34
64
|
end
|
35
65
|
end
|
data/lib/xlearn/ffm.rb
CHANGED
@@ -4,5 +4,24 @@ module XLearn
|
|
4
4
|
@model_type = "ffm"
|
5
5
|
super
|
6
6
|
end
|
7
|
+
|
8
|
+
# shape is [i, j, k]
|
9
|
+
# for v_{i}_{j}
|
10
|
+
def latent_factors
|
11
|
+
factor = []
|
12
|
+
current = -1
|
13
|
+
read_txt do |line|
|
14
|
+
if line.start_with?("v_")
|
15
|
+
parts = line.split(": ")
|
16
|
+
i = parts.first.split("_")[1].to_i
|
17
|
+
if i != current
|
18
|
+
factor << []
|
19
|
+
current = i
|
20
|
+
end
|
21
|
+
factor.last << parts.last.split(" ").map(&:to_f)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
factor
|
25
|
+
end
|
7
26
|
end
|
8
27
|
end
|
data/lib/xlearn/fm.rb
CHANGED
@@ -4,5 +4,17 @@ module XLearn
|
|
4
4
|
@model_type = "fm"
|
5
5
|
super
|
6
6
|
end
|
7
|
+
|
8
|
+
# shape is [i, k]
|
9
|
+
# for v_{i}
|
10
|
+
def latent_factors
|
11
|
+
factor = []
|
12
|
+
read_txt do |line|
|
13
|
+
if line.start_with?("v_")
|
14
|
+
factor << line.split(": ").last.split(" ").map(&:to_f)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
factor
|
18
|
+
end
|
7
19
|
end
|
8
20
|
end
|
data/lib/xlearn/model.rb
CHANGED
@@ -20,14 +20,14 @@ module XLearn
|
|
20
20
|
end
|
21
21
|
|
22
22
|
def fit(x, y = nil, eval_set: nil)
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
23
|
+
@model_path = nil
|
24
|
+
partial_fit(x, y, eval_set: eval_set)
|
25
|
+
end
|
26
|
+
|
27
|
+
def partial_fit(x, y = nil, eval_set: nil)
|
28
|
+
check_call FFI.XLearnSetPreModel(@handle, @model_path || "")
|
29
|
+
|
30
|
+
set_train_set(x, y)
|
31
31
|
|
32
32
|
if eval_set
|
33
33
|
if eval_set.is_a?(String)
|
@@ -38,9 +38,12 @@ module XLearn
|
|
38
38
|
end
|
39
39
|
end
|
40
40
|
|
41
|
-
|
42
|
-
@
|
41
|
+
@txt_file ||= create_tempfile
|
42
|
+
check_call FFI.XLearnSetTXTModel(@handle, @txt_file.path)
|
43
|
+
|
44
|
+
@model_file ||= create_tempfile
|
43
45
|
check_call FFI.XLearnFit(@handle, @model_file.path)
|
46
|
+
@model_path = @model_file.path
|
44
47
|
end
|
45
48
|
|
46
49
|
def predict(x, out_path: nil)
|
@@ -63,24 +66,72 @@ module XLearn
|
|
63
66
|
end
|
64
67
|
end
|
65
68
|
|
69
|
+
def cv(x, y = nil, folds: nil)
|
70
|
+
set_params(fold: folds) if folds
|
71
|
+
set_train_set(x, y)
|
72
|
+
check_call FFI.XLearnCV(@handle)
|
73
|
+
end
|
74
|
+
|
66
75
|
def save_model(path)
|
67
76
|
raise Error, "Not trained" unless @model_file
|
68
77
|
FileUtils.cp(@model_file.path, path)
|
69
78
|
end
|
70
79
|
|
80
|
+
def save_txt(path)
|
81
|
+
raise Error, "Not trained" unless @txt_file
|
82
|
+
FileUtils.cp(@txt_file.path, path)
|
83
|
+
end
|
84
|
+
|
71
85
|
def load_model(path)
|
72
|
-
@model_file ||=
|
86
|
+
@model_file ||= create_tempfile
|
73
87
|
# TODO ensure tempfile is still cleaned up
|
74
88
|
FileUtils.cp(path, @model_file.path)
|
75
89
|
end
|
76
90
|
|
91
|
+
def bias_term
|
92
|
+
read_txt do |line|
|
93
|
+
return line.split(":").last.to_f if line.start_with?("bias:")
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
def linear_term
|
98
|
+
term = []
|
99
|
+
read_txt do |line|
|
100
|
+
if line.start_with?("i_")
|
101
|
+
term << line.split(":").last.to_f
|
102
|
+
elsif line.start_with?("v_")
|
103
|
+
break
|
104
|
+
end
|
105
|
+
end
|
106
|
+
term
|
107
|
+
end
|
108
|
+
|
77
109
|
def self.finalize(pointer)
|
78
110
|
# must use proc instead of stabby lambda
|
79
111
|
proc { FFI.XLearnHandleFree(pointer) }
|
80
112
|
end
|
81
113
|
|
114
|
+
def self.finalize_file(file)
|
115
|
+
# must use proc instead of stabby lambda
|
116
|
+
proc do
|
117
|
+
file.close
|
118
|
+
file.unlink
|
119
|
+
end
|
120
|
+
end
|
121
|
+
|
82
122
|
private
|
83
123
|
|
124
|
+
def set_train_set(x, y)
|
125
|
+
if x.is_a?(String)
|
126
|
+
check_call FFI.XLearnSetTrain(@handle, x)
|
127
|
+
check_call FFI.XLearnSetBool(@handle, "from_file", true)
|
128
|
+
else
|
129
|
+
train_set = DMatrix.new(x, label: y)
|
130
|
+
check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
|
131
|
+
check_call FFI.XLearnSetBool(@handle, "from_file", false)
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
84
135
|
def set_params(params)
|
85
136
|
params.each do |k, v|
|
86
137
|
k = k.to_s
|
@@ -100,5 +151,19 @@ module XLearn
|
|
100
151
|
check_call ret
|
101
152
|
end
|
102
153
|
end
|
154
|
+
|
155
|
+
def read_txt
|
156
|
+
if @txt_file
|
157
|
+
File.foreach(@txt_file.path) do |line|
|
158
|
+
yield line
|
159
|
+
end
|
160
|
+
end
|
161
|
+
end
|
162
|
+
|
163
|
+
def create_tempfile
|
164
|
+
file = Tempfile.new("xlearn")
|
165
|
+
ObjectSpace.define_finalizer(self, self.class.finalize_file(file))
|
166
|
+
file
|
167
|
+
end
|
103
168
|
end
|
104
169
|
end
|
data/lib/xlearn/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xlearn
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-10-
|
11
|
+
date: 2019-10-14 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -66,6 +66,34 @@ dependencies:
|
|
66
66
|
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '5'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: daru
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0'
|
83
|
+
- !ruby/object:Gem::Dependency
|
84
|
+
name: numo-narray
|
85
|
+
requirement: !ruby/object:Gem::Requirement
|
86
|
+
requirements:
|
87
|
+
- - ">="
|
88
|
+
- !ruby/object:Gem::Version
|
89
|
+
version: '0'
|
90
|
+
type: :development
|
91
|
+
prerelease: false
|
92
|
+
version_requirements: !ruby/object:Gem::Requirement
|
93
|
+
requirements:
|
94
|
+
- - ">="
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0'
|
69
97
|
description:
|
70
98
|
email: andrew@chartkick.com
|
71
99
|
executables: []
|