libffm 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/LICENSE.txt +32 -0
- data/README.md +91 -0
- data/ext/libffm/ext.cpp +110 -0
- data/ext/libffm/extconf.rb +21 -0
- data/lib/libffm.rb +13 -0
- data/lib/libffm/model.rb +45 -0
- data/lib/libffm/version.rb +3 -0
- data/vendor/libffm/COPYRIGHT +31 -0
- data/vendor/libffm/Makefile +26 -0
- data/vendor/libffm/Makefile.win +26 -0
- data/vendor/libffm/README +294 -0
- data/vendor/libffm/ffm-predict.cpp +105 -0
- data/vendor/libffm/ffm-train.cpp +173 -0
- data/vendor/libffm/ffm.cpp +699 -0
- data/vendor/libffm/ffm.h +51 -0
- data/vendor/libffm/timer.cpp +31 -0
- data/vendor/libffm/timer.h +14 -0
- metadata +74 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: a207ec6c4b7eee7b8649dac4d6cd1a5b1213431c8b2de64c20ed080b5311adcb
|
4
|
+
data.tar.gz: 3d4e37041abe6ef1c25509130e8472029fb47fa52ae004b249c6cb430b80485e
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: d4a0fee6252c9e6bd999404be493c8a5037f73b71ab579954de8366f5aead6ee19cfc568f3bee021a29cd42448daa34ce85d77d8203e3ce8d840c07ad90f5790
|
7
|
+
data.tar.gz: 4ec88cd2cd3959d7f5d579ccdf6b2d7861066c5d0f347f6fd1dd8cb297cd34a89076c49b28537f302b6ac594565837057925b494a001cc6b7d1f4d6230077d6d
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,32 @@
|
|
1
|
+
|
2
|
+
Copyright (c) 2017 The LIBFFM Project.
|
3
|
+
Copyright (c) 2020 Andrew Kane.
|
4
|
+
All rights reserved.
|
5
|
+
|
6
|
+
Redistribution and use in source and binary forms, with or without
|
7
|
+
modification, are permitted provided that the following conditions
|
8
|
+
are met:
|
9
|
+
|
10
|
+
1. Redistributions of source code must retain the above copyright
|
11
|
+
notice, this list of conditions and the following disclaimer.
|
12
|
+
|
13
|
+
2. Redistributions in binary form must reproduce the above copyright
|
14
|
+
notice, this list of conditions and the following disclaimer in the
|
15
|
+
documentation and/or other materials provided with the distribution.
|
16
|
+
|
17
|
+
3. Neither name of copyright holders nor the names of its contributors
|
18
|
+
may be used to endorse or promote products derived from this software
|
19
|
+
without specific prior written permission.
|
20
|
+
|
21
|
+
|
22
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
23
|
+
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
24
|
+
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
25
|
+
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
|
26
|
+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
27
|
+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
28
|
+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
29
|
+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
30
|
+
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
31
|
+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
32
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/README.md
ADDED
@@ -0,0 +1,91 @@
|
|
1
|
+
# LIBFFM
|
2
|
+
|
3
|
+
[LIBFFM](https://github.com/ycjuan/libffm) - field-aware factorization machines - for Ruby
|
4
|
+
|
5
|
+
[![Build Status](https://github.com/ankane/libffm/workflows/build/badge.svg?branch=master)](https://github.com/ankane/libffm/actions)
|
6
|
+
|
7
|
+
## Installation
|
8
|
+
|
9
|
+
Add this line to your application’s Gemfile:
|
10
|
+
|
11
|
+
```ruby
|
12
|
+
gem 'libffm'
|
13
|
+
```
|
14
|
+
|
15
|
+
## Getting Started
|
16
|
+
|
17
|
+
Prep your data in LIBFFM format
|
18
|
+
|
19
|
+
```txt
|
20
|
+
0 0:0:1 1:1:1
|
21
|
+
1 0:2:1 1:3:1
|
22
|
+
```
|
23
|
+
|
24
|
+
Create a model
|
25
|
+
|
26
|
+
```ruby
|
27
|
+
model = Libffm::Model.new
|
28
|
+
model.fit("train.txt")
|
29
|
+
```
|
30
|
+
|
31
|
+
Make predictions
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
model.predict("test.txt")
|
35
|
+
```
|
36
|
+
|
37
|
+
Save the model to a file
|
38
|
+
|
39
|
+
```ruby
|
40
|
+
model.save_model("model.bin")
|
41
|
+
```
|
42
|
+
|
43
|
+
Load the model from a file
|
44
|
+
|
45
|
+
```ruby
|
46
|
+
model.load_model("model.bin")
|
47
|
+
```
|
48
|
+
|
49
|
+
Pass a validation set
|
50
|
+
|
51
|
+
```ruby
|
52
|
+
model.fit("train.txt", eval_set: "validation.txt")
|
53
|
+
```
|
54
|
+
|
55
|
+
## Parameters
|
56
|
+
|
57
|
+
Pass parameters - default values below
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
Libffm::Model.new(
|
61
|
+
eta: 0.2, # learning rate
|
62
|
+
lambda: 0.00002, # regularization parameter
|
63
|
+
nr_iters: 15, # number of iterations
|
64
|
+
k: 4, # number of latent factors
|
65
|
+
normalization: true, # use instance-wise normalization
|
66
|
+
auto_stop: false # stop at the iteration that achieves the best validation loss
|
67
|
+
)
|
68
|
+
```
|
69
|
+
|
70
|
+
## History
|
71
|
+
|
72
|
+
View the [changelog](https://github.com/ankane/libffm/blob/master/CHANGELOG.md)
|
73
|
+
|
74
|
+
## Contributing
|
75
|
+
|
76
|
+
Everyone is encouraged to help improve this project. Here are a few ways you can help:
|
77
|
+
|
78
|
+
- [Report bugs](https://github.com/ankane/libffm/issues)
|
79
|
+
- Fix bugs and [submit pull requests](https://github.com/ankane/libffm/pulls)
|
80
|
+
- Write, clarify, or fix documentation
|
81
|
+
- Suggest or add new features
|
82
|
+
|
83
|
+
To get started with development:
|
84
|
+
|
85
|
+
```sh
|
86
|
+
git clone --recursive https://github.com/ankane/libffm.git
|
87
|
+
cd libffm
|
88
|
+
bundle install
|
89
|
+
bundle exec rake compile
|
90
|
+
bundle exec rake test
|
91
|
+
```
|
data/ext/libffm/ext.cpp
ADDED
@@ -0,0 +1,110 @@
|
|
1
|
+
// stdlib
|
2
|
+
#include <iostream>
|
3
|
+
|
4
|
+
// ffm
|
5
|
+
#include <ffm.h>
|
6
|
+
|
7
|
+
// rice
|
8
|
+
#include <rice/Array.hpp>
|
9
|
+
#include <rice/Class.hpp>
|
10
|
+
#include <rice/Module.hpp>
|
11
|
+
#include <rice/Object.hpp>
|
12
|
+
|
13
|
+
using Rice::Array;
|
14
|
+
using Rice::Class;
|
15
|
+
using Rice::Module;
|
16
|
+
using Rice::Object;
|
17
|
+
using Rice::String;
|
18
|
+
using Rice::define_module;
|
19
|
+
using Rice::define_module_under;
|
20
|
+
using Rice::define_class_under;
|
21
|
+
|
22
|
+
extern "C"
|
23
|
+
void Init_ext()
|
24
|
+
{
|
25
|
+
Module rb_mLibffm = define_module("Libffm");
|
26
|
+
|
27
|
+
Module rb_mExt = define_module_under(rb_mLibffm, "Ext");
|
28
|
+
define_class_under<ffm::ffm_model>(rb_mExt, "Model");
|
29
|
+
|
30
|
+
rb_mExt
|
31
|
+
.define_singleton_method(
|
32
|
+
"release_model",
|
33
|
+
*[](ffm::ffm_model& model) {
|
34
|
+
model.release();
|
35
|
+
})
|
36
|
+
.define_singleton_method(
|
37
|
+
"fit",
|
38
|
+
*[](std::string tr_path, std::string va_path, std::string tmp_prefix, ffm::ffm_float eta, ffm::ffm_float lambda, ffm::ffm_int nr_iters, ffm::ffm_int k, bool normalization, bool auto_stop) {
|
39
|
+
// quiet
|
40
|
+
ffm::cout.setstate(ffm::ios_base::badbit);
|
41
|
+
|
42
|
+
std::string tr_bin_path = tmp_prefix + "train.bin";
|
43
|
+
ffm::ffm_read_problem_to_disk(tr_path, tr_bin_path);
|
44
|
+
|
45
|
+
std::string va_bin_path = "";
|
46
|
+
if (va_path.size() > 0) {
|
47
|
+
va_bin_path = tmp_prefix + "validation.bin";
|
48
|
+
ffm::ffm_read_problem_to_disk(va_path, va_bin_path);
|
49
|
+
}
|
50
|
+
|
51
|
+
ffm::ffm_parameter param;
|
52
|
+
param.eta = eta;
|
53
|
+
param.lambda = lambda;
|
54
|
+
param.nr_iters = nr_iters;
|
55
|
+
param.k = k;
|
56
|
+
param.normalization = normalization;
|
57
|
+
param.auto_stop = auto_stop;
|
58
|
+
|
59
|
+
return ffm::ffm_train_on_disk(tr_bin_path, va_bin_path, param);
|
60
|
+
})
|
61
|
+
.define_singleton_method(
|
62
|
+
"predict",
|
63
|
+
*[](ffm::ffm_model& model, std::string test_path) {
|
64
|
+
int const kMaxLineSize = 1000000;
|
65
|
+
|
66
|
+
FILE *f_in = fopen(test_path.c_str(), "r");
|
67
|
+
char line[kMaxLineSize];
|
68
|
+
|
69
|
+
ffm::vector<ffm::ffm_node> x;
|
70
|
+
ffm::ffm_int i = 0;
|
71
|
+
|
72
|
+
Array ret;
|
73
|
+
for(; fgets(line, kMaxLineSize, f_in) != nullptr; i++) {
|
74
|
+
x.clear();
|
75
|
+
strtok(line, " \t");
|
76
|
+
|
77
|
+
while(true) {
|
78
|
+
char *field_char = strtok(nullptr,":");
|
79
|
+
char *idx_char = strtok(nullptr,":");
|
80
|
+
char *value_char = strtok(nullptr," \t");
|
81
|
+
if(field_char == nullptr || *field_char == '\n')
|
82
|
+
break;
|
83
|
+
|
84
|
+
ffm::ffm_node N;
|
85
|
+
N.f = atoi(field_char);
|
86
|
+
N.j = atoi(idx_char);
|
87
|
+
N.v = atof(value_char);
|
88
|
+
|
89
|
+
x.push_back(N);
|
90
|
+
}
|
91
|
+
|
92
|
+
ffm::ffm_float y_bar = ffm::ffm_predict(x.data(), x.data()+x.size(), model);
|
93
|
+
ret.push(y_bar);
|
94
|
+
}
|
95
|
+
|
96
|
+
fclose(f_in);
|
97
|
+
|
98
|
+
return ret;
|
99
|
+
})
|
100
|
+
.define_singleton_method(
|
101
|
+
"save_model",
|
102
|
+
*[](ffm::ffm_model& model, std::string path) {
|
103
|
+
ffm::ffm_save_model(model, path);
|
104
|
+
})
|
105
|
+
.define_singleton_method(
|
106
|
+
"load_model",
|
107
|
+
*[](std::string path) {
|
108
|
+
return ffm::ffm_load_model(path);
|
109
|
+
});
|
110
|
+
}
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require "mkmf-rice"
|
2
|
+
|
3
|
+
$CXXFLAGS += " -std=c++11 -DUSESSE"
|
4
|
+
|
5
|
+
apple_clang = RbConfig::CONFIG["CC_VERSION_MESSAGE"] =~ /apple clang/i
|
6
|
+
|
7
|
+
# check omp first
|
8
|
+
if have_library("omp") || have_library("gomp")
|
9
|
+
$CXXFLAGS += " -DUSEOMP"
|
10
|
+
$CXXFLAGS += " -Xclang" if apple_clang
|
11
|
+
$CXXFLAGS += " -fopenmp"
|
12
|
+
end
|
13
|
+
|
14
|
+
ext = File.expand_path(".", __dir__)
|
15
|
+
libffm = File.expand_path("../../vendor/libffm", __dir__)
|
16
|
+
|
17
|
+
$srcs = Dir["#{ext}/*.cpp", "#{libffm}/{ffm,timer}.cpp"]
|
18
|
+
$INCFLAGS += " -I#{libffm}"
|
19
|
+
$VPATH << libffm
|
20
|
+
|
21
|
+
create_makefile("libffm/ext")
|
data/lib/libffm.rb
ADDED
data/lib/libffm/model.rb
ADDED
@@ -0,0 +1,45 @@
|
|
1
|
+
module Libffm
|
2
|
+
class Model
|
3
|
+
def initialize(eta: 0.2, lambda: 0.00002, nr_iters: 15, k: 4, normalization: true, auto_stop: false)
|
4
|
+
@eta = eta
|
5
|
+
@lambda = lambda
|
6
|
+
@nr_iters = nr_iters
|
7
|
+
@k = k
|
8
|
+
@normalization = normalization
|
9
|
+
@auto_stop = auto_stop
|
10
|
+
|
11
|
+
@model = nil
|
12
|
+
end
|
13
|
+
|
14
|
+
def fit(data, eval_set: nil)
|
15
|
+
Dir.mktmpdir do |dir|
|
16
|
+
@model = Ext.fit(data, eval_set || "", File.join(dir, ""), @eta, @lambda, @nr_iters, @k, @normalization, @auto_stop)
|
17
|
+
add_finalizer(@model)
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
def predict(data)
|
22
|
+
raise "Not fit" unless @model
|
23
|
+
Ext.predict(@model, data)
|
24
|
+
end
|
25
|
+
|
26
|
+
def save_model(path)
|
27
|
+
Ext.save_model(@model, path)
|
28
|
+
end
|
29
|
+
|
30
|
+
def load_model(path)
|
31
|
+
@model = Ext.load_model(path)
|
32
|
+
add_finalizer(@model)
|
33
|
+
end
|
34
|
+
|
35
|
+
private
|
36
|
+
|
37
|
+
def add_finalizer(model)
|
38
|
+
ObjectSpace.define_finalizer(model, self.class.finalize_model(model))
|
39
|
+
end
|
40
|
+
|
41
|
+
def self.finalize_model(model)
|
42
|
+
proc { Ext.finalize_model(model) }
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
|
2
|
+
Copyright (c) 2017 The LIBFFM Project.
|
3
|
+
All rights reserved.
|
4
|
+
|
5
|
+
Redistribution and use in source and binary forms, with or without
|
6
|
+
modification, are permitted provided that the following conditions
|
7
|
+
are met:
|
8
|
+
|
9
|
+
1. Redistributions of source code must retain the above copyright
|
10
|
+
notice, this list of conditions and the following disclaimer.
|
11
|
+
|
12
|
+
2. Redistributions in binary form must reproduce the above copyright
|
13
|
+
notice, this list of conditions and the following disclaimer in the
|
14
|
+
documentation and/or other materials provided with the distribution.
|
15
|
+
|
16
|
+
3. Neither name of copyright holders nor the names of its contributors
|
17
|
+
may be used to endorse or promote products derived from this software
|
18
|
+
without specific prior written permission.
|
19
|
+
|
20
|
+
|
21
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
22
|
+
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
23
|
+
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
24
|
+
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
|
25
|
+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
26
|
+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
27
|
+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
28
|
+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
29
|
+
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
30
|
+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
31
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
@@ -0,0 +1,26 @@
|
|
1
|
+
CXX = g++
|
2
|
+
CXXFLAGS = -Wall -O3 -std=c++0x -march=native
|
3
|
+
|
4
|
+
# comment the following flags if you do not want to SSE instructions
|
5
|
+
DFLAG += -DUSESSE
|
6
|
+
|
7
|
+
# comment the following flags if you do not want to use OpenMP
|
8
|
+
DFLAG += -DUSEOMP
|
9
|
+
CXXFLAGS += -fopenmp
|
10
|
+
|
11
|
+
all: ffm-train ffm-predict
|
12
|
+
|
13
|
+
ffm-train: ffm-train.cpp ffm.o timer.o
|
14
|
+
$(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
|
15
|
+
|
16
|
+
ffm-predict: ffm-predict.cpp ffm.o timer.o
|
17
|
+
$(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
|
18
|
+
|
19
|
+
ffm.o: ffm.cpp ffm.h timer.o
|
20
|
+
$(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
|
21
|
+
|
22
|
+
timer.o: timer.cpp timer.h
|
23
|
+
$(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
|
24
|
+
|
25
|
+
clean:
|
26
|
+
rm -f ffm-train ffm-predict ffm.o timer.o
|
@@ -0,0 +1,26 @@
|
|
1
|
+
CXX = cl.exe
|
2
|
+
CFLAGS = /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp
|
3
|
+
|
4
|
+
TARGET = windows
|
5
|
+
|
6
|
+
all: $(TARGET) $(TARGET)\ffm-train.exe $(TARGET)\ffm-predict.exe
|
7
|
+
|
8
|
+
$(TARGET)\ffm-predict.exe: ffm.h ffm-predict.cpp ffm.obj timer.obj
|
9
|
+
$(CXX) $(CFLAGS) ffm-predict.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-predict.exe
|
10
|
+
|
11
|
+
$(TARGET)\ffm-train.exe: ffm.h ffm-train.cpp ffm.obj timer.obj
|
12
|
+
$(CXX) $(CFLAGS) ffm-train.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-train.exe
|
13
|
+
|
14
|
+
ffm.obj: ffm.cpp ffm.h
|
15
|
+
$(CXX) $(CFLAGS) -c ffm.cpp
|
16
|
+
|
17
|
+
timer.obj: timer.cpp timer.h
|
18
|
+
$(CXX) $(CFLAGS) -c timer.cpp
|
19
|
+
|
20
|
+
.PHONY: $(TARGET)
|
21
|
+
$(TARGET):
|
22
|
+
-mkdir $(TARGET)
|
23
|
+
|
24
|
+
clean:
|
25
|
+
-erase /Q *.obj *.exe $(TARGET)\.
|
26
|
+
-rd $(TARGET)
|
@@ -0,0 +1,294 @@
|
|
1
|
+
Table of Contents
|
2
|
+
=================
|
3
|
+
|
4
|
+
- What is LIBFFM
|
5
|
+
- Overfitting and Early Stopping
|
6
|
+
- Installation
|
7
|
+
- Data Format
|
8
|
+
- Command Line Usage
|
9
|
+
- Examples
|
10
|
+
- OpenMP and SSE
|
11
|
+
- Building Windows Binaries
|
12
|
+
- FAQ
|
13
|
+
|
14
|
+
|
15
|
+
What is LIBFFM
|
16
|
+
==============
|
17
|
+
|
18
|
+
LIBFFM is a library for field-aware factorization machine (FFM).
|
19
|
+
|
20
|
+
Field-aware factorization machine is a effective model for CTR prediction. It has been used to win the top-3 positions
|
21
|
+
of following competitions:
|
22
|
+
|
23
|
+
* Criteo: https://www.kaggle.com/c/criteo-display-ad-challenge
|
24
|
+
|
25
|
+
* Avazu: https://www.kaggle.com/c/avazu-ctr-prediction
|
26
|
+
|
27
|
+
* Outbrain: https://www.kaggle.com/c/outbrain-click-prediction
|
28
|
+
|
29
|
+
* RecSys 2015: http://dl.acm.org/citation.cfm?id=2813511&dl=ACM&coll=DL&CFID=941880276&CFTOKEN=60022934
|
30
|
+
|
31
|
+
You can find more information about FFM in the following paper / slides:
|
32
|
+
|
33
|
+
* http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
|
34
|
+
|
35
|
+
* http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf
|
36
|
+
|
37
|
+
* https://arxiv.org/abs/1701.04099
|
38
|
+
|
39
|
+
|
40
|
+
Overfitting and Early Stopping
|
41
|
+
==============================
|
42
|
+
|
43
|
+
FFM is prone to overfitting, and the solution we have so far is early stopping. See how FFM behaves on a certain data
|
44
|
+
set:
|
45
|
+
|
46
|
+
> ffm-train -p va.ffm -l 0.00002 tr.ffm
|
47
|
+
iter tr_logloss va_logloss
|
48
|
+
1 0.49738 0.48776
|
49
|
+
2 0.47383 0.47995
|
50
|
+
3 0.46366 0.47480
|
51
|
+
4 0.45561 0.47231
|
52
|
+
5 0.44810 0.47034
|
53
|
+
6 0.44037 0.47003
|
54
|
+
7 0.43239 0.46952
|
55
|
+
8 0.42362 0.46999
|
56
|
+
9 0.41394 0.47088
|
57
|
+
10 0.40326 0.47228
|
58
|
+
11 0.39156 0.47435
|
59
|
+
12 0.37886 0.47683
|
60
|
+
13 0.36522 0.47975
|
61
|
+
14 0.35079 0.48321
|
62
|
+
15 0.33578 0.48703
|
63
|
+
|
64
|
+
|
65
|
+
We see the best validation loss is achieved at 7th iteration. If we keep training, then overfitting begins. It is worth
|
66
|
+
noting that increasing regularization parameter do not help:
|
67
|
+
|
68
|
+
> ffm-train -p va.ffm -l 0.0002 -t 50 -s 12 tr.ffm
|
69
|
+
iter tr_logloss va_logloss
|
70
|
+
1 0.50532 0.49905
|
71
|
+
2 0.48782 0.49242
|
72
|
+
3 0.48136 0.48748
|
73
|
+
...
|
74
|
+
29 0.42183 0.47014
|
75
|
+
...
|
76
|
+
48 0.37071 0.47333
|
77
|
+
49 0.36767 0.47374
|
78
|
+
50 0.36472 0.47404
|
79
|
+
|
80
|
+
|
81
|
+
To avoid overfitting, we recommend always provide a validation set with option `-p.' You can use option `--auto-stop' to
|
82
|
+
stop at the iteration that reaches the best validation loss:
|
83
|
+
|
84
|
+
> ffm-train -p va.ffm -l 0.00002 --auto-stop tr.ffm
|
85
|
+
iter tr_logloss va_logloss
|
86
|
+
1 0.49738 0.48776
|
87
|
+
2 0.47383 0.47995
|
88
|
+
3 0.46366 0.47480
|
89
|
+
4 0.45561 0.47231
|
90
|
+
5 0.44810 0.47034
|
91
|
+
6 0.44037 0.47003
|
92
|
+
7 0.43239 0.46952
|
93
|
+
8 0.42362 0.46999
|
94
|
+
Auto-stop. Use model at 7th iteration.
|
95
|
+
|
96
|
+
|
97
|
+
Installation
|
98
|
+
============
|
99
|
+
|
100
|
+
Requirement: It requires a C++11 compatible compiler. We also use OpenMP to provide multi-threading. If OpenMP is not
|
101
|
+
available on your platform, please refer to section `OpenMP and SSE.'
|
102
|
+
|
103
|
+
- Unix-like systems:
|
104
|
+
Typeype `make' in the command line.
|
105
|
+
|
106
|
+
- Windows:
|
107
|
+
See `Building Windows Binaries' to compile.
|
108
|
+
|
109
|
+
|
110
|
+
|
111
|
+
Data Format
|
112
|
+
===========
|
113
|
+
|
114
|
+
The data format of LIBFFM is:
|
115
|
+
|
116
|
+
<label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ...
|
117
|
+
.
|
118
|
+
.
|
119
|
+
.
|
120
|
+
|
121
|
+
`field' and `feature' should be non-negative integers. See an example `bigdata.tr.txt.'
|
122
|
+
|
123
|
+
It is important to understand the difference between `field' and `feature'. For example, if we have a raw data like this:
|
124
|
+
|
125
|
+
Click Advertiser Publisher
|
126
|
+
===== ========== =========
|
127
|
+
0 Nike CNN
|
128
|
+
1 ESPN BBC
|
129
|
+
|
130
|
+
Here, we have
|
131
|
+
|
132
|
+
* 2 fields: Advertiser and Publisher
|
133
|
+
|
134
|
+
* 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC
|
135
|
+
|
136
|
+
Usually you will need to build two dictionares, one for field and one for features, like this:
|
137
|
+
|
138
|
+
DictField[Advertiser] -> 0
|
139
|
+
DictField[Publisher] -> 1
|
140
|
+
|
141
|
+
DictFeature[Advertiser-Nike] -> 0
|
142
|
+
DictFeature[Publisher-CNN] -> 1
|
143
|
+
DictFeature[Advertiser-ESPN] -> 2
|
144
|
+
DictFeature[Publisher-BBC] -> 3
|
145
|
+
|
146
|
+
Then, you can generate FFM format data:
|
147
|
+
|
148
|
+
0 0:0:1 1:1:1
|
149
|
+
1 0:2:1 1:3:1
|
150
|
+
|
151
|
+
Note that because these features are categorical, the values here are all ones.
|
152
|
+
|
153
|
+
|
154
|
+
Command Line Usage
|
155
|
+
==================
|
156
|
+
|
157
|
+
- `ffm-train'
|
158
|
+
|
159
|
+
usage: ffm-train [options] training_set_file [model_file]
|
160
|
+
|
161
|
+
options:
|
162
|
+
-l <lambda>: set regularization parameter (default 0.00002)
|
163
|
+
-k <factor>: set number of latent factors (default 4)
|
164
|
+
-t <iteration>: set number of iterations (default 15)
|
165
|
+
-r <eta>: set learning rate (default 0.2)
|
166
|
+
-s <nr_threads>: set number of threads (default 1)
|
167
|
+
-p <path>: set path to the validation set
|
168
|
+
--quiet: quiet model (no output)
|
169
|
+
--no-norm: disable instance-wise normalization
|
170
|
+
--auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)
|
171
|
+
|
172
|
+
By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
|
173
|
+
`--no-norm' to disable this function.
|
174
|
+
|
175
|
+
A binary file `training_set_file.bin' will be generated to store the data in binary format.
|
176
|
+
|
177
|
+
Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
|
178
|
+
the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
|
179
|
+
you use this option.
|
180
|
+
|
181
|
+
|
182
|
+
- `ffm-predict'
|
183
|
+
|
184
|
+
usage: ffm-predict test_file model_file output_file
|
185
|
+
|
186
|
+
|
187
|
+
|
188
|
+
Examples
|
189
|
+
========
|
190
|
+
|
191
|
+
Download a toy data from:
|
192
|
+
|
193
|
+
zip: https://drive.google.com/open?id=1HZX7zSQJy26hY4_PxSlOWz4x7O-tbQjt
|
194
|
+
|
195
|
+
tar.gz: https://drive.google.com/open?id=12-EczjiYGyJRQLH5ARy1MXRFbCvkgfPx
|
196
|
+
|
197
|
+
This dataset is subsampled 1% from Criteo's challenge.
|
198
|
+
|
199
|
+
> tar -xzf libffm_toy.tar.gz
|
200
|
+
|
201
|
+
or
|
202
|
+
|
203
|
+
> unzip libffm_toy.zip
|
204
|
+
|
205
|
+
|
206
|
+
> ./ffm-train -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
|
207
|
+
|
208
|
+
train a model using the default parameters
|
209
|
+
|
210
|
+
|
211
|
+
> ./ffm-predict libffm_toy/criteo.va.r100.gbdt0.ffm model output
|
212
|
+
|
213
|
+
do prediction
|
214
|
+
|
215
|
+
|
216
|
+
> ./ffm-train -l 0.0001 -k 15 -t 30 -r 0.05 -s 4 --auto-stop -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
|
217
|
+
|
218
|
+
train a model using the following parameters:
|
219
|
+
|
220
|
+
regularization cost = 0.0001
|
221
|
+
latent factors = 15
|
222
|
+
iterations = 30
|
223
|
+
learning rate = 0.3
|
224
|
+
threads = 4
|
225
|
+
let it auto-stop
|
226
|
+
|
227
|
+
|
228
|
+
OpenMP and SSE
|
229
|
+
==============
|
230
|
+
|
231
|
+
We use OpenMP to do parallelization. If OpenMP is not available on your
|
232
|
+
platform, then please comment out the following lines in Makefile.
|
233
|
+
|
234
|
+
DFLAG += -DUSEOMP
|
235
|
+
CXXFLAGS += -fopenmp
|
236
|
+
|
237
|
+
Note: Please run `make clean all' if these flags are changed.
|
238
|
+
|
239
|
+
We use SSE instructions to perform fast computation. If you do not want to use it, comment out the following line:
|
240
|
+
|
241
|
+
DFLAG += -DUSESSE
|
242
|
+
|
243
|
+
Then, run `make clean all'
|
244
|
+
|
245
|
+
|
246
|
+
|
247
|
+
Building Windows Binaries
|
248
|
+
=========================
|
249
|
+
|
250
|
+
The Windows part is maintained by different maintainer, so it may not always support the latest version.
|
251
|
+
|
252
|
+
The latest version it supports is: v1.21
|
253
|
+
|
254
|
+
To build them via command-line tools of Visual C++, use the following steps:
|
255
|
+
|
256
|
+
1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
|
257
|
+
variables of VC++ have not been set, type
|
258
|
+
|
259
|
+
"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"
|
260
|
+
|
261
|
+
You may have to modify the above command according which version of VC++ or
|
262
|
+
where it is installed.
|
263
|
+
|
264
|
+
2. Type
|
265
|
+
|
266
|
+
nmake -f Makefile.win clean all
|
267
|
+
|
268
|
+
|
269
|
+
FAQ
|
270
|
+
===
|
271
|
+
|
272
|
+
Q: Why I have the same model size when k = 1 and k = 4?
|
273
|
+
|
274
|
+
A: This is because we use SSE instructions. In order to use SSE, the memory need to be aligned. So even you assign k =
|
275
|
+
1, we still fill some dummy zeros from k = 2 to 4.
|
276
|
+
|
277
|
+
|
278
|
+
Q: Why the logloss is slightly different on the same data when I run the program two or more times when I use multi-threading
|
279
|
+
|
280
|
+
A: When there are more then one thread, the program becomes non-deterministic. To make it determinisitc you can only use one thread.
|
281
|
+
|
282
|
+
|
283
|
+
Contributors
|
284
|
+
============
|
285
|
+
|
286
|
+
Yuchin Juan, Wei-Sheng Chin, and Yong Zhuang
|
287
|
+
|
288
|
+
For questions, comments, feature requests, or bug report, please send your email to:
|
289
|
+
|
290
|
+
Yuchin Juan (juan.yuchin@gmail.com)
|
291
|
+
|
292
|
+
For Windows related questions, please send your email to:
|
293
|
+
|
294
|
+
Wei-Sheng Chin (image.chin@gmail.com)
|