libffm 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a207ec6c4b7eee7b8649dac4d6cd1a5b1213431c8b2de64c20ed080b5311adcb
4
+ data.tar.gz: 3d4e37041abe6ef1c25509130e8472029fb47fa52ae004b249c6cb430b80485e
5
+ SHA512:
6
+ metadata.gz: d4a0fee6252c9e6bd999404be493c8a5037f73b71ab579954de8366f5aead6ee19cfc568f3bee021a29cd42448daa34ce85d77d8203e3ce8d840c07ad90f5790
7
+ data.tar.gz: 4ec88cd2cd3959d7f5d579ccdf6b2d7861066c5d0f347f6fd1dd8cb297cd34a89076c49b28537f302b6ac594565837057925b494a001cc6b7d1f4d6230077d6d
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2020-11-28)
2
+
3
+ - First release
@@ -0,0 +1,32 @@
1
+
2
+ Copyright (c) 2017 The LIBFFM Project.
3
+ Copyright (c) 2020 Andrew Kane.
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions
8
+ are met:
9
+
10
+ 1. Redistributions of source code must retain the above copyright
11
+ notice, this list of conditions and the following disclaimer.
12
+
13
+ 2. Redistributions in binary form must reproduce the above copyright
14
+ notice, this list of conditions and the following disclaimer in the
15
+ documentation and/or other materials provided with the distribution.
16
+
17
+ 3. Neither name of copyright holders nor the names of its contributors
18
+ may be used to endorse or promote products derived from this software
19
+ without specific prior written permission.
20
+
21
+
22
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
23
+ ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
24
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
25
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
26
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
27
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
28
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
29
+ PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
30
+ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
31
+ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
32
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,91 @@
1
+ # LIBFFM
2
+
3
+ [LIBFFM](https://github.com/ycjuan/libffm) - field-aware factorization machines - for Ruby
4
+
5
+ [![Build Status](https://github.com/ankane/libffm/workflows/build/badge.svg?branch=master)](https://github.com/ankane/libffm/actions)
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application’s Gemfile:
10
+
11
+ ```ruby
12
+ gem 'libffm'
13
+ ```
14
+
15
+ ## Getting Started
16
+
17
+ Prep your data in LIBFFM format
18
+
19
+ ```txt
20
+ 0 0:0:1 1:1:1
21
+ 1 0:2:1 1:3:1
22
+ ```
23
+
24
+ Create a model
25
+
26
+ ```ruby
27
+ model = Libffm::Model.new
28
+ model.fit("train.txt")
29
+ ```
30
+
31
+ Make predictions
32
+
33
+ ```ruby
34
+ model.predict("test.txt")
35
+ ```
36
+
37
+ Save the model to a file
38
+
39
+ ```ruby
40
+ model.save_model("model.bin")
41
+ ```
42
+
43
+ Load the model from a file
44
+
45
+ ```ruby
46
+ model.load_model("model.bin")
47
+ ```
48
+
49
+ Pass a validation set
50
+
51
+ ```ruby
52
+ model.fit("train.txt", eval_set: "validation.txt")
53
+ ```
54
+
55
+ ## Parameters
56
+
57
+ Pass parameters - default values below
58
+
59
+ ```ruby
60
+ Libffm::Model.new(
61
+ eta: 0.2, # learning rate
62
+ lambda: 0.00002, # regularization parameter
63
+ nr_iters: 15, # number of iterations
64
+ k: 4, # number of latent factors
65
+ normalization: true, # use instance-wise normalization
66
+ auto_stop: false # stop at the iteration that achieves the best validation loss
67
+ )
68
+ ```
69
+
70
+ ## History
71
+
72
+ View the [changelog](https://github.com/ankane/libffm/blob/master/CHANGELOG.md)
73
+
74
+ ## Contributing
75
+
76
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
77
+
78
+ - [Report bugs](https://github.com/ankane/libffm/issues)
79
+ - Fix bugs and [submit pull requests](https://github.com/ankane/libffm/pulls)
80
+ - Write, clarify, or fix documentation
81
+ - Suggest or add new features
82
+
83
+ To get started with development:
84
+
85
+ ```sh
86
+ git clone --recursive https://github.com/ankane/libffm.git
87
+ cd libffm
88
+ bundle install
89
+ bundle exec rake compile
90
+ bundle exec rake test
91
+ ```
@@ -0,0 +1,110 @@
1
+ // stdlib
2
+ #include <iostream>
3
+
4
+ // ffm
5
+ #include <ffm.h>
6
+
7
+ // rice
8
+ #include <rice/Array.hpp>
9
+ #include <rice/Class.hpp>
10
+ #include <rice/Module.hpp>
11
+ #include <rice/Object.hpp>
12
+
13
+ using Rice::Array;
14
+ using Rice::Class;
15
+ using Rice::Module;
16
+ using Rice::Object;
17
+ using Rice::String;
18
+ using Rice::define_module;
19
+ using Rice::define_module_under;
20
+ using Rice::define_class_under;
21
+
22
+ extern "C"
23
+ void Init_ext()
24
+ {
25
+ Module rb_mLibffm = define_module("Libffm");
26
+
27
+ Module rb_mExt = define_module_under(rb_mLibffm, "Ext");
28
+ define_class_under<ffm::ffm_model>(rb_mExt, "Model");
29
+
30
+ rb_mExt
31
+ .define_singleton_method(
32
+ "release_model",
33
+ *[](ffm::ffm_model& model) {
34
+ model.release();
35
+ })
36
+ .define_singleton_method(
37
+ "fit",
38
+ *[](std::string tr_path, std::string va_path, std::string tmp_prefix, ffm::ffm_float eta, ffm::ffm_float lambda, ffm::ffm_int nr_iters, ffm::ffm_int k, bool normalization, bool auto_stop) {
39
+ // quiet
40
+ ffm::cout.setstate(ffm::ios_base::badbit);
41
+
42
+ std::string tr_bin_path = tmp_prefix + "train.bin";
43
+ ffm::ffm_read_problem_to_disk(tr_path, tr_bin_path);
44
+
45
+ std::string va_bin_path = "";
46
+ if (va_path.size() > 0) {
47
+ va_bin_path = tmp_prefix + "validation.bin";
48
+ ffm::ffm_read_problem_to_disk(va_path, va_bin_path);
49
+ }
50
+
51
+ ffm::ffm_parameter param;
52
+ param.eta = eta;
53
+ param.lambda = lambda;
54
+ param.nr_iters = nr_iters;
55
+ param.k = k;
56
+ param.normalization = normalization;
57
+ param.auto_stop = auto_stop;
58
+
59
+ return ffm::ffm_train_on_disk(tr_bin_path, va_bin_path, param);
60
+ })
61
+ .define_singleton_method(
62
+ "predict",
63
+ *[](ffm::ffm_model& model, std::string test_path) {
64
+ int const kMaxLineSize = 1000000;
65
+
66
+ FILE *f_in = fopen(test_path.c_str(), "r");
67
+ char line[kMaxLineSize];
68
+
69
+ ffm::vector<ffm::ffm_node> x;
70
+ ffm::ffm_int i = 0;
71
+
72
+ Array ret;
73
+ for(; fgets(line, kMaxLineSize, f_in) != nullptr; i++) {
74
+ x.clear();
75
+ strtok(line, " \t");
76
+
77
+ while(true) {
78
+ char *field_char = strtok(nullptr,":");
79
+ char *idx_char = strtok(nullptr,":");
80
+ char *value_char = strtok(nullptr," \t");
81
+ if(field_char == nullptr || *field_char == '\n')
82
+ break;
83
+
84
+ ffm::ffm_node N;
85
+ N.f = atoi(field_char);
86
+ N.j = atoi(idx_char);
87
+ N.v = atof(value_char);
88
+
89
+ x.push_back(N);
90
+ }
91
+
92
+ ffm::ffm_float y_bar = ffm::ffm_predict(x.data(), x.data()+x.size(), model);
93
+ ret.push(y_bar);
94
+ }
95
+
96
+ fclose(f_in);
97
+
98
+ return ret;
99
+ })
100
+ .define_singleton_method(
101
+ "save_model",
102
+ *[](ffm::ffm_model& model, std::string path) {
103
+ ffm::ffm_save_model(model, path);
104
+ })
105
+ .define_singleton_method(
106
+ "load_model",
107
+ *[](std::string path) {
108
+ return ffm::ffm_load_model(path);
109
+ });
110
+ }
@@ -0,0 +1,21 @@
1
+ require "mkmf-rice"
2
+
3
+ $CXXFLAGS += " -std=c++11 -DUSESSE"
4
+
5
+ apple_clang = RbConfig::CONFIG["CC_VERSION_MESSAGE"] =~ /apple clang/i
6
+
7
+ # check omp first
8
+ if have_library("omp") || have_library("gomp")
9
+ $CXXFLAGS += " -DUSEOMP"
10
+ $CXXFLAGS += " -Xclang" if apple_clang
11
+ $CXXFLAGS += " -fopenmp"
12
+ end
13
+
14
+ ext = File.expand_path(".", __dir__)
15
+ libffm = File.expand_path("../../vendor/libffm", __dir__)
16
+
17
+ $srcs = Dir["#{ext}/*.cpp", "#{libffm}/{ffm,timer}.cpp"]
18
+ $INCFLAGS += " -I#{libffm}"
19
+ $VPATH << libffm
20
+
21
+ create_makefile("libffm/ext")
@@ -0,0 +1,13 @@
1
+ # ext
2
+ require "libffm/ext"
3
+
4
+ # stdlib
5
+ require "tmpdir"
6
+
7
+ # modules
8
+ require "libffm/model"
9
+ require "libffm/version"
10
+
11
+ module Libffm
12
+ class Error < StandardError; end
13
+ end
@@ -0,0 +1,45 @@
1
+ module Libffm
2
+ class Model
3
+ def initialize(eta: 0.2, lambda: 0.00002, nr_iters: 15, k: 4, normalization: true, auto_stop: false)
4
+ @eta = eta
5
+ @lambda = lambda
6
+ @nr_iters = nr_iters
7
+ @k = k
8
+ @normalization = normalization
9
+ @auto_stop = auto_stop
10
+
11
+ @model = nil
12
+ end
13
+
14
+ def fit(data, eval_set: nil)
15
+ Dir.mktmpdir do |dir|
16
+ @model = Ext.fit(data, eval_set || "", File.join(dir, ""), @eta, @lambda, @nr_iters, @k, @normalization, @auto_stop)
17
+ add_finalizer(@model)
18
+ end
19
+ end
20
+
21
+ def predict(data)
22
+ raise "Not fit" unless @model
23
+ Ext.predict(@model, data)
24
+ end
25
+
26
+ def save_model(path)
27
+ Ext.save_model(@model, path)
28
+ end
29
+
30
+ def load_model(path)
31
+ @model = Ext.load_model(path)
32
+ add_finalizer(@model)
33
+ end
34
+
35
+ private
36
+
37
+ def add_finalizer(model)
38
+ ObjectSpace.define_finalizer(model, self.class.finalize_model(model))
39
+ end
40
+
41
+ def self.finalize_model(model)
42
+ proc { Ext.finalize_model(model) }
43
+ end
44
+ end
45
+ end
@@ -0,0 +1,3 @@
1
+ module Libffm
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,31 @@
1
+
2
+ Copyright (c) 2017 The LIBFFM Project.
3
+ All rights reserved.
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions
7
+ are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright
10
+ notice, this list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright
13
+ notice, this list of conditions and the following disclaimer in the
14
+ documentation and/or other materials provided with the distribution.
15
+
16
+ 3. Neither name of copyright holders nor the names of its contributors
17
+ may be used to endorse or promote products derived from this software
18
+ without specific prior written permission.
19
+
20
+
21
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22
+ ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
25
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28
+ PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29
+ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30
+ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,26 @@
1
+ CXX = g++
2
+ CXXFLAGS = -Wall -O3 -std=c++0x -march=native
3
+
4
+ # comment the following flags if you do not want to SSE instructions
5
+ DFLAG += -DUSESSE
6
+
7
+ # comment the following flags if you do not want to use OpenMP
8
+ DFLAG += -DUSEOMP
9
+ CXXFLAGS += -fopenmp
10
+
11
+ all: ffm-train ffm-predict
12
+
13
+ ffm-train: ffm-train.cpp ffm.o timer.o
14
+ $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
15
+
16
+ ffm-predict: ffm-predict.cpp ffm.o timer.o
17
+ $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
18
+
19
+ ffm.o: ffm.cpp ffm.h timer.o
20
+ $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
21
+
22
+ timer.o: timer.cpp timer.h
23
+ $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
24
+
25
+ clean:
26
+ rm -f ffm-train ffm-predict ffm.o timer.o
@@ -0,0 +1,26 @@
1
+ CXX = cl.exe
2
+ CFLAGS = /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp
3
+
4
+ TARGET = windows
5
+
6
+ all: $(TARGET) $(TARGET)\ffm-train.exe $(TARGET)\ffm-predict.exe
7
+
8
+ $(TARGET)\ffm-predict.exe: ffm.h ffm-predict.cpp ffm.obj timer.obj
9
+ $(CXX) $(CFLAGS) ffm-predict.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-predict.exe
10
+
11
+ $(TARGET)\ffm-train.exe: ffm.h ffm-train.cpp ffm.obj timer.obj
12
+ $(CXX) $(CFLAGS) ffm-train.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-train.exe
13
+
14
+ ffm.obj: ffm.cpp ffm.h
15
+ $(CXX) $(CFLAGS) -c ffm.cpp
16
+
17
+ timer.obj: timer.cpp timer.h
18
+ $(CXX) $(CFLAGS) -c timer.cpp
19
+
20
+ .PHONY: $(TARGET)
21
+ $(TARGET):
22
+ -mkdir $(TARGET)
23
+
24
+ clean:
25
+ -erase /Q *.obj *.exe $(TARGET)\.
26
+ -rd $(TARGET)
@@ -0,0 +1,294 @@
1
+ Table of Contents
2
+ =================
3
+
4
+ - What is LIBFFM
5
+ - Overfitting and Early Stopping
6
+ - Installation
7
+ - Data Format
8
+ - Command Line Usage
9
+ - Examples
10
+ - OpenMP and SSE
11
+ - Building Windows Binaries
12
+ - FAQ
13
+
14
+
15
+ What is LIBFFM
16
+ ==============
17
+
18
+ LIBFFM is a library for field-aware factorization machine (FFM).
19
+
20
+ Field-aware factorization machine is a effective model for CTR prediction. It has been used to win the top-3 positions
21
+ of following competitions:
22
+
23
+ * Criteo: https://www.kaggle.com/c/criteo-display-ad-challenge
24
+
25
+ * Avazu: https://www.kaggle.com/c/avazu-ctr-prediction
26
+
27
+ * Outbrain: https://www.kaggle.com/c/outbrain-click-prediction
28
+
29
+ * RecSys 2015: http://dl.acm.org/citation.cfm?id=2813511&dl=ACM&coll=DL&CFID=941880276&CFTOKEN=60022934
30
+
31
+ You can find more information about FFM in the following paper / slides:
32
+
33
+ * http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
34
+
35
+ * http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf
36
+
37
+ * https://arxiv.org/abs/1701.04099
38
+
39
+
40
+ Overfitting and Early Stopping
41
+ ==============================
42
+
43
+ FFM is prone to overfitting, and the solution we have so far is early stopping. See how FFM behaves on a certain data
44
+ set:
45
+
46
+ > ffm-train -p va.ffm -l 0.00002 tr.ffm
47
+ iter tr_logloss va_logloss
48
+ 1 0.49738 0.48776
49
+ 2 0.47383 0.47995
50
+ 3 0.46366 0.47480
51
+ 4 0.45561 0.47231
52
+ 5 0.44810 0.47034
53
+ 6 0.44037 0.47003
54
+ 7 0.43239 0.46952
55
+ 8 0.42362 0.46999
56
+ 9 0.41394 0.47088
57
+ 10 0.40326 0.47228
58
+ 11 0.39156 0.47435
59
+ 12 0.37886 0.47683
60
+ 13 0.36522 0.47975
61
+ 14 0.35079 0.48321
62
+ 15 0.33578 0.48703
63
+
64
+
65
+ We see the best validation loss is achieved at 7th iteration. If we keep training, then overfitting begins. It is worth
66
+ noting that increasing regularization parameter do not help:
67
+
68
+ > ffm-train -p va.ffm -l 0.0002 -t 50 -s 12 tr.ffm
69
+ iter tr_logloss va_logloss
70
+ 1 0.50532 0.49905
71
+ 2 0.48782 0.49242
72
+ 3 0.48136 0.48748
73
+ ...
74
+ 29 0.42183 0.47014
75
+ ...
76
+ 48 0.37071 0.47333
77
+ 49 0.36767 0.47374
78
+ 50 0.36472 0.47404
79
+
80
+
81
+ To avoid overfitting, we recommend always provide a validation set with option `-p.' You can use option `--auto-stop' to
82
+ stop at the iteration that reaches the best validation loss:
83
+
84
+ > ffm-train -p va.ffm -l 0.00002 --auto-stop tr.ffm
85
+ iter tr_logloss va_logloss
86
+ 1 0.49738 0.48776
87
+ 2 0.47383 0.47995
88
+ 3 0.46366 0.47480
89
+ 4 0.45561 0.47231
90
+ 5 0.44810 0.47034
91
+ 6 0.44037 0.47003
92
+ 7 0.43239 0.46952
93
+ 8 0.42362 0.46999
94
+ Auto-stop. Use model at 7th iteration.
95
+
96
+
97
+ Installation
98
+ ============
99
+
100
+ Requirement: It requires a C++11 compatible compiler. We also use OpenMP to provide multi-threading. If OpenMP is not
101
+ available on your platform, please refer to section `OpenMP and SSE.'
102
+
103
+ - Unix-like systems:
104
+ Typeype `make' in the command line.
105
+
106
+ - Windows:
107
+ See `Building Windows Binaries' to compile.
108
+
109
+
110
+
111
+ Data Format
112
+ ===========
113
+
114
+ The data format of LIBFFM is:
115
+
116
+ <label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ...
117
+ .
118
+ .
119
+ .
120
+
121
+ `field' and `feature' should be non-negative integers. See an example `bigdata.tr.txt.'
122
+
123
+ It is important to understand the difference between `field' and `feature'. For example, if we have a raw data like this:
124
+
125
+ Click Advertiser Publisher
126
+ ===== ========== =========
127
+ 0 Nike CNN
128
+ 1 ESPN BBC
129
+
130
+ Here, we have
131
+
132
+ * 2 fields: Advertiser and Publisher
133
+
134
+ * 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC
135
+
136
+ Usually you will need to build two dictionares, one for field and one for features, like this:
137
+
138
+ DictField[Advertiser] -> 0
139
+ DictField[Publisher] -> 1
140
+
141
+ DictFeature[Advertiser-Nike] -> 0
142
+ DictFeature[Publisher-CNN] -> 1
143
+ DictFeature[Advertiser-ESPN] -> 2
144
+ DictFeature[Publisher-BBC] -> 3
145
+
146
+ Then, you can generate FFM format data:
147
+
148
+ 0 0:0:1 1:1:1
149
+ 1 0:2:1 1:3:1
150
+
151
+ Note that because these features are categorical, the values here are all ones.
152
+
153
+
154
+ Command Line Usage
155
+ ==================
156
+
157
+ - `ffm-train'
158
+
159
+ usage: ffm-train [options] training_set_file [model_file]
160
+
161
+ options:
162
+ -l <lambda>: set regularization parameter (default 0.00002)
163
+ -k <factor>: set number of latent factors (default 4)
164
+ -t <iteration>: set number of iterations (default 15)
165
+ -r <eta>: set learning rate (default 0.2)
166
+ -s <nr_threads>: set number of threads (default 1)
167
+ -p <path>: set path to the validation set
168
+ --quiet: quiet model (no output)
169
+ --no-norm: disable instance-wise normalization
170
+ --auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)
171
+
172
+ By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
173
+ `--no-norm' to disable this function.
174
+
175
+ A binary file `training_set_file.bin' will be generated to store the data in binary format.
176
+
177
+ Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
178
+ the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
179
+ you use this option.
180
+
181
+
182
+ - `ffm-predict'
183
+
184
+ usage: ffm-predict test_file model_file output_file
185
+
186
+
187
+
188
+ Examples
189
+ ========
190
+
191
+ Download a toy data from:
192
+
193
+ zip: https://drive.google.com/open?id=1HZX7zSQJy26hY4_PxSlOWz4x7O-tbQjt
194
+
195
+ tar.gz: https://drive.google.com/open?id=12-EczjiYGyJRQLH5ARy1MXRFbCvkgfPx
196
+
197
+ This dataset is subsampled 1% from Criteo's challenge.
198
+
199
+ > tar -xzf libffm_toy.tar.gz
200
+
201
+ or
202
+
203
+ > unzip libffm_toy.zip
204
+
205
+
206
+ > ./ffm-train -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
207
+
208
+ train a model using the default parameters
209
+
210
+
211
+ > ./ffm-predict libffm_toy/criteo.va.r100.gbdt0.ffm model output
212
+
213
+ do prediction
214
+
215
+
216
+ > ./ffm-train -l 0.0001 -k 15 -t 30 -r 0.05 -s 4 --auto-stop -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
217
+
218
+ train a model using the following parameters:
219
+
220
+ regularization cost = 0.0001
221
+ latent factors = 15
222
+ iterations = 30
223
+ learning rate = 0.3
224
+ threads = 4
225
+ let it auto-stop
226
+
227
+
228
+ OpenMP and SSE
229
+ ==============
230
+
231
+ We use OpenMP to do parallelization. If OpenMP is not available on your
232
+ platform, then please comment out the following lines in Makefile.
233
+
234
+ DFLAG += -DUSEOMP
235
+ CXXFLAGS += -fopenmp
236
+
237
+ Note: Please run `make clean all' if these flags are changed.
238
+
239
+ We use SSE instructions to perform fast computation. If you do not want to use it, comment out the following line:
240
+
241
+ DFLAG += -DUSESSE
242
+
243
+ Then, run `make clean all'
244
+
245
+
246
+
247
+ Building Windows Binaries
248
+ =========================
249
+
250
+ The Windows part is maintained by different maintainer, so it may not always support the latest version.
251
+
252
+ The latest version it supports is: v1.21
253
+
254
+ To build them via command-line tools of Visual C++, use the following steps:
255
+
256
+ 1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
257
+ variables of VC++ have not been set, type
258
+
259
+ "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"
260
+
261
+ You may have to modify the above command according which version of VC++ or
262
+ where it is installed.
263
+
264
+ 2. Type
265
+
266
+ nmake -f Makefile.win clean all
267
+
268
+
269
+ FAQ
270
+ ===
271
+
272
+ Q: Why I have the same model size when k = 1 and k = 4?
273
+
274
+ A: This is because we use SSE instructions. In order to use SSE, the memory need to be aligned. So even you assign k =
275
+ 1, we still fill some dummy zeros from k = 2 to 4.
276
+
277
+
278
+ Q: Why the logloss is slightly different on the same data when I run the program two or more times when I use multi-threading
279
+
280
+ A: When there are more then one thread, the program becomes non-deterministic. To make it determinisitc you can only use one thread.
281
+
282
+
283
+ Contributors
284
+ ============
285
+
286
+ Yuchin Juan, Wei-Sheng Chin, and Yong Zhuang
287
+
288
+ For questions, comments, feature requests, or bug report, please send your email to:
289
+
290
+ Yuchin Juan (juan.yuchin@gmail.com)
291
+
292
+ For Windows related questions, please send your email to:
293
+
294
+ Wei-Sheng Chin (image.chin@gmail.com)