libffm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a207ec6c4b7eee7b8649dac4d6cd1a5b1213431c8b2de64c20ed080b5311adcb
4
+ data.tar.gz: 3d4e37041abe6ef1c25509130e8472029fb47fa52ae004b249c6cb430b80485e
5
+ SHA512:
6
+ metadata.gz: d4a0fee6252c9e6bd999404be493c8a5037f73b71ab579954de8366f5aead6ee19cfc568f3bee021a29cd42448daa34ce85d77d8203e3ce8d840c07ad90f5790
7
+ data.tar.gz: 4ec88cd2cd3959d7f5d579ccdf6b2d7861066c5d0f347f6fd1dd8cb297cd34a89076c49b28537f302b6ac594565837057925b494a001cc6b7d1f4d6230077d6d
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2020-11-28)
2
+
3
+ - First release
@@ -0,0 +1,32 @@
1
+
2
+ Copyright (c) 2017 The LIBFFM Project.
3
+ Copyright (c) 2020 Andrew Kane.
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions
8
+ are met:
9
+
10
+ 1. Redistributions of source code must retain the above copyright
11
+ notice, this list of conditions and the following disclaimer.
12
+
13
+ 2. Redistributions in binary form must reproduce the above copyright
14
+ notice, this list of conditions and the following disclaimer in the
15
+ documentation and/or other materials provided with the distribution.
16
+
17
+ 3. Neither name of copyright holders nor the names of its contributors
18
+ may be used to endorse or promote products derived from this software
19
+ without specific prior written permission.
20
+
21
+
22
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
23
+ ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
24
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
25
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
26
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
27
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
28
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
29
+ PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
30
+ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
31
+ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
32
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,91 @@
1
+ # LIBFFM
2
+
3
+ [LIBFFM](https://github.com/ycjuan/libffm) - field-aware factorization machines - for Ruby
4
+
5
+ [![Build Status](https://github.com/ankane/libffm/workflows/build/badge.svg?branch=master)](https://github.com/ankane/libffm/actions)
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application’s Gemfile:
10
+
11
+ ```ruby
12
+ gem 'libffm'
13
+ ```
14
+
15
+ ## Getting Started
16
+
17
+ Prep your data in LIBFFM format
18
+
19
+ ```txt
20
+ 0 0:0:1 1:1:1
21
+ 1 0:2:1 1:3:1
22
+ ```
23
+
24
+ Create a model
25
+
26
+ ```ruby
27
+ model = Libffm::Model.new
28
+ model.fit("train.txt")
29
+ ```
30
+
31
+ Make predictions
32
+
33
+ ```ruby
34
+ model.predict("test.txt")
35
+ ```
36
+
37
+ Save the model to a file
38
+
39
+ ```ruby
40
+ model.save_model("model.bin")
41
+ ```
42
+
43
+ Load the model from a file
44
+
45
+ ```ruby
46
+ model.load_model("model.bin")
47
+ ```
48
+
49
+ Pass a validation set
50
+
51
+ ```ruby
52
+ model.fit("train.txt", eval_set: "validation.txt")
53
+ ```
54
+
55
+ ## Parameters
56
+
57
+ Pass parameters - default values below
58
+
59
+ ```ruby
60
+ Libffm::Model.new(
61
+ eta: 0.2, # learning rate
62
+ lambda: 0.00002, # regularization parameter
63
+ nr_iters: 15, # number of iterations
64
+ k: 4, # number of latent factors
65
+ normalization: true, # use instance-wise normalization
66
+ auto_stop: false # stop at the iteration that achieves the best validation loss
67
+ )
68
+ ```
69
+
70
+ ## History
71
+
72
+ View the [changelog](https://github.com/ankane/libffm/blob/master/CHANGELOG.md)
73
+
74
+ ## Contributing
75
+
76
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
77
+
78
+ - [Report bugs](https://github.com/ankane/libffm/issues)
79
+ - Fix bugs and [submit pull requests](https://github.com/ankane/libffm/pulls)
80
+ - Write, clarify, or fix documentation
81
+ - Suggest or add new features
82
+
83
+ To get started with development:
84
+
85
+ ```sh
86
+ git clone --recursive https://github.com/ankane/libffm.git
87
+ cd libffm
88
+ bundle install
89
+ bundle exec rake compile
90
+ bundle exec rake test
91
+ ```
@@ -0,0 +1,110 @@
1
+ // stdlib
2
+ #include <iostream>
3
+
4
+ // ffm
5
+ #include <ffm.h>
6
+
7
+ // rice
8
+ #include <rice/Array.hpp>
9
+ #include <rice/Class.hpp>
10
+ #include <rice/Module.hpp>
11
+ #include <rice/Object.hpp>
12
+
13
+ using Rice::Array;
14
+ using Rice::Class;
15
+ using Rice::Module;
16
+ using Rice::Object;
17
+ using Rice::String;
18
+ using Rice::define_module;
19
+ using Rice::define_module_under;
20
+ using Rice::define_class_under;
21
+
22
+ extern "C"
23
+ void Init_ext()
24
+ {
25
+ Module rb_mLibffm = define_module("Libffm");
26
+
27
+ Module rb_mExt = define_module_under(rb_mLibffm, "Ext");
28
+ define_class_under<ffm::ffm_model>(rb_mExt, "Model");
29
+
30
+ rb_mExt
31
+ .define_singleton_method(
32
+ "release_model",
33
+ *[](ffm::ffm_model& model) {
34
+ model.release();
35
+ })
36
+ .define_singleton_method(
37
+ "fit",
38
+ *[](std::string tr_path, std::string va_path, std::string tmp_prefix, ffm::ffm_float eta, ffm::ffm_float lambda, ffm::ffm_int nr_iters, ffm::ffm_int k, bool normalization, bool auto_stop) {
39
+ // quiet
40
+ ffm::cout.setstate(ffm::ios_base::badbit);
41
+
42
+ std::string tr_bin_path = tmp_prefix + "train.bin";
43
+ ffm::ffm_read_problem_to_disk(tr_path, tr_bin_path);
44
+
45
+ std::string va_bin_path = "";
46
+ if (va_path.size() > 0) {
47
+ va_bin_path = tmp_prefix + "validation.bin";
48
+ ffm::ffm_read_problem_to_disk(va_path, va_bin_path);
49
+ }
50
+
51
+ ffm::ffm_parameter param;
52
+ param.eta = eta;
53
+ param.lambda = lambda;
54
+ param.nr_iters = nr_iters;
55
+ param.k = k;
56
+ param.normalization = normalization;
57
+ param.auto_stop = auto_stop;
58
+
59
+ return ffm::ffm_train_on_disk(tr_bin_path, va_bin_path, param);
60
+ })
61
+ .define_singleton_method(
62
+ "predict",
63
+ *[](ffm::ffm_model& model, std::string test_path) {
64
+ int const kMaxLineSize = 1000000;
65
+
66
+ FILE *f_in = fopen(test_path.c_str(), "r");
67
+ char line[kMaxLineSize];
68
+
69
+ ffm::vector<ffm::ffm_node> x;
70
+ ffm::ffm_int i = 0;
71
+
72
+ Array ret;
73
+ for(; fgets(line, kMaxLineSize, f_in) != nullptr; i++) {
74
+ x.clear();
75
+ strtok(line, " \t");
76
+
77
+ while(true) {
78
+ char *field_char = strtok(nullptr,":");
79
+ char *idx_char = strtok(nullptr,":");
80
+ char *value_char = strtok(nullptr," \t");
81
+ if(field_char == nullptr || *field_char == '\n')
82
+ break;
83
+
84
+ ffm::ffm_node N;
85
+ N.f = atoi(field_char);
86
+ N.j = atoi(idx_char);
87
+ N.v = atof(value_char);
88
+
89
+ x.push_back(N);
90
+ }
91
+
92
+ ffm::ffm_float y_bar = ffm::ffm_predict(x.data(), x.data()+x.size(), model);
93
+ ret.push(y_bar);
94
+ }
95
+
96
+ fclose(f_in);
97
+
98
+ return ret;
99
+ })
100
+ .define_singleton_method(
101
+ "save_model",
102
+ *[](ffm::ffm_model& model, std::string path) {
103
+ ffm::ffm_save_model(model, path);
104
+ })
105
+ .define_singleton_method(
106
+ "load_model",
107
+ *[](std::string path) {
108
+ return ffm::ffm_load_model(path);
109
+ });
110
+ }
@@ -0,0 +1,21 @@
1
+ require "mkmf-rice"
2
+
3
+ $CXXFLAGS += " -std=c++11 -DUSESSE"
4
+
5
+ apple_clang = RbConfig::CONFIG["CC_VERSION_MESSAGE"] =~ /apple clang/i
6
+
7
+ # check omp first
8
+ if have_library("omp") || have_library("gomp")
9
+ $CXXFLAGS += " -DUSEOMP"
10
+ $CXXFLAGS += " -Xclang" if apple_clang
11
+ $CXXFLAGS += " -fopenmp"
12
+ end
13
+
14
+ ext = File.expand_path(".", __dir__)
15
+ libffm = File.expand_path("../../vendor/libffm", __dir__)
16
+
17
+ $srcs = Dir["#{ext}/*.cpp", "#{libffm}/{ffm,timer}.cpp"]
18
+ $INCFLAGS += " -I#{libffm}"
19
+ $VPATH << libffm
20
+
21
+ create_makefile("libffm/ext")
@@ -0,0 +1,13 @@
1
+ # ext
2
+ require "libffm/ext"
3
+
4
+ # stdlib
5
+ require "tmpdir"
6
+
7
+ # modules
8
+ require "libffm/model"
9
+ require "libffm/version"
10
+
11
+ module Libffm
12
+ class Error < StandardError; end
13
+ end
@@ -0,0 +1,45 @@
1
+ module Libffm
2
+ class Model
3
+ def initialize(eta: 0.2, lambda: 0.00002, nr_iters: 15, k: 4, normalization: true, auto_stop: false)
4
+ @eta = eta
5
+ @lambda = lambda
6
+ @nr_iters = nr_iters
7
+ @k = k
8
+ @normalization = normalization
9
+ @auto_stop = auto_stop
10
+
11
+ @model = nil
12
+ end
13
+
14
+ def fit(data, eval_set: nil)
15
+ Dir.mktmpdir do |dir|
16
+ @model = Ext.fit(data, eval_set || "", File.join(dir, ""), @eta, @lambda, @nr_iters, @k, @normalization, @auto_stop)
17
+ add_finalizer(@model)
18
+ end
19
+ end
20
+
21
+ def predict(data)
22
+ raise "Not fit" unless @model
23
+ Ext.predict(@model, data)
24
+ end
25
+
26
+ def save_model(path)
27
+ Ext.save_model(@model, path)
28
+ end
29
+
30
+ def load_model(path)
31
+ @model = Ext.load_model(path)
32
+ add_finalizer(@model)
33
+ end
34
+
35
+ private
36
+
37
+ def add_finalizer(model)
38
+ ObjectSpace.define_finalizer(model, self.class.finalize_model(model))
39
+ end
40
+
41
+ def self.finalize_model(model)
42
+ proc { Ext.finalize_model(model) }
43
+ end
44
+ end
45
+ end
@@ -0,0 +1,3 @@
1
+ module Libffm
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,31 @@
1
+
2
+ Copyright (c) 2017 The LIBFFM Project.
3
+ All rights reserved.
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions
7
+ are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright
10
+ notice, this list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright
13
+ notice, this list of conditions and the following disclaimer in the
14
+ documentation and/or other materials provided with the distribution.
15
+
16
+ 3. Neither name of copyright holders nor the names of its contributors
17
+ may be used to endorse or promote products derived from this software
18
+ without specific prior written permission.
19
+
20
+
21
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22
+ ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
25
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28
+ PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29
+ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30
+ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,26 @@
1
+ CXX = g++
2
+ CXXFLAGS = -Wall -O3 -std=c++0x -march=native
3
+
4
+ # comment the following flags if you do not want to SSE instructions
5
+ DFLAG += -DUSESSE
6
+
7
+ # comment the following flags if you do not want to use OpenMP
8
+ DFLAG += -DUSEOMP
9
+ CXXFLAGS += -fopenmp
10
+
11
+ all: ffm-train ffm-predict
12
+
13
+ ffm-train: ffm-train.cpp ffm.o timer.o
14
+ $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
15
+
16
+ ffm-predict: ffm-predict.cpp ffm.o timer.o
17
+ $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^
18
+
19
+ ffm.o: ffm.cpp ffm.h timer.o
20
+ $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
21
+
22
+ timer.o: timer.cpp timer.h
23
+ $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $<
24
+
25
+ clean:
26
+ rm -f ffm-train ffm-predict ffm.o timer.o
@@ -0,0 +1,26 @@
1
+ CXX = cl.exe
2
+ CFLAGS = /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp
3
+
4
+ TARGET = windows
5
+
6
+ all: $(TARGET) $(TARGET)\ffm-train.exe $(TARGET)\ffm-predict.exe
7
+
8
+ $(TARGET)\ffm-predict.exe: ffm.h ffm-predict.cpp ffm.obj timer.obj
9
+ $(CXX) $(CFLAGS) ffm-predict.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-predict.exe
10
+
11
+ $(TARGET)\ffm-train.exe: ffm.h ffm-train.cpp ffm.obj timer.obj
12
+ $(CXX) $(CFLAGS) ffm-train.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-train.exe
13
+
14
+ ffm.obj: ffm.cpp ffm.h
15
+ $(CXX) $(CFLAGS) -c ffm.cpp
16
+
17
+ timer.obj: timer.cpp timer.h
18
+ $(CXX) $(CFLAGS) -c timer.cpp
19
+
20
+ .PHONY: $(TARGET)
21
+ $(TARGET):
22
+ -mkdir $(TARGET)
23
+
24
+ clean:
25
+ -erase /Q *.obj *.exe $(TARGET)\.
26
+ -rd $(TARGET)
@@ -0,0 +1,294 @@
1
+ Table of Contents
2
+ =================
3
+
4
+ - What is LIBFFM
5
+ - Overfitting and Early Stopping
6
+ - Installation
7
+ - Data Format
8
+ - Command Line Usage
9
+ - Examples
10
+ - OpenMP and SSE
11
+ - Building Windows Binaries
12
+ - FAQ
13
+
14
+
15
+ What is LIBFFM
16
+ ==============
17
+
18
+ LIBFFM is a library for field-aware factorization machine (FFM).
19
+
20
+ Field-aware factorization machine is a effective model for CTR prediction. It has been used to win the top-3 positions
21
+ of following competitions:
22
+
23
+ * Criteo: https://www.kaggle.com/c/criteo-display-ad-challenge
24
+
25
+ * Avazu: https://www.kaggle.com/c/avazu-ctr-prediction
26
+
27
+ * Outbrain: https://www.kaggle.com/c/outbrain-click-prediction
28
+
29
+ * RecSys 2015: http://dl.acm.org/citation.cfm?id=2813511&dl=ACM&coll=DL&CFID=941880276&CFTOKEN=60022934
30
+
31
+ You can find more information about FFM in the following paper / slides:
32
+
33
+ * http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
34
+
35
+ * http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf
36
+
37
+ * https://arxiv.org/abs/1701.04099
38
+
39
+
40
+ Overfitting and Early Stopping
41
+ ==============================
42
+
43
+ FFM is prone to overfitting, and the solution we have so far is early stopping. See how FFM behaves on a certain data
44
+ set:
45
+
46
+ > ffm-train -p va.ffm -l 0.00002 tr.ffm
47
+ iter tr_logloss va_logloss
48
+ 1 0.49738 0.48776
49
+ 2 0.47383 0.47995
50
+ 3 0.46366 0.47480
51
+ 4 0.45561 0.47231
52
+ 5 0.44810 0.47034
53
+ 6 0.44037 0.47003
54
+ 7 0.43239 0.46952
55
+ 8 0.42362 0.46999
56
+ 9 0.41394 0.47088
57
+ 10 0.40326 0.47228
58
+ 11 0.39156 0.47435
59
+ 12 0.37886 0.47683
60
+ 13 0.36522 0.47975
61
+ 14 0.35079 0.48321
62
+ 15 0.33578 0.48703
63
+
64
+
65
+ We see the best validation loss is achieved at 7th iteration. If we keep training, then overfitting begins. It is worth
66
+ noting that increasing regularization parameter do not help:
67
+
68
+ > ffm-train -p va.ffm -l 0.0002 -t 50 -s 12 tr.ffm
69
+ iter tr_logloss va_logloss
70
+ 1 0.50532 0.49905
71
+ 2 0.48782 0.49242
72
+ 3 0.48136 0.48748
73
+ ...
74
+ 29 0.42183 0.47014
75
+ ...
76
+ 48 0.37071 0.47333
77
+ 49 0.36767 0.47374
78
+ 50 0.36472 0.47404
79
+
80
+
81
+ To avoid overfitting, we recommend always provide a validation set with option `-p.' You can use option `--auto-stop' to
82
+ stop at the iteration that reaches the best validation loss:
83
+
84
+ > ffm-train -p va.ffm -l 0.00002 --auto-stop tr.ffm
85
+ iter tr_logloss va_logloss
86
+ 1 0.49738 0.48776
87
+ 2 0.47383 0.47995
88
+ 3 0.46366 0.47480
89
+ 4 0.45561 0.47231
90
+ 5 0.44810 0.47034
91
+ 6 0.44037 0.47003
92
+ 7 0.43239 0.46952
93
+ 8 0.42362 0.46999
94
+ Auto-stop. Use model at 7th iteration.
95
+
96
+
97
+ Installation
98
+ ============
99
+
100
+ Requirement: It requires a C++11 compatible compiler. We also use OpenMP to provide multi-threading. If OpenMP is not
101
+ available on your platform, please refer to section `OpenMP and SSE.'
102
+
103
+ - Unix-like systems:
104
+ Typeype `make' in the command line.
105
+
106
+ - Windows:
107
+ See `Building Windows Binaries' to compile.
108
+
109
+
110
+
111
+ Data Format
112
+ ===========
113
+
114
+ The data format of LIBFFM is:
115
+
116
+ <label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ...
117
+ .
118
+ .
119
+ .
120
+
121
+ `field' and `feature' should be non-negative integers. See an example `bigdata.tr.txt.'
122
+
123
+ It is important to understand the difference between `field' and `feature'. For example, if we have a raw data like this:
124
+
125
+ Click Advertiser Publisher
126
+ ===== ========== =========
127
+ 0 Nike CNN
128
+ 1 ESPN BBC
129
+
130
+ Here, we have
131
+
132
+ * 2 fields: Advertiser and Publisher
133
+
134
+ * 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC
135
+
136
+ Usually you will need to build two dictionares, one for field and one for features, like this:
137
+
138
+ DictField[Advertiser] -> 0
139
+ DictField[Publisher] -> 1
140
+
141
+ DictFeature[Advertiser-Nike] -> 0
142
+ DictFeature[Publisher-CNN] -> 1
143
+ DictFeature[Advertiser-ESPN] -> 2
144
+ DictFeature[Publisher-BBC] -> 3
145
+
146
+ Then, you can generate FFM format data:
147
+
148
+ 0 0:0:1 1:1:1
149
+ 1 0:2:1 1:3:1
150
+
151
+ Note that because these features are categorical, the values here are all ones.
152
+
153
+
154
+ Command Line Usage
155
+ ==================
156
+
157
+ - `ffm-train'
158
+
159
+ usage: ffm-train [options] training_set_file [model_file]
160
+
161
+ options:
162
+ -l <lambda>: set regularization parameter (default 0.00002)
163
+ -k <factor>: set number of latent factors (default 4)
164
+ -t <iteration>: set number of iterations (default 15)
165
+ -r <eta>: set learning rate (default 0.2)
166
+ -s <nr_threads>: set number of threads (default 1)
167
+ -p <path>: set path to the validation set
168
+ --quiet: quiet model (no output)
169
+ --no-norm: disable instance-wise normalization
170
+ --auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)
171
+
172
+ By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
173
+ `--no-norm' to disable this function.
174
+
175
+ A binary file `training_set_file.bin' will be generated to store the data in binary format.
176
+
177
+ Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
178
+ the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
179
+ you use this option.
180
+
181
+
182
+ - `ffm-predict'
183
+
184
+ usage: ffm-predict test_file model_file output_file
185
+
186
+
187
+
188
+ Examples
189
+ ========
190
+
191
+ Download a toy data from:
192
+
193
+ zip: https://drive.google.com/open?id=1HZX7zSQJy26hY4_PxSlOWz4x7O-tbQjt
194
+
195
+ tar.gz: https://drive.google.com/open?id=12-EczjiYGyJRQLH5ARy1MXRFbCvkgfPx
196
+
197
+ This dataset is subsampled 1% from Criteo's challenge.
198
+
199
+ > tar -xzf libffm_toy.tar.gz
200
+
201
+ or
202
+
203
+ > unzip libffm_toy.zip
204
+
205
+
206
+ > ./ffm-train -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
207
+
208
+ train a model using the default parameters
209
+
210
+
211
+ > ./ffm-predict libffm_toy/criteo.va.r100.gbdt0.ffm model output
212
+
213
+ do prediction
214
+
215
+
216
+ > ./ffm-train -l 0.0001 -k 15 -t 30 -r 0.05 -s 4 --auto-stop -p libffm_toy/criteo.va.r100.gbdt0.ffm libffm_toy/criteo.tr.r100.gbdt0.ffm model
217
+
218
+ train a model using the following parameters:
219
+
220
+ regularization cost = 0.0001
221
+ latent factors = 15
222
+ iterations = 30
223
+ learning rate = 0.3
224
+ threads = 4
225
+ let it auto-stop
226
+
227
+
228
+ OpenMP and SSE
229
+ ==============
230
+
231
+ We use OpenMP to do parallelization. If OpenMP is not available on your
232
+ platform, then please comment out the following lines in Makefile.
233
+
234
+ DFLAG += -DUSEOMP
235
+ CXXFLAGS += -fopenmp
236
+
237
+ Note: Please run `make clean all' if these flags are changed.
238
+
239
+ We use SSE instructions to perform fast computation. If you do not want to use it, comment out the following line:
240
+
241
+ DFLAG += -DUSESSE
242
+
243
+ Then, run `make clean all'
244
+
245
+
246
+
247
+ Building Windows Binaries
248
+ =========================
249
+
250
+ The Windows part is maintained by different maintainer, so it may not always support the latest version.
251
+
252
+ The latest version it supports is: v1.21
253
+
254
+ To build them via command-line tools of Visual C++, use the following steps:
255
+
256
+ 1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
257
+ variables of VC++ have not been set, type
258
+
259
+ "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"
260
+
261
+ You may have to modify the above command according which version of VC++ or
262
+ where it is installed.
263
+
264
+ 2. Type
265
+
266
+ nmake -f Makefile.win clean all
267
+
268
+
269
+ FAQ
270
+ ===
271
+
272
+ Q: Why I have the same model size when k = 1 and k = 4?
273
+
274
+ A: This is because we use SSE instructions. In order to use SSE, the memory need to be aligned. So even you assign k =
275
+ 1, we still fill some dummy zeros from k = 2 to 4.
276
+
277
+
278
+ Q: Why the logloss is slightly different on the same data when I run the program two or more times when I use multi-threading
279
+
280
+ A: When there are more then one thread, the program becomes non-deterministic. To make it determinisitc you can only use one thread.
281
+
282
+
283
+ Contributors
284
+ ============
285
+
286
+ Yuchin Juan, Wei-Sheng Chin, and Yong Zhuang
287
+
288
+ For questions, comments, feature requests, or bug report, please send your email to:
289
+
290
+ Yuchin Juan (juan.yuchin@gmail.com)
291
+
292
+ For Windows related questions, please send your email to:
293
+
294
+ Wei-Sheng Chin (image.chin@gmail.com)