xlearn 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cbc492d4f4cb0de9c53cac0820251fc1f747de836348280dfa9d1b7e6475f745
4
- data.tar.gz: d5f1fcbbb10b96714c38fd9c0c924c98fb01dccf1872734e1084a31ad855503c
3
+ metadata.gz: 32722c5c1623a0f680dba1ca05e37105247a0163600da6ae2a5739173aa94066
4
+ data.tar.gz: 87938813a982897dcedcabd351292618d15faad53b69ae33f749b8dea7cf6bff
5
5
  SHA512:
6
- metadata.gz: '048e5915264ba2749e00a91b4c59a773efcd553cf7c764e56df0107e6dd08edcb7bcb86350181b217d095351ea79f47ba6daa84407069fcd9c445683ecdc22a8'
7
- data.tar.gz: 46831787724f8ec1d4063859445a0a9e6aefd2a38b4267032eb11ffcadf145ebbd37ab3babcf56599cf31f3f4143126b1663baf409099926860ce88c591adb75
6
+ metadata.gz: fdc680d913d7ca6100da8102d1b1fc173c9e354e99935eb0eff0e04a23f7cc1e63f5caf547a2c0afc503fb4521e519691cb866d2fabde504f1fa75e0de11238c
7
+ data.tar.gz: 85cd2ad8a5b6f984a5af7024481fc58a7602e47e16b91f013bb6297f335d8bf61273e266346056cb09f5310381e2ae72919c6ce27b1d7b702d5a29bfc53a5197
@@ -1,3 +1,11 @@
1
+ ## 0.1.1
2
+
3
+ - Added `cv` method
4
+ - Added `partial_fit` method
5
+ - Added `save_txt` method
6
+ - Added `bias_term`, `linear_term`, and `latent_factors` methods
7
+ - Added support for Daru and Numo
8
+
1
9
  ## 0.1.0
2
10
 
3
11
  - First release
data/README.md CHANGED
@@ -10,6 +10,8 @@ Supports:
10
10
  - Factorization machines
11
11
  - Field-aware factorization machines
12
12
 
13
+ [![Build Status](https://travis-ci.org/ankane/xlearn.svg?branch=master)](https://travis-ci.org/ankane/xlearn)
14
+
13
15
  ## Installation
14
16
 
15
17
  First, [install xLearn](https://xlearn-doc.readthedocs.io/en/latest/install/index.html). On Mac, copy `build/lib/libxlearn_api.dylib` to `/usr/local/lib`.
@@ -22,8 +24,6 @@ gem 'xlearn'
22
24
 
23
25
  ## Getting Started
24
26
 
25
- This library is modeled after the [Python Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html). Some methods are missing at the moment. PRs welcome!
26
-
27
27
  Prep your data
28
28
 
29
29
  ```ruby
@@ -58,41 +58,136 @@ Load the model from a file
58
58
  model.load_model("model.bin")
59
59
  ```
60
60
 
61
+ Save a text version of the model
62
+
63
+ ```ruby
64
+ model.save_txt("model.txt")
65
+ ```
66
+
67
+ Pass a validation set
68
+
69
+ ```ruby
70
+ model.fit(x_train, y_train, eval_set: [x_val, y_val])
71
+ ```
72
+
73
+ Train online
74
+
75
+ ```ruby
76
+ model.partial_fit(x_train, y_train)
77
+ ```
78
+
79
+ Get the bias term, linear term, and latent factors
80
+
81
+ ```ruby
82
+ model.bias_term
83
+ model.linear_term
84
+ model.latent_factors # fm and ffm only
85
+ ```
86
+
61
87
  ## Parameters
62
88
 
63
89
  Specify parameters
64
90
 
65
91
  ```ruby
66
- model = XLearn::FM.new(k: 20, epoch: 50)
92
+ model = XLearn::Linear.new(k: 20, epoch: 50)
67
93
  ```
68
94
 
69
95
  Supports the same parameters as [Python](https://xlearn-doc.readthedocs.io/en/latest/all_api/index.html)
70
96
 
71
- ## Validation
97
+ ## Cross-Validation
72
98
 
73
- Pass a validation set when fitting
99
+ Cross-validation
74
100
 
75
101
  ```ruby
76
- model.fit(x_train, y_train, eval_set: [x_val, y_val])
102
+ model.cv(x, y)
103
+ ```
104
+
105
+ Specify the number of folds
106
+
107
+ ```ruby
108
+ model.cv(x, y, folds: 5)
109
+ ```
110
+
111
+ ## Data
112
+
113
+ Data can be an array of arrays
114
+
115
+ ```ruby
116
+ [[1, 2, 3], [4, 5, 6]]
117
+ ```
118
+
119
+ Or a Daru data frame
120
+
121
+ ```ruby
122
+ Daru::DataFrame.from_csv("houses.csv")
123
+ ```
124
+
125
+ Or a Numo NArray
126
+
127
+ ```ruby
128
+ Numo::DFloat.new(3, 2).seq
77
129
  ```
78
130
 
79
131
  ## Performance
80
132
 
81
- For performance, you can read data directly from files
133
+ For large datasets, read data directly from files
82
134
 
83
135
  ```ruby
84
136
  model.fit("train.txt", eval_set: "validate.txt")
85
137
  model.predict("test.txt")
138
+ model.cv("train.txt")
139
+ ```
140
+
141
+ For linear models and factorization machines, use CSV:
142
+
143
+ ```txt
144
+ label,value_1,value_2,...,value_n
145
+ ```
146
+
147
+ Or the `libsvm` format (better for sparse data):
148
+
149
+ ```txt
150
+ label index_1:value_1 index_2:value_2 ... index_n:value_n
151
+ ```
152
+
153
+ > You can also use commas instead of spaces for separators
154
+
155
+ For field-aware factorization machines, use the `libffm` format:
156
+
157
+ ```txt
158
+ label field_1:index_1:value_1 field_2:index_2:value_2 ...
86
159
  ```
87
160
 
88
- [These formats](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html#choose-machine-learning-algorithm) are supported
161
+ > You can also use commas instead of spaces for separators
89
162
 
90
163
  You can also write predictions directly to a file
91
164
 
92
165
  ```ruby
93
- model.predict("test.txt", out_file: "predictions.txt")
166
+ model.predict("test.txt", out_path: "predictions.txt")
167
+ ```
168
+
169
+ ## xLearn Installation
170
+
171
+ There’s an experimental branch that includes xLearn with the gem for easiest installation.
172
+
173
+ ```ruby
174
+ gem 'xlearn', github: 'ankane/xlearn', branch: 'vendor', submodules: true
94
175
  ```
95
176
 
177
+ Please file an issue if it doesn’t work for you.
178
+
179
+ You can also specify the path to xLearn in an initializer:
180
+
181
+ ```ruby
182
+ XLearn.ffi_lib << "/path/to/xlearn/lib/libxlearn_api.so"
183
+ ```
184
+
185
+ > Use `libxlearn_api.dylib` for Mac and `xlearn_api.dll` for Windows
186
+
187
+ ## Credits
188
+
189
+ This library is modeled after xLearn’s [Scikit-learn API](https://xlearn-doc.readthedocs.io/en/latest/python_api/index.html).
190
+
96
191
  ## History
97
192
 
98
193
  View the [changelog](https://github.com/ankane/xlearn/blob/master/CHANGELOG.md)
@@ -5,14 +5,30 @@ module XLearn
5
5
  def initialize(data, label: nil)
6
6
  @handle = ::FFI::MemoryPointer.new(:pointer)
7
7
 
8
- nrow = data.count
9
- ncol = data.first.count
8
+ if matrix?(data)
9
+ nrow = data.row_count
10
+ ncol = data.column_count
11
+ flat_data = data.to_a.flatten
12
+ elsif daru?(data)
13
+ nrow, ncol = data.shape
14
+ flat_data = data.map_rows(&:to_a).flatten
15
+ elsif narray?(data)
16
+ nrow, ncol = data.shape
17
+ # TODO convert to SFloat and pass pointer
18
+ # for better performance
19
+ flat_data = data.flatten.to_a
20
+ else
21
+ nrow = data.count
22
+ ncol = data.first.count
23
+ flat_data = data.flatten
24
+ end
10
25
 
11
- c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
12
- c_data.put_array_of_float(0, data.flatten)
26
+ c_data = ::FFI::MemoryPointer.new(:float, flat_data.size)
27
+ c_data.put_array_of_float(0, flat_data)
13
28
 
14
29
  if label
15
- c_label = ::FFI::MemoryPointer.new(:float, nrow)
30
+ label = label.to_a
31
+ c_label = ::FFI::MemoryPointer.new(:float, label.size)
16
32
  c_label.put_array_of_float(0, label)
17
33
  end
18
34
 
@@ -31,5 +47,19 @@ module XLearn
31
47
  # must use proc instead of stabby lambda
32
48
  proc { FFI.XlearnDataFree(pointer) }
33
49
  end
50
+
51
+ private
52
+
53
+ def matrix?(data)
54
+ defined?(Matrix) && data.is_a?(Matrix)
55
+ end
56
+
57
+ def daru?(data)
58
+ defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
59
+ end
60
+
61
+ def narray?(data)
62
+ defined?(Numo::NArray) && data.is_a?(Numo::NArray)
63
+ end
34
64
  end
35
65
  end
@@ -4,5 +4,24 @@ module XLearn
4
4
  @model_type = "ffm"
5
5
  super
6
6
  end
7
+
8
+ # shape is [i, j, k]
9
+ # for v_{i}_{j}
10
+ def latent_factors
11
+ factor = []
12
+ current = -1
13
+ read_txt do |line|
14
+ if line.start_with?("v_")
15
+ parts = line.split(": ")
16
+ i = parts.first.split("_")[1].to_i
17
+ if i != current
18
+ factor << []
19
+ current = i
20
+ end
21
+ factor.last << parts.last.split(" ").map(&:to_f)
22
+ end
23
+ end
24
+ factor
25
+ end
7
26
  end
8
27
  end
@@ -4,5 +4,17 @@ module XLearn
4
4
  @model_type = "fm"
5
5
  super
6
6
  end
7
+
8
+ # shape is [i, k]
9
+ # for v_{i}
10
+ def latent_factors
11
+ factor = []
12
+ read_txt do |line|
13
+ if line.start_with?("v_")
14
+ factor << line.split(": ").last.split(" ").map(&:to_f)
15
+ end
16
+ end
17
+ factor
18
+ end
7
19
  end
8
20
  end
@@ -20,14 +20,14 @@ module XLearn
20
20
  end
21
21
 
22
22
  def fit(x, y = nil, eval_set: nil)
23
- if x.is_a?(String)
24
- check_call FFI.XLearnSetTrain(@handle, x)
25
- check_call FFI.XLearnSetBool(@handle, "from_file", true)
26
- else
27
- train_set = DMatrix.new(x, label: y)
28
- check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
29
- check_call FFI.XLearnSetBool(@handle, "from_file", false)
30
- end
23
+ @model_path = nil
24
+ partial_fit(x, y, eval_set: eval_set)
25
+ end
26
+
27
+ def partial_fit(x, y = nil, eval_set: nil)
28
+ check_call FFI.XLearnSetPreModel(@handle, @model_path || "")
29
+
30
+ set_train_set(x, y)
31
31
 
32
32
  if eval_set
33
33
  if eval_set.is_a?(String)
@@ -38,9 +38,12 @@ module XLearn
38
38
  end
39
39
  end
40
40
 
41
- # TODO unlink in finalizer
42
- @model_file = Tempfile.new("xlearn")
41
+ @txt_file ||= create_tempfile
42
+ check_call FFI.XLearnSetTXTModel(@handle, @txt_file.path)
43
+
44
+ @model_file ||= create_tempfile
43
45
  check_call FFI.XLearnFit(@handle, @model_file.path)
46
+ @model_path = @model_file.path
44
47
  end
45
48
 
46
49
  def predict(x, out_path: nil)
@@ -63,24 +66,72 @@ module XLearn
63
66
  end
64
67
  end
65
68
 
69
+ def cv(x, y = nil, folds: nil)
70
+ set_params(fold: folds) if folds
71
+ set_train_set(x, y)
72
+ check_call FFI.XLearnCV(@handle)
73
+ end
74
+
66
75
  def save_model(path)
67
76
  raise Error, "Not trained" unless @model_file
68
77
  FileUtils.cp(@model_file.path, path)
69
78
  end
70
79
 
80
+ def save_txt(path)
81
+ raise Error, "Not trained" unless @txt_file
82
+ FileUtils.cp(@txt_file.path, path)
83
+ end
84
+
71
85
  def load_model(path)
72
- @model_file ||= Tempfile.new("xlearn")
86
+ @model_file ||= create_tempfile
73
87
  # TODO ensure tempfile is still cleaned up
74
88
  FileUtils.cp(path, @model_file.path)
75
89
  end
76
90
 
91
+ def bias_term
92
+ read_txt do |line|
93
+ return line.split(":").last.to_f if line.start_with?("bias:")
94
+ end
95
+ end
96
+
97
+ def linear_term
98
+ term = []
99
+ read_txt do |line|
100
+ if line.start_with?("i_")
101
+ term << line.split(":").last.to_f
102
+ elsif line.start_with?("v_")
103
+ break
104
+ end
105
+ end
106
+ term
107
+ end
108
+
77
109
  def self.finalize(pointer)
78
110
  # must use proc instead of stabby lambda
79
111
  proc { FFI.XLearnHandleFree(pointer) }
80
112
  end
81
113
 
114
+ def self.finalize_file(file)
115
+ # must use proc instead of stabby lambda
116
+ proc do
117
+ file.close
118
+ file.unlink
119
+ end
120
+ end
121
+
82
122
  private
83
123
 
124
+ def set_train_set(x, y)
125
+ if x.is_a?(String)
126
+ check_call FFI.XLearnSetTrain(@handle, x)
127
+ check_call FFI.XLearnSetBool(@handle, "from_file", true)
128
+ else
129
+ train_set = DMatrix.new(x, label: y)
130
+ check_call FFI.XLearnSetDMatrix(@handle, "train", train_set)
131
+ check_call FFI.XLearnSetBool(@handle, "from_file", false)
132
+ end
133
+ end
134
+
84
135
  def set_params(params)
85
136
  params.each do |k, v|
86
137
  k = k.to_s
@@ -100,5 +151,19 @@ module XLearn
100
151
  check_call ret
101
152
  end
102
153
  end
154
+
155
+ def read_txt
156
+ if @txt_file
157
+ File.foreach(@txt_file.path) do |line|
158
+ yield line
159
+ end
160
+ end
161
+ end
162
+
163
+ def create_tempfile
164
+ file = Tempfile.new("xlearn")
165
+ ObjectSpace.define_finalizer(self, self.class.finalize_file(file))
166
+ file
167
+ end
103
168
  end
104
169
  end
@@ -1,3 +1,3 @@
1
1
  module XLearn
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xlearn
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-12 00:00:00.000000000 Z
11
+ date: 2019-10-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -66,6 +66,34 @@ dependencies:
66
66
  - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '5'
69
+ - !ruby/object:Gem::Dependency
70
+ name: daru
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
69
97
  description:
70
98
  email: andrew@chartkick.com
71
99
  executables: []