thundersvm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 9f3ccc0ace653a21c742dd36bdf2cf33d941fd7c157b98ea5e453524677dc3da
4
+ data.tar.gz: 6a35aad608e187dac40f4cb2cc798d8fe6b17bb7617e570861f5a044bfd30f5b
5
+ SHA512:
6
+ metadata.gz: ca1f18fce237bb032f64e33c36636ff379fb6a267dabcb83f30700a9bfc60c3fd8a1511afbf17cfe5c009bb1c584d2fd463dbba6a35bad7f9a4d5bc9f8b6cbeb
7
+ data.tar.gz: 775d599e2e0334fe4ae24aa6ae7789087f0c16cc8e9a786fb5b344b784a62b6f509db882ce70dfbecdf0d76e4ed562a8b140972f5c536349c899660cebd78a1a
data/CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2019-11-24)
2
+
3
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2019 Andrew Kane
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,141 @@
1
+ # ThunderSVM
2
+
3
+ [ThunderSVM](https://github.com/Xtra-Computing/thundersvm) - high-performance parallel SVMs - for Ruby
4
+
5
+ :fire: Uses GPUs and multi-core CPUs for blazing performance
6
+
7
+ For a great intro on support vector machines, check out [this video](https://www.youtube.com/watch?v=efR1C6CvhmE).
8
+
9
+ ## Installation
10
+
11
+ First, [install ThunderSVM](https://github.com/Xtra-Computing/thundersvm/blob/master/docs/how-to.md#install-thundersvm). Add this line to your application’s Gemfile:
12
+
13
+ ```ruby
14
+ gem 'thundersvm'
15
+ ```
16
+
17
+ ## Getting Started
18
+
19
+ Prep your data
20
+
21
+ ```ruby
22
+ x = [[1, 2], [3, 4], [5, 6], [7, 8]]
23
+ y = [1, 2, 3, 4]
24
+ ```
25
+
26
+ Train a model
27
+
28
+ ```ruby
29
+ model = ThunderSVM::Regressor.new
30
+ model.fit(x, y)
31
+ ```
32
+
33
+ Use `ThunderSVM::Classifier` for classification and `ThunderSVM::Model` for other models
34
+
35
+ Make predictions
36
+
37
+ ```ruby
38
+ model.predict(x)
39
+ ```
40
+
41
+ Save the model to a file
42
+
43
+ ```ruby
44
+ model.save_model("model.txt")
45
+ ```
46
+
47
+ Load the model from a file
48
+
49
+ ```ruby
50
+ model = ThunderSVM.load_model("model.txt")
51
+ ```
52
+
53
+ Get support vectors
54
+
55
+ ```ruby
56
+ model.support_vectors
57
+ ```
58
+
59
+ ## Cross-Validation
60
+
61
+ Perform cross-validation
62
+
63
+ ```ruby
64
+ model.cv(x, y)
65
+ ```
66
+
67
+ Specify the number of folds
68
+
69
+ ```ruby
70
+ model.cv(x, y, folds: 5)
71
+ ```
72
+
73
+ ## Parameters
74
+
75
+ Defaults shown below
76
+
77
+ ```ruby
78
+ ThunderSVM::Model.new(
79
+ svm_type: :c_svc, # set type of SVM (c_svc, nu_svc, one_class, epsilon_svr, nu_svr)
80
+ kernel: :rbf, # set type of kernel function (linear, polynomial, rbf, sigmoid)
81
+ degree: 3, # set degree in kernel function
82
+ gamma: nil, # set gamma in kernel function
83
+ coef0: 0, # set coef0 in kernel function
84
+ c: 1, # set the parameter C of C-SVC, epsilon-SVR, and nu-SVR
85
+ nu: 0.5, # set the parameter nu of nu-SVC, one-class SVM, and nu-SVR
86
+ epsilon: 0.1, # set the epsilon in loss function of epsilon-SVR
87
+ max_memory: 8192, # constrain the maximum memory size (MB) that thundersvm uses
88
+ tolerance: 0.001, # set tolerance of termination criterion
89
+ probability: false, # whether to train a SVC or SVR model for probability estimates
90
+ gpu: 0, # specify which gpu to use
91
+ cores: nil, # set the number of cpu cores to use (defaults to all)
92
+ verbose: false # verbose mode
93
+ )
94
+ ```
95
+
96
+ ## Data
97
+
98
+ Data can be a Ruby array
99
+
100
+ ```ruby
101
+ [[1, 2], [3, 4], [5, 6], [7, 8]]
102
+ ```
103
+
104
+ Or a Numo array
105
+
106
+ ```ruby
107
+ Numo::DFloat.cast([[1, 2], [3, 4], [5, 6], [7, 8]])
108
+ ```
109
+
110
+ Or the path a file in `libsvm` format (better for sparse data)
111
+
112
+ ```ruby
113
+ model.fit("train.txt")
114
+ model.predict("test.txt")
115
+ ```
116
+
117
+ ## Resources
118
+
119
+ - [ThunderSVM: A Fast SVM Library on GPUs and CPUs](https://github.com/Xtra-Computing/thundersvm/blob/master/thundersvm-full.pdf)
120
+
121
+ ## History
122
+
123
+ View the [changelog](https://github.com/ankane/thundersvm/blob/master/CHANGELOG.md)
124
+
125
+ ## Contributing
126
+
127
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
128
+
129
+ - [Report bugs](https://github.com/ankane/thundersvm/issues)
130
+ - Fix bugs and [submit pull requests](https://github.com/ankane/thundersvm/pulls)
131
+ - Write, clarify, or fix documentation
132
+ - Suggest or add new features
133
+
134
+ To get started with development:
135
+
136
+ ```sh
137
+ git clone https://github.com/ankane/thundersvm.git
138
+ cd thundersvm
139
+ bundle install
140
+ bundle exec rake test
141
+ ```
data/lib/thundersvm.rb ADDED
@@ -0,0 +1,28 @@
1
+ # stdlib
2
+ require "fiddle/import"
3
+ require "fileutils"
4
+ require "tempfile"
5
+
6
+ # modules
7
+ require "thundersvm/model"
8
+ require "thundersvm/classifier"
9
+ require "thundersvm/regressor"
10
+ require "thundersvm/version"
11
+
12
+ module ThunderSVM
13
+ class Error < StandardError; end
14
+
15
+ class << self
16
+ attr_accessor :ffi_lib
17
+ end
18
+ self.ffi_lib = ["libthundersvm.so", "libthundersvm.dylib", "thundersvm.dll"]
19
+
20
+ # friendlier error message
21
+ autoload :FFI, "thundersvm/ffi"
22
+
23
+ def self.load_model(path)
24
+ model = Model.new
25
+ model.load_model(path)
26
+ model
27
+ end
28
+ end
@@ -0,0 +1,7 @@
1
+ module ThunderSVM
2
+ class Classifier < Model
3
+ def initialize(svm_type: :c_svc, **options)
4
+ super(svm_type: svm_type, **options)
5
+ end
6
+ end
7
+ end
@@ -0,0 +1,19 @@
1
+ module ThunderSVM
2
+ module FFI
3
+ extend Fiddle::Importer
4
+
5
+ libs = ThunderSVM.ffi_lib.dup
6
+ begin
7
+ dlload libs.shift
8
+ rescue Fiddle::DLError => e
9
+ retry if libs.any?
10
+ raise e if ENV["THUNDERSVM_DEBUG"]
11
+ raise LoadError, "Could not find ThunderSVM"
12
+ end
13
+
14
+ extern "void thundersvm_train(int argc, char **argv)"
15
+ extern "void thundersvm_train_after_parse(char **option, int len, char *file_name)"
16
+ extern "void thundersvm_predict(int argc, char **argv)"
17
+ extern "void thundersvm_predict_after_parse(char *model_file_name, char *output_file_name, char **option, int len)"
18
+ end
19
+ end
@@ -0,0 +1,191 @@
1
+ module ThunderSVM
2
+ class Model
3
+ def initialize(svm_type: :c_svc, kernel: :rbf, degree: 3, gamma: nil, coef0: 0,
4
+ c: 1, nu: 0.5, epsilon: 0.1, max_memory: 8192, tolerance: 0.001,
5
+ probability: false, gpu: 0, cores: nil, verbose: nil)
6
+
7
+ @svm_type = svm_type.to_sym
8
+ @kernel = kernel.to_sym
9
+ @degree = degree
10
+ @gamma = gamma
11
+ @coef0 = coef0
12
+ @c = c
13
+ @nu = nu
14
+ @epsilon = epsilon
15
+ @max_memory = max_memory
16
+ @tolerance = tolerance
17
+ @probability = probability
18
+ @gpu = gpu
19
+ @cores = cores
20
+ @verbose = verbose
21
+ end
22
+
23
+ def fit(x, y = nil)
24
+ train(x, y)
25
+ end
26
+
27
+ def cv(x, y = nil, folds: 5)
28
+ train(x, y, folds: folds)
29
+ end
30
+
31
+ def predict(x)
32
+ dataset_file = create_dataset(x)
33
+ out_file = create_tempfile
34
+ argv = ["thundersvm-predict", dataset_file.path, @model_file.path, out_file.path]
35
+ FFI.thundersvm_predict(argv.size, str_ptr(argv))
36
+ func = [:c_svc, :nu_svc].include?(@svm_type) ? :to_i : :to_f
37
+ out_file.each_line.map(&func)
38
+ end
39
+
40
+ def save_model(path)
41
+ raise Error, "Not trained" unless @model_file
42
+ FileUtils.cp(@model_file.path, path)
43
+ nil
44
+ end
45
+
46
+ def load_model(path)
47
+ @model_file ||= create_tempfile
48
+ # TODO ensure tempfile is still cleaned up
49
+ FileUtils.cp(path, @model_file.path)
50
+ @svm_type = read_header["svm_type"].to_sym
51
+ @kernel = read_header["kernel_type"].to_sym
52
+ nil
53
+ end
54
+
55
+ def support_vectors
56
+ vectors = []
57
+ sv = false
58
+ read_txt do |line|
59
+ if sv
60
+ index = line.index("1:")
61
+ vectors << line[index..-1].split(" ").map { |v| v.split(":").last.to_f }
62
+ elsif line.start_with?("SV")
63
+ sv = true
64
+ end
65
+ end
66
+ vectors
67
+ end
68
+
69
+ def dual_coef
70
+ vectors = []
71
+ sv = false
72
+ read_txt do |line|
73
+ if sv
74
+ index = line.index("1:")
75
+ line[0...index].split(" ").map(&:to_f).each_with_index do |v, i|
76
+ (vectors[i] ||= []) << v
77
+ end
78
+ elsif line.start_with?("SV")
79
+ sv = true
80
+ end
81
+ end
82
+ vectors
83
+ end
84
+
85
+ def self.finalize_file(file)
86
+ # must use proc instead of stabby lambda
87
+ proc do
88
+ file.close
89
+ file.unlink
90
+ end
91
+ end
92
+
93
+ private
94
+
95
+ def train(x, y = nil, folds: nil)
96
+ dataset_file = create_dataset(x, y)
97
+ @model_file ||= create_tempfile
98
+
99
+ svm_types = {
100
+ c_svc: 0,
101
+ nu_svc: 1,
102
+ one_class: 2,
103
+ epsilon_svr: 3,
104
+ nu_svr: 4
105
+ }
106
+ s = svm_types[@svm_type]
107
+ raise Error, "Unknown SVM type: #{@svm_type}" unless s
108
+
109
+ kernels = {
110
+ linear: 0,
111
+ polynomial: 1,
112
+ rbf: 2,
113
+ sigmoid: 3
114
+ }
115
+ t = kernels[@kernel]
116
+ raise Error, "Unknown kernel: #{@kernel}" unless t
117
+
118
+ verbose = @verbose
119
+ verbose = true if folds && verbose.nil?
120
+
121
+ argv = ["thundersvm-train"]
122
+ argv += ["-s", s]
123
+ argv += ["-t", t]
124
+ argv += ["-d", @degree.to_i] if @degree
125
+ argv += ["-g", @gamma.to_f] if @gamma
126
+ argv += ["-r", @coef0.to_f] if @coef0
127
+ argv += ["-c", @c.to_f] if @c
128
+ argv += ["-n", @nu.to_f] if @nu
129
+ argv += ["-p", @epsilon.to_f] if @epsilon
130
+ argv += ["-m", @max_memory.to_i] if @max_memory
131
+ argv += ["-e", @tolerance.to_f] if @tolerance
132
+ argv += ["-b", @probability ? 1 : 0] if @probability
133
+ argv += ["-v", folds.to_i] if folds
134
+ argv += ["-u", @gpu.to_i] if @gpu
135
+ argv += ["-o", @cores.to_i] if @cores
136
+ argv << "-q" unless verbose
137
+ argv += [dataset_file.path, @model_file.path]
138
+
139
+ FFI.thundersvm_train(argv.size, str_ptr(argv))
140
+ nil
141
+ end
142
+
143
+ def create_dataset(x, y = nil)
144
+ if x.is_a?(String)
145
+ raise ArgumentError, "Cannot pass y with file" if y
146
+ File.open(x)
147
+ else
148
+ contents = String.new("")
149
+ y ||= [0] * x.size
150
+ x.to_a.zip(y.to_a).each do |xi, yi|
151
+ contents << "#{yi.to_i} #{xi.map.with_index { |v, i| "#{i + 1}:#{v.to_f}" }.join(" ")}\n"
152
+ end
153
+ dataset = create_tempfile
154
+ dataset.write(contents)
155
+ dataset.close
156
+ dataset
157
+ end
158
+ end
159
+
160
+ def str_ptr(arr)
161
+ ptr = Fiddle::Pointer.malloc(Fiddle::SIZEOF_VOIDP * arr.size)
162
+ arr.each_with_index do |v, i|
163
+ ptr[i * Fiddle::SIZEOF_VOIDP, Fiddle::SIZEOF_VOIDP] = Fiddle::Pointer["#{v}\x00"].ref
164
+ end
165
+ ptr
166
+ end
167
+
168
+ def create_tempfile
169
+ file = Tempfile.new("thundersvm")
170
+ ObjectSpace.define_finalizer(self, self.class.finalize_file(file))
171
+ file
172
+ end
173
+
174
+ def read_header
175
+ model = {}
176
+ read_txt do |line|
177
+ break if line.start_with?("SV")
178
+ k, v = line.split(" ", 2)
179
+ model[k] = v.strip
180
+ end
181
+ model
182
+ end
183
+
184
+ def read_txt
185
+ @model_file.rewind
186
+ @model_file.each_line do |line|
187
+ yield line
188
+ end
189
+ end
190
+ end
191
+ end
@@ -0,0 +1,7 @@
1
+ module ThunderSVM
2
+ class Regressor < Model
3
+ def initialize(svm_type: :epsilon_svr, **options)
4
+ super(svm_type: svm_type, **options)
5
+ end
6
+ end
7
+ end
@@ -0,0 +1,3 @@
1
+ module ThunderSVM
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,107 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: thundersvm
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-11-25 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: minitest
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '5'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '5'
55
+ - !ruby/object:Gem::Dependency
56
+ name: numo-narray
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email: andrew@chartkick.com
71
+ executables: []
72
+ extensions: []
73
+ extra_rdoc_files: []
74
+ files:
75
+ - CHANGELOG.md
76
+ - LICENSE.txt
77
+ - README.md
78
+ - lib/thundersvm.rb
79
+ - lib/thundersvm/classifier.rb
80
+ - lib/thundersvm/ffi.rb
81
+ - lib/thundersvm/model.rb
82
+ - lib/thundersvm/regressor.rb
83
+ - lib/thundersvm/version.rb
84
+ homepage: https://github.com/ankane/thundersvm
85
+ licenses:
86
+ - MIT
87
+ metadata: {}
88
+ post_install_message:
89
+ rdoc_options: []
90
+ require_paths:
91
+ - lib
92
+ required_ruby_version: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '2.4'
97
+ required_rubygems_version: !ruby/object:Gem::Requirement
98
+ requirements:
99
+ - - ">="
100
+ - !ruby/object:Gem::Version
101
+ version: '0'
102
+ requirements: []
103
+ rubygems_version: 3.0.6
104
+ signing_key:
105
+ specification_version: 4
106
+ summary: ThunderSVM - high-performance parallel SVMs - for Ruby
107
+ test_files: []