libsvmloader 0.1.3 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: a02d403d7d0960462edd290a33ac4b41c3aea2db
4
- data.tar.gz: fd5fa99a1dd15d76c60c2dcf5ef2361f92fc5e65
2
+ SHA256:
3
+ metadata.gz: e887dc00c4c665e3bf78221650a6740440cf9d07549c01bc47ccfb6940825ddd
4
+ data.tar.gz: bff658751c419564ccbac3896307a2a1799d46ef7370fd5528e641f35073089b
5
5
  SHA512:
6
- metadata.gz: 44400787b97fd5933c36c23b83458ff9628d38829f7e12479aec87425adf6a88312bebc769ad7f89dd1f1fcade3fe785772e97dbb39d54ba075c6a7410dd4404
7
- data.tar.gz: d1d336f24796652fdf203dc773853403cc980ccfc4f3c02e7e7da476e669be2cdbeb5c4bf076c72ad7ce1160dff4fb74a3dbff966e399efd4d2fd32767eb6738
6
+ metadata.gz: ab4062fbd21fdddb5b139e6fef498234e04588e858f172042265f2f2ef33f9f06e60f147e52f56a022a12a2e10d527f3d0c6c64d815651a32cd6741ef488dadd
7
+ data.tar.gz: 8ca0ffb36321df79bb6fd018d2e45d7a6c600371d852ef70ffad644eebcb7273dfb011f34596c3492cd9a1c42b13516ff8b8341df6d2d996e0d660613d3c809a
data/.coveralls.yml ADDED
@@ -0,0 +1 @@
1
+ service_name: travis-ci
data/.gitignore CHANGED
@@ -8,5 +8,14 @@
8
8
  /spec/reports/
9
9
  /tmp/
10
10
 
11
+
11
12
  # rspec failure tracking
12
13
  .rspec_status
14
+
15
+ *.swp
16
+ .DS_Store
17
+ .ruby-version
18
+ /spec/dump_cmp.t
19
+ /spec/dump_dbl.t
20
+ /spec/dump_int.t
21
+ /spec/dump_zb.t
data/.rubocop.yml CHANGED
@@ -6,3 +6,9 @@ Metrics/ModuleLength:
6
6
 
7
7
  Metrics/ClassLength:
8
8
  Max: 200
9
+
10
+ Metrics/MethodLength:
11
+ Max: 40
12
+
13
+ Style/FormatStringToken:
14
+ Enabled: false
data/.travis.yml CHANGED
@@ -1,5 +1,10 @@
1
1
  sudo: false
2
+ os: linux
3
+ dist: trusty
2
4
  language: ruby
3
5
  rvm:
4
- - 2.4.1
6
+ - 2.2
7
+ - 2.3
8
+ - 2.4
9
+ - 2.5
5
10
  before_install: gem install bundler -v 1.15.3
data/HISTORY.md CHANGED
@@ -1,3 +1,9 @@
1
+ # 0.2.0
2
+ ## Breaking changes
3
+
4
+ LibSVMLoader has been modified to return the samples and labels of dataset as Ruby Array.
5
+ Thus, LibSVMLoader does not require NMatrix.
6
+
1
7
  # 0.1.3
2
8
  - Changed the visibility of protected methods to the private.
3
9
  - Fixed the description in the gemspec file.
data/README.md CHANGED
@@ -1,6 +1,7 @@
1
1
  # LibSVMLoader
2
2
 
3
3
  [![Build Status](https://travis-ci.org/yoshoku/LibSVMLoader.svg?branch=master)](https://travis-ci.org/yoshoku/LibSVMLoader)
4
+ [![Coverage Status](https://coveralls.io/repos/github/yoshoku/LibSVMLoader/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/LibSVMLoader?branch=master)
4
5
  [![Gem Version](https://badge.fury.io/rb/libsvmloader.svg)](https://badge.fury.io/rb/libsvmloader)
5
6
  [![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/yoshoku/LibSVMLoader/blob/master/LICENSE.txt)
6
7
 
@@ -32,10 +33,38 @@ samples, labels = LibSVMLoader.load_libsvm_file('foo.t')
32
33
  LibSVMLoader.dump_libsvm_file(samples, labels, 'bar.t')
33
34
 
34
35
  # for regression task
35
- samples, target_variables = LibSVMLoader.load_libsvm_file('foo.t', label_dtype: :float64)
36
+ samples, target_variables = LibSVMLoader.load_libsvm_file('foo.t', label_dtype: 'float')
36
37
  LibSVMLoader.dump_libsvm_file(samples, target_variables, 'bar.t')
37
38
  ```
38
39
 
40
+ When using with Numo::NArray:
41
+
42
+ ```ruby
43
+ require 'libsvmloader'
44
+ require 'numo/narray'
45
+
46
+ samples, labels = LibSVMLoader.load_libsvm_file('foo.t')
47
+
48
+ samples_na = Numo::NArray[*samples]
49
+ labels_na = Numo::NArray[*labels]
50
+
51
+ LibSVMLoader.dump_libsvm_file(samples_na.to_a, labels_na.to_a, 'bar.t')
52
+ ```
53
+
54
+ When using with NMatrix:
55
+
56
+ ```ruby
57
+ require 'libsvmloader'
58
+ require 'nmatrix/nmatrix'
59
+
60
+ samples, labels = LibSVMLoader.load_libsvm_file('foo.t')
61
+
62
+ samples_nm = N[*samples]
63
+ labels_nm = N[*labels]
64
+
65
+ LibSVMLoader.dump_libsvm_file(samples_nm.to_a, labels_nm.to_a, 'bar.t')
66
+ ```
67
+
39
68
  ## Development
40
69
 
41
70
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
data/lib/libsvmloader.rb CHANGED
@@ -1,92 +1,112 @@
1
+ # frozen_string_literal: true
1
2
 
2
3
  require 'libsvmloader/version'
3
- require 'nmatrix/nmatrix'
4
+ require 'csv'
4
5
 
5
6
  # LibSVMLoader loads (and dumps) dataset file with the libsvm file format.
6
7
  class LibSVMLoader
7
8
  class << self
8
- # Load a dataset with the libsvm file format into NMatrix.
9
+ # Load a dataset with the libsvm file format.
9
10
  #
10
- # @param filename [String] A path to a dataset file.
11
+ # @param filename [String] Path to a dataset file.
11
12
  # @param zero_based [Boolean] Whether the column index starts from 0 (true) or 1 (false).
12
- # @param stype [Symbol] The strorage type of the nmatrix consisting of feature vectors.
13
- # @param label_dtype [Symbol] The data type of the NMatrix consisting of labels or target values.
14
- # @param value_dtype [Symbol] The data type of the NMatrix consisting of feature vectors.
13
+ # @param label_dtype [String] Data type of labels or target values ('int', 'float', 'complex').
14
+ # @param value_dtype [String] Data type of feature vectors ('int', 'float', 'complex').
15
15
  #
16
- # @return [Array<NMatrix>]
16
+ # @return [Array<Array>]
17
17
  # Returns array containing the (n_samples x n_features) matrix for feature vectors
18
- # and (n_samples x 1) matrix for labels or target values.
19
- def load_libsvm_file(filename, zero_based: false, stype: :yale, label_dtype: :int32, value_dtype: :float64)
20
- ftvecs = []
18
+ # and (n_samples) vector for labels or target values.
19
+ def load_libsvm_file(filename, zero_based: false, label_dtype: 'int', value_dtype: 'float')
21
20
  labels = []
22
- n_features = 0
23
- File.read(filename).split("\n").each do |line|
24
- label, ftvec, max_idx = parse_libsvm_line(line, zero_based)
21
+ ftvecs = []
22
+ maxids = []
23
+ label_class = parse_dtype(label_dtype)
24
+ value_class = parse_dtype(value_dtype)
25
+ CSV.foreach(filename, col_sep: "\s", headers: false) do |row|
26
+ label, ftvec, maxid = parse_libsvm_row(row, zero_based, label_class, value_class)
25
27
  labels.push(label)
26
28
  ftvecs.push(ftvec)
27
- n_features = [n_features, max_idx].max
29
+ maxids.push(maxid)
28
30
  end
29
- [convert_to_nmatrix(ftvecs, n_features, value_dtype, stype),
30
- NMatrix.new([labels.size, 1], labels, dtype: label_dtype)]
31
+ [convert_to_matrix(ftvecs, maxids.max + 1, value_class), labels]
31
32
  end
32
33
 
33
34
  # Dump the dataset with the libsvm file format.
34
35
  #
35
- # @param data [NMatrix] (n_samples x n_features) matrix consisting of feature vectors.
36
- # @param labels [NMatrix] (n_samples x 1) matrix consisting of labels or target values.
37
- # @param filename [String] A path to the output libsvm file.
36
+ # @param data [Array] (n_samples x n_features) matrix consisting of feature vectors.
37
+ # @param labels [Array] (n_samples) vector consisting of labels or target values.
38
+ # @param filename [String] Path to the output libsvm file.
38
39
  # @param zero_based [Boolean] Whether the column index starts from 0 (true) or 1 (false).
39
40
  def dump_libsvm_file(data, labels, filename, zero_based: false)
40
- n_samples = [data.rows, labels.rows].min
41
- label_type = detect_dtype(labels)
42
- value_type = detect_dtype(data)
41
+ n_samples = [data.size, labels.size].min
42
+ label_format = detect_format(labels.first)
43
+ value_format = detect_format(data.flatten.first)
43
44
  File.open(filename, 'w') do |file|
44
- n_samples.times do |n|
45
- file.puts(dump_libsvm_line(labels[n], data.row(n),
46
- label_type, value_type, zero_based))
47
- end
45
+ n_samples.times { |n| file.puts(dump_libsvm_line(labels[n], data[n], label_format, value_format, zero_based)) }
48
46
  end
49
47
  end
50
48
 
51
49
  private
52
50
 
53
- def parse_libsvm_line(line, zero_based)
54
- tokens = line.split
55
- label = tokens.shift.to_f
56
- ftvec = tokens.map do |el|
51
+ def parse_libsvm_row(row, zero_based, label_type, value_type)
52
+ label = convert_type(row.shift, label_type)
53
+ ftvec = row.map do |el|
57
54
  idx, val = el.split(':')
58
- idx = idx.to_i - (zero_based == false ? 1 : 0)
59
- [idx, val.to_f]
55
+ [idx.to_i - (zero_based == false ? 1 : 0), convert_type(val, value_type)]
60
56
  end
61
- max_idx = ftvec.map { |el| el[0] }.max
62
- max_idx ||= 0
57
+ max_idx = ftvec.map { |idx, _val| idx }.max || 0
63
58
  [label, ftvec, max_idx]
64
59
  end
65
60
 
66
- def convert_to_nmatrix(data, n_features, value_dtype, stype)
67
- n_samples = data.size
68
- mat = NMatrix.zeros([n_samples, n_features + 1],
69
- dtype: value_dtype, stype: stype)
70
- data.each_with_index do |ftvec, n|
71
- ftvec.each do |el|
72
- mat[n, el[0]] = el[1]
73
- end
61
+ def parse_dtype(dtype)
62
+ case dtype.to_s
63
+ when /^(int)/i
64
+ :int
65
+ when /^(float)/i
66
+ :float
67
+ when /^(complex)/i
68
+ :complex
69
+ else
70
+ :string
74
71
  end
75
- mat
76
72
  end
77
73
 
78
- def detect_dtype(data)
74
+ def convert_type(value, dtype)
75
+ case dtype
76
+ when :int
77
+ value.to_i
78
+ when :float
79
+ value.to_f
80
+ when :complex
81
+ value.to_c
82
+ else
83
+ value
84
+ end
85
+ end
86
+
87
+ def convert_to_matrix(data, n_features, value_type)
88
+ z = convert_type(0, value_type)
89
+ data.map do |ft|
90
+ vec = Array.new(n_features) { z }
91
+ ft.each { |idx, val| vec[idx] = val }
92
+ vec
93
+ end
94
+ end
95
+
96
+ def detect_format(data)
79
97
  type = '%s'
80
- type = '%d' if %i[int8 int16 int32 int64].include?(data.dtype)
81
- type = '%.10g' if %i[float32 float64].include?(data.dtype)
98
+ type = '%d' if data.is_a?(Integer)
99
+ type = '%.10g' if data.is_a?(Float)
82
100
  type
83
101
  end
84
102
 
85
- def dump_libsvm_line(label, ftvec, label_type, value_type, zero_based)
86
- line = format(label_type.to_s, label)
87
- ftvec.to_a.each_with_index do |val, n|
88
- idx = n + (zero_based == false ? 1 : 0)
89
- line += format(" %d:#{value_type}", idx, val) if val != 0.0
103
+ def dump_libsvm_line(label, ftvec, label_format, value_format, zero_based)
104
+ line = format(label_format, label)
105
+ ftvec.each_with_index do |val, n|
106
+ unless val.zero?
107
+ idx = n + (zero_based == false ? 1 : 0)
108
+ line += format(" %d:#{value_format}", idx, val)
109
+ end
90
110
  end
91
111
  line
92
112
  end
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  class LibSVMLoader
2
- VERSION = '0.1.3'.freeze
4
+ VERSION = '0.2.0'.freeze
3
5
  end
data/libsvmloader.gemspec CHANGED
@@ -14,7 +14,7 @@ LibSVMLoader loads (and dumps) dataset file with the libsvm file format.
14
14
  MSG
15
15
  spec.description = <<MSG
16
16
  LibSVMLoader is a class that loads (and dumps) dataset file with the libsvm file format.
17
- The sample matrix for feature vectors and target vector for labels are represented by the NMatrix format.
17
+ The feature vectors and labels of dataset are represented by Ruby Array.
18
18
  MSG
19
19
  spec.homepage = 'https://github.com/yoshoku/libsvmloader'
20
20
  spec.license = 'MIT'
@@ -28,9 +28,8 @@ MSG
28
28
 
29
29
  spec.required_ruby_version = '>= 2.1'
30
30
 
31
- spec.add_runtime_dependency 'nmatrix', '~> 0.2.3'
32
-
33
31
  spec.add_development_dependency 'bundler', '~> 1.15'
32
+ spec.add_development_dependency 'coveralls', '~> 0.8'
34
33
  spec.add_development_dependency 'rake', '~> 10.0'
35
34
  spec.add_development_dependency 'rspec', '~> 3.0'
36
35
  end
metadata CHANGED
@@ -1,43 +1,43 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: libsvmloader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-01-07 00:00:00.000000000 Z
11
+ date: 2018-08-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: nmatrix
14
+ name: bundler
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: 0.2.3
20
- type: :runtime
19
+ version: '1.15'
20
+ type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: 0.2.3
26
+ version: '1.15'
27
27
  - !ruby/object:Gem::Dependency
28
- name: bundler
28
+ name: coveralls
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.15'
33
+ version: '0.8'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.15'
40
+ version: '0.8'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rake
43
43
  requirement: !ruby/object:Gem::Requirement
@@ -68,7 +68,7 @@ dependencies:
68
68
  version: '3.0'
69
69
  description: |
70
70
  LibSVMLoader is a class that loads (and dumps) dataset file with the libsvm file format.
71
- The sample matrix for feature vectors and target vector for labels are represented by the NMatrix format.
71
+ The feature vectors and labels of dataset are represented by Ruby Array.
72
72
  email:
73
73
  - yoshoku@outlook.com
74
74
  executables:
@@ -76,6 +76,7 @@ executables:
76
76
  extensions: []
77
77
  extra_rdoc_files: []
78
78
  files:
79
+ - ".coveralls.yml"
79
80
  - ".gitignore"
80
81
  - ".rspec"
81
82
  - ".rubocop.yml"
@@ -112,7 +113,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
112
113
  version: '0'
113
114
  requirements: []
114
115
  rubyforge_project:
115
- rubygems_version: 2.4.5.4
116
+ rubygems_version: 2.7.6
116
117
  signing_key:
117
118
  specification_version: 4
118
119
  summary: LibSVMLoader loads (and dumps) dataset file with the libsvm file format.