diarize-ruby 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,109 @@
1
+ # diarize-ruby
2
+
3
+ This library provides an easy-to-use toolkit for speaker
4
+ segmentation (diarization) and identification from audio.
5
+
6
+ This library is being used within the BBC R&D World Service
7
+ archive prototype.
8
+
9
+ See http://worldservice.prototyping.bbc.co.uk/programmes/X0403940 for
10
+ an example.
11
+
12
+
13
+ ## Speaker diarization
14
+
15
+ This library gives acccess to the algorithm developed by the LIUM
16
+ for the ESTER 2 evaluation campaign and described in [Meigner2010].
17
+
18
+ It wraps a binary JAR file compiled from
19
+ http://lium3.univ-lemans.fr/diarization/doku.php/welcome.
20
+
21
+
22
+ ## Speaker identification
23
+
24
+ This library also implements an algorithm for speaker identification
25
+ based on the comparison of normalised speaker models, which can be
26
+ accessed through the Speaker#match method.
27
+
28
+ This algorithm builds on top of the LIUM toolkit and uses the following
29
+ techniques:
30
+
31
+ * "M-Norm" normalisation of speaker models [Ben2003]
32
+ * The symmetric Kullback-Leibler divergence approximation described in [Do2003]
33
+ * The detection score specified in [Ben2005]
34
+
35
+ It also includes support for speaker supervectors [Campbell2006], which
36
+ can be used in combination with our ruby-lsh library for fast speaker
37
+ identification.
38
+
39
+
40
+ ## Example use
41
+
42
+ $ jruby -S gem install diarize-jruby
43
+ $ jruby -S irb
44
+ > require 'diarize'
45
+ > audio = Diarize::Audio.new URI('http://example.com/file.wav')
46
+ > audio = Diarize::Audio.new URI.join('file:///', '/Users/juergen/work/ruby/diarize-ruby/test/data/will-and-juergen.wav')
47
+ > audio.analyze!
48
+ > audio.segments
49
+ > audio.speakers
50
+ > audio.to_rdf
51
+ > speakers = audio.speakers
52
+ > speakers.first.gender
53
+ > speakers.first.model.mean_log_likelihood
54
+ > speakers.first.model.components.size
55
+ > audio.segments_by_speaker(speakers.first)[0].play
56
+ > audio.segments_by_speaker(speakers.first)[1].play
57
+ > ...
58
+ > speakers |= other_speakers
59
+ > Diarize::Speaker.match(speakers)
60
+
61
+
62
+ ## Running tests
63
+
64
+ $ rake
65
+
66
+
67
+ ## References
68
+
69
+ [Meigner2010] S. Meignier and T. Merlin, "LIUM SpkDiarization:
70
+ An Open Source Toolkit For Diarization" in Proc. CMU SPUD Workshop,
71
+ March 2010, Dallas (Texas, USA)
72
+
73
+ [Ben2003] M. Ben and F. Bimbot, "D-MAP: A Distance-Normalized Map
74
+ Estimation of SPeaker Models for Automatic Speaker Verification",
75
+ Proceedings of ICASSP, 2003
76
+
77
+ [Do2003] M. N. Do, "Fast Approximation of Kullback-Leibler Distance
78
+ for Dependence Trees and Hidden Markov Models",
79
+ IEEE Signal Processing Letters, April 2003
80
+
81
+ [Ben2005] M. Ben and G. Gravier and F. Bimbot. "A model space
82
+ framework for efficient speaker detection",
83
+ Proceedings of INTERSPEECH, 2005
84
+
85
+ [Campbell2006] W. M. Campbell, D. E. Sturim and D. A. Reynolds,
86
+ "Support vector machines using GMM supervectors for speaker verification",
87
+ IEEE Signal Processing Letters, 2006, 13, 308-311
88
+
89
+
90
+ ## Licensing terms and authorship
91
+
92
+ See 'LICENSE' and 'AUTHORS' files.
93
+
94
+ All code here, except where otherwise indicated, is licensed under
95
+ the GNU Affero General Public License version 3. This license includes
96
+ many restrictions. If this causes a problem, please contact us.
97
+ See "AUTHORS" for contact details.
98
+
99
+ This library includes a binary JAR file from the LIUM project, which code
100
+ is licensed under the GNU General Public License version 2. See
101
+ http://lium3.univ-lemans.fr/diarization/doku.php/licence for more
102
+ information.
103
+
104
+
105
+ ## Developer Resources
106
+
107
+ * [Connecting Ruby to Java and vice versa](http://nofail.de/2010/04/ruby-in-java-java-in-ruby-jruby-or-ruby-java-bridge/)
108
+ * [LIUM scripts](https://github.com/StevenLOL/LIUM/blob/master/ilp_diarization2.sh)
109
+ * [Speaker Identification for the whole World Service Archive](http://www.bbc.co.uk/rd/blog/2014-01-speaker-identification-for-the-whole-world-service-archive)
data/Rakefile ADDED
@@ -0,0 +1,11 @@
1
+ require 'rake/testtask'
2
+
3
+ task :default => [:test]
4
+
5
+ desc "Run tests"
6
+ Rake::TestTask.new do |t|
7
+ t.libs << "lib"
8
+ t.libs << "test"
9
+ t.test_files = FileList['test/*_test.rb']
10
+ t.verbose = true
11
+ end
@@ -0,0 +1,31 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'diarize/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "diarize-ruby"
8
+ spec.version = Diarize::VERSION
9
+ spec.date = "2016-07-09"
10
+ spec.authors = ['Yves Raimond', 'Juergen Fesslmeier']
11
+ spec.summary = "Speaker Diarization for Ruby"
12
+ spec.email = ["jfesslmeier@gmail.com"]
13
+ spec.homepage = "https://github.com/chinshr/diarize-ruby"
14
+ spec.description = "A library for Ruby wrapping the LIUM Speaker Diarization and including a few extra tools"
15
+ spec.has_rdoc = false
16
+ spec.license = "GNU Affero General Public License version 3"
17
+
18
+ spec.files = `git ls-files -z`.split("\x0")
19
+ spec.bindir = 'bin'
20
+ spec.executables = []
21
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
22
+ spec.require_paths = ["lib"]
23
+
24
+ spec.add_development_dependency "test-unit", "~> 3.0"
25
+ spec.add_development_dependency "mocha", "~> 1.1"
26
+ spec.add_development_dependency "webmock", "~> 2.1"
27
+
28
+ spec.add_dependency "rjb", "~> 1.5"
29
+ spec.add_dependency "to-rdf", "~> 0"
30
+ spec.add_dependency "jblas-ruby", "~> 1.1"
31
+ end
data/lib/diarize.rb ADDED
@@ -0,0 +1,117 @@
1
+ require "rjb"
2
+
3
+ RJB_LOAD_PATH = [File.join(File.expand_path(File.dirname(__FILE__)), 'diarize', 'LIUM_SpkDiarization-4.2.jar')].join(File::PATH_SEPARATOR)
4
+ RJB_OPTIONS = ['-Xms16m', '-Xmx1024m']
5
+
6
+ Rjb::load(RJB_LOAD_PATH, RJB_OPTIONS)
7
+
8
+ require "matrix"
9
+ require "diarize/version"
10
+ require "diarize/lium"
11
+ require "diarize/audio"
12
+ require "diarize/segment"
13
+ require "diarize/segmentation"
14
+ require "diarize/audio_player"
15
+ require "diarize/super_vector"
16
+
17
+ # Extenions to the {Ruby-Java Bridge}[http://rjb.rubyforge.org/] module that
18
+ # adds a generic Java object wrapper class.
19
+ module Rjb
20
+ # A generic wrapper for a Java object loaded via the Ruby Java Bridge. The
21
+ # wrapper class handles intialization and stringification, and passes other
22
+ # method calls down to the underlying Java object. Objects returned by the
23
+ # underlying Java object are converted to the appropriate Ruby object.
24
+ #
25
+ # This object is enumerable, yielding items in the order defined by the Java
26
+ # object's iterator.
27
+ class JavaObjectWrapper
28
+ include Enumerable
29
+
30
+ # The underlying Java object.
31
+ attr_reader :java_object
32
+
33
+ # Initialize with a Java object <em>obj</em>. If <em>obj</em> is a
34
+ # String, assume it is a Java class name and instantiate it. Otherwise,
35
+ # treat <em>obj</em> as an instance of a Java object.
36
+ def initialize(obj, *args)
37
+ @java_object = obj.class == String ?
38
+ Rjb::import(obj).send(:new, *args) : obj
39
+ end
40
+
41
+ # Enumerate all the items in the object using its iterator. If the object
42
+ # has no iterator, this function yields nothing.
43
+ def each
44
+ if @java_object.getClass.getMethods.any? {|m| m.getName == "iterator"}
45
+ i = @java_object.iterator
46
+ while i.hasNext
47
+ yield wrap_java_object(i.next)
48
+ end
49
+ end
50
+ end # each
51
+
52
+ # Reflect unhandled method calls to the underlying Java object.
53
+ def method_missing(m, *args)
54
+ wrap_java_object(@java_object.send(m, *args))
55
+ end
56
+
57
+ # Convert a value returned by a call to the underlying Java object to the
58
+ # appropriate Ruby object as follows:
59
+ # * RJB objects are placed inside a generic JavaObjectWrapper wrapper.
60
+ # * <tt>java.util.ArrayList</tt> objects are converted to Ruby Arrays.
61
+ # * <tt>java.util.HashSet</tt> objects are converted to Ruby Sets
62
+ # * Other objects are left unchanged.
63
+ #
64
+ # This function is applied recursively to items in collection objects such
65
+ # as set and arrays.
66
+ def wrap_java_object(object)
67
+ if object.kind_of?(Array)
68
+ object.collect {|item| wrap_java_object(item)}
69
+ # Ruby-Java Bridge Java objects all have a _classname member which tells
70
+ # the name of their Java class.
71
+ elsif object.respond_to?(:_classname)
72
+ case object._classname
73
+ when /java\.util\.ArrayList/
74
+ # Convert java.util.ArrayList objects to Ruby arrays.
75
+ array_list = []
76
+ object.size.times do
77
+ |i| array_list << wrap_java_object(object.get(i))
78
+ end
79
+ array_list
80
+ when /java\.util\.HashSet/
81
+ # Convert java.util.HashSet objects to Ruby sets.
82
+ set = Set.new
83
+ i = object.iterator
84
+ while i.hasNext
85
+ set << wrap_java_object(i.next)
86
+ end
87
+ set
88
+ else
89
+ # Pass other RJB objects off to a handler.
90
+ wrap_rjb_object(object)
91
+ end # case
92
+ else
93
+ # Return non-RJB objects unchanged.
94
+ object
95
+ end # if
96
+ end # wrap_java_object
97
+
98
+ # By default, all RJB classes other than <tt>java.util.ArrayList</tt> and
99
+ # <tt>java.util.HashSet</tt> go in a generic wrapper. Derived classes may
100
+ # change this behavior.
101
+ def wrap_rjb_object(object)
102
+ JavaObjectWrapper.new(object)
103
+ end
104
+
105
+ # Show the classname of the underlying Java object.
106
+ def inspect
107
+ "<#{@java_object._classname}>"
108
+ end
109
+
110
+ # Use the underlying Java object's stringification.
111
+ def to_s
112
+ toString
113
+ end
114
+
115
+ protected :wrap_java_object, :wrap_rjb_object
116
+ end # JavaObjectWrapper
117
+ end # Rjb
@@ -0,0 +1,196 @@
1
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'lium')
2
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'segmentation')
3
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'speaker')
4
+
5
+ require 'rubygems'
6
+ require 'to_rdf'
7
+ require 'uri'
8
+ require 'open-uri'
9
+ require 'digest/md5'
10
+
11
+ module Diarize
12
+
13
+ class Audio
14
+
15
+ attr_reader :path, :file, :uri
16
+
17
+ def initialize(url_or_uri)
18
+ @uri = url_or_uri.is_a?(String) ? URI(url_or_uri) : url_or_uri
19
+ if uri.scheme == 'file'
20
+ # Local file
21
+ @path = uri.path
22
+ else
23
+ # Remote file, we get it locally
24
+ @path = '/tmp/' + Digest::MD5.hexdigest(uri.to_s)
25
+ File.open(@path, "wb") {|f| f << open(uri).read }
26
+ end
27
+
28
+ if !File.exist?(@path)
29
+ raise "Unable to locate: #{@path}. Check that the file is available at #{uri.inspect}."
30
+ end
31
+
32
+ @file = File.new @path
33
+ end
34
+
35
+ def analyze!(train_speaker_models = true)
36
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
37
+ parameter = Rjb::import('fr.lium.spkDiarization.parameter.Parameter').new
38
+ parameter.show = show
39
+ # 12 MFCC + Energy
40
+ # 1: static coefficients are present in the file
41
+ # 1: energy coefficient is present in the file
42
+ # 0: delta coefficients are not present in the file
43
+ # 0: delta energy coefficient is not present in the file
44
+ # 0: delta delta coefficients are not present in the file
45
+ # 0: delta delta energy coefficient is not present in the file
46
+ # 13: total size of a feature vector in the mfcc file
47
+ # 0:0:0: no feature normalization
48
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:1:0:0:0:0,13,0:0:0:0')
49
+ #parameter.parameterDiarization.cEClustering = true # We use CE clustering by default
50
+ parameter.parameterInputFeature.setFeatureMask(@path)
51
+ @clusters = ester2(parameter)
52
+ @segments = Segmentation.from_clusters(self, @clusters)
53
+ train_speaker_gmms if train_speaker_models
54
+ end
55
+
56
+ def clean!
57
+ return if @uri.scheme == 'file' # Don't delete local file if initialised from local URI
58
+ File.delete(@path)
59
+ end
60
+
61
+ def segments
62
+ raise Exception.new('You need to run analyze! before being able to access the analysis results') unless @segments
63
+ @segments
64
+ end
65
+
66
+ def speakers
67
+ return @speakers if @speakers
68
+ @speakers = segments.map { |segment| segment.speaker }.uniq
69
+ end
70
+
71
+ def segments_by_speaker(speaker)
72
+ segments.select { |segment| segment.speaker == speaker }
73
+ end
74
+
75
+ def duration_by_speaker(speaker)
76
+ return unless speaker
77
+ segments = segments_by_speaker(speaker)
78
+ duration = 0.0
79
+ segments.each { |segment| duration += segment.duration }
80
+ duration
81
+ end
82
+
83
+ def top_speakers
84
+ speakers.sort {|s1, s2| duration_by_speaker(s1) <=> duration_by_speaker(s2)}.reverse
85
+ end
86
+
87
+ include ToRdf
88
+
89
+ def namespaces
90
+ super.merge 'ws' => 'http://wsarchive.prototype0.net/ontology/', 'mo' => 'http://purl.org/ontology/mo/'
91
+ end
92
+
93
+ def uri
94
+ @uri
95
+ end
96
+
97
+ def uri=(uri)
98
+ @uri = uri
99
+ end
100
+
101
+ def base_uri
102
+ # Remove the fragment if there is one
103
+ base = uri.clone
104
+ base.fragment = nil
105
+ base
106
+ end
107
+
108
+ def type_uri
109
+ @type_uri || 'mo:AudioFile'
110
+ end
111
+
112
+ def type_uri=(type_uri)
113
+ @type_uri = type_uri
114
+ end
115
+
116
+ def rdf_mapping
117
+ { 'ws:segment' => segments }
118
+ end
119
+
120
+ def show
121
+ # The LIUM show name will be the file name, without extension or directory
122
+ File.expand_path(@path).split('/')[-1].split('.')[0]
123
+ end
124
+
125
+ protected
126
+
127
+ def train_speaker_gmms
128
+ segments # Making sure we have pre-computed segments and clusters
129
+ # Would be nice to reuse GMMs computed as part of the segmentation process
130
+ # but not sure how to access them without changing the Java API
131
+
132
+ # Start by copying models from the universal background model, one per speaker, using MTrainInit
133
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
134
+ parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
135
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
136
+ parameter.parameterInputFeature.setFeatureMask(@path)
137
+ parameter.parameterInitializationEM.setModelInitMethod('copy')
138
+ parameter.parameterModelSetInputFile.setMask(File.join(File.expand_path(File.dirname(__FILE__)), 'ubm.gmm'))
139
+ # features = fr.lium.spkDiarization.lib.MainTools.readFeatureSet(parameter, @clusters)
140
+ features = Rjb::import("fr.lium.spkDiarization.lib.MainTools").readFeatureSet(parameter, @clusters.java_object)
141
+ # init_vect = java.util.ArrayList.new(@clusters.cluster_get_size)
142
+ init_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList", @clusters.java_object.cluster_get_size)
143
+ # fr.lium.spkDiarization.programs.MTrainInit.make(features, @clusters, init_vect, parameter)
144
+ Rjb::import("fr.lium.spkDiarization.programs.MTrainInit").make(features, @clusters.java_object, init_vect.java_object, parameter)
145
+
146
+ # Adapt models to individual speakers detected in the audio, using MTrainMap
147
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
148
+ parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
149
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
150
+ parameter.parameterInputFeature.setFeatureMask(@path)
151
+ parameter.parameterEM.setEMControl('1,5,0.01')
152
+ parameter.parameterVarianceControl.setVarianceControl('0.01,10.0')
153
+ parameter.show = show
154
+ features.setCurrentShow(parameter.show)
155
+ # gmm_vect = java.util.ArrayList.new
156
+ gmm_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList")
157
+ # fr.lium.spkDiarization.programs.MTrainMAP.make(features, @clusters, init_vect, gmm_vect, parameter)
158
+ Rjb::import("fr.lium.spkDiarization.programs.MTrainMAP").make(features, @clusters.java_object, init_vect.java_object, gmm_vect.java_object, parameter)
159
+
160
+ # Populating the speakers with their GMMs
161
+ gmm_vect.each_with_index do |speaker_model, i|
162
+ speakers[i].model = speaker_model
163
+ end
164
+ end
165
+
166
+ def ester2(parameter)
167
+ # diarization = fr.lium.spkDiarization.system.Diarization.new
168
+ diarization = Rjb::import('fr.lium.spkDiarization.system.Diarization').new
169
+ parameterDiarization = parameter.parameterDiarization
170
+ # clusterSet = diarization.initialize__method(parameter)
171
+ clusterSet = diarization.initialize(parameter)
172
+ # featureSet = fr.lium.spkDiarization.system.Diarization.load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
173
+ featureSet = Rjb::import('fr.lium.spkDiarization.system.Diarization').load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
174
+ featureSet.setCurrentShow(parameter.show)
175
+ nbFeatures = featureSet.getNumberOfFeatures
176
+ clusterSet.getFirstCluster().firstSegment().setLength(nbFeatures) unless parameter.parameterDiarization.isLoadInputSegmentation
177
+ clustersSegInit = diarization.sanityCheck(clusterSet, featureSet, parameter)
178
+ clustersSeg = diarization.segmentation("GLR", "FULL", clustersSegInit, featureSet, parameter)
179
+ clustersLClust = diarization.clusteringLinear(parameterDiarization.getThreshold("l"), clustersSeg, featureSet, parameter)
180
+ clustersHClust = diarization.clustering(parameterDiarization.getThreshold("h"), clustersLClust, featureSet, parameter)
181
+ clustersDClust = diarization.decode(8, parameterDiarization.getThreshold("d"), clustersHClust, featureSet, parameter)
182
+ clustersSplitClust = diarization.speech("10,10,50", clusterSet, clustersSegInit, clustersDClust, featureSet, parameter)
183
+ clusters = diarization.gender(clusterSet, clustersSplitClust, featureSet, parameter)
184
+ if parameter.parameterDiarization.isCEClustering
185
+ # If true, the program computes the NCLR/CE clustering at the end.
186
+ # The diarization error rate is minimized.
187
+ # If this option is not set, the program stops right after the detection of the gender
188
+ # and the resulting segmentation is sufficient for a transcription system.
189
+ clusters = diarization.speakerClustering(parameterDiarization.getThreshold("c"), "ce", clusterSet, clusters, featureSet, parameter)
190
+ end
191
+ Rjb::JavaObjectWrapper.new(clusters)
192
+ end
193
+
194
+ end
195
+
196
+ end