diarize-ruby 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,109 @@
1
+ # diarize-ruby
2
+
3
+ This library provides an easy-to-use toolkit for speaker
4
+ segmentation (diarization) and identification from audio.
5
+
6
+ This library is being used within the BBC R&D World Service
7
+ archive prototype.
8
+
9
+ See http://worldservice.prototyping.bbc.co.uk/programmes/X0403940 for
10
+ an example.
11
+
12
+
13
+ ## Speaker diarization
14
+
15
+ This library gives acccess to the algorithm developed by the LIUM
16
+ for the ESTER 2 evaluation campaign and described in [Meigner2010].
17
+
18
+ It wraps a binary JAR file compiled from
19
+ http://lium3.univ-lemans.fr/diarization/doku.php/welcome.
20
+
21
+
22
+ ## Speaker identification
23
+
24
+ This library also implements an algorithm for speaker identification
25
+ based on the comparison of normalised speaker models, which can be
26
+ accessed through the Speaker#match method.
27
+
28
+ This algorithm builds on top of the LIUM toolkit and uses the following
29
+ techniques:
30
+
31
+ * "M-Norm" normalisation of speaker models [Ben2003]
32
+ * The symmetric Kullback-Leibler divergence approximation described in [Do2003]
33
+ * The detection score specified in [Ben2005]
34
+
35
+ It also includes support for speaker supervectors [Campbell2006], which
36
+ can be used in combination with our ruby-lsh library for fast speaker
37
+ identification.
38
+
39
+
40
+ ## Example use
41
+
42
+ $ jruby -S gem install diarize-jruby
43
+ $ jruby -S irb
44
+ > require 'diarize'
45
+ > audio = Diarize::Audio.new URI('http://example.com/file.wav')
46
+ > audio = Diarize::Audio.new URI.join('file:///', '/Users/juergen/work/ruby/diarize-ruby/test/data/will-and-juergen.wav')
47
+ > audio.analyze!
48
+ > audio.segments
49
+ > audio.speakers
50
+ > audio.to_rdf
51
+ > speakers = audio.speakers
52
+ > speakers.first.gender
53
+ > speakers.first.model.mean_log_likelihood
54
+ > speakers.first.model.components.size
55
+ > audio.segments_by_speaker(speakers.first)[0].play
56
+ > audio.segments_by_speaker(speakers.first)[1].play
57
+ > ...
58
+ > speakers |= other_speakers
59
+ > Diarize::Speaker.match(speakers)
60
+
61
+
62
+ ## Running tests
63
+
64
+ $ rake
65
+
66
+
67
+ ## References
68
+
69
+ [Meigner2010] S. Meignier and T. Merlin, "LIUM SpkDiarization:
70
+ An Open Source Toolkit For Diarization" in Proc. CMU SPUD Workshop,
71
+ March 2010, Dallas (Texas, USA)
72
+
73
+ [Ben2003] M. Ben and F. Bimbot, "D-MAP: A Distance-Normalized Map
74
+ Estimation of SPeaker Models for Automatic Speaker Verification",
75
+ Proceedings of ICASSP, 2003
76
+
77
+ [Do2003] M. N. Do, "Fast Approximation of Kullback-Leibler Distance
78
+ for Dependence Trees and Hidden Markov Models",
79
+ IEEE Signal Processing Letters, April 2003
80
+
81
+ [Ben2005] M. Ben and G. Gravier and F. Bimbot. "A model space
82
+ framework for efficient speaker detection",
83
+ Proceedings of INTERSPEECH, 2005
84
+
85
+ [Campbell2006] W. M. Campbell, D. E. Sturim and D. A. Reynolds,
86
+ "Support vector machines using GMM supervectors for speaker verification",
87
+ IEEE Signal Processing Letters, 2006, 13, 308-311
88
+
89
+
90
+ ## Licensing terms and authorship
91
+
92
+ See 'LICENSE' and 'AUTHORS' files.
93
+
94
+ All code here, except where otherwise indicated, is licensed under
95
+ the GNU Affero General Public License version 3. This license includes
96
+ many restrictions. If this causes a problem, please contact us.
97
+ See "AUTHORS" for contact details.
98
+
99
+ This library includes a binary JAR file from the LIUM project, which code
100
+ is licensed under the GNU General Public License version 2. See
101
+ http://lium3.univ-lemans.fr/diarization/doku.php/licence for more
102
+ information.
103
+
104
+
105
+ ## Developer Resources
106
+
107
+ * [Connecting Ruby to Java and vice versa](http://nofail.de/2010/04/ruby-in-java-java-in-ruby-jruby-or-ruby-java-bridge/)
108
+ * [LIUM scripts](https://github.com/StevenLOL/LIUM/blob/master/ilp_diarization2.sh)
109
+ * [Speaker Identification for the whole World Service Archive](http://www.bbc.co.uk/rd/blog/2014-01-speaker-identification-for-the-whole-world-service-archive)
data/Rakefile ADDED
@@ -0,0 +1,11 @@
1
+ require 'rake/testtask'
2
+
3
+ task :default => [:test]
4
+
5
+ desc "Run tests"
6
+ Rake::TestTask.new do |t|
7
+ t.libs << "lib"
8
+ t.libs << "test"
9
+ t.test_files = FileList['test/*_test.rb']
10
+ t.verbose = true
11
+ end
@@ -0,0 +1,31 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'diarize/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "diarize-ruby"
8
+ spec.version = Diarize::VERSION
9
+ spec.date = "2016-07-09"
10
+ spec.authors = ['Yves Raimond', 'Juergen Fesslmeier']
11
+ spec.summary = "Speaker Diarization for Ruby"
12
+ spec.email = ["jfesslmeier@gmail.com"]
13
+ spec.homepage = "https://github.com/chinshr/diarize-ruby"
14
+ spec.description = "A library for Ruby wrapping the LIUM Speaker Diarization and including a few extra tools"
15
+ spec.has_rdoc = false
16
+ spec.license = "GNU Affero General Public License version 3"
17
+
18
+ spec.files = `git ls-files -z`.split("\x0")
19
+ spec.bindir = 'bin'
20
+ spec.executables = []
21
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
22
+ spec.require_paths = ["lib"]
23
+
24
+ spec.add_development_dependency "test-unit", "~> 3.0"
25
+ spec.add_development_dependency "mocha", "~> 1.1"
26
+ spec.add_development_dependency "webmock", "~> 2.1"
27
+
28
+ spec.add_dependency "rjb", "~> 1.5"
29
+ spec.add_dependency "to-rdf", "~> 0"
30
+ spec.add_dependency "jblas-ruby", "~> 1.1"
31
+ end
data/lib/diarize.rb ADDED
@@ -0,0 +1,117 @@
1
+ require "rjb"
2
+
3
+ RJB_LOAD_PATH = [File.join(File.expand_path(File.dirname(__FILE__)), 'diarize', 'LIUM_SpkDiarization-4.2.jar')].join(File::PATH_SEPARATOR)
4
+ RJB_OPTIONS = ['-Xms16m', '-Xmx1024m']
5
+
6
+ Rjb::load(RJB_LOAD_PATH, RJB_OPTIONS)
7
+
8
+ require "matrix"
9
+ require "diarize/version"
10
+ require "diarize/lium"
11
+ require "diarize/audio"
12
+ require "diarize/segment"
13
+ require "diarize/segmentation"
14
+ require "diarize/audio_player"
15
+ require "diarize/super_vector"
16
+
17
+ # Extenions to the {Ruby-Java Bridge}[http://rjb.rubyforge.org/] module that
18
+ # adds a generic Java object wrapper class.
19
+ module Rjb
20
+ # A generic wrapper for a Java object loaded via the Ruby Java Bridge. The
21
+ # wrapper class handles intialization and stringification, and passes other
22
+ # method calls down to the underlying Java object. Objects returned by the
23
+ # underlying Java object are converted to the appropriate Ruby object.
24
+ #
25
+ # This object is enumerable, yielding items in the order defined by the Java
26
+ # object's iterator.
27
+ class JavaObjectWrapper
28
+ include Enumerable
29
+
30
+ # The underlying Java object.
31
+ attr_reader :java_object
32
+
33
+ # Initialize with a Java object <em>obj</em>. If <em>obj</em> is a
34
+ # String, assume it is a Java class name and instantiate it. Otherwise,
35
+ # treat <em>obj</em> as an instance of a Java object.
36
+ def initialize(obj, *args)
37
+ @java_object = obj.class == String ?
38
+ Rjb::import(obj).send(:new, *args) : obj
39
+ end
40
+
41
+ # Enumerate all the items in the object using its iterator. If the object
42
+ # has no iterator, this function yields nothing.
43
+ def each
44
+ if @java_object.getClass.getMethods.any? {|m| m.getName == "iterator"}
45
+ i = @java_object.iterator
46
+ while i.hasNext
47
+ yield wrap_java_object(i.next)
48
+ end
49
+ end
50
+ end # each
51
+
52
+ # Reflect unhandled method calls to the underlying Java object.
53
+ def method_missing(m, *args)
54
+ wrap_java_object(@java_object.send(m, *args))
55
+ end
56
+
57
+ # Convert a value returned by a call to the underlying Java object to the
58
+ # appropriate Ruby object as follows:
59
+ # * RJB objects are placed inside a generic JavaObjectWrapper wrapper.
60
+ # * <tt>java.util.ArrayList</tt> objects are converted to Ruby Arrays.
61
+ # * <tt>java.util.HashSet</tt> objects are converted to Ruby Sets
62
+ # * Other objects are left unchanged.
63
+ #
64
+ # This function is applied recursively to items in collection objects such
65
+ # as set and arrays.
66
+ def wrap_java_object(object)
67
+ if object.kind_of?(Array)
68
+ object.collect {|item| wrap_java_object(item)}
69
+ # Ruby-Java Bridge Java objects all have a _classname member which tells
70
+ # the name of their Java class.
71
+ elsif object.respond_to?(:_classname)
72
+ case object._classname
73
+ when /java\.util\.ArrayList/
74
+ # Convert java.util.ArrayList objects to Ruby arrays.
75
+ array_list = []
76
+ object.size.times do
77
+ |i| array_list << wrap_java_object(object.get(i))
78
+ end
79
+ array_list
80
+ when /java\.util\.HashSet/
81
+ # Convert java.util.HashSet objects to Ruby sets.
82
+ set = Set.new
83
+ i = object.iterator
84
+ while i.hasNext
85
+ set << wrap_java_object(i.next)
86
+ end
87
+ set
88
+ else
89
+ # Pass other RJB objects off to a handler.
90
+ wrap_rjb_object(object)
91
+ end # case
92
+ else
93
+ # Return non-RJB objects unchanged.
94
+ object
95
+ end # if
96
+ end # wrap_java_object
97
+
98
+ # By default, all RJB classes other than <tt>java.util.ArrayList</tt> and
99
+ # <tt>java.util.HashSet</tt> go in a generic wrapper. Derived classes may
100
+ # change this behavior.
101
+ def wrap_rjb_object(object)
102
+ JavaObjectWrapper.new(object)
103
+ end
104
+
105
+ # Show the classname of the underlying Java object.
106
+ def inspect
107
+ "<#{@java_object._classname}>"
108
+ end
109
+
110
+ # Use the underlying Java object's stringification.
111
+ def to_s
112
+ toString
113
+ end
114
+
115
+ protected :wrap_java_object, :wrap_rjb_object
116
+ end # JavaObjectWrapper
117
+ end # Rjb
@@ -0,0 +1,196 @@
1
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'lium')
2
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'segmentation')
3
+ require File.join(File.expand_path(File.dirname(__FILE__)), 'speaker')
4
+
5
+ require 'rubygems'
6
+ require 'to_rdf'
7
+ require 'uri'
8
+ require 'open-uri'
9
+ require 'digest/md5'
10
+
11
+ module Diarize
12
+
13
+ class Audio
14
+
15
+ attr_reader :path, :file, :uri
16
+
17
+ def initialize(url_or_uri)
18
+ @uri = url_or_uri.is_a?(String) ? URI(url_or_uri) : url_or_uri
19
+ if uri.scheme == 'file'
20
+ # Local file
21
+ @path = uri.path
22
+ else
23
+ # Remote file, we get it locally
24
+ @path = '/tmp/' + Digest::MD5.hexdigest(uri.to_s)
25
+ File.open(@path, "wb") {|f| f << open(uri).read }
26
+ end
27
+
28
+ if !File.exist?(@path)
29
+ raise "Unable to locate: #{@path}. Check that the file is available at #{uri.inspect}."
30
+ end
31
+
32
+ @file = File.new @path
33
+ end
34
+
35
+ def analyze!(train_speaker_models = true)
36
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
37
+ parameter = Rjb::import('fr.lium.spkDiarization.parameter.Parameter').new
38
+ parameter.show = show
39
+ # 12 MFCC + Energy
40
+ # 1: static coefficients are present in the file
41
+ # 1: energy coefficient is present in the file
42
+ # 0: delta coefficients are not present in the file
43
+ # 0: delta energy coefficient is not present in the file
44
+ # 0: delta delta coefficients are not present in the file
45
+ # 0: delta delta energy coefficient is not present in the file
46
+ # 13: total size of a feature vector in the mfcc file
47
+ # 0:0:0: no feature normalization
48
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:1:0:0:0:0,13,0:0:0:0')
49
+ #parameter.parameterDiarization.cEClustering = true # We use CE clustering by default
50
+ parameter.parameterInputFeature.setFeatureMask(@path)
51
+ @clusters = ester2(parameter)
52
+ @segments = Segmentation.from_clusters(self, @clusters)
53
+ train_speaker_gmms if train_speaker_models
54
+ end
55
+
56
+ def clean!
57
+ return if @uri.scheme == 'file' # Don't delete local file if initialised from local URI
58
+ File.delete(@path)
59
+ end
60
+
61
+ def segments
62
+ raise Exception.new('You need to run analyze! before being able to access the analysis results') unless @segments
63
+ @segments
64
+ end
65
+
66
+ def speakers
67
+ return @speakers if @speakers
68
+ @speakers = segments.map { |segment| segment.speaker }.uniq
69
+ end
70
+
71
+ def segments_by_speaker(speaker)
72
+ segments.select { |segment| segment.speaker == speaker }
73
+ end
74
+
75
+ def duration_by_speaker(speaker)
76
+ return unless speaker
77
+ segments = segments_by_speaker(speaker)
78
+ duration = 0.0
79
+ segments.each { |segment| duration += segment.duration }
80
+ duration
81
+ end
82
+
83
+ def top_speakers
84
+ speakers.sort {|s1, s2| duration_by_speaker(s1) <=> duration_by_speaker(s2)}.reverse
85
+ end
86
+
87
+ include ToRdf
88
+
89
+ def namespaces
90
+ super.merge 'ws' => 'http://wsarchive.prototype0.net/ontology/', 'mo' => 'http://purl.org/ontology/mo/'
91
+ end
92
+
93
+ def uri
94
+ @uri
95
+ end
96
+
97
+ def uri=(uri)
98
+ @uri = uri
99
+ end
100
+
101
+ def base_uri
102
+ # Remove the fragment if there is one
103
+ base = uri.clone
104
+ base.fragment = nil
105
+ base
106
+ end
107
+
108
+ def type_uri
109
+ @type_uri || 'mo:AudioFile'
110
+ end
111
+
112
+ def type_uri=(type_uri)
113
+ @type_uri = type_uri
114
+ end
115
+
116
+ def rdf_mapping
117
+ { 'ws:segment' => segments }
118
+ end
119
+
120
+ def show
121
+ # The LIUM show name will be the file name, without extension or directory
122
+ File.expand_path(@path).split('/')[-1].split('.')[0]
123
+ end
124
+
125
+ protected
126
+
127
+ def train_speaker_gmms
128
+ segments # Making sure we have pre-computed segments and clusters
129
+ # Would be nice to reuse GMMs computed as part of the segmentation process
130
+ # but not sure how to access them without changing the Java API
131
+
132
+ # Start by copying models from the universal background model, one per speaker, using MTrainInit
133
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
134
+ parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
135
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
136
+ parameter.parameterInputFeature.setFeatureMask(@path)
137
+ parameter.parameterInitializationEM.setModelInitMethod('copy')
138
+ parameter.parameterModelSetInputFile.setMask(File.join(File.expand_path(File.dirname(__FILE__)), 'ubm.gmm'))
139
+ # features = fr.lium.spkDiarization.lib.MainTools.readFeatureSet(parameter, @clusters)
140
+ features = Rjb::import("fr.lium.spkDiarization.lib.MainTools").readFeatureSet(parameter, @clusters.java_object)
141
+ # init_vect = java.util.ArrayList.new(@clusters.cluster_get_size)
142
+ init_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList", @clusters.java_object.cluster_get_size)
143
+ # fr.lium.spkDiarization.programs.MTrainInit.make(features, @clusters, init_vect, parameter)
144
+ Rjb::import("fr.lium.spkDiarization.programs.MTrainInit").make(features, @clusters.java_object, init_vect.java_object, parameter)
145
+
146
+ # Adapt models to individual speakers detected in the audio, using MTrainMap
147
+ # parameter = fr.lium.spkDiarization.parameter.Parameter.new
148
+ parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
149
+ parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
150
+ parameter.parameterInputFeature.setFeatureMask(@path)
151
+ parameter.parameterEM.setEMControl('1,5,0.01')
152
+ parameter.parameterVarianceControl.setVarianceControl('0.01,10.0')
153
+ parameter.show = show
154
+ features.setCurrentShow(parameter.show)
155
+ # gmm_vect = java.util.ArrayList.new
156
+ gmm_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList")
157
+ # fr.lium.spkDiarization.programs.MTrainMAP.make(features, @clusters, init_vect, gmm_vect, parameter)
158
+ Rjb::import("fr.lium.spkDiarization.programs.MTrainMAP").make(features, @clusters.java_object, init_vect.java_object, gmm_vect.java_object, parameter)
159
+
160
+ # Populating the speakers with their GMMs
161
+ gmm_vect.each_with_index do |speaker_model, i|
162
+ speakers[i].model = speaker_model
163
+ end
164
+ end
165
+
166
+ def ester2(parameter)
167
+ # diarization = fr.lium.spkDiarization.system.Diarization.new
168
+ diarization = Rjb::import('fr.lium.spkDiarization.system.Diarization').new
169
+ parameterDiarization = parameter.parameterDiarization
170
+ # clusterSet = diarization.initialize__method(parameter)
171
+ clusterSet = diarization.initialize(parameter)
172
+ # featureSet = fr.lium.spkDiarization.system.Diarization.load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
173
+ featureSet = Rjb::import('fr.lium.spkDiarization.system.Diarization').load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
174
+ featureSet.setCurrentShow(parameter.show)
175
+ nbFeatures = featureSet.getNumberOfFeatures
176
+ clusterSet.getFirstCluster().firstSegment().setLength(nbFeatures) unless parameter.parameterDiarization.isLoadInputSegmentation
177
+ clustersSegInit = diarization.sanityCheck(clusterSet, featureSet, parameter)
178
+ clustersSeg = diarization.segmentation("GLR", "FULL", clustersSegInit, featureSet, parameter)
179
+ clustersLClust = diarization.clusteringLinear(parameterDiarization.getThreshold("l"), clustersSeg, featureSet, parameter)
180
+ clustersHClust = diarization.clustering(parameterDiarization.getThreshold("h"), clustersLClust, featureSet, parameter)
181
+ clustersDClust = diarization.decode(8, parameterDiarization.getThreshold("d"), clustersHClust, featureSet, parameter)
182
+ clustersSplitClust = diarization.speech("10,10,50", clusterSet, clustersSegInit, clustersDClust, featureSet, parameter)
183
+ clusters = diarization.gender(clusterSet, clustersSplitClust, featureSet, parameter)
184
+ if parameter.parameterDiarization.isCEClustering
185
+ # If true, the program computes the NCLR/CE clustering at the end.
186
+ # The diarization error rate is minimized.
187
+ # If this option is not set, the program stops right after the detection of the gender
188
+ # and the resulting segmentation is sufficient for a transcription system.
189
+ clusters = diarization.speakerClustering(parameterDiarization.getThreshold("c"), "ce", clusterSet, clusters, featureSet, parameter)
190
+ end
191
+ Rjb::JavaObjectWrapper.new(clusters)
192
+ end
193
+
194
+ end
195
+
196
+ end