RubyGems - diarize-ruby - Versions diffs - 0.3.0 - Mend

diarize-ruby 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +7 -0
data/.gitignore +26 -0
data/.ruby-gemset +1 -0
data/.ruby-version +1 -0
data/AUTHORS +12 -0
data/Gemfile +4 -0
data/LICENSE +678 -0
data/README.md +109 -0
data/Rakefile +11 -0
data/diarize-ruby.gemspec +31 -0
data/lib/diarize.rb +117 -0
data/lib/diarize/LIUM_SpkDiarization-4.2.jar +0 -0
data/lib/diarize/audio.rb +196 -0
data/lib/diarize/audio_player.rb +24 -0
data/lib/diarize/lium.rb +5 -0
data/lib/diarize/segment.rb +58 -0
data/lib/diarize/segmentation.rb +37 -0
data/lib/diarize/speaker.rb +174 -0
data/lib/diarize/super_vector.rb +77 -0
data/lib/diarize/ubm.gmm +0 -0
data/lib/diarize/version.rb +3 -0
data/test/audio_test.rb +107 -0
data/test/data/foo.wav +0 -0
data/test/data/speaker1.gmm +0 -0
data/test/data/will-and-juergen.wav +0 -0
data/test/segment_test.rb +29 -0
data/test/segmentation_test.rb +39 -0
data/test/speaker_test.rb +101 -0
data/test/super_vector_test.rb +24 -0
data/test/test_helper.rb +23 -0
metadata +168 -0

data/README.md ADDED Viewed

@@ -0,0 +1,109 @@
+# diarize-ruby
+This library provides an easy-to-use toolkit for speaker
+segmentation (diarization) and identification from audio.
+This library is being used within the BBC R&D World Service
+archive prototype.
+See http://worldservice.prototyping.bbc.co.uk/programmes/X0403940 for
+an example.
+## Speaker diarization
+This library gives acccess to the algorithm developed by the LIUM
+for the ESTER 2 evaluation campaign and described in [Meigner2010].
+It wraps a binary JAR file compiled from
+http://lium3.univ-lemans.fr/diarization/doku.php/welcome.
+## Speaker identification
+This library also implements an algorithm for speaker identification
+based on the comparison of normalised speaker models, which can be
+accessed through the Speaker#match method.
+This algorithm builds on top of the LIUM toolkit and uses the following
+techniques:
+ * "M-Norm" normalisation of speaker models [Ben2003]
+ * The symmetric Kullback-Leibler divergence approximation described in [Do2003]
+ * The detection score specified in [Ben2005]
+It also includes support for speaker supervectors [Campbell2006], which
+can be used in combination with our ruby-lsh library for fast speaker
+identification.
+## Example use
+    $ jruby -S gem install diarize-jruby
+    $ jruby -S irb
+    > require 'diarize'
+    > audio = Diarize::Audio.new URI('http://example.com/file.wav')
+    > audio = Diarize::Audio.new URI.join('file:///', '/Users/juergen/work/ruby/diarize-ruby/test/data/will-and-juergen.wav')
+    > audio.analyze!
+    > audio.segments
+    > audio.speakers
+    > audio.to_rdf
+    > speakers = audio.speakers
+    > speakers.first.gender
+    > speakers.first.model.mean_log_likelihood
+    > speakers.first.model.components.size
+    > audio.segments_by_speaker(speakers.first)[0].play
+    > audio.segments_by_speaker(speakers.first)[1].play
+    > ...
+    > speakers |= other_speakers
+    > Diarize::Speaker.match(speakers)
+## Running tests
+    $ rake
+## References
+[Meigner2010] S. Meignier and T. Merlin, "LIUM SpkDiarization:
+An Open Source Toolkit For Diarization" in Proc. CMU SPUD Workshop,
+March 2010, Dallas (Texas, USA)
+[Ben2003] M. Ben and F. Bimbot, "D-MAP: A Distance-Normalized Map
+Estimation of SPeaker Models for Automatic Speaker Verification",
+Proceedings of ICASSP, 2003
+[Do2003] M. N. Do, "Fast Approximation of Kullback-Leibler Distance
+for Dependence Trees and Hidden Markov Models",
+IEEE Signal Processing Letters, April 2003
+[Ben2005] M. Ben and G. Gravier and F. Bimbot. "A model space
+framework for efficient speaker detection",
+Proceedings of INTERSPEECH, 2005
+[Campbell2006] W. M. Campbell, D. E. Sturim and D. A. Reynolds,
+"Support vector machines using GMM supervectors for speaker verification",
+IEEE Signal Processing Letters, 2006, 13, 308-311
+## Licensing terms and authorship
+See 'LICENSE' and 'AUTHORS' files.
+All code here, except where otherwise indicated, is licensed under
+the GNU Affero General Public License version 3. This license includes
+many restrictions. If this causes a problem, please contact us.
+See "AUTHORS" for contact details.
+This library includes a binary JAR file from the LIUM project, which code
+is licensed under the GNU General Public License version 2. See
+http://lium3.univ-lemans.fr/diarization/doku.php/licence for more
+information.
+## Developer Resources
+* [Connecting Ruby to Java and vice versa](http://nofail.de/2010/04/ruby-in-java-java-in-ruby-jruby-or-ruby-java-bridge/)
+* [LIUM scripts](https://github.com/StevenLOL/LIUM/blob/master/ilp_diarization2.sh)
+* [Speaker Identification for the whole World Service Archive](http://www.bbc.co.uk/rd/blog/2014-01-speaker-identification-for-the-whole-world-service-archive)

data/Rakefile ADDED Viewed

@@ -0,0 +1,11 @@
+require 'rake/testtask'
+task :default => [:test]
+desc "Run tests"
+Rake::TestTask.new do |t|
+  t.libs << "lib"
+  t.libs << "test"
+  t.test_files = FileList['test/*_test.rb']
+  t.verbose = true
+end

data/diarize-ruby.gemspec ADDED Viewed

@@ -0,0 +1,31 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'diarize/version'
+Gem::Specification.new do |spec|
+  spec.name          = "diarize-ruby"
+  spec.version       = Diarize::VERSION
+  spec.date          = "2016-07-09"
+  spec.authors       = ['Yves Raimond', 'Juergen Fesslmeier']
+  spec.summary       = "Speaker Diarization for Ruby"
+  spec.email         = ["jfesslmeier@gmail.com"]
+  spec.homepage      = "https://github.com/chinshr/diarize-ruby"
+  spec.description   = "A library for Ruby wrapping the LIUM Speaker Diarization and including a few extra tools"
+  spec.has_rdoc      = false
+  spec.license       = "GNU Affero General Public License version 3"
+  spec.files         = `git ls-files -z`.split("\x0")
+  spec.bindir        = 'bin'
+  spec.executables   = []
+  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
+  spec.require_paths = ["lib"]
+  spec.add_development_dependency "test-unit", "~> 3.0"
+  spec.add_development_dependency "mocha", "~> 1.1"
+  spec.add_development_dependency "webmock", "~> 2.1"
+  spec.add_dependency "rjb", "~> 1.5"
+  spec.add_dependency "to-rdf", "~> 0"
+  spec.add_dependency "jblas-ruby", "~> 1.1"
+end

data/lib/diarize.rb ADDED Viewed

@@ -0,0 +1,117 @@
+require "rjb"
+RJB_LOAD_PATH = [File.join(File.expand_path(File.dirname(__FILE__)), 'diarize', 'LIUM_SpkDiarization-4.2.jar')].join(File::PATH_SEPARATOR)
+RJB_OPTIONS   = ['-Xms16m', '-Xmx1024m']
+Rjb::load(RJB_LOAD_PATH, RJB_OPTIONS)
+require "matrix"
+require "diarize/version"
+require "diarize/lium"
+require "diarize/audio"
+require "diarize/segment"
+require "diarize/segmentation"
+require "diarize/audio_player"
+require "diarize/super_vector"
+# Extenions to the {Ruby-Java Bridge}[http://rjb.rubyforge.org/] module that
+# adds a generic Java object wrapper class.
+module Rjb
+  # A generic wrapper for a Java object loaded via the Ruby Java Bridge.  The
+  # wrapper class handles intialization and stringification, and passes other
+  # method calls down to the underlying Java object.  Objects returned by the
+  # underlying Java object are converted to the appropriate Ruby object.
+  #
+  # This object is enumerable, yielding items in the order defined by the Java
+  # object's iterator.
+  class JavaObjectWrapper
+    include Enumerable
+    # The underlying Java object.
+    attr_reader :java_object
+    # Initialize with a Java object <em>obj</em>.  If <em>obj</em> is a
+    # String, assume it is a Java class name and instantiate it.  Otherwise,
+    # treat <em>obj</em> as an instance of a Java object.
+    def initialize(obj, *args)
+      @java_object = obj.class == String ?
+      Rjb::import(obj).send(:new, *args) : obj
+    end
+    # Enumerate all the items in the object using its iterator.  If the object
+    # has no iterator, this function yields nothing.
+    def each
+      if @java_object.getClass.getMethods.any? {|m| m.getName == "iterator"}
+        i = @java_object.iterator
+        while i.hasNext
+          yield wrap_java_object(i.next)
+        end
+      end
+    end # each
+    # Reflect unhandled method calls to the underlying Java object.
+    def method_missing(m, *args)
+      wrap_java_object(@java_object.send(m, *args))
+    end
+    # Convert a value returned by a call to the underlying Java object to the
+    # appropriate Ruby object as follows:
+    # * RJB objects are placed inside a generic JavaObjectWrapper wrapper.
+    # * <tt>java.util.ArrayList</tt> objects are converted to Ruby Arrays.
+    # * <tt>java.util.HashSet</tt> objects are converted to Ruby Sets
+    # * Other objects are left unchanged.
+    #
+    # This function is applied recursively to items in collection objects such
+    # as set and arrays.
+    def wrap_java_object(object)
+      if object.kind_of?(Array)
+        object.collect {|item| wrap_java_object(item)}
+      # Ruby-Java Bridge Java objects all have a _classname member which tells
+      # the name of their Java class.
+      elsif object.respond_to?(:_classname)
+        case object._classname
+        when /java\.util\.ArrayList/
+          # Convert java.util.ArrayList objects to Ruby arrays.
+          array_list = []
+          object.size.times do
+            |i| array_list << wrap_java_object(object.get(i))
+          end
+          array_list
+        when /java\.util\.HashSet/
+          # Convert java.util.HashSet objects to Ruby sets.
+          set = Set.new
+          i = object.iterator
+          while i.hasNext
+            set << wrap_java_object(i.next)
+          end
+          set
+        else
+          # Pass other RJB objects off to a handler.
+          wrap_rjb_object(object)
+        end # case
+      else
+        # Return non-RJB objects unchanged.
+        object
+      end # if
+    end # wrap_java_object
+    # By default, all RJB classes other than <tt>java.util.ArrayList</tt> and
+    # <tt>java.util.HashSet</tt> go in a generic wrapper.  Derived classes may
+    # change this behavior.
+    def wrap_rjb_object(object)
+      JavaObjectWrapper.new(object)
+    end
+    # Show the classname of the underlying Java object.
+    def inspect
+      "<#{@java_object._classname}>"
+    end
+    # Use the underlying Java object's stringification.
+    def to_s
+      toString
+    end
+    protected :wrap_java_object, :wrap_rjb_object
+  end # JavaObjectWrapper
+end # Rjb

data/lib/diarize/LIUM_SpkDiarization-4.2.jar ADDED Viewed

Binary file

data/lib/diarize/audio.rb ADDED Viewed

@@ -0,0 +1,196 @@
+require File.join(File.expand_path(File.dirname(__FILE__)), 'lium')
+require File.join(File.expand_path(File.dirname(__FILE__)), 'segmentation')
+require File.join(File.expand_path(File.dirname(__FILE__)), 'speaker')
+require 'rubygems'
+require 'to_rdf'
+require 'uri'
+require 'open-uri'
+require 'digest/md5'
+module Diarize
+  class Audio
+    attr_reader :path, :file, :uri
+    def initialize(url_or_uri)
+      @uri = url_or_uri.is_a?(String) ? URI(url_or_uri) : url_or_uri
+      if uri.scheme == 'file'
+        # Local file
+        @path = uri.path
+      else
+        # Remote file, we get it locally
+        @path = '/tmp/' + Digest::MD5.hexdigest(uri.to_s)
+        File.open(@path, "wb") {|f| f << open(uri).read }
+      end
+      if !File.exist?(@path)
+        raise "Unable to locate: #{@path}.  Check that the file is available at #{uri.inspect}."
+      end
+      @file = File.new @path
+    end
+    def analyze!(train_speaker_models = true)
+      # parameter = fr.lium.spkDiarization.parameter.Parameter.new
+      parameter = Rjb::import('fr.lium.spkDiarization.parameter.Parameter').new
+      parameter.show = show
+      # 12 MFCC + Energy
+      # 1: static coefficients are present in the file
+      # 1: energy coefficient is present in the file
+      # 0: delta coefficients are not present in the file
+      # 0: delta energy coefficient is not present in the file
+      # 0: delta delta coefficients are not present in the file
+      # 0: delta delta energy coefficient is not present in the file
+      # 13: total size of a feature vector in the mfcc file
+      # 0:0:0: no feature normalization
+      parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:1:0:0:0:0,13,0:0:0:0')
+      #parameter.parameterDiarization.cEClustering = true # We use CE clustering by default
+      parameter.parameterInputFeature.setFeatureMask(@path)
+      @clusters = ester2(parameter)
+      @segments = Segmentation.from_clusters(self, @clusters)
+      train_speaker_gmms if train_speaker_models
+    end
+    def clean!
+      return if @uri.scheme == 'file' # Don't delete local file if initialised from local URI
+      File.delete(@path)
+    end
+    def segments
+      raise Exception.new('You need to run analyze! before being able to access the analysis results') unless @segments
+      @segments
+    end
+    def speakers
+      return @speakers if @speakers
+      @speakers = segments.map { |segment| segment.speaker }.uniq
+    end
+    def segments_by_speaker(speaker)
+      segments.select { |segment| segment.speaker == speaker }
+    end
+    def duration_by_speaker(speaker)
+      return unless speaker
+      segments = segments_by_speaker(speaker)
+      duration = 0.0
+      segments.each { |segment| duration += segment.duration }
+      duration
+    end
+    def top_speakers
+      speakers.sort {|s1, s2| duration_by_speaker(s1) <=> duration_by_speaker(s2)}.reverse
+    end
+    include ToRdf
+    def namespaces
+      super.merge 'ws' => 'http://wsarchive.prototype0.net/ontology/', 'mo' => 'http://purl.org/ontology/mo/'
+    end
+    def uri
+      @uri
+    end
+    def uri=(uri)
+      @uri = uri
+    end
+    def base_uri
+      # Remove the fragment if there is one
+      base = uri.clone
+      base.fragment = nil
+      base
+    end
+    def type_uri
+      @type_uri || 'mo:AudioFile'
+    end
+    def type_uri=(type_uri)
+      @type_uri = type_uri
+    end
+    def rdf_mapping
+      { 'ws:segment' => segments }
+    end
+    def show
+      # The LIUM show name will be the file name, without extension or directory
+      File.expand_path(@path).split('/')[-1].split('.')[0]
+    end
+    protected
+    def train_speaker_gmms
+      segments # Making sure we have pre-computed segments and clusters
+      # Would be nice to reuse GMMs computed as part of the segmentation process
+      # but not sure how to access them without changing the Java API
+      # Start by copying models from the universal background model, one per speaker, using MTrainInit
+      # parameter = fr.lium.spkDiarization.parameter.Parameter.new
+      parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
+      parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
+      parameter.parameterInputFeature.setFeatureMask(@path)
+      parameter.parameterInitializationEM.setModelInitMethod('copy')
+      parameter.parameterModelSetInputFile.setMask(File.join(File.expand_path(File.dirname(__FILE__)), 'ubm.gmm'))
+      # features = fr.lium.spkDiarization.lib.MainTools.readFeatureSet(parameter, @clusters)
+      features = Rjb::import("fr.lium.spkDiarization.lib.MainTools").readFeatureSet(parameter, @clusters.java_object)
+      # init_vect = java.util.ArrayList.new(@clusters.cluster_get_size)
+      init_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList", @clusters.java_object.cluster_get_size)
+      # fr.lium.spkDiarization.programs.MTrainInit.make(features, @clusters, init_vect, parameter)
+      Rjb::import("fr.lium.spkDiarization.programs.MTrainInit").make(features, @clusters.java_object, init_vect.java_object, parameter)
+      # Adapt models to individual speakers detected in the audio, using MTrainMap
+      # parameter = fr.lium.spkDiarization.parameter.Parameter.new
+      parameter = Rjb::import("fr.lium.spkDiarization.parameter.Parameter").new
+      parameter.parameterInputFeature.setFeaturesDescription('audio2sphinx,1:3:2:0:0:0,13,1:1:300:4')
+      parameter.parameterInputFeature.setFeatureMask(@path)
+      parameter.parameterEM.setEMControl('1,5,0.01')
+      parameter.parameterVarianceControl.setVarianceControl('0.01,10.0')
+      parameter.show = show
+      features.setCurrentShow(parameter.show)
+      # gmm_vect = java.util.ArrayList.new
+      gmm_vect = Rjb::JavaObjectWrapper.new("java.util.ArrayList")
+      # fr.lium.spkDiarization.programs.MTrainMAP.make(features, @clusters, init_vect, gmm_vect, parameter)
+      Rjb::import("fr.lium.spkDiarization.programs.MTrainMAP").make(features, @clusters.java_object, init_vect.java_object, gmm_vect.java_object, parameter)
+      # Populating the speakers with their GMMs
+      gmm_vect.each_with_index do |speaker_model, i|
+        speakers[i].model = speaker_model
+      end
+    end
+    def ester2(parameter)
+      # diarization = fr.lium.spkDiarization.system.Diarization.new
+      diarization = Rjb::import('fr.lium.spkDiarization.system.Diarization').new
+      parameterDiarization = parameter.parameterDiarization
+      # clusterSet = diarization.initialize__method(parameter)
+      clusterSet = diarization.initialize(parameter)
+      # featureSet = fr.lium.spkDiarization.system.Diarization.load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
+      featureSet = Rjb::import('fr.lium.spkDiarization.system.Diarization').load_feature(parameter, clusterSet, parameter.parameterInputFeature.getFeaturesDescString())
+      featureSet.setCurrentShow(parameter.show)
+      nbFeatures = featureSet.getNumberOfFeatures
+      clusterSet.getFirstCluster().firstSegment().setLength(nbFeatures) unless parameter.parameterDiarization.isLoadInputSegmentation
+      clustersSegInit = diarization.sanityCheck(clusterSet, featureSet, parameter)
+      clustersSeg = diarization.segmentation("GLR", "FULL", clustersSegInit, featureSet, parameter)
+      clustersLClust = diarization.clusteringLinear(parameterDiarization.getThreshold("l"), clustersSeg, featureSet, parameter)
+      clustersHClust = diarization.clustering(parameterDiarization.getThreshold("h"), clustersLClust, featureSet, parameter)
+      clustersDClust = diarization.decode(8, parameterDiarization.getThreshold("d"), clustersHClust, featureSet, parameter)
+      clustersSplitClust = diarization.speech("10,10,50", clusterSet, clustersSegInit, clustersDClust, featureSet, parameter)
+      clusters = diarization.gender(clusterSet, clustersSplitClust, featureSet, parameter)
+      if parameter.parameterDiarization.isCEClustering
+        # If true, the program computes the NCLR/CE clustering at the end.
+        # The diarization error rate is minimized.
+        # If this option is not set, the program stops right after the detection of the gender
+        # and the resulting segmentation is sufficient for a transcription system.
+        clusters = diarization.speakerClustering(parameterDiarization.getThreshold("c"), "ce", clusterSet, clusters, featureSet, parameter)
+      end
+      Rjb::JavaObjectWrapper.new(clusters)
+    end
+  end
+end