pocketsphinx-ruby 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +50 -1
- data/examples/decode_audio_file.rb +11 -0
- data/examples/record_audio_file.rb +1 -1
- data/lib/pocketsphinx.rb +2 -0
- data/lib/pocketsphinx/api/pocketsphinx.rb +10 -6
- data/lib/pocketsphinx/audio_file.rb +32 -0
- data/lib/pocketsphinx/audio_file_speech_recognizer.rb +12 -0
- data/lib/pocketsphinx/configuration.rb +34 -9
- data/lib/pocketsphinx/configuration/setting_definition.rb +13 -7
- data/lib/pocketsphinx/decoder.rb +36 -0
- data/lib/pocketsphinx/live_speech_recognizer.rb +2 -40
- data/lib/pocketsphinx/microphone.rb +20 -4
- data/lib/pocketsphinx/speech_recognizer.rb +77 -3
- data/lib/pocketsphinx/version.rb +1 -1
- data/spec/assets/audio/goforward.raw +0 -0
- data/spec/configuration_spec.rb +33 -0
- data/spec/decoder_spec.rb +16 -0
- data/spec/speech_recognizer_spec.rb +23 -0
- metadata +9 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 440699d34e0585b3670bd4bfa91e6e9a87b2331f
|
4
|
+
data.tar.gz: 8a697fa2d7e491e4eccfb47fe678a6acfcf695a7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7d913ab82f397056b9b90bb5f7d4fb6609a618a367a361c3936afd4e850caf908614bb16a174cae26160241cc8424eb7abae4404595b1d27e6a90bc0e431f2cf
|
7
|
+
data.tar.gz: ab6b8b36f3b9ef07f0086cca1e28b49b146116b15a3acf7e06f0fc80d50fc133a2653e05f437344c2f8532284c4e8fa8205151bca3de379f4a4f354048accdf5
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6
6
|
|
7
7
|
This gem provides Ruby [FFI](https://github.com/ffi/ffi) bindings for [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx), a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the [CMU Sphinx](http://cmusphinx.sourceforge.net/) Open Source Toolkit For Speech Recognition.
|
8
8
|
|
9
|
-
|
9
|
+
Pocketsphinx's [SWIG](http://www.swig.org/) interface was initially considered for this gem, but dropped in favor of FFI for many of the reasons outlined [here](https://github.com/ffi/ffi/wiki/Why-use-FFI); most importantly ease of maintenance and JRuby support.
|
10
10
|
|
11
11
|
The goal of this project is to make it as easy as possible for the Ruby community to experiment with speech recognition. Please do contribute fixes and enhancements.
|
12
12
|
|
@@ -62,6 +62,41 @@ Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
|
|
62
62
|
end
|
63
63
|
```
|
64
64
|
|
65
|
+
The `AudioFileSpeechRecognizer` decodes directly from an audio file by coordinating interactions between an `AudioFile` and `Decoder`.
|
66
|
+
|
67
|
+
```ruby
|
68
|
+
recognizer = Pocketsphinx::AudioFileSpeechRecognizer.new
|
69
|
+
|
70
|
+
recognizer.recognize('spec/assets/audio/goforward.raw') do |speech|
|
71
|
+
puts speech # => "go forward ten years"
|
72
|
+
end
|
73
|
+
```
|
74
|
+
|
75
|
+
These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` configuration settings.
|
76
|
+
|
77
|
+
|
78
|
+
## Configuration
|
79
|
+
|
80
|
+
All of Pocketsphinx's decoding settings are managed by the `Configuration` class, which can be passed into the high-level speech recognizers:
|
81
|
+
|
82
|
+
```ruby
|
83
|
+
configuration = Pocketsphinx::Configuration.default
|
84
|
+
configuration.details('vad_threshold')
|
85
|
+
# => {
|
86
|
+
# :name => "vad_threshold",
|
87
|
+
# :type => :float,
|
88
|
+
# :default => 2.0,
|
89
|
+
# :value => 2.0,
|
90
|
+
# :info => "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
|
91
|
+
# }
|
92
|
+
|
93
|
+
configuration['vad_threshold'] = 4
|
94
|
+
|
95
|
+
Pocketsphinx::LiveSpeechRecognizer.new(configuration)
|
96
|
+
```
|
97
|
+
|
98
|
+
You can find the output of `configuration.details` [here](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Default-Pocketsphinx-Configuration) for more information on the various different settings.
|
99
|
+
|
65
100
|
|
66
101
|
## Microphone
|
67
102
|
|
@@ -86,6 +121,20 @@ File.open("test.raw", "wb") do |file|
|
|
86
121
|
end
|
87
122
|
```
|
88
123
|
|
124
|
+
To open this audio file take a look at [this wiki page](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Importing-raw-PCM-audio-with-Audacity).
|
125
|
+
|
126
|
+
|
127
|
+
## Decoder
|
128
|
+
|
129
|
+
The `Decoder` class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:
|
130
|
+
|
131
|
+
```ruby
|
132
|
+
decoder = Decoder.new(Configuration.default)
|
133
|
+
decoder.decode 'spec/assets/audio/goforward.raw'
|
134
|
+
|
135
|
+
puts decoder.hypothesis # => "go forward ten years"
|
136
|
+
```
|
137
|
+
|
89
138
|
|
90
139
|
## Contributing
|
91
140
|
|
@@ -0,0 +1,11 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "pocketsphinx-ruby"
|
5
|
+
|
6
|
+
include Pocketsphinx
|
7
|
+
|
8
|
+
decoder = Decoder.new(Configuration.default)
|
9
|
+
decoder.decode 'spec/assets/audio/goforward.raw'
|
10
|
+
|
11
|
+
puts decoder.hypothesis # => "go forward ten years"
|
@@ -16,7 +16,7 @@ microphone = Microphone.new
|
|
16
16
|
File.open("test_write.raw", "wb") do |file|
|
17
17
|
microphone.record do
|
18
18
|
FFI::MemoryPointer.new(:int16, MAX_SAMPLES) do |buffer|
|
19
|
-
(RECORDING_LENGTH / RECORDING_INTERVAL).times do
|
19
|
+
(RECORDING_LENGTH / RECORDING_INTERVAL).to_i.times do
|
20
20
|
sample_count = microphone.read_audio(buffer, MAX_SAMPLES)
|
21
21
|
|
22
22
|
# sample_count * 2 since this is length in bytes
|
data/lib/pocketsphinx.rb
CHANGED
@@ -6,10 +6,12 @@ require "pocketsphinx/api/sphinxad"
|
|
6
6
|
require "pocketsphinx/api/pocketsphinx"
|
7
7
|
|
8
8
|
require "pocketsphinx/configuration"
|
9
|
+
require "pocketsphinx/audio_file"
|
9
10
|
require "pocketsphinx/microphone"
|
10
11
|
require "pocketsphinx/decoder"
|
11
12
|
require "pocketsphinx/speech_recognizer"
|
12
13
|
require "pocketsphinx/live_speech_recognizer"
|
14
|
+
require "pocketsphinx/audio_file_speech_recognizer"
|
13
15
|
|
14
16
|
module Pocketsphinx
|
15
17
|
|
@@ -4,14 +4,18 @@ module Pocketsphinx
|
|
4
4
|
extend FFI::Library
|
5
5
|
ffi_lib "libpocketsphinx"
|
6
6
|
|
7
|
-
|
7
|
+
typedef :pointer, :decoder
|
8
|
+
typedef :pointer, :configuration
|
9
|
+
|
10
|
+
attach_function :ps_init, [:configuration], :decoder
|
8
11
|
attach_function :ps_default_search_args, [:pointer], :void
|
9
12
|
attach_function :ps_args, [], :pointer
|
10
|
-
attach_function :
|
11
|
-
attach_function :
|
12
|
-
attach_function :
|
13
|
-
attach_function :
|
14
|
-
attach_function :
|
13
|
+
attach_function :ps_decode_raw, [:decoder, :pointer, :string, :long], :int
|
14
|
+
attach_function :ps_process_raw, [:decoder, :pointer, :size_t, :int, :int], :int
|
15
|
+
attach_function :ps_start_utt, [:decoder, :string], :int
|
16
|
+
attach_function :ps_end_utt, [:decoder], :int
|
17
|
+
attach_function :ps_get_in_speech, [:decoder], :uint8
|
18
|
+
attach_function :ps_get_hyp, [:decoder, :pointer, :pointer], :string
|
15
19
|
end
|
16
20
|
end
|
17
21
|
end
|
@@ -0,0 +1,32 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
# Implements Recordable interface (#record and #read_audio)
|
3
|
+
class AudioFile < Struct.new(:file_path)
|
4
|
+
def record
|
5
|
+
File.open(file_path, 'rb') do |file|
|
6
|
+
self.file = file
|
7
|
+
yield
|
8
|
+
self.file = nil
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
# Read next block of audio samples from file; up to max samples into buffer.
|
13
|
+
#
|
14
|
+
# @param [FFI::Pointer] buffer 16bit buffer of at least max_samples in size
|
15
|
+
# @params [Fixnum] max_samples The maximum number of samples to read from the audio file
|
16
|
+
# @return [Fixnum] Samples actually read; nil if EOF
|
17
|
+
def read_audio(buffer, max_samples = 4096)
|
18
|
+
if file.nil?
|
19
|
+
raise "Can't read audio: use AudioFile#record to open the file first"
|
20
|
+
end
|
21
|
+
|
22
|
+
if data = file.read(max_samples * 2)
|
23
|
+
buffer.write_string(data)
|
24
|
+
data.length / 2
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
private
|
29
|
+
|
30
|
+
attr_accessor :file
|
31
|
+
end
|
32
|
+
end
|
@@ -0,0 +1,12 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
# High-level class for live speech recognition from a raw audio file.
|
3
|
+
class AudioFileSpeechRecognizer < SpeechRecognizer
|
4
|
+
def recognize(file_path, max_samples = 4096)
|
5
|
+
self.recordable = AudioFile.new(file_path)
|
6
|
+
|
7
|
+
super(max_samples) do |speech|
|
8
|
+
yield speech if block_given?
|
9
|
+
end
|
10
|
+
end
|
11
|
+
end
|
12
|
+
end
|
@@ -3,6 +3,7 @@ require 'pocketsphinx/configuration/setting_definition'
|
|
3
3
|
module Pocketsphinx
|
4
4
|
class Configuration
|
5
5
|
attr_reader :ps_config
|
6
|
+
attr_reader :setting_definitions
|
6
7
|
|
7
8
|
private_class_method :new
|
8
9
|
|
@@ -22,12 +23,33 @@ module Pocketsphinx
|
|
22
23
|
new(API::Pocketsphinx.ps_args)
|
23
24
|
end
|
24
25
|
|
25
|
-
def
|
26
|
-
|
27
|
-
|
26
|
+
def setting_names
|
27
|
+
setting_definitions.keys.sort
|
28
|
+
end
|
29
|
+
|
30
|
+
# Get details for one or all configuration settings
|
31
|
+
#
|
32
|
+
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
33
|
+
def details(name = nil)
|
34
|
+
details = [name || setting_names].flatten.map do |name|
|
35
|
+
definition = find_definition(name)
|
36
|
+
|
37
|
+
{
|
38
|
+
name: name,
|
39
|
+
type: definition.type,
|
40
|
+
default: definition.default,
|
41
|
+
required: definition.required?,
|
42
|
+
value: self[name],
|
43
|
+
info: definition.doc
|
44
|
+
}
|
28
45
|
end
|
29
46
|
|
30
|
-
|
47
|
+
name ? details.first : details
|
48
|
+
end
|
49
|
+
|
50
|
+
# Get a configuration setting
|
51
|
+
def [](name)
|
52
|
+
case find_definition(name).type
|
31
53
|
when :integer
|
32
54
|
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}")
|
33
55
|
when :float
|
@@ -41,12 +63,9 @@ module Pocketsphinx
|
|
41
63
|
end
|
42
64
|
end
|
43
65
|
|
66
|
+
# Set a configuration setting with type checking
|
44
67
|
def []=(name, value)
|
45
|
-
|
46
|
-
raise "Configuration setting '#{name}' does not exist"
|
47
|
-
end
|
48
|
-
|
49
|
-
case definition.type
|
68
|
+
case find_definition(name).type
|
50
69
|
when :integer
|
51
70
|
raise "Configuration setting '#{name}' must be a Fixnum" unless value.respond_to?(:to_i)
|
52
71
|
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value.to_i)
|
@@ -61,5 +80,11 @@ module Pocketsphinx
|
|
61
80
|
raise NotImplementedException
|
62
81
|
end
|
63
82
|
end
|
83
|
+
|
84
|
+
private
|
85
|
+
|
86
|
+
def find_definition(name)
|
87
|
+
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
88
|
+
end
|
64
89
|
end
|
65
90
|
end
|
@@ -1,19 +1,25 @@
|
|
1
1
|
module Pocketsphinx
|
2
2
|
class Configuration
|
3
|
-
class SettingDefinition
|
3
|
+
class SettingDefinition < Struct.new(:name, :type_code, :deflt, :doc)
|
4
4
|
TYPES = [:integer, :float, :string, :boolean, :string_list]
|
5
5
|
|
6
|
-
def initialize(name, type_code, default, doc)
|
7
|
-
@name, @type_code, @default, @doc = name, type_code, default, doc
|
8
|
-
end
|
9
|
-
|
10
6
|
def type
|
11
7
|
# Remove the required bit if it exists and find type from log2 of code
|
12
|
-
TYPES[Math.log2(
|
8
|
+
TYPES[Math.log2(type_code - type_code%2) - 1]
|
9
|
+
end
|
10
|
+
|
11
|
+
# Convert string defaults from pocketsphinx to Ruby types
|
12
|
+
def default
|
13
|
+
case type
|
14
|
+
when :integer then deflt.to_i
|
15
|
+
when :float then deflt.to_f
|
16
|
+
when :boolean then deflt == 'yes'
|
17
|
+
else deflt
|
18
|
+
end
|
13
19
|
end
|
14
20
|
|
15
21
|
def required?
|
16
|
-
|
22
|
+
type_code % 2 == 1
|
17
23
|
end
|
18
24
|
|
19
25
|
# Build setting definitions from pocketsphinx argument definitions
|
data/lib/pocketsphinx/decoder.rb
CHANGED
@@ -10,6 +10,42 @@ module Pocketsphinx
|
|
10
10
|
@ps_decoder = ps_api.ps_init(configuration.ps_config)
|
11
11
|
end
|
12
12
|
|
13
|
+
# Decode a raw audio stream as a single utterance, opening a file if path given
|
14
|
+
#
|
15
|
+
# See #decode_raw
|
16
|
+
#
|
17
|
+
# @param [IO] audio_path_or_file The raw audio stream or file path to decode as a single utterance
|
18
|
+
# @param [Fixnum] max_samples The maximum samples to process from the stream on each iteration
|
19
|
+
def decode(audio_path_or_file, max_samples = 2048)
|
20
|
+
case audio_path_or_file
|
21
|
+
when String
|
22
|
+
File.open(audio_path_or_file, 'rb') { |f| decode_raw(f, max_samples) }
|
23
|
+
else
|
24
|
+
decode_raw(audio_path_or_file, max_samples)
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
# Decode a raw audio stream as a single utterance.
|
29
|
+
#
|
30
|
+
# No headers are recognized in this files. The configuration parameters samprate
|
31
|
+
# and input_endian are used to determine the sampling rate and endianness of the stream,
|
32
|
+
# respectively. Audio is always assumed to be 16-bit signed PCM.
|
33
|
+
#
|
34
|
+
# @param [IO] audio_file The raw audio stream to decode as a single utterance
|
35
|
+
# @param [Fixnum] max_samples The maximum samples to process from the stream on each iteration
|
36
|
+
def decode_raw(audio_file, max_samples = 2048)
|
37
|
+
start_utterance
|
38
|
+
|
39
|
+
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
40
|
+
while data = audio_file.read(max_samples * 2)
|
41
|
+
buffer.write_string(data)
|
42
|
+
process_raw(buffer, data.length / 2)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
end_utterance
|
47
|
+
end
|
48
|
+
|
13
49
|
# Decode raw audio data.
|
14
50
|
#
|
15
51
|
# @param [Boolean] no_search If non-zero, perform feature extraction but don't do any
|
@@ -3,46 +3,8 @@ module Pocketsphinx
|
|
3
3
|
#
|
4
4
|
# Modeled on the LiveSpeechRecognizer from Sphinx4.
|
5
5
|
class LiveSpeechRecognizer < SpeechRecognizer
|
6
|
-
|
7
|
-
|
8
|
-
def microphone
|
9
|
-
@microphone ||= Microphone.new
|
10
|
-
end
|
11
|
-
|
12
|
-
# Recognize utterances and yield hypotheses in infinite loop
|
13
|
-
#
|
14
|
-
# @param [Float]
|
15
|
-
def recognize(recording_interval = 0.1, max_samples = 4096)
|
16
|
-
decoder.start_utterance
|
17
|
-
|
18
|
-
microphone.record do
|
19
|
-
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
20
|
-
loop do
|
21
|
-
if decoder.in_speech?
|
22
|
-
process_audio(buffer, max_samples, recording_interval) while decoder.in_speech?
|
23
|
-
yield get_hypothesis
|
24
|
-
else
|
25
|
-
process_audio(buffer, max_samples, recording_interval)
|
26
|
-
end
|
27
|
-
end
|
28
|
-
end
|
29
|
-
end
|
30
|
-
end
|
31
|
-
|
32
|
-
private
|
33
|
-
|
34
|
-
def process_audio(buffer, max_samples, delay)
|
35
|
-
sample_count = microphone.read_audio(buffer, max_samples)
|
36
|
-
decoder.process_raw(buffer, sample_count)
|
37
|
-
sleep delay
|
38
|
-
end
|
39
|
-
|
40
|
-
# Called on speech -> silence transition
|
41
|
-
def get_hypothesis
|
42
|
-
decoder.end_utterance
|
43
|
-
decoder.hypothesis.tap do
|
44
|
-
decoder.start_utterance
|
45
|
-
end
|
6
|
+
def recordable
|
7
|
+
@recordable ||= Microphone.new
|
46
8
|
end
|
47
9
|
end
|
48
10
|
end
|
@@ -1,10 +1,13 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
-
# Provides non-blocking audio recording using libsphinxad
|
2
|
+
# Provides non-blocking live audio recording using libsphinxad
|
3
|
+
#
|
4
|
+
# Implements Recordable interface (#record and #read_audio)
|
3
5
|
class Microphone
|
4
6
|
Error = Class.new(StandardError)
|
5
7
|
|
6
8
|
attr_reader :ps_audio_device
|
7
9
|
attr_writer :ps_api
|
10
|
+
attr_reader :sample_rate
|
8
11
|
|
9
12
|
# Opens an audio device for recording
|
10
13
|
#
|
@@ -14,8 +17,9 @@ module Pocketsphinx
|
|
14
17
|
# @param [String] default_device The device name
|
15
18
|
# @param [Object] ps_api A SphinxAD API implementation to use, API::SphinxAD if not provided
|
16
19
|
def initialize(sample_rate = 16000, default_device = nil, ps_api = nil)
|
20
|
+
@sample_rate = sample_rate
|
17
21
|
@ps_api = ps_api
|
18
|
-
@ps_audio_device = ps_api.ad_open_dev(default_device, sample_rate)
|
22
|
+
@ps_audio_device = self.ps_api.ad_open_dev(default_device, sample_rate)
|
19
23
|
|
20
24
|
# Ensure that audio device is closed when object is garbage collected
|
21
25
|
ObjectSpace.define_finalizer(self, self.class.finalize(ps_api, @ps_audio_device))
|
@@ -46,10 +50,22 @@ module Pocketsphinx
|
|
46
50
|
# Read next block of audio samples while recording; read upto max samples into buf.
|
47
51
|
#
|
48
52
|
# @param [FFI::Pointer] buffer 16bit buffer of at least max_samples in size
|
49
|
-
# @
|
53
|
+
# @params [Fixnum] max_samples The maximum number of samples to read from the audio device
|
54
|
+
# @return [Fixnum] Samples actually read (could be 0 since non-blocking); nil if not
|
50
55
|
# recording and no more samples remaining to be read from most recent recording.
|
51
56
|
def read_audio(buffer, max_samples = 4096)
|
52
|
-
ps_api.ad_read(@ps_audio_device, buffer, max_samples)
|
57
|
+
samples = ps_api.ad_read(@ps_audio_device, buffer, max_samples)
|
58
|
+
samples if samples >= 0
|
59
|
+
end
|
60
|
+
|
61
|
+
# A Recordable may specify an audio reading delay
|
62
|
+
#
|
63
|
+
# In the case of the Microphone, because we are doing non-blocking reads,
|
64
|
+
# we specify a delay which should fill half of the max buffer size
|
65
|
+
#
|
66
|
+
# @param [Fixnum] max_samples The maximum samples we tried to read from the audio device
|
67
|
+
def read_audio_delay(max_samples = 4096)
|
68
|
+
max_samples / (2 * sample_rate)
|
53
69
|
end
|
54
70
|
|
55
71
|
def close_device
|
@@ -1,9 +1,83 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
+
# Reads audio data from a recordable interface and decodes it into utterances
|
3
|
+
#
|
4
|
+
# Essentially orchestrates interaction between Recordable and Decoder, and detects new utterances.
|
2
5
|
class SpeechRecognizer
|
3
|
-
|
6
|
+
# Recordable interface must implement #record and #read_audio
|
7
|
+
attr_writer :recordable
|
8
|
+
attr_writer :decoder
|
4
9
|
|
5
|
-
def initialize(configuration= nil)
|
6
|
-
@
|
10
|
+
def initialize(configuration = nil)
|
11
|
+
@configuration = configuration
|
12
|
+
end
|
13
|
+
|
14
|
+
def recordable
|
15
|
+
@recordable or raise "A SpeechRecognizer must have a recordable interface"
|
16
|
+
end
|
17
|
+
|
18
|
+
def decoder
|
19
|
+
@decoder ||= Decoder.new(configuration)
|
20
|
+
end
|
21
|
+
|
22
|
+
def configuration
|
23
|
+
@configuration ||= Configuration.default
|
24
|
+
end
|
25
|
+
|
26
|
+
# Recognize utterances and yield hypotheses in infinite loop
|
27
|
+
#
|
28
|
+
# Splits speech into utterances by detecting silence between them.
|
29
|
+
# By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be
|
30
|
+
# configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` settings.
|
31
|
+
#
|
32
|
+
# @param [Fixnum] max_samples Number of samples to process at a time
|
33
|
+
def recognize(max_samples = 4096)
|
34
|
+
decoder.start_utterance
|
35
|
+
|
36
|
+
recordable.record do
|
37
|
+
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
38
|
+
loop do
|
39
|
+
if in_speech?
|
40
|
+
while decoder.in_speech?
|
41
|
+
process_audio(buffer, max_samples) or break
|
42
|
+
end
|
43
|
+
|
44
|
+
yield get_hypothesis
|
45
|
+
else
|
46
|
+
process_audio(buffer, max_samples) or break
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def in_speech?
|
54
|
+
# Use Pocketsphinx's implementation by default
|
55
|
+
decoder.in_speech?
|
56
|
+
end
|
57
|
+
|
58
|
+
private
|
59
|
+
|
60
|
+
def process_audio(buffer, max_samples)
|
61
|
+
sample_count = recordable.read_audio(buffer, max_samples)
|
62
|
+
|
63
|
+
if sample_count
|
64
|
+
decoder.process_raw(buffer, sample_count)
|
65
|
+
|
66
|
+
# Check for a delay for example in case of non-blocking live audio
|
67
|
+
if recordable.respond_to?(:read_audio_delay)
|
68
|
+
sleep recordable.read_audio_delay(max_samples)
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
sample_count
|
73
|
+
end
|
74
|
+
|
75
|
+
# Called on speech -> silence transition
|
76
|
+
def get_hypothesis
|
77
|
+
decoder.end_utterance
|
78
|
+
decoder.hypothesis.tap do
|
79
|
+
decoder.start_utterance
|
80
|
+
end
|
7
81
|
end
|
8
82
|
end
|
9
83
|
end
|
data/lib/pocketsphinx/version.rb
CHANGED
Binary file
|
data/spec/configuration_spec.rb
CHANGED
@@ -44,4 +44,37 @@ describe Configuration do
|
|
44
44
|
it 'raises exceptions when a setting is unknown' do
|
45
45
|
expect { subject['unknown'] = true }.to raise_exception "Configuration setting 'unknown' does not exist"
|
46
46
|
end
|
47
|
+
|
48
|
+
describe '#setting_names' do
|
49
|
+
it 'contains the names of all possible system settings' do
|
50
|
+
expect(subject.setting_names.count).to eq(117)
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
describe '#details' do
|
55
|
+
it 'gives details for a single setting' do
|
56
|
+
expect(subject.details 'vad_threshold').to eq({
|
57
|
+
name: "vad_threshold",
|
58
|
+
type: :float,
|
59
|
+
default: 2.0,
|
60
|
+
required: false,
|
61
|
+
value: 2.0,
|
62
|
+
info: "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
|
63
|
+
})
|
64
|
+
end
|
65
|
+
|
66
|
+
it 'gives details for all settings when no name is specified' do
|
67
|
+
details = subject.details
|
68
|
+
|
69
|
+
expect(details.count).to eq(117)
|
70
|
+
expect(details.first).to eq({
|
71
|
+
name: "agc",
|
72
|
+
type: :string,
|
73
|
+
default: "none",
|
74
|
+
required: false,
|
75
|
+
value: "none",
|
76
|
+
info: "Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')"
|
77
|
+
})
|
78
|
+
end
|
79
|
+
end
|
47
80
|
end
|
data/spec/decoder_spec.rb
CHANGED
@@ -9,6 +9,22 @@ describe Decoder do
|
|
9
9
|
@decoder = Decoder.new(Configuration.default)
|
10
10
|
end
|
11
11
|
|
12
|
+
# Full integration test
|
13
|
+
describe '#decode' do
|
14
|
+
it 'correctly decodes the speech in goforward.raw' do
|
15
|
+
subject.decode File.open('spec/assets/audio/goforward.raw', 'rb')
|
16
|
+
|
17
|
+
# With the default configuration (no specific grammar), pocketsphinx doesn't actually
|
18
|
+
# get this quite right, but nonetheless this is the expected output
|
19
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
20
|
+
end
|
21
|
+
|
22
|
+
it 'accepts a file path as well as a stream' do
|
23
|
+
subject.decode 'spec/assets/audio/goforward.raw'
|
24
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
12
28
|
describe '#process_raw' do
|
13
29
|
it 'calls libpocketsphinx' do
|
14
30
|
FFI::MemoryPointer.new(:int16, 4096) do |buffer|
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe SpeechRecognizer do
|
4
|
+
let(:recordable) { AudioFile.new('spec/assets/audio/goforward.raw') }
|
5
|
+
|
6
|
+
subject do
|
7
|
+
SpeechRecognizer.new.tap do |speech_recognizer|
|
8
|
+
speech_recognizer.recordable = recordable
|
9
|
+
speech_recognizer.decoder = @decoder
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
# Share decoder across all examples for speed
|
14
|
+
before :all do
|
15
|
+
@decoder = Decoder.new(Configuration.default)
|
16
|
+
end
|
17
|
+
|
18
|
+
describe '#recognize' do
|
19
|
+
it 'should decode speech in raw audio' do
|
20
|
+
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pocketsphinx-ruby
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Howard Wilson
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-10-
|
11
|
+
date: 2014-10-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -94,6 +94,7 @@ files:
|
|
94
94
|
- LICENSE.txt
|
95
95
|
- README.md
|
96
96
|
- Rakefile
|
97
|
+
- examples/decode_audio_file.rb
|
97
98
|
- examples/pocketsphinx_continuous.rb
|
98
99
|
- examples/record_audio_file.rb
|
99
100
|
- lib/pocketsphinx-ruby.rb
|
@@ -101,6 +102,8 @@ files:
|
|
101
102
|
- lib/pocketsphinx/api/pocketsphinx.rb
|
102
103
|
- lib/pocketsphinx/api/sphinxad.rb
|
103
104
|
- lib/pocketsphinx/api/sphinxbase.rb
|
105
|
+
- lib/pocketsphinx/audio_file.rb
|
106
|
+
- lib/pocketsphinx/audio_file_speech_recognizer.rb
|
104
107
|
- lib/pocketsphinx/configuration.rb
|
105
108
|
- lib/pocketsphinx/configuration/setting_definition.rb
|
106
109
|
- lib/pocketsphinx/decoder.rb
|
@@ -109,10 +112,12 @@ files:
|
|
109
112
|
- lib/pocketsphinx/speech_recognizer.rb
|
110
113
|
- lib/pocketsphinx/version.rb
|
111
114
|
- pocketsphinx-ruby.gemspec
|
115
|
+
- spec/assets/audio/goforward.raw
|
112
116
|
- spec/configuration_spec.rb
|
113
117
|
- spec/decoder_spec.rb
|
114
118
|
- spec/microphone_spec.rb
|
115
119
|
- spec/spec_helper.rb
|
120
|
+
- spec/speech_recognizer_spec.rb
|
116
121
|
homepage: https://github.com/watsonbox/pocketsphinx-ruby
|
117
122
|
licenses:
|
118
123
|
- MIT
|
@@ -138,7 +143,9 @@ signing_key:
|
|
138
143
|
specification_version: 4
|
139
144
|
summary: Ruby FFI pocketsphinx bindings
|
140
145
|
test_files:
|
146
|
+
- spec/assets/audio/goforward.raw
|
141
147
|
- spec/configuration_spec.rb
|
142
148
|
- spec/decoder_spec.rb
|
143
149
|
- spec/microphone_spec.rb
|
144
150
|
- spec/spec_helper.rb
|
151
|
+
- spec/speech_recognizer_spec.rb
|