pocketsphinx-ruby 0.0.1 → 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +50 -1
- data/examples/decode_audio_file.rb +11 -0
- data/examples/record_audio_file.rb +1 -1
- data/lib/pocketsphinx.rb +2 -0
- data/lib/pocketsphinx/api/pocketsphinx.rb +10 -6
- data/lib/pocketsphinx/audio_file.rb +32 -0
- data/lib/pocketsphinx/audio_file_speech_recognizer.rb +12 -0
- data/lib/pocketsphinx/configuration.rb +34 -9
- data/lib/pocketsphinx/configuration/setting_definition.rb +13 -7
- data/lib/pocketsphinx/decoder.rb +36 -0
- data/lib/pocketsphinx/live_speech_recognizer.rb +2 -40
- data/lib/pocketsphinx/microphone.rb +20 -4
- data/lib/pocketsphinx/speech_recognizer.rb +77 -3
- data/lib/pocketsphinx/version.rb +1 -1
- data/spec/assets/audio/goforward.raw +0 -0
- data/spec/configuration_spec.rb +33 -0
- data/spec/decoder_spec.rb +16 -0
- data/spec/speech_recognizer_spec.rb +23 -0
- metadata +9 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 440699d34e0585b3670bd4bfa91e6e9a87b2331f
|
4
|
+
data.tar.gz: 8a697fa2d7e491e4eccfb47fe678a6acfcf695a7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7d913ab82f397056b9b90bb5f7d4fb6609a618a367a361c3936afd4e850caf908614bb16a174cae26160241cc8424eb7abae4404595b1d27e6a90bc0e431f2cf
|
7
|
+
data.tar.gz: ab6b8b36f3b9ef07f0086cca1e28b49b146116b15a3acf7e06f0fc80d50fc133a2653e05f437344c2f8532284c4e8fa8205151bca3de379f4a4f354048accdf5
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6
6
|
|
7
7
|
This gem provides Ruby [FFI](https://github.com/ffi/ffi) bindings for [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx), a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the [CMU Sphinx](http://cmusphinx.sourceforge.net/) Open Source Toolkit For Speech Recognition.
|
8
8
|
|
9
|
-
|
9
|
+
Pocketsphinx's [SWIG](http://www.swig.org/) interface was initially considered for this gem, but dropped in favor of FFI for many of the reasons outlined [here](https://github.com/ffi/ffi/wiki/Why-use-FFI); most importantly ease of maintenance and JRuby support.
|
10
10
|
|
11
11
|
The goal of this project is to make it as easy as possible for the Ruby community to experiment with speech recognition. Please do contribute fixes and enhancements.
|
12
12
|
|
@@ -62,6 +62,41 @@ Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
|
|
62
62
|
end
|
63
63
|
```
|
64
64
|
|
65
|
+
The `AudioFileSpeechRecognizer` decodes directly from an audio file by coordinating interactions between an `AudioFile` and `Decoder`.
|
66
|
+
|
67
|
+
```ruby
|
68
|
+
recognizer = Pocketsphinx::AudioFileSpeechRecognizer.new
|
69
|
+
|
70
|
+
recognizer.recognize('spec/assets/audio/goforward.raw') do |speech|
|
71
|
+
puts speech # => "go forward ten years"
|
72
|
+
end
|
73
|
+
```
|
74
|
+
|
75
|
+
These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` configuration settings.
|
76
|
+
|
77
|
+
|
78
|
+
## Configuration
|
79
|
+
|
80
|
+
All of Pocketsphinx's decoding settings are managed by the `Configuration` class, which can be passed into the high-level speech recognizers:
|
81
|
+
|
82
|
+
```ruby
|
83
|
+
configuration = Pocketsphinx::Configuration.default
|
84
|
+
configuration.details('vad_threshold')
|
85
|
+
# => {
|
86
|
+
# :name => "vad_threshold",
|
87
|
+
# :type => :float,
|
88
|
+
# :default => 2.0,
|
89
|
+
# :value => 2.0,
|
90
|
+
# :info => "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
|
91
|
+
# }
|
92
|
+
|
93
|
+
configuration['vad_threshold'] = 4
|
94
|
+
|
95
|
+
Pocketsphinx::LiveSpeechRecognizer.new(configuration)
|
96
|
+
```
|
97
|
+
|
98
|
+
You can find the output of `configuration.details` [here](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Default-Pocketsphinx-Configuration) for more information on the various different settings.
|
99
|
+
|
65
100
|
|
66
101
|
## Microphone
|
67
102
|
|
@@ -86,6 +121,20 @@ File.open("test.raw", "wb") do |file|
|
|
86
121
|
end
|
87
122
|
```
|
88
123
|
|
124
|
+
To open this audio file take a look at [this wiki page](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Importing-raw-PCM-audio-with-Audacity).
|
125
|
+
|
126
|
+
|
127
|
+
## Decoder
|
128
|
+
|
129
|
+
The `Decoder` class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:
|
130
|
+
|
131
|
+
```ruby
|
132
|
+
decoder = Decoder.new(Configuration.default)
|
133
|
+
decoder.decode 'spec/assets/audio/goforward.raw'
|
134
|
+
|
135
|
+
puts decoder.hypothesis # => "go forward ten years"
|
136
|
+
```
|
137
|
+
|
89
138
|
|
90
139
|
## Contributing
|
91
140
|
|
@@ -0,0 +1,11 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "pocketsphinx-ruby"
|
5
|
+
|
6
|
+
include Pocketsphinx
|
7
|
+
|
8
|
+
decoder = Decoder.new(Configuration.default)
|
9
|
+
decoder.decode 'spec/assets/audio/goforward.raw'
|
10
|
+
|
11
|
+
puts decoder.hypothesis # => "go forward ten years"
|
@@ -16,7 +16,7 @@ microphone = Microphone.new
|
|
16
16
|
File.open("test_write.raw", "wb") do |file|
|
17
17
|
microphone.record do
|
18
18
|
FFI::MemoryPointer.new(:int16, MAX_SAMPLES) do |buffer|
|
19
|
-
(RECORDING_LENGTH / RECORDING_INTERVAL).times do
|
19
|
+
(RECORDING_LENGTH / RECORDING_INTERVAL).to_i.times do
|
20
20
|
sample_count = microphone.read_audio(buffer, MAX_SAMPLES)
|
21
21
|
|
22
22
|
# sample_count * 2 since this is length in bytes
|
data/lib/pocketsphinx.rb
CHANGED
@@ -6,10 +6,12 @@ require "pocketsphinx/api/sphinxad"
|
|
6
6
|
require "pocketsphinx/api/pocketsphinx"
|
7
7
|
|
8
8
|
require "pocketsphinx/configuration"
|
9
|
+
require "pocketsphinx/audio_file"
|
9
10
|
require "pocketsphinx/microphone"
|
10
11
|
require "pocketsphinx/decoder"
|
11
12
|
require "pocketsphinx/speech_recognizer"
|
12
13
|
require "pocketsphinx/live_speech_recognizer"
|
14
|
+
require "pocketsphinx/audio_file_speech_recognizer"
|
13
15
|
|
14
16
|
module Pocketsphinx
|
15
17
|
|
@@ -4,14 +4,18 @@ module Pocketsphinx
|
|
4
4
|
extend FFI::Library
|
5
5
|
ffi_lib "libpocketsphinx"
|
6
6
|
|
7
|
-
|
7
|
+
typedef :pointer, :decoder
|
8
|
+
typedef :pointer, :configuration
|
9
|
+
|
10
|
+
attach_function :ps_init, [:configuration], :decoder
|
8
11
|
attach_function :ps_default_search_args, [:pointer], :void
|
9
12
|
attach_function :ps_args, [], :pointer
|
10
|
-
attach_function :
|
11
|
-
attach_function :
|
12
|
-
attach_function :
|
13
|
-
attach_function :
|
14
|
-
attach_function :
|
13
|
+
attach_function :ps_decode_raw, [:decoder, :pointer, :string, :long], :int
|
14
|
+
attach_function :ps_process_raw, [:decoder, :pointer, :size_t, :int, :int], :int
|
15
|
+
attach_function :ps_start_utt, [:decoder, :string], :int
|
16
|
+
attach_function :ps_end_utt, [:decoder], :int
|
17
|
+
attach_function :ps_get_in_speech, [:decoder], :uint8
|
18
|
+
attach_function :ps_get_hyp, [:decoder, :pointer, :pointer], :string
|
15
19
|
end
|
16
20
|
end
|
17
21
|
end
|
@@ -0,0 +1,32 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
# Implements Recordable interface (#record and #read_audio)
|
3
|
+
class AudioFile < Struct.new(:file_path)
|
4
|
+
def record
|
5
|
+
File.open(file_path, 'rb') do |file|
|
6
|
+
self.file = file
|
7
|
+
yield
|
8
|
+
self.file = nil
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
# Read next block of audio samples from file; up to max samples into buffer.
|
13
|
+
#
|
14
|
+
# @param [FFI::Pointer] buffer 16bit buffer of at least max_samples in size
|
15
|
+
# @params [Fixnum] max_samples The maximum number of samples to read from the audio file
|
16
|
+
# @return [Fixnum] Samples actually read; nil if EOF
|
17
|
+
def read_audio(buffer, max_samples = 4096)
|
18
|
+
if file.nil?
|
19
|
+
raise "Can't read audio: use AudioFile#record to open the file first"
|
20
|
+
end
|
21
|
+
|
22
|
+
if data = file.read(max_samples * 2)
|
23
|
+
buffer.write_string(data)
|
24
|
+
data.length / 2
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
private
|
29
|
+
|
30
|
+
attr_accessor :file
|
31
|
+
end
|
32
|
+
end
|
@@ -0,0 +1,12 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
# High-level class for live speech recognition from a raw audio file.
|
3
|
+
class AudioFileSpeechRecognizer < SpeechRecognizer
|
4
|
+
def recognize(file_path, max_samples = 4096)
|
5
|
+
self.recordable = AudioFile.new(file_path)
|
6
|
+
|
7
|
+
super(max_samples) do |speech|
|
8
|
+
yield speech if block_given?
|
9
|
+
end
|
10
|
+
end
|
11
|
+
end
|
12
|
+
end
|
@@ -3,6 +3,7 @@ require 'pocketsphinx/configuration/setting_definition'
|
|
3
3
|
module Pocketsphinx
|
4
4
|
class Configuration
|
5
5
|
attr_reader :ps_config
|
6
|
+
attr_reader :setting_definitions
|
6
7
|
|
7
8
|
private_class_method :new
|
8
9
|
|
@@ -22,12 +23,33 @@ module Pocketsphinx
|
|
22
23
|
new(API::Pocketsphinx.ps_args)
|
23
24
|
end
|
24
25
|
|
25
|
-
def
|
26
|
-
|
27
|
-
|
26
|
+
def setting_names
|
27
|
+
setting_definitions.keys.sort
|
28
|
+
end
|
29
|
+
|
30
|
+
# Get details for one or all configuration settings
|
31
|
+
#
|
32
|
+
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
33
|
+
def details(name = nil)
|
34
|
+
details = [name || setting_names].flatten.map do |name|
|
35
|
+
definition = find_definition(name)
|
36
|
+
|
37
|
+
{
|
38
|
+
name: name,
|
39
|
+
type: definition.type,
|
40
|
+
default: definition.default,
|
41
|
+
required: definition.required?,
|
42
|
+
value: self[name],
|
43
|
+
info: definition.doc
|
44
|
+
}
|
28
45
|
end
|
29
46
|
|
30
|
-
|
47
|
+
name ? details.first : details
|
48
|
+
end
|
49
|
+
|
50
|
+
# Get a configuration setting
|
51
|
+
def [](name)
|
52
|
+
case find_definition(name).type
|
31
53
|
when :integer
|
32
54
|
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}")
|
33
55
|
when :float
|
@@ -41,12 +63,9 @@ module Pocketsphinx
|
|
41
63
|
end
|
42
64
|
end
|
43
65
|
|
66
|
+
# Set a configuration setting with type checking
|
44
67
|
def []=(name, value)
|
45
|
-
|
46
|
-
raise "Configuration setting '#{name}' does not exist"
|
47
|
-
end
|
48
|
-
|
49
|
-
case definition.type
|
68
|
+
case find_definition(name).type
|
50
69
|
when :integer
|
51
70
|
raise "Configuration setting '#{name}' must be a Fixnum" unless value.respond_to?(:to_i)
|
52
71
|
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value.to_i)
|
@@ -61,5 +80,11 @@ module Pocketsphinx
|
|
61
80
|
raise NotImplementedException
|
62
81
|
end
|
63
82
|
end
|
83
|
+
|
84
|
+
private
|
85
|
+
|
86
|
+
def find_definition(name)
|
87
|
+
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
88
|
+
end
|
64
89
|
end
|
65
90
|
end
|
@@ -1,19 +1,25 @@
|
|
1
1
|
module Pocketsphinx
|
2
2
|
class Configuration
|
3
|
-
class SettingDefinition
|
3
|
+
class SettingDefinition < Struct.new(:name, :type_code, :deflt, :doc)
|
4
4
|
TYPES = [:integer, :float, :string, :boolean, :string_list]
|
5
5
|
|
6
|
-
def initialize(name, type_code, default, doc)
|
7
|
-
@name, @type_code, @default, @doc = name, type_code, default, doc
|
8
|
-
end
|
9
|
-
|
10
6
|
def type
|
11
7
|
# Remove the required bit if it exists and find type from log2 of code
|
12
|
-
TYPES[Math.log2(
|
8
|
+
TYPES[Math.log2(type_code - type_code%2) - 1]
|
9
|
+
end
|
10
|
+
|
11
|
+
# Convert string defaults from pocketsphinx to Ruby types
|
12
|
+
def default
|
13
|
+
case type
|
14
|
+
when :integer then deflt.to_i
|
15
|
+
when :float then deflt.to_f
|
16
|
+
when :boolean then deflt == 'yes'
|
17
|
+
else deflt
|
18
|
+
end
|
13
19
|
end
|
14
20
|
|
15
21
|
def required?
|
16
|
-
|
22
|
+
type_code % 2 == 1
|
17
23
|
end
|
18
24
|
|
19
25
|
# Build setting definitions from pocketsphinx argument definitions
|
data/lib/pocketsphinx/decoder.rb
CHANGED
@@ -10,6 +10,42 @@ module Pocketsphinx
|
|
10
10
|
@ps_decoder = ps_api.ps_init(configuration.ps_config)
|
11
11
|
end
|
12
12
|
|
13
|
+
# Decode a raw audio stream as a single utterance, opening a file if path given
|
14
|
+
#
|
15
|
+
# See #decode_raw
|
16
|
+
#
|
17
|
+
# @param [IO] audio_path_or_file The raw audio stream or file path to decode as a single utterance
|
18
|
+
# @param [Fixnum] max_samples The maximum samples to process from the stream on each iteration
|
19
|
+
def decode(audio_path_or_file, max_samples = 2048)
|
20
|
+
case audio_path_or_file
|
21
|
+
when String
|
22
|
+
File.open(audio_path_or_file, 'rb') { |f| decode_raw(f, max_samples) }
|
23
|
+
else
|
24
|
+
decode_raw(audio_path_or_file, max_samples)
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
# Decode a raw audio stream as a single utterance.
|
29
|
+
#
|
30
|
+
# No headers are recognized in this files. The configuration parameters samprate
|
31
|
+
# and input_endian are used to determine the sampling rate and endianness of the stream,
|
32
|
+
# respectively. Audio is always assumed to be 16-bit signed PCM.
|
33
|
+
#
|
34
|
+
# @param [IO] audio_file The raw audio stream to decode as a single utterance
|
35
|
+
# @param [Fixnum] max_samples The maximum samples to process from the stream on each iteration
|
36
|
+
def decode_raw(audio_file, max_samples = 2048)
|
37
|
+
start_utterance
|
38
|
+
|
39
|
+
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
40
|
+
while data = audio_file.read(max_samples * 2)
|
41
|
+
buffer.write_string(data)
|
42
|
+
process_raw(buffer, data.length / 2)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
end_utterance
|
47
|
+
end
|
48
|
+
|
13
49
|
# Decode raw audio data.
|
14
50
|
#
|
15
51
|
# @param [Boolean] no_search If non-zero, perform feature extraction but don't do any
|
@@ -3,46 +3,8 @@ module Pocketsphinx
|
|
3
3
|
#
|
4
4
|
# Modeled on the LiveSpeechRecognizer from Sphinx4.
|
5
5
|
class LiveSpeechRecognizer < SpeechRecognizer
|
6
|
-
|
7
|
-
|
8
|
-
def microphone
|
9
|
-
@microphone ||= Microphone.new
|
10
|
-
end
|
11
|
-
|
12
|
-
# Recognize utterances and yield hypotheses in infinite loop
|
13
|
-
#
|
14
|
-
# @param [Float]
|
15
|
-
def recognize(recording_interval = 0.1, max_samples = 4096)
|
16
|
-
decoder.start_utterance
|
17
|
-
|
18
|
-
microphone.record do
|
19
|
-
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
20
|
-
loop do
|
21
|
-
if decoder.in_speech?
|
22
|
-
process_audio(buffer, max_samples, recording_interval) while decoder.in_speech?
|
23
|
-
yield get_hypothesis
|
24
|
-
else
|
25
|
-
process_audio(buffer, max_samples, recording_interval)
|
26
|
-
end
|
27
|
-
end
|
28
|
-
end
|
29
|
-
end
|
30
|
-
end
|
31
|
-
|
32
|
-
private
|
33
|
-
|
34
|
-
def process_audio(buffer, max_samples, delay)
|
35
|
-
sample_count = microphone.read_audio(buffer, max_samples)
|
36
|
-
decoder.process_raw(buffer, sample_count)
|
37
|
-
sleep delay
|
38
|
-
end
|
39
|
-
|
40
|
-
# Called on speech -> silence transition
|
41
|
-
def get_hypothesis
|
42
|
-
decoder.end_utterance
|
43
|
-
decoder.hypothesis.tap do
|
44
|
-
decoder.start_utterance
|
45
|
-
end
|
6
|
+
def recordable
|
7
|
+
@recordable ||= Microphone.new
|
46
8
|
end
|
47
9
|
end
|
48
10
|
end
|
@@ -1,10 +1,13 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
-
# Provides non-blocking audio recording using libsphinxad
|
2
|
+
# Provides non-blocking live audio recording using libsphinxad
|
3
|
+
#
|
4
|
+
# Implements Recordable interface (#record and #read_audio)
|
3
5
|
class Microphone
|
4
6
|
Error = Class.new(StandardError)
|
5
7
|
|
6
8
|
attr_reader :ps_audio_device
|
7
9
|
attr_writer :ps_api
|
10
|
+
attr_reader :sample_rate
|
8
11
|
|
9
12
|
# Opens an audio device for recording
|
10
13
|
#
|
@@ -14,8 +17,9 @@ module Pocketsphinx
|
|
14
17
|
# @param [String] default_device The device name
|
15
18
|
# @param [Object] ps_api A SphinxAD API implementation to use, API::SphinxAD if not provided
|
16
19
|
def initialize(sample_rate = 16000, default_device = nil, ps_api = nil)
|
20
|
+
@sample_rate = sample_rate
|
17
21
|
@ps_api = ps_api
|
18
|
-
@ps_audio_device = ps_api.ad_open_dev(default_device, sample_rate)
|
22
|
+
@ps_audio_device = self.ps_api.ad_open_dev(default_device, sample_rate)
|
19
23
|
|
20
24
|
# Ensure that audio device is closed when object is garbage collected
|
21
25
|
ObjectSpace.define_finalizer(self, self.class.finalize(ps_api, @ps_audio_device))
|
@@ -46,10 +50,22 @@ module Pocketsphinx
|
|
46
50
|
# Read next block of audio samples while recording; read upto max samples into buf.
|
47
51
|
#
|
48
52
|
# @param [FFI::Pointer] buffer 16bit buffer of at least max_samples in size
|
49
|
-
# @
|
53
|
+
# @params [Fixnum] max_samples The maximum number of samples to read from the audio device
|
54
|
+
# @return [Fixnum] Samples actually read (could be 0 since non-blocking); nil if not
|
50
55
|
# recording and no more samples remaining to be read from most recent recording.
|
51
56
|
def read_audio(buffer, max_samples = 4096)
|
52
|
-
ps_api.ad_read(@ps_audio_device, buffer, max_samples)
|
57
|
+
samples = ps_api.ad_read(@ps_audio_device, buffer, max_samples)
|
58
|
+
samples if samples >= 0
|
59
|
+
end
|
60
|
+
|
61
|
+
# A Recordable may specify an audio reading delay
|
62
|
+
#
|
63
|
+
# In the case of the Microphone, because we are doing non-blocking reads,
|
64
|
+
# we specify a delay which should fill half of the max buffer size
|
65
|
+
#
|
66
|
+
# @param [Fixnum] max_samples The maximum samples we tried to read from the audio device
|
67
|
+
def read_audio_delay(max_samples = 4096)
|
68
|
+
max_samples / (2 * sample_rate)
|
53
69
|
end
|
54
70
|
|
55
71
|
def close_device
|
@@ -1,9 +1,83 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
+
# Reads audio data from a recordable interface and decodes it into utterances
|
3
|
+
#
|
4
|
+
# Essentially orchestrates interaction between Recordable and Decoder, and detects new utterances.
|
2
5
|
class SpeechRecognizer
|
3
|
-
|
6
|
+
# Recordable interface must implement #record and #read_audio
|
7
|
+
attr_writer :recordable
|
8
|
+
attr_writer :decoder
|
4
9
|
|
5
|
-
def initialize(configuration= nil)
|
6
|
-
@
|
10
|
+
def initialize(configuration = nil)
|
11
|
+
@configuration = configuration
|
12
|
+
end
|
13
|
+
|
14
|
+
def recordable
|
15
|
+
@recordable or raise "A SpeechRecognizer must have a recordable interface"
|
16
|
+
end
|
17
|
+
|
18
|
+
def decoder
|
19
|
+
@decoder ||= Decoder.new(configuration)
|
20
|
+
end
|
21
|
+
|
22
|
+
def configuration
|
23
|
+
@configuration ||= Configuration.default
|
24
|
+
end
|
25
|
+
|
26
|
+
# Recognize utterances and yield hypotheses in infinite loop
|
27
|
+
#
|
28
|
+
# Splits speech into utterances by detecting silence between them.
|
29
|
+
# By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be
|
30
|
+
# configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` settings.
|
31
|
+
#
|
32
|
+
# @param [Fixnum] max_samples Number of samples to process at a time
|
33
|
+
def recognize(max_samples = 4096)
|
34
|
+
decoder.start_utterance
|
35
|
+
|
36
|
+
recordable.record do
|
37
|
+
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
38
|
+
loop do
|
39
|
+
if in_speech?
|
40
|
+
while decoder.in_speech?
|
41
|
+
process_audio(buffer, max_samples) or break
|
42
|
+
end
|
43
|
+
|
44
|
+
yield get_hypothesis
|
45
|
+
else
|
46
|
+
process_audio(buffer, max_samples) or break
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def in_speech?
|
54
|
+
# Use Pocketsphinx's implementation by default
|
55
|
+
decoder.in_speech?
|
56
|
+
end
|
57
|
+
|
58
|
+
private
|
59
|
+
|
60
|
+
def process_audio(buffer, max_samples)
|
61
|
+
sample_count = recordable.read_audio(buffer, max_samples)
|
62
|
+
|
63
|
+
if sample_count
|
64
|
+
decoder.process_raw(buffer, sample_count)
|
65
|
+
|
66
|
+
# Check for a delay for example in case of non-blocking live audio
|
67
|
+
if recordable.respond_to?(:read_audio_delay)
|
68
|
+
sleep recordable.read_audio_delay(max_samples)
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
sample_count
|
73
|
+
end
|
74
|
+
|
75
|
+
# Called on speech -> silence transition
|
76
|
+
def get_hypothesis
|
77
|
+
decoder.end_utterance
|
78
|
+
decoder.hypothesis.tap do
|
79
|
+
decoder.start_utterance
|
80
|
+
end
|
7
81
|
end
|
8
82
|
end
|
9
83
|
end
|
data/lib/pocketsphinx/version.rb
CHANGED
Binary file
|
data/spec/configuration_spec.rb
CHANGED
@@ -44,4 +44,37 @@ describe Configuration do
|
|
44
44
|
it 'raises exceptions when a setting is unknown' do
|
45
45
|
expect { subject['unknown'] = true }.to raise_exception "Configuration setting 'unknown' does not exist"
|
46
46
|
end
|
47
|
+
|
48
|
+
describe '#setting_names' do
|
49
|
+
it 'contains the names of all possible system settings' do
|
50
|
+
expect(subject.setting_names.count).to eq(117)
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
describe '#details' do
|
55
|
+
it 'gives details for a single setting' do
|
56
|
+
expect(subject.details 'vad_threshold').to eq({
|
57
|
+
name: "vad_threshold",
|
58
|
+
type: :float,
|
59
|
+
default: 2.0,
|
60
|
+
required: false,
|
61
|
+
value: 2.0,
|
62
|
+
info: "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
|
63
|
+
})
|
64
|
+
end
|
65
|
+
|
66
|
+
it 'gives details for all settings when no name is specified' do
|
67
|
+
details = subject.details
|
68
|
+
|
69
|
+
expect(details.count).to eq(117)
|
70
|
+
expect(details.first).to eq({
|
71
|
+
name: "agc",
|
72
|
+
type: :string,
|
73
|
+
default: "none",
|
74
|
+
required: false,
|
75
|
+
value: "none",
|
76
|
+
info: "Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')"
|
77
|
+
})
|
78
|
+
end
|
79
|
+
end
|
47
80
|
end
|
data/spec/decoder_spec.rb
CHANGED
@@ -9,6 +9,22 @@ describe Decoder do
|
|
9
9
|
@decoder = Decoder.new(Configuration.default)
|
10
10
|
end
|
11
11
|
|
12
|
+
# Full integration test
|
13
|
+
describe '#decode' do
|
14
|
+
it 'correctly decodes the speech in goforward.raw' do
|
15
|
+
subject.decode File.open('spec/assets/audio/goforward.raw', 'rb')
|
16
|
+
|
17
|
+
# With the default configuration (no specific grammar), pocketsphinx doesn't actually
|
18
|
+
# get this quite right, but nonetheless this is the expected output
|
19
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
20
|
+
end
|
21
|
+
|
22
|
+
it 'accepts a file path as well as a stream' do
|
23
|
+
subject.decode 'spec/assets/audio/goforward.raw'
|
24
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
12
28
|
describe '#process_raw' do
|
13
29
|
it 'calls libpocketsphinx' do
|
14
30
|
FFI::MemoryPointer.new(:int16, 4096) do |buffer|
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe SpeechRecognizer do
|
4
|
+
let(:recordable) { AudioFile.new('spec/assets/audio/goforward.raw') }
|
5
|
+
|
6
|
+
subject do
|
7
|
+
SpeechRecognizer.new.tap do |speech_recognizer|
|
8
|
+
speech_recognizer.recordable = recordable
|
9
|
+
speech_recognizer.decoder = @decoder
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
# Share decoder across all examples for speed
|
14
|
+
before :all do
|
15
|
+
@decoder = Decoder.new(Configuration.default)
|
16
|
+
end
|
17
|
+
|
18
|
+
describe '#recognize' do
|
19
|
+
it 'should decode speech in raw audio' do
|
20
|
+
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pocketsphinx-ruby
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Howard Wilson
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-10-
|
11
|
+
date: 2014-10-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -94,6 +94,7 @@ files:
|
|
94
94
|
- LICENSE.txt
|
95
95
|
- README.md
|
96
96
|
- Rakefile
|
97
|
+
- examples/decode_audio_file.rb
|
97
98
|
- examples/pocketsphinx_continuous.rb
|
98
99
|
- examples/record_audio_file.rb
|
99
100
|
- lib/pocketsphinx-ruby.rb
|
@@ -101,6 +102,8 @@ files:
|
|
101
102
|
- lib/pocketsphinx/api/pocketsphinx.rb
|
102
103
|
- lib/pocketsphinx/api/sphinxad.rb
|
103
104
|
- lib/pocketsphinx/api/sphinxbase.rb
|
105
|
+
- lib/pocketsphinx/audio_file.rb
|
106
|
+
- lib/pocketsphinx/audio_file_speech_recognizer.rb
|
104
107
|
- lib/pocketsphinx/configuration.rb
|
105
108
|
- lib/pocketsphinx/configuration/setting_definition.rb
|
106
109
|
- lib/pocketsphinx/decoder.rb
|
@@ -109,10 +112,12 @@ files:
|
|
109
112
|
- lib/pocketsphinx/speech_recognizer.rb
|
110
113
|
- lib/pocketsphinx/version.rb
|
111
114
|
- pocketsphinx-ruby.gemspec
|
115
|
+
- spec/assets/audio/goforward.raw
|
112
116
|
- spec/configuration_spec.rb
|
113
117
|
- spec/decoder_spec.rb
|
114
118
|
- spec/microphone_spec.rb
|
115
119
|
- spec/spec_helper.rb
|
120
|
+
- spec/speech_recognizer_spec.rb
|
116
121
|
homepage: https://github.com/watsonbox/pocketsphinx-ruby
|
117
122
|
licenses:
|
118
123
|
- MIT
|
@@ -138,7 +143,9 @@ signing_key:
|
|
138
143
|
specification_version: 4
|
139
144
|
summary: Ruby FFI pocketsphinx bindings
|
140
145
|
test_files:
|
146
|
+
- spec/assets/audio/goforward.raw
|
141
147
|
- spec/configuration_spec.rb
|
142
148
|
- spec/decoder_spec.rb
|
143
149
|
- spec/microphone_spec.rb
|
144
150
|
- spec/spec_helper.rb
|
151
|
+
- spec/speech_recognizer_spec.rb
|