pocketsphinx-ruby 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +17 -4
- data/examples/keyword_spotter.rb +21 -0
- data/lib/pocketsphinx.rb +8 -1
- data/lib/pocketsphinx/api/pocketsphinx.rb +1 -0
- data/lib/pocketsphinx/configuration/base.rb +95 -0
- data/lib/pocketsphinx/configuration/default.rb +17 -0
- data/lib/pocketsphinx/configuration/keyword_spotting.rb +37 -0
- data/lib/pocketsphinx/configuration/setting_definition.rb +1 -1
- data/lib/pocketsphinx/decoder.rb +23 -10
- data/lib/pocketsphinx/microphone.rb +1 -1
- data/lib/pocketsphinx/speech_recognizer.rb +23 -1
- data/lib/pocketsphinx/version.rb +1 -1
- data/spec/configuration_spec.rb +39 -4
- data/spec/decoder_spec.rb +36 -16
- data/spec/integration/decoder_spec.rb +28 -0
- data/spec/integration/speech_recognizer_spec.rb +23 -0
- data/spec/microphone_spec.rb +6 -0
- data/spec/speech_recognizer_spec.rb +30 -13
- metadata +10 -3
- data/lib/pocketsphinx/configuration.rb +0 -90
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3bf38b30cbc9fd5c2375d2c330238d1ad429e44c
|
4
|
+
data.tar.gz: 82aecda7d2c95378f15f47cc897b13419b83ce00
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2bb177461a17173815f299d3b17807014b80a22c3fae818569a4d29355e33dd506d3e1d4d195f8fa84ee09073014212fadd17a00e3f088a82d4278ccceea936a
|
7
|
+
data.tar.gz: 1ba9d9b74c999a05091e870b0fe2aee6c45df623b4d84a7364268eab1e4cf4e41dcf11b7206e90f5e3afd1d9ee292e503f8b4a6712b859e151e611944fe867a4
|
data/README.md
CHANGED
@@ -3,6 +3,7 @@
|
|
3
3
|
[](https://travis-ci.org/watsonbox/pocketsphinx-ruby)
|
4
4
|
[](https://codeclimate.com/github/watsonbox/pocketsphinx-ruby)
|
5
5
|
[](https://coveralls.io/r/watsonbox/pocketsphinx-ruby)
|
6
|
+
[](http://www.rubydoc.info/gems/pocketsphinx-ruby/frames)
|
6
7
|
|
7
8
|
This gem provides Ruby [FFI](https://github.com/ffi/ffi) bindings for [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx), a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the [CMU Sphinx](http://cmusphinx.sourceforge.net/) Open Source Toolkit For Speech Recognition.
|
8
9
|
|
@@ -50,7 +51,7 @@ Or install it yourself as:
|
|
50
51
|
$ gem install pocketsphinx-ruby
|
51
52
|
|
52
53
|
|
53
|
-
##
|
54
|
+
## Usage
|
54
55
|
|
55
56
|
The `LiveSpeechRecognizer` is modeled on the same class in [Sphinx4](http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4). It uses the `Microphone` and `Decoder` classes internally to provide a simple, high-level recognition interface:
|
56
57
|
|
@@ -75,7 +76,7 @@ end
|
|
75
76
|
These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` configuration settings.
|
76
77
|
|
77
78
|
|
78
|
-
|
79
|
+
### Configuration
|
79
80
|
|
80
81
|
All of Pocketsphinx's decoding settings are managed by the `Configuration` class, which can be passed into the high-level speech recognizers:
|
81
82
|
|
@@ -98,7 +99,7 @@ Pocketsphinx::LiveSpeechRecognizer.new(configuration)
|
|
98
99
|
You can find the output of `configuration.details` [here](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Default-Pocketsphinx-Configuration) for more information on the various different settings.
|
99
100
|
|
100
101
|
|
101
|
-
|
102
|
+
### Microphone
|
102
103
|
|
103
104
|
The `Microphone` class uses Pocketsphinx's libsphinxad to record audio for speech recognition. For desktop applications this should normally be 16bit/16kHz raw PCM audio, so these are the default settings. The exact audio backend depends on [what was selected](https://github.com/cmusphinx/sphinxbase/blob/master/configure.in#L138) when libsphinxad was built. On OSX, OpenAL is [now supported](https://github.com/cmusphinx/sphinxbase/commit/5cc55c4721273681200e1f754ff0798ac073b950) and should work just fine.
|
104
105
|
|
@@ -124,7 +125,7 @@ end
|
|
124
125
|
To open this audio file take a look at [this wiki page](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Importing-raw-PCM-audio-with-Audacity).
|
125
126
|
|
126
127
|
|
127
|
-
|
128
|
+
### Decoder
|
128
129
|
|
129
130
|
The `Decoder` class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:
|
130
131
|
|
@@ -136,6 +137,18 @@ puts decoder.hypothesis # => "go forward ten years"
|
|
136
137
|
```
|
137
138
|
|
138
139
|
|
140
|
+
### Keyword Spotting
|
141
|
+
|
142
|
+
Keyword spotting is another feature that is not in the current stable (0.8) releases of Pocketsphinx, having been [merged into trunk](https://github.com/cmusphinx/pocketsphinx/commit/f562f9356cc7f1ade4941ebdde0c377642a023e3) early in 2014. In can be useful for detecting an activation keyword in a command and control application, while ignoring all other speech. Set up a recognizer as follows:
|
143
|
+
|
144
|
+
```ruby
|
145
|
+
configuration = Configuration::KeywordSpotting.new('Okay computer')
|
146
|
+
recognizer = LiveSpeechRecognizer.new(configuration)
|
147
|
+
```
|
148
|
+
|
149
|
+
The `KeywordSpotting` configuration accepts a second argument for adjusting the sensitivity of the keyword detection. Note that this is just a wrapper which sets the `keyphrase` and `kws_threshold` settings on the default configuration.
|
150
|
+
|
151
|
+
|
139
152
|
## Contributing
|
140
153
|
|
141
154
|
1. Fork it ( https://github.com/[my-github-username]/pocketsphinx-ruby/fork )
|
@@ -0,0 +1,21 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "pocketsphinx-ruby"
|
5
|
+
|
6
|
+
include Pocketsphinx
|
7
|
+
|
8
|
+
configuration = Configuration::KeywordSpotting.new('hello computer')
|
9
|
+
recognizer = LiveSpeechRecognizer.new(configuration)
|
10
|
+
|
11
|
+
recognizer.recognize do |speech|
|
12
|
+
if configuration.keyword == 'hello computer'
|
13
|
+
configuration.keyword = 'goodbye computer'
|
14
|
+
else
|
15
|
+
configuration.keyword = 'hello computer'
|
16
|
+
end
|
17
|
+
|
18
|
+
recognizer.reconfigure
|
19
|
+
|
20
|
+
puts "You said '#{speech}'. Keyword is now '#{configuration.keyword}'"
|
21
|
+
end
|
data/lib/pocketsphinx.rb
CHANGED
@@ -1,11 +1,18 @@
|
|
1
1
|
require 'ffi'
|
2
2
|
|
3
3
|
require "pocketsphinx/version"
|
4
|
+
|
5
|
+
# Pocketsphinx FFI API
|
4
6
|
require "pocketsphinx/api/sphinxbase"
|
5
7
|
require "pocketsphinx/api/sphinxad"
|
6
8
|
require "pocketsphinx/api/pocketsphinx"
|
7
9
|
|
8
|
-
|
10
|
+
# Configuration
|
11
|
+
require 'pocketsphinx/configuration/setting_definition'
|
12
|
+
require "pocketsphinx/configuration/base"
|
13
|
+
require "pocketsphinx/configuration/default"
|
14
|
+
require "pocketsphinx/configuration/keyword_spotting"
|
15
|
+
|
9
16
|
require "pocketsphinx/audio_file"
|
10
17
|
require "pocketsphinx/microphone"
|
11
18
|
require "pocketsphinx/decoder"
|
@@ -8,6 +8,7 @@ module Pocketsphinx
|
|
8
8
|
typedef :pointer, :configuration
|
9
9
|
|
10
10
|
attach_function :ps_init, [:configuration], :decoder
|
11
|
+
attach_function :ps_reinit, [:decoder, :configuration], :int
|
11
12
|
attach_function :ps_default_search_args, [:pointer], :void
|
12
13
|
attach_function :ps_args, [], :pointer
|
13
14
|
attach_function :ps_decode_raw, [:decoder, :pointer, :string, :long], :int
|
@@ -0,0 +1,95 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class Base
|
4
|
+
attr_reader :ps_config
|
5
|
+
attr_reader :setting_definitions
|
6
|
+
|
7
|
+
def initialize
|
8
|
+
@ps_arg_defs = API::Pocketsphinx.ps_args
|
9
|
+
@setting_definitions = SettingDefinition.from_arg_defs(@ps_arg_defs)
|
10
|
+
|
11
|
+
# Sets default settings based on definitions
|
12
|
+
@ps_config = API::Sphinxbase.cmd_ln_parse_r(nil, @ps_arg_defs, 0, nil, 1)
|
13
|
+
end
|
14
|
+
|
15
|
+
def setting_names
|
16
|
+
setting_definitions.keys.sort
|
17
|
+
end
|
18
|
+
|
19
|
+
# Get details for one or all configuration settings
|
20
|
+
#
|
21
|
+
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
22
|
+
def details(name = nil)
|
23
|
+
details = [name || setting_names].flatten.map do |name|
|
24
|
+
definition = find_definition(name)
|
25
|
+
|
26
|
+
{
|
27
|
+
name: name,
|
28
|
+
type: definition.type,
|
29
|
+
default: definition.default,
|
30
|
+
required: definition.required?,
|
31
|
+
value: self[name],
|
32
|
+
info: definition.doc
|
33
|
+
}
|
34
|
+
end
|
35
|
+
|
36
|
+
name ? details.first : details
|
37
|
+
end
|
38
|
+
|
39
|
+
# Get a configuration setting
|
40
|
+
def [](name)
|
41
|
+
case find_definition(name).type
|
42
|
+
when :integer
|
43
|
+
API::Sphinxbase.cmd_ln_int_r(ps_config, "-#{name}")
|
44
|
+
when :float
|
45
|
+
API::Sphinxbase.cmd_ln_float_r(ps_config, "-#{name}")
|
46
|
+
when :string
|
47
|
+
API::Sphinxbase.cmd_ln_str_r(ps_config, "-#{name}")
|
48
|
+
when :boolean
|
49
|
+
API::Sphinxbase.cmd_ln_int_r(ps_config, "-#{name}") != 0
|
50
|
+
when :string_list
|
51
|
+
raise NotImplementedException
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
# Set a configuration setting with type checking
|
56
|
+
def []=(name, value)
|
57
|
+
check_type(name, type = find_definition(name).type, value)
|
58
|
+
|
59
|
+
case type
|
60
|
+
when :integer
|
61
|
+
API::Sphinxbase.cmd_ln_set_int_r(ps_config, "-#{name}", value.to_i)
|
62
|
+
when :float
|
63
|
+
API::Sphinxbase.cmd_ln_set_float_r(ps_config, "-#{name}", value.to_f)
|
64
|
+
when :string
|
65
|
+
API::Sphinxbase.cmd_ln_set_str_r(ps_config, "-#{name}", (value.to_s if value))
|
66
|
+
when :boolean
|
67
|
+
API::Sphinxbase.cmd_ln_set_int_r(ps_config, "-#{name}", value ? 1 : 0)
|
68
|
+
when :string_list
|
69
|
+
raise NotImplementedException
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
private
|
74
|
+
|
75
|
+
def find_definition(name)
|
76
|
+
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
77
|
+
end
|
78
|
+
|
79
|
+
def check_type(name, expected_type, value)
|
80
|
+
conversion_method = case expected_type
|
81
|
+
when :integer then :to_i
|
82
|
+
when :float then :to_f
|
83
|
+
end
|
84
|
+
|
85
|
+
if conversion_method && !value.respond_to?(conversion_method)
|
86
|
+
raise "Configuration setting '#{name}' must be of type #{expected_type.to_s.capitalize}"
|
87
|
+
end
|
88
|
+
|
89
|
+
if value.nil? && expected_type != :string
|
90
|
+
raise "Only string settings can be set to nil"
|
91
|
+
end
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class Default < Base
|
4
|
+
def initialize
|
5
|
+
super
|
6
|
+
|
7
|
+
# Sets default grammar and language model if they are not set explicitly and
|
8
|
+
# are present in the default search path.
|
9
|
+
API::Pocketsphinx.ps_default_search_args(@ps_config)
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
def self.default
|
14
|
+
Default.new
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class KeywordSpotting < Default
|
4
|
+
attr_reader :kws_threshold
|
5
|
+
|
6
|
+
def initialize(keyword, threshold = nil)
|
7
|
+
super()
|
8
|
+
|
9
|
+
self['lm'] = nil
|
10
|
+
self.keyword = keyword
|
11
|
+
self.kws_threshold = threshold if threshold
|
12
|
+
end
|
13
|
+
|
14
|
+
def keyword
|
15
|
+
self['keyphrase']
|
16
|
+
end
|
17
|
+
|
18
|
+
def keyword=(value)
|
19
|
+
self['keyphrase'] = sanitize_keyword value
|
20
|
+
end
|
21
|
+
|
22
|
+
def kws_threshold
|
23
|
+
self['kws_threshold']
|
24
|
+
end
|
25
|
+
|
26
|
+
def kws_threshold=(value)
|
27
|
+
self['kws_threshold'] = value
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
|
32
|
+
def sanitize_keyword(keyword)
|
33
|
+
keyword.downcase
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
data/lib/pocketsphinx/decoder.rb
CHANGED
@@ -1,13 +1,22 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
-
class Decoder
|
2
|
+
class Decoder < Struct.new(:configuration)
|
3
3
|
Error = Class.new(StandardError)
|
4
4
|
|
5
|
-
attr_reader :ps_decoder
|
6
5
|
attr_writer :ps_api
|
7
6
|
|
8
|
-
|
9
|
-
|
10
|
-
|
7
|
+
# Reinitialize the decoder with updated configuration.
|
8
|
+
#
|
9
|
+
# This function allows you to switch the acoustic model, dictionary, or other configuration
|
10
|
+
# without creating an entirely new decoding object.
|
11
|
+
#
|
12
|
+
# @param [Configuration] configuration An optional new configuration to use. If this is
|
13
|
+
# nil, the previous configuration will be reloaded, with any changes applied.
|
14
|
+
def reconfigure(configuration = nil)
|
15
|
+
self.configuration = configuration if configuration
|
16
|
+
|
17
|
+
ps_api.ps_reinit(ps_decoder, self.configuration.ps_config).tap do |result|
|
18
|
+
raise Error, "Decoder#reconfigure failed with error code #{result}" if result < 0
|
19
|
+
end
|
11
20
|
end
|
12
21
|
|
13
22
|
# Decode a raw audio stream as a single utterance, opening a file if path given
|
@@ -55,7 +64,7 @@ module Pocketsphinx
|
|
55
64
|
# worth of data. This may allow the recognizer to produce more accurate results.
|
56
65
|
# @return Number of frames of data searched
|
57
66
|
def process_raw(buffer, size, no_search = false, full_utt = false)
|
58
|
-
ps_api.ps_process_raw(
|
67
|
+
ps_api.ps_process_raw(ps_decoder, buffer, size, no_search ? 1 : 0, full_utt ? 1 : 0).tap do |result|
|
59
68
|
raise Error, "Decoder#process_raw failed with error code #{result}" if result < 0
|
60
69
|
end
|
61
70
|
end
|
@@ -68,21 +77,21 @@ module Pocketsphinx
|
|
68
77
|
#
|
69
78
|
# @param [String] name String uniquely identifying this utterance. If nil, one will be created.
|
70
79
|
def start_utterance(name = nil)
|
71
|
-
ps_api.ps_start_utt(
|
80
|
+
ps_api.ps_start_utt(ps_decoder, name).tap do |result|
|
72
81
|
raise Error, "Decoder#start_utterance failed with error code #{result}" if result < 0
|
73
82
|
end
|
74
83
|
end
|
75
84
|
|
76
85
|
# End utterance processing
|
77
86
|
def end_utterance
|
78
|
-
ps_api.ps_end_utt(
|
87
|
+
ps_api.ps_end_utt(ps_decoder).tap do |result|
|
79
88
|
raise Error, "Decoder#end_utterance failed with error code #{result}" if result < 0
|
80
89
|
end
|
81
90
|
end
|
82
91
|
|
83
92
|
# Checks if the last feed audio buffer contained speech
|
84
93
|
def in_speech?
|
85
|
-
ps_api.ps_get_in_speech(
|
94
|
+
ps_api.ps_get_in_speech(ps_decoder) != 0
|
86
95
|
end
|
87
96
|
|
88
97
|
# Get hypothesis string and path score.
|
@@ -90,11 +99,15 @@ module Pocketsphinx
|
|
90
99
|
# @return [String] Hypothesis string
|
91
100
|
# @todo Expand to return path score and utterance ID
|
92
101
|
def hypothesis
|
93
|
-
ps_api.ps_get_hyp(
|
102
|
+
ps_api.ps_get_hyp(ps_decoder, nil, nil)
|
94
103
|
end
|
95
104
|
|
96
105
|
def ps_api
|
97
106
|
@ps_api || API::Pocketsphinx
|
98
107
|
end
|
108
|
+
|
109
|
+
def ps_decoder
|
110
|
+
@ps_decoder ||= ps_api.ps_init(configuration.ps_config)
|
111
|
+
end
|
99
112
|
end
|
100
113
|
end
|
@@ -65,7 +65,7 @@ module Pocketsphinx
|
|
65
65
|
#
|
66
66
|
# @param [Fixnum] max_samples The maximum samples we tried to read from the audio device
|
67
67
|
def read_audio_delay(max_samples = 4096)
|
68
|
-
max_samples / (2 * sample_rate)
|
68
|
+
max_samples.to_f / (2 * sample_rate)
|
69
69
|
end
|
70
70
|
|
71
71
|
def close_device
|
@@ -6,6 +6,7 @@ module Pocketsphinx
|
|
6
6
|
# Recordable interface must implement #record and #read_audio
|
7
7
|
attr_writer :recordable
|
8
8
|
attr_writer :decoder
|
9
|
+
attr_writer :configuration
|
9
10
|
|
10
11
|
def initialize(configuration = nil)
|
11
12
|
@configuration = configuration
|
@@ -23,6 +24,19 @@ module Pocketsphinx
|
|
23
24
|
@configuration ||= Configuration.default
|
24
25
|
end
|
25
26
|
|
27
|
+
# Reinitialize the decoder with updated configuration.
|
28
|
+
#
|
29
|
+
# See Decoder#reconfigure
|
30
|
+
#
|
31
|
+
# @param [Configuration] configuration An optional new configuration to use. If this is
|
32
|
+
# nil, the previous configuration will be reloaded, with any changes applied.
|
33
|
+
def reconfigure(configuration = nil)
|
34
|
+
self.configuration = configuration if configuration
|
35
|
+
|
36
|
+
decoder.reconfigure(configuration)
|
37
|
+
decoder.start_utterance if recognizing?
|
38
|
+
end
|
39
|
+
|
26
40
|
# Recognize utterances and yield hypotheses in infinite loop
|
27
41
|
#
|
28
42
|
# Splits speech into utterances by detecting silence between them.
|
@@ -32,6 +46,7 @@ module Pocketsphinx
|
|
32
46
|
# @param [Fixnum] max_samples Number of samples to process at a time
|
33
47
|
def recognize(max_samples = 4096)
|
34
48
|
decoder.start_utterance
|
49
|
+
@recognizing = true
|
35
50
|
|
36
51
|
recordable.record do
|
37
52
|
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
@@ -41,13 +56,16 @@ module Pocketsphinx
|
|
41
56
|
process_audio(buffer, max_samples) or break
|
42
57
|
end
|
43
58
|
|
44
|
-
|
59
|
+
hypothesis = get_hypothesis
|
60
|
+
yield hypothesis if hypothesis
|
45
61
|
else
|
46
62
|
process_audio(buffer, max_samples) or break
|
47
63
|
end
|
48
64
|
end
|
49
65
|
end
|
50
66
|
end
|
67
|
+
ensure
|
68
|
+
@recognizing = false
|
51
69
|
end
|
52
70
|
|
53
71
|
def in_speech?
|
@@ -55,6 +73,10 @@ module Pocketsphinx
|
|
55
73
|
decoder.in_speech?
|
56
74
|
end
|
57
75
|
|
76
|
+
def recognizing?
|
77
|
+
@recognizing == true
|
78
|
+
end
|
79
|
+
|
58
80
|
private
|
59
81
|
|
60
82
|
def process_audio(buffer, max_samples)
|
data/lib/pocketsphinx/version.rb
CHANGED
data/spec/configuration_spec.rb
CHANGED
@@ -4,7 +4,7 @@ describe Configuration do
|
|
4
4
|
subject { Pocketsphinx::Configuration.default }
|
5
5
|
|
6
6
|
it "provides a default pocketsphinx configuration" do
|
7
|
-
expect(subject).to be_a(Pocketsphinx::Configuration)
|
7
|
+
expect(subject).to be_a(Pocketsphinx::Configuration::Default)
|
8
8
|
end
|
9
9
|
|
10
10
|
it "supports integer settings" do
|
@@ -13,6 +13,8 @@ describe Configuration do
|
|
13
13
|
|
14
14
|
subject['frate'] = 50
|
15
15
|
expect(subject['frate']).to eq(50)
|
16
|
+
|
17
|
+
expect { subject['frate'] = nil }.to raise_exception "Only string settings can be set to nil"
|
16
18
|
end
|
17
19
|
|
18
20
|
it "supports float settings" do
|
@@ -21,24 +23,31 @@ describe Configuration do
|
|
21
23
|
|
22
24
|
subject['samprate'] = 8000
|
23
25
|
expect(subject['samprate']).to eq(8000)
|
26
|
+
|
27
|
+
expect { subject['samprate'] = nil }.to raise_exception "Only string settings can be set to nil"
|
24
28
|
end
|
25
29
|
|
26
|
-
it "supports
|
30
|
+
it "supports string settings" do
|
27
31
|
expect(subject['warp_type']).to eq('inverse_linear')
|
28
32
|
|
29
33
|
subject['warp_type'] = 'different_type'
|
30
34
|
expect(subject['warp_type']).to eq('different_type')
|
35
|
+
|
36
|
+
subject['warp_type'] = nil
|
37
|
+
expect(subject['warp_type']).to eq(nil)
|
31
38
|
end
|
32
39
|
|
33
|
-
it "supports
|
40
|
+
it "supports boolean settings" do
|
34
41
|
expect(subject['smoothspec']).to eq(false)
|
35
42
|
|
36
43
|
subject['smoothspec'] = true
|
37
44
|
expect(subject['smoothspec']).to eq(true)
|
45
|
+
|
46
|
+
expect { subject['smoothspec'] = nil }.to raise_exception "Only string settings can be set to nil"
|
38
47
|
end
|
39
48
|
|
40
49
|
it 'raises exceptions when setting with incorrectly typed values' do
|
41
|
-
expect { subject['frate'] = true }.to raise_exception "Configuration setting 'frate' must be
|
50
|
+
expect { subject['frate'] = true }.to raise_exception "Configuration setting 'frate' must be of type Integer"
|
42
51
|
end
|
43
52
|
|
44
53
|
it 'raises exceptions when a setting is unknown' do
|
@@ -77,4 +86,30 @@ describe Configuration do
|
|
77
86
|
})
|
78
87
|
end
|
79
88
|
end
|
89
|
+
|
90
|
+
context 'keyword spotting configuration' do
|
91
|
+
subject { Configuration::KeywordSpotting.new('Okay computer') }
|
92
|
+
|
93
|
+
it 'sets the lowercase keyphrase' do
|
94
|
+
expect(subject['keyphrase']).to eq('okay computer')
|
95
|
+
end
|
96
|
+
|
97
|
+
it 'uses no language model' do
|
98
|
+
expect(subject['lm']).to be_nil
|
99
|
+
end
|
100
|
+
|
101
|
+
it 'exposes the keyphrase setting as #keyword' do
|
102
|
+
subject.keyword = 'Hello computer'
|
103
|
+
|
104
|
+
expect(subject.keyword).to eq('hello computer')
|
105
|
+
expect(subject['keyphrase']).to eq('hello computer')
|
106
|
+
end
|
107
|
+
|
108
|
+
it 'exposes the kws_threshold setting as #kws_threshold' do
|
109
|
+
subject.kws_threshold = 24
|
110
|
+
|
111
|
+
expect(subject.kws_threshold).to eq(24)
|
112
|
+
expect(subject['kws_threshold']).to eq(24)
|
113
|
+
end
|
114
|
+
end
|
80
115
|
end
|
data/spec/decoder_spec.rb
CHANGED
@@ -1,27 +1,47 @@
|
|
1
1
|
require 'spec_helper'
|
2
2
|
|
3
3
|
describe Decoder do
|
4
|
-
subject {
|
5
|
-
let(:ps_api) {
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
4
|
+
subject { Decoder.new(configuration) }
|
5
|
+
let(:ps_api) { subject.ps_api }
|
6
|
+
let(:ps_decoder) { double }
|
7
|
+
let(:configuration) { Configuration.default }
|
8
|
+
|
9
|
+
before do
|
10
|
+
subject.ps_api = double
|
11
|
+
allow(ps_api).to receive(:ps_init).and_return(ps_decoder)
|
10
12
|
end
|
11
13
|
|
12
|
-
#
|
13
|
-
|
14
|
-
|
15
|
-
|
14
|
+
describe '#reconfigure' do
|
15
|
+
it 'calls libpocketsphinx' do
|
16
|
+
expect(ps_api)
|
17
|
+
.to receive(:ps_reinit)
|
18
|
+
.with(subject.ps_decoder, configuration.ps_config)
|
19
|
+
.and_return(0)
|
16
20
|
|
17
|
-
|
18
|
-
# get this quite right, but nonetheless this is the expected output
|
19
|
-
expect(subject.hypothesis).to eq("go forward ten years")
|
21
|
+
subject.reconfigure
|
20
22
|
end
|
21
23
|
|
22
|
-
it '
|
23
|
-
|
24
|
-
|
24
|
+
it 'sets a new configuration if one is passed' do
|
25
|
+
new_config = Struct.new(:ps_config).new(:ps_config)
|
26
|
+
|
27
|
+
expect(ps_api)
|
28
|
+
.to receive(:ps_reinit)
|
29
|
+
.with(subject.ps_decoder, new_config.ps_config)
|
30
|
+
.and_return(0)
|
31
|
+
|
32
|
+
subject.reconfigure(new_config)
|
33
|
+
|
34
|
+
expect(subject.configuration).to be(new_config)
|
35
|
+
end
|
36
|
+
|
37
|
+
it 'raises an exception on error' do
|
38
|
+
expect(ps_api)
|
39
|
+
.to receive(:ps_reinit)
|
40
|
+
.with(subject.ps_decoder, configuration.ps_config)
|
41
|
+
.and_return(-1)
|
42
|
+
|
43
|
+
expect { subject.reconfigure }
|
44
|
+
.to raise_exception "Decoder#reconfigure failed with error code -1"
|
25
45
|
end
|
26
46
|
end
|
27
47
|
|
@@ -0,0 +1,28 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe Decoder do
|
4
|
+
subject { @decoder }
|
5
|
+
let(:configuration) { @configuration }
|
6
|
+
|
7
|
+
# Share decoder across all examples for speed
|
8
|
+
before :all do
|
9
|
+
@configuration = Configuration.default
|
10
|
+
@decoder = Decoder.new(@configuration)
|
11
|
+
end
|
12
|
+
|
13
|
+
describe '#decode' do
|
14
|
+
it 'correctly decodes the speech in goforward.raw' do
|
15
|
+
@decoder.ps_api = nil
|
16
|
+
subject.decode File.open('spec/assets/audio/goforward.raw', 'rb')
|
17
|
+
|
18
|
+
# With the default configuration (no specific grammar), pocketsphinx doesn't actually
|
19
|
+
# get this quite right, but nonetheless this is the expected output
|
20
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
21
|
+
end
|
22
|
+
|
23
|
+
it 'accepts a file path as well as a stream' do
|
24
|
+
subject.decode 'spec/assets/audio/goforward.raw'
|
25
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe SpeechRecognizer do
|
4
|
+
let(:recordable) { AudioFile.new('spec/assets/audio/goforward.raw') }
|
5
|
+
|
6
|
+
subject do
|
7
|
+
SpeechRecognizer.new.tap do |speech_recognizer|
|
8
|
+
speech_recognizer.recordable = recordable
|
9
|
+
speech_recognizer.decoder = @decoder
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
# Share decoder across all examples for speed
|
14
|
+
before :all do
|
15
|
+
@decoder = Decoder.new(Configuration.default)
|
16
|
+
end
|
17
|
+
|
18
|
+
describe '#recognize' do
|
19
|
+
it 'should decode speech in raw audio' do
|
20
|
+
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
data/spec/microphone_spec.rb
CHANGED
@@ -79,6 +79,12 @@ describe Microphone do
|
|
79
79
|
end
|
80
80
|
end
|
81
81
|
|
82
|
+
describe '#read_audio_delay' do
|
83
|
+
it 'should be 0.128 seconds for a max_samples of 4096 and sample rate of 16kHz' do
|
84
|
+
expect(subject.read_audio_delay(4096)).to eq(0.128)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
82
88
|
describe '#close_device' do
|
83
89
|
it 'calls libsphinxad' do
|
84
90
|
expect(ps_api)
|
@@ -1,23 +1,40 @@
|
|
1
1
|
require 'spec_helper'
|
2
2
|
|
3
3
|
describe SpeechRecognizer do
|
4
|
-
let(:
|
4
|
+
let(:configuration) { double }
|
5
|
+
let(:recordable) { double }
|
6
|
+
let(:decoder) { double }
|
7
|
+
subject { SpeechRecognizer.new(configuration) }
|
5
8
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
speech_recognizer.decoder = @decoder
|
10
|
-
end
|
9
|
+
before do
|
10
|
+
subject.decoder = decoder
|
11
|
+
subject.recordable = recordable
|
11
12
|
end
|
12
13
|
|
13
|
-
#
|
14
|
-
|
15
|
-
|
16
|
-
|
14
|
+
describe '#reconfigure' do
|
15
|
+
before do
|
16
|
+
allow(decoder).to receive(:reconfigure)
|
17
|
+
allow(decoder).to receive(:start_utterance)
|
18
|
+
end
|
19
|
+
|
20
|
+
it 'saves the configuration if one is given' do
|
21
|
+
subject.reconfigure(:new_configuration)
|
22
|
+
expect(subject.configuration).to eq(:new_configuration)
|
23
|
+
end
|
24
|
+
|
25
|
+
it 'reconfigures the decoder' do
|
26
|
+
expect(decoder).to receive(:reconfigure).with(nil).ordered
|
27
|
+
expect(decoder).to receive(:reconfigure).with(:new_configuration).ordered
|
28
|
+
|
29
|
+
subject.reconfigure
|
30
|
+
subject.reconfigure(:new_configuration)
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'restarts an utterance if recognition was interrupted' do
|
34
|
+
expect(subject).to receive(:recognizing?).and_return(true)
|
35
|
+
expect(decoder).to receive(:start_utterance)
|
17
36
|
|
18
|
-
|
19
|
-
it 'should decode speech in raw audio' do
|
20
|
-
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
37
|
+
subject.reconfigure
|
21
38
|
end
|
22
39
|
end
|
23
40
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pocketsphinx-ruby
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Howard Wilson
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-10-
|
11
|
+
date: 2014-10-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -95,6 +95,7 @@ files:
|
|
95
95
|
- README.md
|
96
96
|
- Rakefile
|
97
97
|
- examples/decode_audio_file.rb
|
98
|
+
- examples/keyword_spotter.rb
|
98
99
|
- examples/pocketsphinx_continuous.rb
|
99
100
|
- examples/record_audio_file.rb
|
100
101
|
- lib/pocketsphinx-ruby.rb
|
@@ -104,7 +105,9 @@ files:
|
|
104
105
|
- lib/pocketsphinx/api/sphinxbase.rb
|
105
106
|
- lib/pocketsphinx/audio_file.rb
|
106
107
|
- lib/pocketsphinx/audio_file_speech_recognizer.rb
|
107
|
-
- lib/pocketsphinx/configuration.rb
|
108
|
+
- lib/pocketsphinx/configuration/base.rb
|
109
|
+
- lib/pocketsphinx/configuration/default.rb
|
110
|
+
- lib/pocketsphinx/configuration/keyword_spotting.rb
|
108
111
|
- lib/pocketsphinx/configuration/setting_definition.rb
|
109
112
|
- lib/pocketsphinx/decoder.rb
|
110
113
|
- lib/pocketsphinx/live_speech_recognizer.rb
|
@@ -115,6 +118,8 @@ files:
|
|
115
118
|
- spec/assets/audio/goforward.raw
|
116
119
|
- spec/configuration_spec.rb
|
117
120
|
- spec/decoder_spec.rb
|
121
|
+
- spec/integration/decoder_spec.rb
|
122
|
+
- spec/integration/speech_recognizer_spec.rb
|
118
123
|
- spec/microphone_spec.rb
|
119
124
|
- spec/spec_helper.rb
|
120
125
|
- spec/speech_recognizer_spec.rb
|
@@ -146,6 +151,8 @@ test_files:
|
|
146
151
|
- spec/assets/audio/goforward.raw
|
147
152
|
- spec/configuration_spec.rb
|
148
153
|
- spec/decoder_spec.rb
|
154
|
+
- spec/integration/decoder_spec.rb
|
155
|
+
- spec/integration/speech_recognizer_spec.rb
|
149
156
|
- spec/microphone_spec.rb
|
150
157
|
- spec/spec_helper.rb
|
151
158
|
- spec/speech_recognizer_spec.rb
|
@@ -1,90 +0,0 @@
|
|
1
|
-
require 'pocketsphinx/configuration/setting_definition'
|
2
|
-
|
3
|
-
module Pocketsphinx
|
4
|
-
class Configuration
|
5
|
-
attr_reader :ps_config
|
6
|
-
attr_reader :setting_definitions
|
7
|
-
|
8
|
-
private_class_method :new
|
9
|
-
|
10
|
-
def initialize(ps_arg_defs)
|
11
|
-
@ps_arg_defs = ps_arg_defs
|
12
|
-
@setting_definitions = SettingDefinition.from_arg_defs(ps_arg_defs)
|
13
|
-
|
14
|
-
# Sets default settings based on definitions
|
15
|
-
@ps_config = API::Sphinxbase.cmd_ln_parse_r(nil, ps_arg_defs, 0, nil, 1)
|
16
|
-
|
17
|
-
# Sets default grammar and language model if they are not set explicitly and
|
18
|
-
# are present in the default search path.
|
19
|
-
API::Pocketsphinx.ps_default_search_args(@ps_config)
|
20
|
-
end
|
21
|
-
|
22
|
-
def self.default
|
23
|
-
new(API::Pocketsphinx.ps_args)
|
24
|
-
end
|
25
|
-
|
26
|
-
def setting_names
|
27
|
-
setting_definitions.keys.sort
|
28
|
-
end
|
29
|
-
|
30
|
-
# Get details for one or all configuration settings
|
31
|
-
#
|
32
|
-
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
33
|
-
def details(name = nil)
|
34
|
-
details = [name || setting_names].flatten.map do |name|
|
35
|
-
definition = find_definition(name)
|
36
|
-
|
37
|
-
{
|
38
|
-
name: name,
|
39
|
-
type: definition.type,
|
40
|
-
default: definition.default,
|
41
|
-
required: definition.required?,
|
42
|
-
value: self[name],
|
43
|
-
info: definition.doc
|
44
|
-
}
|
45
|
-
end
|
46
|
-
|
47
|
-
name ? details.first : details
|
48
|
-
end
|
49
|
-
|
50
|
-
# Get a configuration setting
|
51
|
-
def [](name)
|
52
|
-
case find_definition(name).type
|
53
|
-
when :integer
|
54
|
-
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}")
|
55
|
-
when :float
|
56
|
-
API::Sphinxbase.cmd_ln_float_r(@ps_config, "-#{name}")
|
57
|
-
when :string
|
58
|
-
API::Sphinxbase.cmd_ln_str_r(@ps_config, "-#{name}")
|
59
|
-
when :boolean
|
60
|
-
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}") != 0
|
61
|
-
when :string_list
|
62
|
-
raise NotImplementedException
|
63
|
-
end
|
64
|
-
end
|
65
|
-
|
66
|
-
# Set a configuration setting with type checking
|
67
|
-
def []=(name, value)
|
68
|
-
case find_definition(name).type
|
69
|
-
when :integer
|
70
|
-
raise "Configuration setting '#{name}' must be a Fixnum" unless value.respond_to?(:to_i)
|
71
|
-
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value.to_i)
|
72
|
-
when :float
|
73
|
-
raise "Configuration setting '#{name}' must be a Float" unless value.respond_to?(:to_i)
|
74
|
-
API::Sphinxbase.cmd_ln_set_float_r(@ps_config, "-#{name}", value.to_f)
|
75
|
-
when :string
|
76
|
-
API::Sphinxbase.cmd_ln_set_str_r(@ps_config, "-#{name}", value.to_s)
|
77
|
-
when :boolean
|
78
|
-
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value ? 1 : 0)
|
79
|
-
when :string_list
|
80
|
-
raise NotImplementedException
|
81
|
-
end
|
82
|
-
end
|
83
|
-
|
84
|
-
private
|
85
|
-
|
86
|
-
def find_definition(name)
|
87
|
-
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
88
|
-
end
|
89
|
-
end
|
90
|
-
end
|