pocketsphinx-ruby 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +17 -4
- data/examples/keyword_spotter.rb +21 -0
- data/lib/pocketsphinx.rb +8 -1
- data/lib/pocketsphinx/api/pocketsphinx.rb +1 -0
- data/lib/pocketsphinx/configuration/base.rb +95 -0
- data/lib/pocketsphinx/configuration/default.rb +17 -0
- data/lib/pocketsphinx/configuration/keyword_spotting.rb +37 -0
- data/lib/pocketsphinx/configuration/setting_definition.rb +1 -1
- data/lib/pocketsphinx/decoder.rb +23 -10
- data/lib/pocketsphinx/microphone.rb +1 -1
- data/lib/pocketsphinx/speech_recognizer.rb +23 -1
- data/lib/pocketsphinx/version.rb +1 -1
- data/spec/configuration_spec.rb +39 -4
- data/spec/decoder_spec.rb +36 -16
- data/spec/integration/decoder_spec.rb +28 -0
- data/spec/integration/speech_recognizer_spec.rb +23 -0
- data/spec/microphone_spec.rb +6 -0
- data/spec/speech_recognizer_spec.rb +30 -13
- metadata +10 -3
- data/lib/pocketsphinx/configuration.rb +0 -90
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3bf38b30cbc9fd5c2375d2c330238d1ad429e44c
|
4
|
+
data.tar.gz: 82aecda7d2c95378f15f47cc897b13419b83ce00
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2bb177461a17173815f299d3b17807014b80a22c3fae818569a4d29355e33dd506d3e1d4d195f8fa84ee09073014212fadd17a00e3f088a82d4278ccceea936a
|
7
|
+
data.tar.gz: 1ba9d9b74c999a05091e870b0fe2aee6c45df623b4d84a7364268eab1e4cf4e41dcf11b7206e90f5e3afd1d9ee292e503f8b4a6712b859e151e611944fe867a4
|
data/README.md
CHANGED
@@ -3,6 +3,7 @@
|
|
3
3
|
[![Build Status](http://img.shields.io/travis/watsonbox/pocketsphinx-ruby.svg?style=flat)](https://travis-ci.org/watsonbox/pocketsphinx-ruby)
|
4
4
|
[![Code Climate](http://img.shields.io/codeclimate/github/watsonbox/pocketsphinx-ruby/badges/gpa.svg?style=flat)](https://codeclimate.com/github/watsonbox/pocketsphinx-ruby)
|
5
5
|
[![Coverage Status](https://img.shields.io/coveralls/watsonbox/pocketsphinx-ruby.svg?style=flat)](https://coveralls.io/r/watsonbox/pocketsphinx-ruby)
|
6
|
+
[![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg?style=flat)](http://www.rubydoc.info/gems/pocketsphinx-ruby/frames)
|
6
7
|
|
7
8
|
This gem provides Ruby [FFI](https://github.com/ffi/ffi) bindings for [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx), a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the [CMU Sphinx](http://cmusphinx.sourceforge.net/) Open Source Toolkit For Speech Recognition.
|
8
9
|
|
@@ -50,7 +51,7 @@ Or install it yourself as:
|
|
50
51
|
$ gem install pocketsphinx-ruby
|
51
52
|
|
52
53
|
|
53
|
-
##
|
54
|
+
## Usage
|
54
55
|
|
55
56
|
The `LiveSpeechRecognizer` is modeled on the same class in [Sphinx4](http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4). It uses the `Microphone` and `Decoder` classes internally to provide a simple, high-level recognition interface:
|
56
57
|
|
@@ -75,7 +76,7 @@ end
|
|
75
76
|
These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the `vad_postspeech`, `vad_prespeech`, and `vad_threshold` configuration settings.
|
76
77
|
|
77
78
|
|
78
|
-
|
79
|
+
### Configuration
|
79
80
|
|
80
81
|
All of Pocketsphinx's decoding settings are managed by the `Configuration` class, which can be passed into the high-level speech recognizers:
|
81
82
|
|
@@ -98,7 +99,7 @@ Pocketsphinx::LiveSpeechRecognizer.new(configuration)
|
|
98
99
|
You can find the output of `configuration.details` [here](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Default-Pocketsphinx-Configuration) for more information on the various different settings.
|
99
100
|
|
100
101
|
|
101
|
-
|
102
|
+
### Microphone
|
102
103
|
|
103
104
|
The `Microphone` class uses Pocketsphinx's libsphinxad to record audio for speech recognition. For desktop applications this should normally be 16bit/16kHz raw PCM audio, so these are the default settings. The exact audio backend depends on [what was selected](https://github.com/cmusphinx/sphinxbase/blob/master/configure.in#L138) when libsphinxad was built. On OSX, OpenAL is [now supported](https://github.com/cmusphinx/sphinxbase/commit/5cc55c4721273681200e1f754ff0798ac073b950) and should work just fine.
|
104
105
|
|
@@ -124,7 +125,7 @@ end
|
|
124
125
|
To open this audio file take a look at [this wiki page](https://github.com/watsonbox/pocketsphinx-ruby/wiki/Importing-raw-PCM-audio-with-Audacity).
|
125
126
|
|
126
127
|
|
127
|
-
|
128
|
+
### Decoder
|
128
129
|
|
129
130
|
The `Decoder` class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:
|
130
131
|
|
@@ -136,6 +137,18 @@ puts decoder.hypothesis # => "go forward ten years"
|
|
136
137
|
```
|
137
138
|
|
138
139
|
|
140
|
+
### Keyword Spotting
|
141
|
+
|
142
|
+
Keyword spotting is another feature that is not in the current stable (0.8) releases of Pocketsphinx, having been [merged into trunk](https://github.com/cmusphinx/pocketsphinx/commit/f562f9356cc7f1ade4941ebdde0c377642a023e3) early in 2014. In can be useful for detecting an activation keyword in a command and control application, while ignoring all other speech. Set up a recognizer as follows:
|
143
|
+
|
144
|
+
```ruby
|
145
|
+
configuration = Configuration::KeywordSpotting.new('Okay computer')
|
146
|
+
recognizer = LiveSpeechRecognizer.new(configuration)
|
147
|
+
```
|
148
|
+
|
149
|
+
The `KeywordSpotting` configuration accepts a second argument for adjusting the sensitivity of the keyword detection. Note that this is just a wrapper which sets the `keyphrase` and `kws_threshold` settings on the default configuration.
|
150
|
+
|
151
|
+
|
139
152
|
## Contributing
|
140
153
|
|
141
154
|
1. Fork it ( https://github.com/[my-github-username]/pocketsphinx-ruby/fork )
|
@@ -0,0 +1,21 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "pocketsphinx-ruby"
|
5
|
+
|
6
|
+
include Pocketsphinx
|
7
|
+
|
8
|
+
configuration = Configuration::KeywordSpotting.new('hello computer')
|
9
|
+
recognizer = LiveSpeechRecognizer.new(configuration)
|
10
|
+
|
11
|
+
recognizer.recognize do |speech|
|
12
|
+
if configuration.keyword == 'hello computer'
|
13
|
+
configuration.keyword = 'goodbye computer'
|
14
|
+
else
|
15
|
+
configuration.keyword = 'hello computer'
|
16
|
+
end
|
17
|
+
|
18
|
+
recognizer.reconfigure
|
19
|
+
|
20
|
+
puts "You said '#{speech}'. Keyword is now '#{configuration.keyword}'"
|
21
|
+
end
|
data/lib/pocketsphinx.rb
CHANGED
@@ -1,11 +1,18 @@
|
|
1
1
|
require 'ffi'
|
2
2
|
|
3
3
|
require "pocketsphinx/version"
|
4
|
+
|
5
|
+
# Pocketsphinx FFI API
|
4
6
|
require "pocketsphinx/api/sphinxbase"
|
5
7
|
require "pocketsphinx/api/sphinxad"
|
6
8
|
require "pocketsphinx/api/pocketsphinx"
|
7
9
|
|
8
|
-
|
10
|
+
# Configuration
|
11
|
+
require 'pocketsphinx/configuration/setting_definition'
|
12
|
+
require "pocketsphinx/configuration/base"
|
13
|
+
require "pocketsphinx/configuration/default"
|
14
|
+
require "pocketsphinx/configuration/keyword_spotting"
|
15
|
+
|
9
16
|
require "pocketsphinx/audio_file"
|
10
17
|
require "pocketsphinx/microphone"
|
11
18
|
require "pocketsphinx/decoder"
|
@@ -8,6 +8,7 @@ module Pocketsphinx
|
|
8
8
|
typedef :pointer, :configuration
|
9
9
|
|
10
10
|
attach_function :ps_init, [:configuration], :decoder
|
11
|
+
attach_function :ps_reinit, [:decoder, :configuration], :int
|
11
12
|
attach_function :ps_default_search_args, [:pointer], :void
|
12
13
|
attach_function :ps_args, [], :pointer
|
13
14
|
attach_function :ps_decode_raw, [:decoder, :pointer, :string, :long], :int
|
@@ -0,0 +1,95 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class Base
|
4
|
+
attr_reader :ps_config
|
5
|
+
attr_reader :setting_definitions
|
6
|
+
|
7
|
+
def initialize
|
8
|
+
@ps_arg_defs = API::Pocketsphinx.ps_args
|
9
|
+
@setting_definitions = SettingDefinition.from_arg_defs(@ps_arg_defs)
|
10
|
+
|
11
|
+
# Sets default settings based on definitions
|
12
|
+
@ps_config = API::Sphinxbase.cmd_ln_parse_r(nil, @ps_arg_defs, 0, nil, 1)
|
13
|
+
end
|
14
|
+
|
15
|
+
def setting_names
|
16
|
+
setting_definitions.keys.sort
|
17
|
+
end
|
18
|
+
|
19
|
+
# Get details for one or all configuration settings
|
20
|
+
#
|
21
|
+
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
22
|
+
def details(name = nil)
|
23
|
+
details = [name || setting_names].flatten.map do |name|
|
24
|
+
definition = find_definition(name)
|
25
|
+
|
26
|
+
{
|
27
|
+
name: name,
|
28
|
+
type: definition.type,
|
29
|
+
default: definition.default,
|
30
|
+
required: definition.required?,
|
31
|
+
value: self[name],
|
32
|
+
info: definition.doc
|
33
|
+
}
|
34
|
+
end
|
35
|
+
|
36
|
+
name ? details.first : details
|
37
|
+
end
|
38
|
+
|
39
|
+
# Get a configuration setting
|
40
|
+
def [](name)
|
41
|
+
case find_definition(name).type
|
42
|
+
when :integer
|
43
|
+
API::Sphinxbase.cmd_ln_int_r(ps_config, "-#{name}")
|
44
|
+
when :float
|
45
|
+
API::Sphinxbase.cmd_ln_float_r(ps_config, "-#{name}")
|
46
|
+
when :string
|
47
|
+
API::Sphinxbase.cmd_ln_str_r(ps_config, "-#{name}")
|
48
|
+
when :boolean
|
49
|
+
API::Sphinxbase.cmd_ln_int_r(ps_config, "-#{name}") != 0
|
50
|
+
when :string_list
|
51
|
+
raise NotImplementedException
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
# Set a configuration setting with type checking
|
56
|
+
def []=(name, value)
|
57
|
+
check_type(name, type = find_definition(name).type, value)
|
58
|
+
|
59
|
+
case type
|
60
|
+
when :integer
|
61
|
+
API::Sphinxbase.cmd_ln_set_int_r(ps_config, "-#{name}", value.to_i)
|
62
|
+
when :float
|
63
|
+
API::Sphinxbase.cmd_ln_set_float_r(ps_config, "-#{name}", value.to_f)
|
64
|
+
when :string
|
65
|
+
API::Sphinxbase.cmd_ln_set_str_r(ps_config, "-#{name}", (value.to_s if value))
|
66
|
+
when :boolean
|
67
|
+
API::Sphinxbase.cmd_ln_set_int_r(ps_config, "-#{name}", value ? 1 : 0)
|
68
|
+
when :string_list
|
69
|
+
raise NotImplementedException
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
private
|
74
|
+
|
75
|
+
def find_definition(name)
|
76
|
+
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
77
|
+
end
|
78
|
+
|
79
|
+
def check_type(name, expected_type, value)
|
80
|
+
conversion_method = case expected_type
|
81
|
+
when :integer then :to_i
|
82
|
+
when :float then :to_f
|
83
|
+
end
|
84
|
+
|
85
|
+
if conversion_method && !value.respond_to?(conversion_method)
|
86
|
+
raise "Configuration setting '#{name}' must be of type #{expected_type.to_s.capitalize}"
|
87
|
+
end
|
88
|
+
|
89
|
+
if value.nil? && expected_type != :string
|
90
|
+
raise "Only string settings can be set to nil"
|
91
|
+
end
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class Default < Base
|
4
|
+
def initialize
|
5
|
+
super
|
6
|
+
|
7
|
+
# Sets default grammar and language model if they are not set explicitly and
|
8
|
+
# are present in the default search path.
|
9
|
+
API::Pocketsphinx.ps_default_search_args(@ps_config)
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
def self.default
|
14
|
+
Default.new
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
module Pocketsphinx
|
2
|
+
module Configuration
|
3
|
+
class KeywordSpotting < Default
|
4
|
+
attr_reader :kws_threshold
|
5
|
+
|
6
|
+
def initialize(keyword, threshold = nil)
|
7
|
+
super()
|
8
|
+
|
9
|
+
self['lm'] = nil
|
10
|
+
self.keyword = keyword
|
11
|
+
self.kws_threshold = threshold if threshold
|
12
|
+
end
|
13
|
+
|
14
|
+
def keyword
|
15
|
+
self['keyphrase']
|
16
|
+
end
|
17
|
+
|
18
|
+
def keyword=(value)
|
19
|
+
self['keyphrase'] = sanitize_keyword value
|
20
|
+
end
|
21
|
+
|
22
|
+
def kws_threshold
|
23
|
+
self['kws_threshold']
|
24
|
+
end
|
25
|
+
|
26
|
+
def kws_threshold=(value)
|
27
|
+
self['kws_threshold'] = value
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
|
32
|
+
def sanitize_keyword(keyword)
|
33
|
+
keyword.downcase
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
data/lib/pocketsphinx/decoder.rb
CHANGED
@@ -1,13 +1,22 @@
|
|
1
1
|
module Pocketsphinx
|
2
|
-
class Decoder
|
2
|
+
class Decoder < Struct.new(:configuration)
|
3
3
|
Error = Class.new(StandardError)
|
4
4
|
|
5
|
-
attr_reader :ps_decoder
|
6
5
|
attr_writer :ps_api
|
7
6
|
|
8
|
-
|
9
|
-
|
10
|
-
|
7
|
+
# Reinitialize the decoder with updated configuration.
|
8
|
+
#
|
9
|
+
# This function allows you to switch the acoustic model, dictionary, or other configuration
|
10
|
+
# without creating an entirely new decoding object.
|
11
|
+
#
|
12
|
+
# @param [Configuration] configuration An optional new configuration to use. If this is
|
13
|
+
# nil, the previous configuration will be reloaded, with any changes applied.
|
14
|
+
def reconfigure(configuration = nil)
|
15
|
+
self.configuration = configuration if configuration
|
16
|
+
|
17
|
+
ps_api.ps_reinit(ps_decoder, self.configuration.ps_config).tap do |result|
|
18
|
+
raise Error, "Decoder#reconfigure failed with error code #{result}" if result < 0
|
19
|
+
end
|
11
20
|
end
|
12
21
|
|
13
22
|
# Decode a raw audio stream as a single utterance, opening a file if path given
|
@@ -55,7 +64,7 @@ module Pocketsphinx
|
|
55
64
|
# worth of data. This may allow the recognizer to produce more accurate results.
|
56
65
|
# @return Number of frames of data searched
|
57
66
|
def process_raw(buffer, size, no_search = false, full_utt = false)
|
58
|
-
ps_api.ps_process_raw(
|
67
|
+
ps_api.ps_process_raw(ps_decoder, buffer, size, no_search ? 1 : 0, full_utt ? 1 : 0).tap do |result|
|
59
68
|
raise Error, "Decoder#process_raw failed with error code #{result}" if result < 0
|
60
69
|
end
|
61
70
|
end
|
@@ -68,21 +77,21 @@ module Pocketsphinx
|
|
68
77
|
#
|
69
78
|
# @param [String] name String uniquely identifying this utterance. If nil, one will be created.
|
70
79
|
def start_utterance(name = nil)
|
71
|
-
ps_api.ps_start_utt(
|
80
|
+
ps_api.ps_start_utt(ps_decoder, name).tap do |result|
|
72
81
|
raise Error, "Decoder#start_utterance failed with error code #{result}" if result < 0
|
73
82
|
end
|
74
83
|
end
|
75
84
|
|
76
85
|
# End utterance processing
|
77
86
|
def end_utterance
|
78
|
-
ps_api.ps_end_utt(
|
87
|
+
ps_api.ps_end_utt(ps_decoder).tap do |result|
|
79
88
|
raise Error, "Decoder#end_utterance failed with error code #{result}" if result < 0
|
80
89
|
end
|
81
90
|
end
|
82
91
|
|
83
92
|
# Checks if the last feed audio buffer contained speech
|
84
93
|
def in_speech?
|
85
|
-
ps_api.ps_get_in_speech(
|
94
|
+
ps_api.ps_get_in_speech(ps_decoder) != 0
|
86
95
|
end
|
87
96
|
|
88
97
|
# Get hypothesis string and path score.
|
@@ -90,11 +99,15 @@ module Pocketsphinx
|
|
90
99
|
# @return [String] Hypothesis string
|
91
100
|
# @todo Expand to return path score and utterance ID
|
92
101
|
def hypothesis
|
93
|
-
ps_api.ps_get_hyp(
|
102
|
+
ps_api.ps_get_hyp(ps_decoder, nil, nil)
|
94
103
|
end
|
95
104
|
|
96
105
|
def ps_api
|
97
106
|
@ps_api || API::Pocketsphinx
|
98
107
|
end
|
108
|
+
|
109
|
+
def ps_decoder
|
110
|
+
@ps_decoder ||= ps_api.ps_init(configuration.ps_config)
|
111
|
+
end
|
99
112
|
end
|
100
113
|
end
|
@@ -65,7 +65,7 @@ module Pocketsphinx
|
|
65
65
|
#
|
66
66
|
# @param [Fixnum] max_samples The maximum samples we tried to read from the audio device
|
67
67
|
def read_audio_delay(max_samples = 4096)
|
68
|
-
max_samples / (2 * sample_rate)
|
68
|
+
max_samples.to_f / (2 * sample_rate)
|
69
69
|
end
|
70
70
|
|
71
71
|
def close_device
|
@@ -6,6 +6,7 @@ module Pocketsphinx
|
|
6
6
|
# Recordable interface must implement #record and #read_audio
|
7
7
|
attr_writer :recordable
|
8
8
|
attr_writer :decoder
|
9
|
+
attr_writer :configuration
|
9
10
|
|
10
11
|
def initialize(configuration = nil)
|
11
12
|
@configuration = configuration
|
@@ -23,6 +24,19 @@ module Pocketsphinx
|
|
23
24
|
@configuration ||= Configuration.default
|
24
25
|
end
|
25
26
|
|
27
|
+
# Reinitialize the decoder with updated configuration.
|
28
|
+
#
|
29
|
+
# See Decoder#reconfigure
|
30
|
+
#
|
31
|
+
# @param [Configuration] configuration An optional new configuration to use. If this is
|
32
|
+
# nil, the previous configuration will be reloaded, with any changes applied.
|
33
|
+
def reconfigure(configuration = nil)
|
34
|
+
self.configuration = configuration if configuration
|
35
|
+
|
36
|
+
decoder.reconfigure(configuration)
|
37
|
+
decoder.start_utterance if recognizing?
|
38
|
+
end
|
39
|
+
|
26
40
|
# Recognize utterances and yield hypotheses in infinite loop
|
27
41
|
#
|
28
42
|
# Splits speech into utterances by detecting silence between them.
|
@@ -32,6 +46,7 @@ module Pocketsphinx
|
|
32
46
|
# @param [Fixnum] max_samples Number of samples to process at a time
|
33
47
|
def recognize(max_samples = 4096)
|
34
48
|
decoder.start_utterance
|
49
|
+
@recognizing = true
|
35
50
|
|
36
51
|
recordable.record do
|
37
52
|
FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
|
@@ -41,13 +56,16 @@ module Pocketsphinx
|
|
41
56
|
process_audio(buffer, max_samples) or break
|
42
57
|
end
|
43
58
|
|
44
|
-
|
59
|
+
hypothesis = get_hypothesis
|
60
|
+
yield hypothesis if hypothesis
|
45
61
|
else
|
46
62
|
process_audio(buffer, max_samples) or break
|
47
63
|
end
|
48
64
|
end
|
49
65
|
end
|
50
66
|
end
|
67
|
+
ensure
|
68
|
+
@recognizing = false
|
51
69
|
end
|
52
70
|
|
53
71
|
def in_speech?
|
@@ -55,6 +73,10 @@ module Pocketsphinx
|
|
55
73
|
decoder.in_speech?
|
56
74
|
end
|
57
75
|
|
76
|
+
def recognizing?
|
77
|
+
@recognizing == true
|
78
|
+
end
|
79
|
+
|
58
80
|
private
|
59
81
|
|
60
82
|
def process_audio(buffer, max_samples)
|
data/lib/pocketsphinx/version.rb
CHANGED
data/spec/configuration_spec.rb
CHANGED
@@ -4,7 +4,7 @@ describe Configuration do
|
|
4
4
|
subject { Pocketsphinx::Configuration.default }
|
5
5
|
|
6
6
|
it "provides a default pocketsphinx configuration" do
|
7
|
-
expect(subject).to be_a(Pocketsphinx::Configuration)
|
7
|
+
expect(subject).to be_a(Pocketsphinx::Configuration::Default)
|
8
8
|
end
|
9
9
|
|
10
10
|
it "supports integer settings" do
|
@@ -13,6 +13,8 @@ describe Configuration do
|
|
13
13
|
|
14
14
|
subject['frate'] = 50
|
15
15
|
expect(subject['frate']).to eq(50)
|
16
|
+
|
17
|
+
expect { subject['frate'] = nil }.to raise_exception "Only string settings can be set to nil"
|
16
18
|
end
|
17
19
|
|
18
20
|
it "supports float settings" do
|
@@ -21,24 +23,31 @@ describe Configuration do
|
|
21
23
|
|
22
24
|
subject['samprate'] = 8000
|
23
25
|
expect(subject['samprate']).to eq(8000)
|
26
|
+
|
27
|
+
expect { subject['samprate'] = nil }.to raise_exception "Only string settings can be set to nil"
|
24
28
|
end
|
25
29
|
|
26
|
-
it "supports
|
30
|
+
it "supports string settings" do
|
27
31
|
expect(subject['warp_type']).to eq('inverse_linear')
|
28
32
|
|
29
33
|
subject['warp_type'] = 'different_type'
|
30
34
|
expect(subject['warp_type']).to eq('different_type')
|
35
|
+
|
36
|
+
subject['warp_type'] = nil
|
37
|
+
expect(subject['warp_type']).to eq(nil)
|
31
38
|
end
|
32
39
|
|
33
|
-
it "supports
|
40
|
+
it "supports boolean settings" do
|
34
41
|
expect(subject['smoothspec']).to eq(false)
|
35
42
|
|
36
43
|
subject['smoothspec'] = true
|
37
44
|
expect(subject['smoothspec']).to eq(true)
|
45
|
+
|
46
|
+
expect { subject['smoothspec'] = nil }.to raise_exception "Only string settings can be set to nil"
|
38
47
|
end
|
39
48
|
|
40
49
|
it 'raises exceptions when setting with incorrectly typed values' do
|
41
|
-
expect { subject['frate'] = true }.to raise_exception "Configuration setting 'frate' must be
|
50
|
+
expect { subject['frate'] = true }.to raise_exception "Configuration setting 'frate' must be of type Integer"
|
42
51
|
end
|
43
52
|
|
44
53
|
it 'raises exceptions when a setting is unknown' do
|
@@ -77,4 +86,30 @@ describe Configuration do
|
|
77
86
|
})
|
78
87
|
end
|
79
88
|
end
|
89
|
+
|
90
|
+
context 'keyword spotting configuration' do
|
91
|
+
subject { Configuration::KeywordSpotting.new('Okay computer') }
|
92
|
+
|
93
|
+
it 'sets the lowercase keyphrase' do
|
94
|
+
expect(subject['keyphrase']).to eq('okay computer')
|
95
|
+
end
|
96
|
+
|
97
|
+
it 'uses no language model' do
|
98
|
+
expect(subject['lm']).to be_nil
|
99
|
+
end
|
100
|
+
|
101
|
+
it 'exposes the keyphrase setting as #keyword' do
|
102
|
+
subject.keyword = 'Hello computer'
|
103
|
+
|
104
|
+
expect(subject.keyword).to eq('hello computer')
|
105
|
+
expect(subject['keyphrase']).to eq('hello computer')
|
106
|
+
end
|
107
|
+
|
108
|
+
it 'exposes the kws_threshold setting as #kws_threshold' do
|
109
|
+
subject.kws_threshold = 24
|
110
|
+
|
111
|
+
expect(subject.kws_threshold).to eq(24)
|
112
|
+
expect(subject['kws_threshold']).to eq(24)
|
113
|
+
end
|
114
|
+
end
|
80
115
|
end
|
data/spec/decoder_spec.rb
CHANGED
@@ -1,27 +1,47 @@
|
|
1
1
|
require 'spec_helper'
|
2
2
|
|
3
3
|
describe Decoder do
|
4
|
-
subject {
|
5
|
-
let(:ps_api) {
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
4
|
+
subject { Decoder.new(configuration) }
|
5
|
+
let(:ps_api) { subject.ps_api }
|
6
|
+
let(:ps_decoder) { double }
|
7
|
+
let(:configuration) { Configuration.default }
|
8
|
+
|
9
|
+
before do
|
10
|
+
subject.ps_api = double
|
11
|
+
allow(ps_api).to receive(:ps_init).and_return(ps_decoder)
|
10
12
|
end
|
11
13
|
|
12
|
-
#
|
13
|
-
|
14
|
-
|
15
|
-
|
14
|
+
describe '#reconfigure' do
|
15
|
+
it 'calls libpocketsphinx' do
|
16
|
+
expect(ps_api)
|
17
|
+
.to receive(:ps_reinit)
|
18
|
+
.with(subject.ps_decoder, configuration.ps_config)
|
19
|
+
.and_return(0)
|
16
20
|
|
17
|
-
|
18
|
-
# get this quite right, but nonetheless this is the expected output
|
19
|
-
expect(subject.hypothesis).to eq("go forward ten years")
|
21
|
+
subject.reconfigure
|
20
22
|
end
|
21
23
|
|
22
|
-
it '
|
23
|
-
|
24
|
-
|
24
|
+
it 'sets a new configuration if one is passed' do
|
25
|
+
new_config = Struct.new(:ps_config).new(:ps_config)
|
26
|
+
|
27
|
+
expect(ps_api)
|
28
|
+
.to receive(:ps_reinit)
|
29
|
+
.with(subject.ps_decoder, new_config.ps_config)
|
30
|
+
.and_return(0)
|
31
|
+
|
32
|
+
subject.reconfigure(new_config)
|
33
|
+
|
34
|
+
expect(subject.configuration).to be(new_config)
|
35
|
+
end
|
36
|
+
|
37
|
+
it 'raises an exception on error' do
|
38
|
+
expect(ps_api)
|
39
|
+
.to receive(:ps_reinit)
|
40
|
+
.with(subject.ps_decoder, configuration.ps_config)
|
41
|
+
.and_return(-1)
|
42
|
+
|
43
|
+
expect { subject.reconfigure }
|
44
|
+
.to raise_exception "Decoder#reconfigure failed with error code -1"
|
25
45
|
end
|
26
46
|
end
|
27
47
|
|
@@ -0,0 +1,28 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe Decoder do
|
4
|
+
subject { @decoder }
|
5
|
+
let(:configuration) { @configuration }
|
6
|
+
|
7
|
+
# Share decoder across all examples for speed
|
8
|
+
before :all do
|
9
|
+
@configuration = Configuration.default
|
10
|
+
@decoder = Decoder.new(@configuration)
|
11
|
+
end
|
12
|
+
|
13
|
+
describe '#decode' do
|
14
|
+
it 'correctly decodes the speech in goforward.raw' do
|
15
|
+
@decoder.ps_api = nil
|
16
|
+
subject.decode File.open('spec/assets/audio/goforward.raw', 'rb')
|
17
|
+
|
18
|
+
# With the default configuration (no specific grammar), pocketsphinx doesn't actually
|
19
|
+
# get this quite right, but nonetheless this is the expected output
|
20
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
21
|
+
end
|
22
|
+
|
23
|
+
it 'accepts a file path as well as a stream' do
|
24
|
+
subject.decode 'spec/assets/audio/goforward.raw'
|
25
|
+
expect(subject.hypothesis).to eq("go forward ten years")
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe SpeechRecognizer do
|
4
|
+
let(:recordable) { AudioFile.new('spec/assets/audio/goforward.raw') }
|
5
|
+
|
6
|
+
subject do
|
7
|
+
SpeechRecognizer.new.tap do |speech_recognizer|
|
8
|
+
speech_recognizer.recordable = recordable
|
9
|
+
speech_recognizer.decoder = @decoder
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
# Share decoder across all examples for speed
|
14
|
+
before :all do
|
15
|
+
@decoder = Decoder.new(Configuration.default)
|
16
|
+
end
|
17
|
+
|
18
|
+
describe '#recognize' do
|
19
|
+
it 'should decode speech in raw audio' do
|
20
|
+
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
data/spec/microphone_spec.rb
CHANGED
@@ -79,6 +79,12 @@ describe Microphone do
|
|
79
79
|
end
|
80
80
|
end
|
81
81
|
|
82
|
+
describe '#read_audio_delay' do
|
83
|
+
it 'should be 0.128 seconds for a max_samples of 4096 and sample rate of 16kHz' do
|
84
|
+
expect(subject.read_audio_delay(4096)).to eq(0.128)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
82
88
|
describe '#close_device' do
|
83
89
|
it 'calls libsphinxad' do
|
84
90
|
expect(ps_api)
|
@@ -1,23 +1,40 @@
|
|
1
1
|
require 'spec_helper'
|
2
2
|
|
3
3
|
describe SpeechRecognizer do
|
4
|
-
let(:
|
4
|
+
let(:configuration) { double }
|
5
|
+
let(:recordable) { double }
|
6
|
+
let(:decoder) { double }
|
7
|
+
subject { SpeechRecognizer.new(configuration) }
|
5
8
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
speech_recognizer.decoder = @decoder
|
10
|
-
end
|
9
|
+
before do
|
10
|
+
subject.decoder = decoder
|
11
|
+
subject.recordable = recordable
|
11
12
|
end
|
12
13
|
|
13
|
-
#
|
14
|
-
|
15
|
-
|
16
|
-
|
14
|
+
describe '#reconfigure' do
|
15
|
+
before do
|
16
|
+
allow(decoder).to receive(:reconfigure)
|
17
|
+
allow(decoder).to receive(:start_utterance)
|
18
|
+
end
|
19
|
+
|
20
|
+
it 'saves the configuration if one is given' do
|
21
|
+
subject.reconfigure(:new_configuration)
|
22
|
+
expect(subject.configuration).to eq(:new_configuration)
|
23
|
+
end
|
24
|
+
|
25
|
+
it 'reconfigures the decoder' do
|
26
|
+
expect(decoder).to receive(:reconfigure).with(nil).ordered
|
27
|
+
expect(decoder).to receive(:reconfigure).with(:new_configuration).ordered
|
28
|
+
|
29
|
+
subject.reconfigure
|
30
|
+
subject.reconfigure(:new_configuration)
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'restarts an utterance if recognition was interrupted' do
|
34
|
+
expect(subject).to receive(:recognizing?).and_return(true)
|
35
|
+
expect(decoder).to receive(:start_utterance)
|
17
36
|
|
18
|
-
|
19
|
-
it 'should decode speech in raw audio' do
|
20
|
-
expect { |b| subject.recognize(4096, &b) }.to yield_with_args("go forward ten years")
|
37
|
+
subject.reconfigure
|
21
38
|
end
|
22
39
|
end
|
23
40
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pocketsphinx-ruby
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Howard Wilson
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-10-
|
11
|
+
date: 2014-10-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -95,6 +95,7 @@ files:
|
|
95
95
|
- README.md
|
96
96
|
- Rakefile
|
97
97
|
- examples/decode_audio_file.rb
|
98
|
+
- examples/keyword_spotter.rb
|
98
99
|
- examples/pocketsphinx_continuous.rb
|
99
100
|
- examples/record_audio_file.rb
|
100
101
|
- lib/pocketsphinx-ruby.rb
|
@@ -104,7 +105,9 @@ files:
|
|
104
105
|
- lib/pocketsphinx/api/sphinxbase.rb
|
105
106
|
- lib/pocketsphinx/audio_file.rb
|
106
107
|
- lib/pocketsphinx/audio_file_speech_recognizer.rb
|
107
|
-
- lib/pocketsphinx/configuration.rb
|
108
|
+
- lib/pocketsphinx/configuration/base.rb
|
109
|
+
- lib/pocketsphinx/configuration/default.rb
|
110
|
+
- lib/pocketsphinx/configuration/keyword_spotting.rb
|
108
111
|
- lib/pocketsphinx/configuration/setting_definition.rb
|
109
112
|
- lib/pocketsphinx/decoder.rb
|
110
113
|
- lib/pocketsphinx/live_speech_recognizer.rb
|
@@ -115,6 +118,8 @@ files:
|
|
115
118
|
- spec/assets/audio/goforward.raw
|
116
119
|
- spec/configuration_spec.rb
|
117
120
|
- spec/decoder_spec.rb
|
121
|
+
- spec/integration/decoder_spec.rb
|
122
|
+
- spec/integration/speech_recognizer_spec.rb
|
118
123
|
- spec/microphone_spec.rb
|
119
124
|
- spec/spec_helper.rb
|
120
125
|
- spec/speech_recognizer_spec.rb
|
@@ -146,6 +151,8 @@ test_files:
|
|
146
151
|
- spec/assets/audio/goforward.raw
|
147
152
|
- spec/configuration_spec.rb
|
148
153
|
- spec/decoder_spec.rb
|
154
|
+
- spec/integration/decoder_spec.rb
|
155
|
+
- spec/integration/speech_recognizer_spec.rb
|
149
156
|
- spec/microphone_spec.rb
|
150
157
|
- spec/spec_helper.rb
|
151
158
|
- spec/speech_recognizer_spec.rb
|
@@ -1,90 +0,0 @@
|
|
1
|
-
require 'pocketsphinx/configuration/setting_definition'
|
2
|
-
|
3
|
-
module Pocketsphinx
|
4
|
-
class Configuration
|
5
|
-
attr_reader :ps_config
|
6
|
-
attr_reader :setting_definitions
|
7
|
-
|
8
|
-
private_class_method :new
|
9
|
-
|
10
|
-
def initialize(ps_arg_defs)
|
11
|
-
@ps_arg_defs = ps_arg_defs
|
12
|
-
@setting_definitions = SettingDefinition.from_arg_defs(ps_arg_defs)
|
13
|
-
|
14
|
-
# Sets default settings based on definitions
|
15
|
-
@ps_config = API::Sphinxbase.cmd_ln_parse_r(nil, ps_arg_defs, 0, nil, 1)
|
16
|
-
|
17
|
-
# Sets default grammar and language model if they are not set explicitly and
|
18
|
-
# are present in the default search path.
|
19
|
-
API::Pocketsphinx.ps_default_search_args(@ps_config)
|
20
|
-
end
|
21
|
-
|
22
|
-
def self.default
|
23
|
-
new(API::Pocketsphinx.ps_args)
|
24
|
-
end
|
25
|
-
|
26
|
-
def setting_names
|
27
|
-
setting_definitions.keys.sort
|
28
|
-
end
|
29
|
-
|
30
|
-
# Get details for one or all configuration settings
|
31
|
-
#
|
32
|
-
# @param [String] name Name of setting to get details for. Gets details for all settings if nil.
|
33
|
-
def details(name = nil)
|
34
|
-
details = [name || setting_names].flatten.map do |name|
|
35
|
-
definition = find_definition(name)
|
36
|
-
|
37
|
-
{
|
38
|
-
name: name,
|
39
|
-
type: definition.type,
|
40
|
-
default: definition.default,
|
41
|
-
required: definition.required?,
|
42
|
-
value: self[name],
|
43
|
-
info: definition.doc
|
44
|
-
}
|
45
|
-
end
|
46
|
-
|
47
|
-
name ? details.first : details
|
48
|
-
end
|
49
|
-
|
50
|
-
# Get a configuration setting
|
51
|
-
def [](name)
|
52
|
-
case find_definition(name).type
|
53
|
-
when :integer
|
54
|
-
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}")
|
55
|
-
when :float
|
56
|
-
API::Sphinxbase.cmd_ln_float_r(@ps_config, "-#{name}")
|
57
|
-
when :string
|
58
|
-
API::Sphinxbase.cmd_ln_str_r(@ps_config, "-#{name}")
|
59
|
-
when :boolean
|
60
|
-
API::Sphinxbase.cmd_ln_int_r(@ps_config, "-#{name}") != 0
|
61
|
-
when :string_list
|
62
|
-
raise NotImplementedException
|
63
|
-
end
|
64
|
-
end
|
65
|
-
|
66
|
-
# Set a configuration setting with type checking
|
67
|
-
def []=(name, value)
|
68
|
-
case find_definition(name).type
|
69
|
-
when :integer
|
70
|
-
raise "Configuration setting '#{name}' must be a Fixnum" unless value.respond_to?(:to_i)
|
71
|
-
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value.to_i)
|
72
|
-
when :float
|
73
|
-
raise "Configuration setting '#{name}' must be a Float" unless value.respond_to?(:to_i)
|
74
|
-
API::Sphinxbase.cmd_ln_set_float_r(@ps_config, "-#{name}", value.to_f)
|
75
|
-
when :string
|
76
|
-
API::Sphinxbase.cmd_ln_set_str_r(@ps_config, "-#{name}", value.to_s)
|
77
|
-
when :boolean
|
78
|
-
API::Sphinxbase.cmd_ln_set_int_r(@ps_config, "-#{name}", value ? 1 : 0)
|
79
|
-
when :string_list
|
80
|
-
raise NotImplementedException
|
81
|
-
end
|
82
|
-
end
|
83
|
-
|
84
|
-
private
|
85
|
-
|
86
|
-
def find_definition(name)
|
87
|
-
setting_definitions[name] or raise "Configuration setting '#{name}' does not exist"
|
88
|
-
end
|
89
|
-
end
|
90
|
-
end
|