pronounce 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +19 -0
- data/data/beep/ACKNOWLEDGEMENTS +36 -0
- data/data/beep/ANNOUNCE-1.0 +27 -0
- data/data/beep/README +39 -0
- data/data/beep/addparan +22 -0
- data/data/beep/beep-1.0 +257070 -0
- data/data/beep/case.txt +166944 -0
- data/data/beep/lexicode.doc +47 -0
- data/data/beep/phoncode.doc +48 -0
- data/data/beep/phone45.tab +45 -0
- data/data/beep/sayTimit.doc +130 -0
- data/data/beep/sayTimit.pl +174 -0
- data/data/cmudict/00README_FIRST.txt +36 -0
- data/data/cmudict/README.developer +50 -0
- data/data/cmudict/README.old +79 -0
- data/data/cmudict/README.weide +67 -0
- data/data/cmudict/cmudict.0.6d +129511 -0
- data/data/cmudict/cmudict.0.7a +133369 -0
- data/data/cmudict/cmudict.0.7a.phones +39 -0
- data/data/cmudict/cmudict.0.7a.symbols +84 -0
- data/data/cmudict/scripts/CompileDictionary.sh +36 -0
- data/data/cmudict/scripts/README.txt +27 -0
- data/data/cmudict/scripts/make_baseform.pl +172 -0
- data/data/cmudict/scripts/sort_cmudict.pl +141 -0
- data/data/cmudict/scripts/test_cmudict.pl +166 -0
- data/data/cmudict/scripts/test_dict.pl +119 -0
- data/data/cmudict/sphinxdict/README.txt +19 -0
- data/data/cmudict/sphinxdict/SphinxPhones_40 +40 -0
- data/data/cmudict/sphinxdict/cmudict.0.7a_SPHINX_40 +133012 -0
- data/data/cmudict/sphinxdict/cmudict_SPHINX_40 +133012 -0
- data/lib/pronounce.rb +33 -0
- metadata +104 -0
@@ -0,0 +1,47 @@
|
|
1
|
+
This file documents the code used to represent the word forms in the lexicon
|
2
|
+
|
3
|
+
The valid characters in this field are:
|
4
|
+
|
5
|
+
Characters Notes Example
|
6
|
+
[A-Z] All words are presently in upper case AARDVARK
|
7
|
+
. Abbreviations are represented with a trailing '.' A.
|
8
|
+
- Hypenated words are seaprated by '-' A-BOMB
|
9
|
+
_ Separator for multiple words with one lexical entry A_PRIORI
|
10
|
+
' ACCOUNTANTS'
|
11
|
+
\' CAF\'E
|
12
|
+
\` CR\`ECHE
|
13
|
+
\^ AR\^ETE
|
14
|
+
|
15
|
+
|
16
|
+
Also there are these one-off words:
|
17
|
+
|
18
|
+
!EXCLAMATION-POINT
|
19
|
+
"DOUBLE-QUOTE
|
20
|
+
%PERCENT
|
21
|
+
&ERSAND
|
22
|
+
&EM
|
23
|
+
...
|
24
|
+
&UN
|
25
|
+
|
26
|
+
'EM
|
27
|
+
...
|
28
|
+
'UN
|
29
|
+
(LEFT-PAREN
|
30
|
+
)RIGHT-PAREN
|
31
|
+
+PLUS
|
32
|
+
,COMMA
|
33
|
+
--DASH
|
34
|
+
-HYPHEN
|
35
|
+
-SHIRE
|
36
|
+
...ELLIPSIS
|
37
|
+
.PERIOD
|
38
|
+
.POINT
|
39
|
+
/SLASH
|
40
|
+
:COLON
|
41
|
+
;SEMI-COLON
|
42
|
+
;SEMI-COLON
|
43
|
+
;SEMICOLON
|
44
|
+
?QUESTION-MARK
|
45
|
+
...
|
46
|
+
{LEFT-BRACE
|
47
|
+
}RIGHT-BRACE
|
@@ -0,0 +1,48 @@
|
|
1
|
+
ARPAbet MRPA Edin. Alvey Example Relative frequency
|
2
|
+
p p p p put 3.1%
|
3
|
+
b b b b but 2.3%
|
4
|
+
t t t t ten 6.8%
|
5
|
+
d d d d den 4.1%
|
6
|
+
k k k k can 4.7%
|
7
|
+
m m m m man 3.1%
|
8
|
+
n n n n not 6.5%
|
9
|
+
l l l l like 5.5%
|
10
|
+
r r r r run 5.4%
|
11
|
+
f f f f full 1.8%
|
12
|
+
v v v v very 1.2%
|
13
|
+
s s s s some 6.6%
|
14
|
+
z z z z zeal 3.6%
|
15
|
+
hh h h h hat 0.8%
|
16
|
+
w w w w went 0.9%
|
17
|
+
g g g g game 1.3%
|
18
|
+
ch ch ch tS chain 0.5%
|
19
|
+
jh jh j dZ Jane 0.8%
|
20
|
+
ng ng ng 9 long 1.6%
|
21
|
+
th th th T thin 0.3%
|
22
|
+
dh dh dh D then 12.2%
|
23
|
+
sh sh sh S ship 1.2%
|
24
|
+
zh zh zh Z measure 0.1%
|
25
|
+
y y y j yes 0.8%
|
26
|
+
iy ii ee i bean 1.4%
|
27
|
+
aa aa ar A barn 0.9%
|
28
|
+
ao oo aw O born 1.0%
|
29
|
+
uw uu uu u boon 1.0%
|
30
|
+
er @@ er 3 burn 0.7%
|
31
|
+
ih i i I pit 10.0%
|
32
|
+
eh e e e pet 2.4%
|
33
|
+
ae a aa & pat 2.5%
|
34
|
+
ah uh u V putt 1.5%
|
35
|
+
oh o o 0 pot 1.6%
|
36
|
+
uh u oo U good 0.4%
|
37
|
+
ax @ a @ about 7.2%
|
38
|
+
ey ei ai eI bay 2.0%
|
39
|
+
ay ai ie aI buy 1.6%
|
40
|
+
oy oi oi oI boy 0.2%
|
41
|
+
ow ou oa @U no 1.5%
|
42
|
+
aw au ou aU now 0.4%
|
43
|
+
ia i@ eer I@ peer 0.7%
|
44
|
+
ea e@ air e@ pair 0.2%
|
45
|
+
ua u@ oor U@ poor 0.2%
|
46
|
+
|
47
|
+
See also the files abvailable from the Oxford Text Archive by FTP from
|
48
|
+
sable.ox.ac.uk in directories /pub/ota/public/dicts/710 and 1054.
|
@@ -0,0 +1,45 @@
|
|
1
|
+
sil
|
2
|
+
aa
|
3
|
+
ae
|
4
|
+
ah
|
5
|
+
ao
|
6
|
+
aw
|
7
|
+
ax
|
8
|
+
ay
|
9
|
+
b
|
10
|
+
ch
|
11
|
+
d
|
12
|
+
dh
|
13
|
+
ea
|
14
|
+
eh
|
15
|
+
er
|
16
|
+
ey
|
17
|
+
f
|
18
|
+
g
|
19
|
+
hh
|
20
|
+
ia
|
21
|
+
ih
|
22
|
+
iy
|
23
|
+
jh
|
24
|
+
k
|
25
|
+
l
|
26
|
+
m
|
27
|
+
n
|
28
|
+
ng
|
29
|
+
oh
|
30
|
+
ow
|
31
|
+
oy
|
32
|
+
p
|
33
|
+
r
|
34
|
+
s
|
35
|
+
sh
|
36
|
+
t
|
37
|
+
th
|
38
|
+
ua
|
39
|
+
uh
|
40
|
+
uw
|
41
|
+
v
|
42
|
+
w
|
43
|
+
y
|
44
|
+
z
|
45
|
+
zh
|
@@ -0,0 +1,130 @@
|
|
1
|
+
'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
|
2
|
+
notation, using 'say' from rsynth to produce audial checking of the transcrip-
|
3
|
+
tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
|
4
|
+
info is fed to it under certain commands. No attempt is made to make use of
|
5
|
+
the stress markers (this is due to rsynth, not the utility...).
|
6
|
+
|
7
|
+
Commands are given to say the current transcription, say arbitrary text, look
|
8
|
+
up the current word in a dictionary &c.
|
9
|
+
|
10
|
+
--------------------------------------------------------------------------------
|
11
|
+
instructions.
|
12
|
+
--------------------------------------------------------------------------------
|
13
|
+
|
14
|
+
|
15
|
+
Basic instructions
|
16
|
+
------------------
|
17
|
+
|
18
|
+
usage> sayTimit <dict file> [<start place>]
|
19
|
+
|
20
|
+
dict file is assumed in a format like:
|
21
|
+
|
22
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
23
|
+
LAKEFIELD l ey1 k f iy2 l d
|
24
|
+
|
25
|
+
ie, the word; the timit transcription, and an optional comment.
|
26
|
+
|
27
|
+
"start place" is a word which should be in the file; this just allows you
|
28
|
+
to start work in the middle of a file, if something went wrong on a previous
|
29
|
+
run &c &c.
|
30
|
+
|
31
|
+
Before running - you need to determine how 'say' (from rsynth) is to be run.
|
32
|
+
Just change line that looks like:
|
33
|
+
|
34
|
+
> open(SAY, "| <say-command>"); # crank up 'say'.
|
35
|
+
|
36
|
+
eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
|
37
|
+
headphone port. Additionally, the port and gain are controllable on SUNs with
|
38
|
+
the 'gaintool' program which is part of the SunOS SOUND demonstration package
|
39
|
+
(have a look in /usr/demo/SOUND on your machine).
|
40
|
+
|
41
|
+
|
42
|
+
Using it.
|
43
|
+
--------
|
44
|
+
|
45
|
+
eg on the line for
|
46
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
47
|
+
|
48
|
+
the following is shown: the current line and the translation into ota phonemes.
|
49
|
+
|
50
|
+
>> LACKLUSTER l ae1 k l ah2 s t ax axr
|
51
|
+
== l'&kl,Vst@R
|
52
|
+
|
53
|
+
The control is a loop which only exits on 'n'.
|
54
|
+
|
55
|
+
* hitting return will 'say' the transcription (in ota format)
|
56
|
+
* hitting n <+cr> goes to the next line in the dict.
|
57
|
+
* s <text> sends the text verbatim to 'say' - eg you might want to listen to
|
58
|
+
a similar word: s duster
|
59
|
+
or try a different transcription: s [lVst3]
|
60
|
+
|
61
|
+
* t <text> makes the current line 'text' - eg can edit the dict file in a
|
62
|
+
separate window, and paste in a version to try.
|
63
|
+
|
64
|
+
* p retrieves the previous version of the current line (ie, before you used 't')
|
65
|
+
|
66
|
+
|
67
|
+
'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
|
68
|
+
notation, using 'say' from rsynth to produce audial checking of the transcrip-
|
69
|
+
tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
|
70
|
+
info is fed to it under certain commands. No attempt is made to make use of
|
71
|
+
the stress markers (this is due to rsynth, not the utility...).
|
72
|
+
|
73
|
+
Commands are given to say the current transcription, say arbitrary text, look
|
74
|
+
up the current word in a dictionary &c.
|
75
|
+
|
76
|
+
--------------------------------------------------------------------------------
|
77
|
+
instructions.
|
78
|
+
--------------------------------------------------------------------------------
|
79
|
+
|
80
|
+
|
81
|
+
Basic instructions
|
82
|
+
------------------
|
83
|
+
|
84
|
+
usage> sayTimit <dict file> [<start place>]
|
85
|
+
|
86
|
+
dict file is assumed in a format like:
|
87
|
+
|
88
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
89
|
+
LAKEFIELD l ey1 k f iy2 l d
|
90
|
+
|
91
|
+
ie, the word; the timit transcription, and an optional comment.
|
92
|
+
|
93
|
+
"start place" is a word which should be in the file; this just allows you
|
94
|
+
to start work in the middle of a file, if something went wrong on a previous
|
95
|
+
run &c &c.
|
96
|
+
|
97
|
+
Before running - you need to determine how 'say' (from rsynth) is to be run.
|
98
|
+
Just change line that looks like:
|
99
|
+
|
100
|
+
> open(SAY, "| <say-command>"); # crank up 'say'.
|
101
|
+
|
102
|
+
eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
|
103
|
+
headphone port. Additionally, the port and gain are controllable on SUNs with
|
104
|
+
the 'gaintool' program which is part of the SunOS SOUND demonstration package
|
105
|
+
(have a look in /usr/demo/SOUND on your machine).
|
106
|
+
|
107
|
+
|
108
|
+
Using it.
|
109
|
+
--------
|
110
|
+
|
111
|
+
eg on the line for
|
112
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
113
|
+
|
114
|
+
the following is shown: the current line and the translation into ota phonemes.
|
115
|
+
|
116
|
+
>> LACKLUSTER l ae1 k l ah2 s t ax axr
|
117
|
+
== l'&kl,Vst@R
|
118
|
+
|
119
|
+
The control is a loop which only exits on 'n'.
|
120
|
+
|
121
|
+
* hitting return will 'say' the transcription (in ota format)
|
122
|
+
* hitting n <+cr> goes to the next line in the dict.
|
123
|
+
* s <text> sends the text verbatim to 'say' - eg you might want to listen to
|
124
|
+
a similar word: s duster
|
125
|
+
or try a different transcription: s [lVst3]
|
126
|
+
|
127
|
+
* t <text> makes the current line 'text' - eg can edit the dict file in a
|
128
|
+
separate window, and paste in a version to try.
|
129
|
+
|
130
|
+
* p retrieves the previous version of the current line (ie, before you used 't')
|
@@ -0,0 +1,174 @@
|
|
1
|
+
#! /usr/local/bin/perl
|
2
|
+
|
3
|
+
# sayTimit
|
4
|
+
|
5
|
+
# Paul Callaghan, may1994.
|
6
|
+
# University of Durham.
|
7
|
+
|
8
|
+
# instructions elsewhere (in file sayTimit.doc)
|
9
|
+
|
10
|
+
%timit2ota = (
|
11
|
+
"b", "b",
|
12
|
+
"d", "d",
|
13
|
+
"g", "g",
|
14
|
+
"p", "p",
|
15
|
+
"t", "t",
|
16
|
+
"k", "k",
|
17
|
+
"dx", "d",
|
18
|
+
"q", "t",
|
19
|
+
|
20
|
+
|
21
|
+
"jh", "dZ",
|
22
|
+
"ch", "tS",
|
23
|
+
|
24
|
+
|
25
|
+
"s", "s",
|
26
|
+
"sh", "S",
|
27
|
+
"z", "z",
|
28
|
+
"zh", "Z",
|
29
|
+
"f", "f",
|
30
|
+
"th", "T",
|
31
|
+
"v", "v",
|
32
|
+
"dh", "D",
|
33
|
+
|
34
|
+
|
35
|
+
"m", "m",
|
36
|
+
"n", "n",
|
37
|
+
"ng", "N",
|
38
|
+
"em", "m",
|
39
|
+
"en", "n",
|
40
|
+
"eng", "N",
|
41
|
+
"nx", "n",
|
42
|
+
|
43
|
+
|
44
|
+
|
45
|
+
"l", "l",
|
46
|
+
"r", "r",
|
47
|
+
"w", "w",
|
48
|
+
"y", "j",
|
49
|
+
"hh", "h",
|
50
|
+
"hv", "h",
|
51
|
+
"el", "l",
|
52
|
+
|
53
|
+
|
54
|
+
|
55
|
+
|
56
|
+
"iy", "i",
|
57
|
+
"ih", "I",
|
58
|
+
"eh", "e",
|
59
|
+
|
60
|
+
"ea", "e@", # eg bare, or air.
|
61
|
+
"ey", "eI",
|
62
|
+
"ae", "&",
|
63
|
+
"aa", "A",
|
64
|
+
"aw", "aU",
|
65
|
+
"ay", "aI",
|
66
|
+
"ah", "V",
|
67
|
+
|
68
|
+
"oh", "0",
|
69
|
+
"oy", "oI",
|
70
|
+
"ow", "@U",
|
71
|
+
"uh", "U",
|
72
|
+
"uw", "u",
|
73
|
+
"ux", "u",
|
74
|
+
|
75
|
+
"er", "3",
|
76
|
+
"ax", "@",
|
77
|
+
"ix", "I",
|
78
|
+
"axr", "R",
|
79
|
+
"ax-h", "e", # forgotten what this is!
|
80
|
+
# non-timit symbols for RP Vowels
|
81
|
+
"ia", "I@", # as in 'beer'
|
82
|
+
"ao", "O", # as in 'cord'
|
83
|
+
"ua", "U@", # as in 'tour'
|
84
|
+
|
85
|
+
"epi", " ", # epenthetic silence
|
86
|
+
"sil", " ", # silence
|
87
|
+
"pau", " ", # pause
|
88
|
+
|
89
|
+
"1", "'", # primary stress
|
90
|
+
"2", "," # secondary stress.
|
91
|
+
);
|
92
|
+
|
93
|
+
################################################################################
|
94
|
+
# crank up 'say'.
|
95
|
+
|
96
|
+
# "say:sound <params>" needs to be changed to your installation of
|
97
|
+
# rsynth, with desired parameters.
|
98
|
+
|
99
|
+
open(SAY, "| say:sound +h -g 0.2") || die "Couldn't start SAY: $!\n";
|
100
|
+
|
101
|
+
# then make it flush after every write/print operation. The method is to
|
102
|
+
# set it temporarily as the default output channel, then get it to flush,
|
103
|
+
# then reset the old default channel. Surprisingly (for some ppl), SAY will
|
104
|
+
# STILL flush as required.
|
105
|
+
|
106
|
+
$oldofh = select(SAY); # make it default
|
107
|
+
$| = 1; # flush after each IO op.
|
108
|
+
select($oldofh); # and reset.
|
109
|
+
|
110
|
+
|
111
|
+
# open input file.
|
112
|
+
|
113
|
+
open(INPUT, $ARGV[0]) || die "Couldn't open pronunciations file $ARGV[0].\n";
|
114
|
+
shift;
|
115
|
+
|
116
|
+
# startfrom word?
|
117
|
+
$startfrom = $ARGV[0];
|
118
|
+
|
119
|
+
if ($startfrom ne "") {
|
120
|
+
do {
|
121
|
+
$_ = <INPUT>;
|
122
|
+
@tmp = split;
|
123
|
+
} until ( $tmp[0] eq $startfrom);
|
124
|
+
print "startfrom " . $_ . "\n";
|
125
|
+
|
126
|
+
} else {
|
127
|
+
$_ = <INPUT>; # first line expected
|
128
|
+
}
|
129
|
+
|
130
|
+
|
131
|
+
# main loop.
|
132
|
+
|
133
|
+
START:
|
134
|
+
do {
|
135
|
+
$orig = $_;
|
136
|
+
s/#.*//; # kill comment
|
137
|
+
s/[.]//g; # kill '.'
|
138
|
+
|
139
|
+
s/([a-z]*)1/1 \1/g;
|
140
|
+
s/([a-z]*)2/2 \1/g;
|
141
|
+
# change stress notation: OTA seems to require
|
142
|
+
# marks BEFORE the syllable, not AFTER the
|
143
|
+
# 'nuclear'(?) vowel.
|
144
|
+
|
145
|
+
@tmp = split;
|
146
|
+
|
147
|
+
$phons = "";
|
148
|
+
foreach $p (@tmp[1..$#tmp]) {
|
149
|
+
if ($timit2ota{$p} eq "") {
|
150
|
+
print "\nERROR: unknown symbol " . $p;
|
151
|
+
} else {
|
152
|
+
$phons .= $timit2ota{$p};
|
153
|
+
}
|
154
|
+
}
|
155
|
+
$tmp[0] =~ tr/A-Z/a-z/; # current word to lower case.
|
156
|
+
|
157
|
+
$ig = 0;
|
158
|
+
do {
|
159
|
+
print STDOUT ">> " . $orig;
|
160
|
+
print STDOUT "== " . $phons . "\n";
|
161
|
+
unless ($ig) { print SAY "[" . $phons . "]\n"; }
|
162
|
+
# write result to SAY
|
163
|
+
$ig = 0;
|
164
|
+
$cmd = <STDIN>;
|
165
|
+
chop $cmd;
|
166
|
+
if ($cmd =~ /^s/) { $cmd =~ s/^s//; if ($cmd eq "") { print SAY $tmp[0] . "\n"; } else { print SAY $cmd . "\n"; } $ig = 1;}
|
167
|
+
if ($cmd =~ /^t /) { $old = $_; $cmd =~ s/^t //; $_ = $cmd; goto START; }
|
168
|
+
if ($cmd =~ /^p/) { $_ = $old; goto START; }
|
169
|
+
|
170
|
+
} until ($cmd eq "n");
|
171
|
+
} while (<INPUT>);
|
172
|
+
|
173
|
+
close(INPUT);
|
174
|
+
close(SAY);
|
@@ -0,0 +1,36 @@
|
|
1
|
+
|
2
|
+
CMUdict
|
3
|
+
-------
|
4
|
+
|
5
|
+
CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free
|
6
|
+
pronouncing dictionary of English, suitable for uses in speech
|
7
|
+
technology and is maintained by the Speech Group in the School of
|
8
|
+
Computer Science at Carnegie Mellon University.
|
9
|
+
|
10
|
+
The Carnegie Mellon Speech Group does not guarantee the accuracy of
|
11
|
+
this dictionary, nor its suitability for any specific purpose. In
|
12
|
+
fact, we expect a number of errors, omissions and inconsistencies to
|
13
|
+
remain in the dictionary. We intend to continually update the
|
14
|
+
dictionary by correction existing entries and by adding new ones. From
|
15
|
+
time to time a new major version will be released.
|
16
|
+
|
17
|
+
We welcome input from users: Please send email to Alex Rudnicky
|
18
|
+
(air+cmudict@cs.cmu.edu).
|
19
|
+
|
20
|
+
The Carnegie Mellon Pronouncing Dictionary, in its current and
|
21
|
+
previous versions is Copyright (C) 1993-2008 by Carnegie Mellon
|
22
|
+
University. Use of this dictionary for any research or commercial
|
23
|
+
purpose is completely unrestricted. If you make use of or
|
24
|
+
redistribute this material we request that you acknowledge its
|
25
|
+
origin in your descriptions.
|
26
|
+
|
27
|
+
If you add words to or correct words in your version of this
|
28
|
+
dictionary, we would appreciate it if you could send these additions
|
29
|
+
and corrections to us (air+cmudict@cs.cmu.edu) for consideration in a
|
30
|
+
subsequent version. All submissions will be reviewed and approved by
|
31
|
+
the current maintainer, Alex Rudnicky at Carnegie Mellon.
|
32
|
+
|
33
|
+
------------------------------------------------------------------
|
34
|
+
The current version of cmudict is cmudict.0.7a
|
35
|
+
[First released October 29, 2007]
|
36
|
+
|