pronounce 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +19 -0
- data/data/beep/ACKNOWLEDGEMENTS +36 -0
- data/data/beep/ANNOUNCE-1.0 +27 -0
- data/data/beep/README +39 -0
- data/data/beep/addparan +22 -0
- data/data/beep/beep-1.0 +257070 -0
- data/data/beep/case.txt +166944 -0
- data/data/beep/lexicode.doc +47 -0
- data/data/beep/phoncode.doc +48 -0
- data/data/beep/phone45.tab +45 -0
- data/data/beep/sayTimit.doc +130 -0
- data/data/beep/sayTimit.pl +174 -0
- data/data/cmudict/00README_FIRST.txt +36 -0
- data/data/cmudict/README.developer +50 -0
- data/data/cmudict/README.old +79 -0
- data/data/cmudict/README.weide +67 -0
- data/data/cmudict/cmudict.0.6d +129511 -0
- data/data/cmudict/cmudict.0.7a +133369 -0
- data/data/cmudict/cmudict.0.7a.phones +39 -0
- data/data/cmudict/cmudict.0.7a.symbols +84 -0
- data/data/cmudict/scripts/CompileDictionary.sh +36 -0
- data/data/cmudict/scripts/README.txt +27 -0
- data/data/cmudict/scripts/make_baseform.pl +172 -0
- data/data/cmudict/scripts/sort_cmudict.pl +141 -0
- data/data/cmudict/scripts/test_cmudict.pl +166 -0
- data/data/cmudict/scripts/test_dict.pl +119 -0
- data/data/cmudict/sphinxdict/README.txt +19 -0
- data/data/cmudict/sphinxdict/SphinxPhones_40 +40 -0
- data/data/cmudict/sphinxdict/cmudict.0.7a_SPHINX_40 +133012 -0
- data/data/cmudict/sphinxdict/cmudict_SPHINX_40 +133012 -0
- data/lib/pronounce.rb +33 -0
- metadata +104 -0
@@ -0,0 +1,47 @@
|
|
1
|
+
This file documents the code used to represent the word forms in the lexicon
|
2
|
+
|
3
|
+
The valid characters in this field are:
|
4
|
+
|
5
|
+
Characters Notes Example
|
6
|
+
[A-Z] All words are presently in upper case AARDVARK
|
7
|
+
. Abbreviations are represented with a trailing '.' A.
|
8
|
+
- Hypenated words are seaprated by '-' A-BOMB
|
9
|
+
_ Separator for multiple words with one lexical entry A_PRIORI
|
10
|
+
' ACCOUNTANTS'
|
11
|
+
\' CAF\'E
|
12
|
+
\` CR\`ECHE
|
13
|
+
\^ AR\^ETE
|
14
|
+
|
15
|
+
|
16
|
+
Also there are these one-off words:
|
17
|
+
|
18
|
+
!EXCLAMATION-POINT
|
19
|
+
"DOUBLE-QUOTE
|
20
|
+
%PERCENT
|
21
|
+
&ERSAND
|
22
|
+
&EM
|
23
|
+
...
|
24
|
+
&UN
|
25
|
+
|
26
|
+
'EM
|
27
|
+
...
|
28
|
+
'UN
|
29
|
+
(LEFT-PAREN
|
30
|
+
)RIGHT-PAREN
|
31
|
+
+PLUS
|
32
|
+
,COMMA
|
33
|
+
--DASH
|
34
|
+
-HYPHEN
|
35
|
+
-SHIRE
|
36
|
+
...ELLIPSIS
|
37
|
+
.PERIOD
|
38
|
+
.POINT
|
39
|
+
/SLASH
|
40
|
+
:COLON
|
41
|
+
;SEMI-COLON
|
42
|
+
;SEMI-COLON
|
43
|
+
;SEMICOLON
|
44
|
+
?QUESTION-MARK
|
45
|
+
...
|
46
|
+
{LEFT-BRACE
|
47
|
+
}RIGHT-BRACE
|
@@ -0,0 +1,48 @@
|
|
1
|
+
ARPAbet MRPA Edin. Alvey Example Relative frequency
|
2
|
+
p p p p put 3.1%
|
3
|
+
b b b b but 2.3%
|
4
|
+
t t t t ten 6.8%
|
5
|
+
d d d d den 4.1%
|
6
|
+
k k k k can 4.7%
|
7
|
+
m m m m man 3.1%
|
8
|
+
n n n n not 6.5%
|
9
|
+
l l l l like 5.5%
|
10
|
+
r r r r run 5.4%
|
11
|
+
f f f f full 1.8%
|
12
|
+
v v v v very 1.2%
|
13
|
+
s s s s some 6.6%
|
14
|
+
z z z z zeal 3.6%
|
15
|
+
hh h h h hat 0.8%
|
16
|
+
w w w w went 0.9%
|
17
|
+
g g g g game 1.3%
|
18
|
+
ch ch ch tS chain 0.5%
|
19
|
+
jh jh j dZ Jane 0.8%
|
20
|
+
ng ng ng 9 long 1.6%
|
21
|
+
th th th T thin 0.3%
|
22
|
+
dh dh dh D then 12.2%
|
23
|
+
sh sh sh S ship 1.2%
|
24
|
+
zh zh zh Z measure 0.1%
|
25
|
+
y y y j yes 0.8%
|
26
|
+
iy ii ee i bean 1.4%
|
27
|
+
aa aa ar A barn 0.9%
|
28
|
+
ao oo aw O born 1.0%
|
29
|
+
uw uu uu u boon 1.0%
|
30
|
+
er @@ er 3 burn 0.7%
|
31
|
+
ih i i I pit 10.0%
|
32
|
+
eh e e e pet 2.4%
|
33
|
+
ae a aa & pat 2.5%
|
34
|
+
ah uh u V putt 1.5%
|
35
|
+
oh o o 0 pot 1.6%
|
36
|
+
uh u oo U good 0.4%
|
37
|
+
ax @ a @ about 7.2%
|
38
|
+
ey ei ai eI bay 2.0%
|
39
|
+
ay ai ie aI buy 1.6%
|
40
|
+
oy oi oi oI boy 0.2%
|
41
|
+
ow ou oa @U no 1.5%
|
42
|
+
aw au ou aU now 0.4%
|
43
|
+
ia i@ eer I@ peer 0.7%
|
44
|
+
ea e@ air e@ pair 0.2%
|
45
|
+
ua u@ oor U@ poor 0.2%
|
46
|
+
|
47
|
+
See also the files abvailable from the Oxford Text Archive by FTP from
|
48
|
+
sable.ox.ac.uk in directories /pub/ota/public/dicts/710 and 1054.
|
@@ -0,0 +1,45 @@
|
|
1
|
+
sil
|
2
|
+
aa
|
3
|
+
ae
|
4
|
+
ah
|
5
|
+
ao
|
6
|
+
aw
|
7
|
+
ax
|
8
|
+
ay
|
9
|
+
b
|
10
|
+
ch
|
11
|
+
d
|
12
|
+
dh
|
13
|
+
ea
|
14
|
+
eh
|
15
|
+
er
|
16
|
+
ey
|
17
|
+
f
|
18
|
+
g
|
19
|
+
hh
|
20
|
+
ia
|
21
|
+
ih
|
22
|
+
iy
|
23
|
+
jh
|
24
|
+
k
|
25
|
+
l
|
26
|
+
m
|
27
|
+
n
|
28
|
+
ng
|
29
|
+
oh
|
30
|
+
ow
|
31
|
+
oy
|
32
|
+
p
|
33
|
+
r
|
34
|
+
s
|
35
|
+
sh
|
36
|
+
t
|
37
|
+
th
|
38
|
+
ua
|
39
|
+
uh
|
40
|
+
uw
|
41
|
+
v
|
42
|
+
w
|
43
|
+
y
|
44
|
+
z
|
45
|
+
zh
|
@@ -0,0 +1,130 @@
|
|
1
|
+
'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
|
2
|
+
notation, using 'say' from rsynth to produce audial checking of the transcrip-
|
3
|
+
tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
|
4
|
+
info is fed to it under certain commands. No attempt is made to make use of
|
5
|
+
the stress markers (this is due to rsynth, not the utility...).
|
6
|
+
|
7
|
+
Commands are given to say the current transcription, say arbitrary text, look
|
8
|
+
up the current word in a dictionary &c.
|
9
|
+
|
10
|
+
--------------------------------------------------------------------------------
|
11
|
+
instructions.
|
12
|
+
--------------------------------------------------------------------------------
|
13
|
+
|
14
|
+
|
15
|
+
Basic instructions
|
16
|
+
------------------
|
17
|
+
|
18
|
+
usage> sayTimit <dict file> [<start place>]
|
19
|
+
|
20
|
+
dict file is assumed in a format like:
|
21
|
+
|
22
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
23
|
+
LAKEFIELD l ey1 k f iy2 l d
|
24
|
+
|
25
|
+
ie, the word; the timit transcription, and an optional comment.
|
26
|
+
|
27
|
+
"start place" is a word which should be in the file; this just allows you
|
28
|
+
to start work in the middle of a file, if something went wrong on a previous
|
29
|
+
run &c &c.
|
30
|
+
|
31
|
+
Before running - you need to determine how 'say' (from rsynth) is to be run.
|
32
|
+
Just change line that looks like:
|
33
|
+
|
34
|
+
> open(SAY, "| <say-command>"); # crank up 'say'.
|
35
|
+
|
36
|
+
eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
|
37
|
+
headphone port. Additionally, the port and gain are controllable on SUNs with
|
38
|
+
the 'gaintool' program which is part of the SunOS SOUND demonstration package
|
39
|
+
(have a look in /usr/demo/SOUND on your machine).
|
40
|
+
|
41
|
+
|
42
|
+
Using it.
|
43
|
+
--------
|
44
|
+
|
45
|
+
eg on the line for
|
46
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
47
|
+
|
48
|
+
the following is shown: the current line and the translation into ota phonemes.
|
49
|
+
|
50
|
+
>> LACKLUSTER l ae1 k l ah2 s t ax axr
|
51
|
+
== l'&kl,Vst@R
|
52
|
+
|
53
|
+
The control is a loop which only exits on 'n'.
|
54
|
+
|
55
|
+
* hitting return will 'say' the transcription (in ota format)
|
56
|
+
* hitting n <+cr> goes to the next line in the dict.
|
57
|
+
* s <text> sends the text verbatim to 'say' - eg you might want to listen to
|
58
|
+
a similar word: s duster
|
59
|
+
or try a different transcription: s [lVst3]
|
60
|
+
|
61
|
+
* t <text> makes the current line 'text' - eg can edit the dict file in a
|
62
|
+
separate window, and paste in a version to try.
|
63
|
+
|
64
|
+
* p retrieves the previous version of the current line (ie, before you used 't')
|
65
|
+
|
66
|
+
|
67
|
+
'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
|
68
|
+
notation, using 'say' from rsynth to produce audial checking of the transcrip-
|
69
|
+
tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
|
70
|
+
info is fed to it under certain commands. No attempt is made to make use of
|
71
|
+
the stress markers (this is due to rsynth, not the utility...).
|
72
|
+
|
73
|
+
Commands are given to say the current transcription, say arbitrary text, look
|
74
|
+
up the current word in a dictionary &c.
|
75
|
+
|
76
|
+
--------------------------------------------------------------------------------
|
77
|
+
instructions.
|
78
|
+
--------------------------------------------------------------------------------
|
79
|
+
|
80
|
+
|
81
|
+
Basic instructions
|
82
|
+
------------------
|
83
|
+
|
84
|
+
usage> sayTimit <dict file> [<start place>]
|
85
|
+
|
86
|
+
dict file is assumed in a format like:
|
87
|
+
|
88
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
89
|
+
LAKEFIELD l ey1 k f iy2 l d
|
90
|
+
|
91
|
+
ie, the word; the timit transcription, and an optional comment.
|
92
|
+
|
93
|
+
"start place" is a word which should be in the file; this just allows you
|
94
|
+
to start work in the middle of a file, if something went wrong on a previous
|
95
|
+
run &c &c.
|
96
|
+
|
97
|
+
Before running - you need to determine how 'say' (from rsynth) is to be run.
|
98
|
+
Just change line that looks like:
|
99
|
+
|
100
|
+
> open(SAY, "| <say-command>"); # crank up 'say'.
|
101
|
+
|
102
|
+
eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
|
103
|
+
headphone port. Additionally, the port and gain are controllable on SUNs with
|
104
|
+
the 'gaintool' program which is part of the SunOS SOUND demonstration package
|
105
|
+
(have a look in /usr/demo/SOUND on your machine).
|
106
|
+
|
107
|
+
|
108
|
+
Using it.
|
109
|
+
--------
|
110
|
+
|
111
|
+
eg on the line for
|
112
|
+
LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
|
113
|
+
|
114
|
+
the following is shown: the current line and the translation into ota phonemes.
|
115
|
+
|
116
|
+
>> LACKLUSTER l ae1 k l ah2 s t ax axr
|
117
|
+
== l'&kl,Vst@R
|
118
|
+
|
119
|
+
The control is a loop which only exits on 'n'.
|
120
|
+
|
121
|
+
* hitting return will 'say' the transcription (in ota format)
|
122
|
+
* hitting n <+cr> goes to the next line in the dict.
|
123
|
+
* s <text> sends the text verbatim to 'say' - eg you might want to listen to
|
124
|
+
a similar word: s duster
|
125
|
+
or try a different transcription: s [lVst3]
|
126
|
+
|
127
|
+
* t <text> makes the current line 'text' - eg can edit the dict file in a
|
128
|
+
separate window, and paste in a version to try.
|
129
|
+
|
130
|
+
* p retrieves the previous version of the current line (ie, before you used 't')
|
@@ -0,0 +1,174 @@
|
|
1
|
+
#! /usr/local/bin/perl
|
2
|
+
|
3
|
+
# sayTimit
|
4
|
+
|
5
|
+
# Paul Callaghan, may1994.
|
6
|
+
# University of Durham.
|
7
|
+
|
8
|
+
# instructions elsewhere (in file sayTimit.doc)
|
9
|
+
|
10
|
+
%timit2ota = (
|
11
|
+
"b", "b",
|
12
|
+
"d", "d",
|
13
|
+
"g", "g",
|
14
|
+
"p", "p",
|
15
|
+
"t", "t",
|
16
|
+
"k", "k",
|
17
|
+
"dx", "d",
|
18
|
+
"q", "t",
|
19
|
+
|
20
|
+
|
21
|
+
"jh", "dZ",
|
22
|
+
"ch", "tS",
|
23
|
+
|
24
|
+
|
25
|
+
"s", "s",
|
26
|
+
"sh", "S",
|
27
|
+
"z", "z",
|
28
|
+
"zh", "Z",
|
29
|
+
"f", "f",
|
30
|
+
"th", "T",
|
31
|
+
"v", "v",
|
32
|
+
"dh", "D",
|
33
|
+
|
34
|
+
|
35
|
+
"m", "m",
|
36
|
+
"n", "n",
|
37
|
+
"ng", "N",
|
38
|
+
"em", "m",
|
39
|
+
"en", "n",
|
40
|
+
"eng", "N",
|
41
|
+
"nx", "n",
|
42
|
+
|
43
|
+
|
44
|
+
|
45
|
+
"l", "l",
|
46
|
+
"r", "r",
|
47
|
+
"w", "w",
|
48
|
+
"y", "j",
|
49
|
+
"hh", "h",
|
50
|
+
"hv", "h",
|
51
|
+
"el", "l",
|
52
|
+
|
53
|
+
|
54
|
+
|
55
|
+
|
56
|
+
"iy", "i",
|
57
|
+
"ih", "I",
|
58
|
+
"eh", "e",
|
59
|
+
|
60
|
+
"ea", "e@", # eg bare, or air.
|
61
|
+
"ey", "eI",
|
62
|
+
"ae", "&",
|
63
|
+
"aa", "A",
|
64
|
+
"aw", "aU",
|
65
|
+
"ay", "aI",
|
66
|
+
"ah", "V",
|
67
|
+
|
68
|
+
"oh", "0",
|
69
|
+
"oy", "oI",
|
70
|
+
"ow", "@U",
|
71
|
+
"uh", "U",
|
72
|
+
"uw", "u",
|
73
|
+
"ux", "u",
|
74
|
+
|
75
|
+
"er", "3",
|
76
|
+
"ax", "@",
|
77
|
+
"ix", "I",
|
78
|
+
"axr", "R",
|
79
|
+
"ax-h", "e", # forgotten what this is!
|
80
|
+
# non-timit symbols for RP Vowels
|
81
|
+
"ia", "I@", # as in 'beer'
|
82
|
+
"ao", "O", # as in 'cord'
|
83
|
+
"ua", "U@", # as in 'tour'
|
84
|
+
|
85
|
+
"epi", " ", # epenthetic silence
|
86
|
+
"sil", " ", # silence
|
87
|
+
"pau", " ", # pause
|
88
|
+
|
89
|
+
"1", "'", # primary stress
|
90
|
+
"2", "," # secondary stress.
|
91
|
+
);
|
92
|
+
|
93
|
+
################################################################################
|
94
|
+
# crank up 'say'.
|
95
|
+
|
96
|
+
# "say:sound <params>" needs to be changed to your installation of
|
97
|
+
# rsynth, with desired parameters.
|
98
|
+
|
99
|
+
open(SAY, "| say:sound +h -g 0.2") || die "Couldn't start SAY: $!\n";
|
100
|
+
|
101
|
+
# then make it flush after every write/print operation. The method is to
|
102
|
+
# set it temporarily as the default output channel, then get it to flush,
|
103
|
+
# then reset the old default channel. Surprisingly (for some ppl), SAY will
|
104
|
+
# STILL flush as required.
|
105
|
+
|
106
|
+
$oldofh = select(SAY); # make it default
|
107
|
+
$| = 1; # flush after each IO op.
|
108
|
+
select($oldofh); # and reset.
|
109
|
+
|
110
|
+
|
111
|
+
# open input file.
|
112
|
+
|
113
|
+
open(INPUT, $ARGV[0]) || die "Couldn't open pronunciations file $ARGV[0].\n";
|
114
|
+
shift;
|
115
|
+
|
116
|
+
# startfrom word?
|
117
|
+
$startfrom = $ARGV[0];
|
118
|
+
|
119
|
+
if ($startfrom ne "") {
|
120
|
+
do {
|
121
|
+
$_ = <INPUT>;
|
122
|
+
@tmp = split;
|
123
|
+
} until ( $tmp[0] eq $startfrom);
|
124
|
+
print "startfrom " . $_ . "\n";
|
125
|
+
|
126
|
+
} else {
|
127
|
+
$_ = <INPUT>; # first line expected
|
128
|
+
}
|
129
|
+
|
130
|
+
|
131
|
+
# main loop.
|
132
|
+
|
133
|
+
START:
|
134
|
+
do {
|
135
|
+
$orig = $_;
|
136
|
+
s/#.*//; # kill comment
|
137
|
+
s/[.]//g; # kill '.'
|
138
|
+
|
139
|
+
s/([a-z]*)1/1 \1/g;
|
140
|
+
s/([a-z]*)2/2 \1/g;
|
141
|
+
# change stress notation: OTA seems to require
|
142
|
+
# marks BEFORE the syllable, not AFTER the
|
143
|
+
# 'nuclear'(?) vowel.
|
144
|
+
|
145
|
+
@tmp = split;
|
146
|
+
|
147
|
+
$phons = "";
|
148
|
+
foreach $p (@tmp[1..$#tmp]) {
|
149
|
+
if ($timit2ota{$p} eq "") {
|
150
|
+
print "\nERROR: unknown symbol " . $p;
|
151
|
+
} else {
|
152
|
+
$phons .= $timit2ota{$p};
|
153
|
+
}
|
154
|
+
}
|
155
|
+
$tmp[0] =~ tr/A-Z/a-z/; # current word to lower case.
|
156
|
+
|
157
|
+
$ig = 0;
|
158
|
+
do {
|
159
|
+
print STDOUT ">> " . $orig;
|
160
|
+
print STDOUT "== " . $phons . "\n";
|
161
|
+
unless ($ig) { print SAY "[" . $phons . "]\n"; }
|
162
|
+
# write result to SAY
|
163
|
+
$ig = 0;
|
164
|
+
$cmd = <STDIN>;
|
165
|
+
chop $cmd;
|
166
|
+
if ($cmd =~ /^s/) { $cmd =~ s/^s//; if ($cmd eq "") { print SAY $tmp[0] . "\n"; } else { print SAY $cmd . "\n"; } $ig = 1;}
|
167
|
+
if ($cmd =~ /^t /) { $old = $_; $cmd =~ s/^t //; $_ = $cmd; goto START; }
|
168
|
+
if ($cmd =~ /^p/) { $_ = $old; goto START; }
|
169
|
+
|
170
|
+
} until ($cmd eq "n");
|
171
|
+
} while (<INPUT>);
|
172
|
+
|
173
|
+
close(INPUT);
|
174
|
+
close(SAY);
|
@@ -0,0 +1,36 @@
|
|
1
|
+
|
2
|
+
CMUdict
|
3
|
+
-------
|
4
|
+
|
5
|
+
CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free
|
6
|
+
pronouncing dictionary of English, suitable for uses in speech
|
7
|
+
technology and is maintained by the Speech Group in the School of
|
8
|
+
Computer Science at Carnegie Mellon University.
|
9
|
+
|
10
|
+
The Carnegie Mellon Speech Group does not guarantee the accuracy of
|
11
|
+
this dictionary, nor its suitability for any specific purpose. In
|
12
|
+
fact, we expect a number of errors, omissions and inconsistencies to
|
13
|
+
remain in the dictionary. We intend to continually update the
|
14
|
+
dictionary by correction existing entries and by adding new ones. From
|
15
|
+
time to time a new major version will be released.
|
16
|
+
|
17
|
+
We welcome input from users: Please send email to Alex Rudnicky
|
18
|
+
(air+cmudict@cs.cmu.edu).
|
19
|
+
|
20
|
+
The Carnegie Mellon Pronouncing Dictionary, in its current and
|
21
|
+
previous versions is Copyright (C) 1993-2008 by Carnegie Mellon
|
22
|
+
University. Use of this dictionary for any research or commercial
|
23
|
+
purpose is completely unrestricted. If you make use of or
|
24
|
+
redistribute this material we request that you acknowledge its
|
25
|
+
origin in your descriptions.
|
26
|
+
|
27
|
+
If you add words to or correct words in your version of this
|
28
|
+
dictionary, we would appreciate it if you could send these additions
|
29
|
+
and corrections to us (air+cmudict@cs.cmu.edu) for consideration in a
|
30
|
+
subsequent version. All submissions will be reviewed and approved by
|
31
|
+
the current maintainer, Alex Rudnicky at Carnegie Mellon.
|
32
|
+
|
33
|
+
------------------------------------------------------------------
|
34
|
+
The current version of cmudict is cmudict.0.7a
|
35
|
+
[First released October 29, 2007]
|
36
|
+
|