pronounce 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,47 @@
1
+ This file documents the code used to represent the word forms in the lexicon
2
+
3
+ The valid characters in this field are:
4
+
5
+ Characters Notes Example
6
+ [A-Z] All words are presently in upper case AARDVARK
7
+ . Abbreviations are represented with a trailing '.' A.
8
+ - Hypenated words are seaprated by '-' A-BOMB
9
+ _ Separator for multiple words with one lexical entry A_PRIORI
10
+ ' ACCOUNTANTS'
11
+ \' CAF\'E
12
+ \` CR\`ECHE
13
+ \^ AR\^ETE
14
+
15
+
16
+ Also there are these one-off words:
17
+
18
+ !EXCLAMATION-POINT
19
+ "DOUBLE-QUOTE
20
+ %PERCENT
21
+ &AMPERSAND
22
+ &EM
23
+ ...
24
+ &UN
25
+
26
+ 'EM
27
+ ...
28
+ 'UN
29
+ (LEFT-PAREN
30
+ )RIGHT-PAREN
31
+ +PLUS
32
+ ,COMMA
33
+ --DASH
34
+ -HYPHEN
35
+ -SHIRE
36
+ ...ELLIPSIS
37
+ .PERIOD
38
+ .POINT
39
+ /SLASH
40
+ :COLON
41
+ ;SEMI-COLON
42
+ ;SEMI-COLON
43
+ ;SEMICOLON
44
+ ?QUESTION-MARK
45
+ ...
46
+ {LEFT-BRACE
47
+ }RIGHT-BRACE
@@ -0,0 +1,48 @@
1
+ ARPAbet MRPA Edin. Alvey Example Relative frequency
2
+ p p p p put 3.1%
3
+ b b b b but 2.3%
4
+ t t t t ten 6.8%
5
+ d d d d den 4.1%
6
+ k k k k can 4.7%
7
+ m m m m man 3.1%
8
+ n n n n not 6.5%
9
+ l l l l like 5.5%
10
+ r r r r run 5.4%
11
+ f f f f full 1.8%
12
+ v v v v very 1.2%
13
+ s s s s some 6.6%
14
+ z z z z zeal 3.6%
15
+ hh h h h hat 0.8%
16
+ w w w w went 0.9%
17
+ g g g g game 1.3%
18
+ ch ch ch tS chain 0.5%
19
+ jh jh j dZ Jane 0.8%
20
+ ng ng ng 9 long 1.6%
21
+ th th th T thin 0.3%
22
+ dh dh dh D then 12.2%
23
+ sh sh sh S ship 1.2%
24
+ zh zh zh Z measure 0.1%
25
+ y y y j yes 0.8%
26
+ iy ii ee i bean 1.4%
27
+ aa aa ar A barn 0.9%
28
+ ao oo aw O born 1.0%
29
+ uw uu uu u boon 1.0%
30
+ er @@ er 3 burn 0.7%
31
+ ih i i I pit 10.0%
32
+ eh e e e pet 2.4%
33
+ ae a aa & pat 2.5%
34
+ ah uh u V putt 1.5%
35
+ oh o o 0 pot 1.6%
36
+ uh u oo U good 0.4%
37
+ ax @ a @ about 7.2%
38
+ ey ei ai eI bay 2.0%
39
+ ay ai ie aI buy 1.6%
40
+ oy oi oi oI boy 0.2%
41
+ ow ou oa @U no 1.5%
42
+ aw au ou aU now 0.4%
43
+ ia i@ eer I@ peer 0.7%
44
+ ea e@ air e@ pair 0.2%
45
+ ua u@ oor U@ poor 0.2%
46
+
47
+ See also the files abvailable from the Oxford Text Archive by FTP from
48
+ sable.ox.ac.uk in directories /pub/ota/public/dicts/710 and 1054.
@@ -0,0 +1,45 @@
1
+ sil
2
+ aa
3
+ ae
4
+ ah
5
+ ao
6
+ aw
7
+ ax
8
+ ay
9
+ b
10
+ ch
11
+ d
12
+ dh
13
+ ea
14
+ eh
15
+ er
16
+ ey
17
+ f
18
+ g
19
+ hh
20
+ ia
21
+ ih
22
+ iy
23
+ jh
24
+ k
25
+ l
26
+ m
27
+ n
28
+ ng
29
+ oh
30
+ ow
31
+ oy
32
+ p
33
+ r
34
+ s
35
+ sh
36
+ t
37
+ th
38
+ ua
39
+ uh
40
+ uw
41
+ v
42
+ w
43
+ y
44
+ z
45
+ zh
@@ -0,0 +1,130 @@
1
+ 'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
2
+ notation, using 'say' from rsynth to produce audial checking of the transcrip-
3
+ tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
4
+ info is fed to it under certain commands. No attempt is made to make use of
5
+ the stress markers (this is due to rsynth, not the utility...).
6
+
7
+ Commands are given to say the current transcription, say arbitrary text, look
8
+ up the current word in a dictionary &c.
9
+
10
+ --------------------------------------------------------------------------------
11
+ instructions.
12
+ --------------------------------------------------------------------------------
13
+
14
+
15
+ Basic instructions
16
+ ------------------
17
+
18
+ usage> sayTimit <dict file> [<start place>]
19
+
20
+ dict file is assumed in a format like:
21
+
22
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
23
+ LAKEFIELD l ey1 k f iy2 l d
24
+
25
+ ie, the word; the timit transcription, and an optional comment.
26
+
27
+ "start place" is a word which should be in the file; this just allows you
28
+ to start work in the middle of a file, if something went wrong on a previous
29
+ run &c &c.
30
+
31
+ Before running - you need to determine how 'say' (from rsynth) is to be run.
32
+ Just change line that looks like:
33
+
34
+ > open(SAY, "| <say-command>"); # crank up 'say'.
35
+
36
+ eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
37
+ headphone port. Additionally, the port and gain are controllable on SUNs with
38
+ the 'gaintool' program which is part of the SunOS SOUND demonstration package
39
+ (have a look in /usr/demo/SOUND on your machine).
40
+
41
+
42
+ Using it.
43
+ --------
44
+
45
+ eg on the line for
46
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
47
+
48
+ the following is shown: the current line and the translation into ota phonemes.
49
+
50
+ >> LACKLUSTER l ae1 k l ah2 s t ax axr
51
+ == l'&kl,Vst@R
52
+
53
+ The control is a loop which only exits on 'n'.
54
+
55
+ * hitting return will 'say' the transcription (in ota format)
56
+ * hitting n <+cr> goes to the next line in the dict.
57
+ * s <text> sends the text verbatim to 'say' - eg you might want to listen to
58
+ a similar word: s duster
59
+ or try a different transcription: s [lVst3]
60
+
61
+ * t <text> makes the current line 'text' - eg can edit the dict file in a
62
+ separate window, and paste in a version to try.
63
+
64
+ * p retrieves the previous version of the current line (ie, before you used 't')
65
+
66
+
67
+ 'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
68
+ notation, using 'say' from rsynth to produce audial checking of the transcrip-
69
+ tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
70
+ info is fed to it under certain commands. No attempt is made to make use of
71
+ the stress markers (this is due to rsynth, not the utility...).
72
+
73
+ Commands are given to say the current transcription, say arbitrary text, look
74
+ up the current word in a dictionary &c.
75
+
76
+ --------------------------------------------------------------------------------
77
+ instructions.
78
+ --------------------------------------------------------------------------------
79
+
80
+
81
+ Basic instructions
82
+ ------------------
83
+
84
+ usage> sayTimit <dict file> [<start place>]
85
+
86
+ dict file is assumed in a format like:
87
+
88
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
89
+ LAKEFIELD l ey1 k f iy2 l d
90
+
91
+ ie, the word; the timit transcription, and an optional comment.
92
+
93
+ "start place" is a word which should be in the file; this just allows you
94
+ to start work in the middle of a file, if something went wrong on a previous
95
+ run &c &c.
96
+
97
+ Before running - you need to determine how 'say' (from rsynth) is to be run.
98
+ Just change line that looks like:
99
+
100
+ > open(SAY, "| <say-command>"); # crank up 'say'.
101
+
102
+ eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
103
+ headphone port. Additionally, the port and gain are controllable on SUNs with
104
+ the 'gaintool' program which is part of the SunOS SOUND demonstration package
105
+ (have a look in /usr/demo/SOUND on your machine).
106
+
107
+
108
+ Using it.
109
+ --------
110
+
111
+ eg on the line for
112
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
113
+
114
+ the following is shown: the current line and the translation into ota phonemes.
115
+
116
+ >> LACKLUSTER l ae1 k l ah2 s t ax axr
117
+ == l'&kl,Vst@R
118
+
119
+ The control is a loop which only exits on 'n'.
120
+
121
+ * hitting return will 'say' the transcription (in ota format)
122
+ * hitting n <+cr> goes to the next line in the dict.
123
+ * s <text> sends the text verbatim to 'say' - eg you might want to listen to
124
+ a similar word: s duster
125
+ or try a different transcription: s [lVst3]
126
+
127
+ * t <text> makes the current line 'text' - eg can edit the dict file in a
128
+ separate window, and paste in a version to try.
129
+
130
+ * p retrieves the previous version of the current line (ie, before you used 't')
@@ -0,0 +1,174 @@
1
+ #! /usr/local/bin/perl
2
+
3
+ # sayTimit
4
+
5
+ # Paul Callaghan, may1994.
6
+ # University of Durham.
7
+
8
+ # instructions elsewhere (in file sayTimit.doc)
9
+
10
+ %timit2ota = (
11
+ "b", "b",
12
+ "d", "d",
13
+ "g", "g",
14
+ "p", "p",
15
+ "t", "t",
16
+ "k", "k",
17
+ "dx", "d",
18
+ "q", "t",
19
+
20
+
21
+ "jh", "dZ",
22
+ "ch", "tS",
23
+
24
+
25
+ "s", "s",
26
+ "sh", "S",
27
+ "z", "z",
28
+ "zh", "Z",
29
+ "f", "f",
30
+ "th", "T",
31
+ "v", "v",
32
+ "dh", "D",
33
+
34
+
35
+ "m", "m",
36
+ "n", "n",
37
+ "ng", "N",
38
+ "em", "m",
39
+ "en", "n",
40
+ "eng", "N",
41
+ "nx", "n",
42
+
43
+
44
+
45
+ "l", "l",
46
+ "r", "r",
47
+ "w", "w",
48
+ "y", "j",
49
+ "hh", "h",
50
+ "hv", "h",
51
+ "el", "l",
52
+
53
+
54
+
55
+
56
+ "iy", "i",
57
+ "ih", "I",
58
+ "eh", "e",
59
+
60
+ "ea", "e@", # eg bare, or air.
61
+ "ey", "eI",
62
+ "ae", "&",
63
+ "aa", "A",
64
+ "aw", "aU",
65
+ "ay", "aI",
66
+ "ah", "V",
67
+
68
+ "oh", "0",
69
+ "oy", "oI",
70
+ "ow", "@U",
71
+ "uh", "U",
72
+ "uw", "u",
73
+ "ux", "u",
74
+
75
+ "er", "3",
76
+ "ax", "@",
77
+ "ix", "I",
78
+ "axr", "R",
79
+ "ax-h", "e", # forgotten what this is!
80
+ # non-timit symbols for RP Vowels
81
+ "ia", "I@", # as in 'beer'
82
+ "ao", "O", # as in 'cord'
83
+ "ua", "U@", # as in 'tour'
84
+
85
+ "epi", " ", # epenthetic silence
86
+ "sil", " ", # silence
87
+ "pau", " ", # pause
88
+
89
+ "1", "'", # primary stress
90
+ "2", "," # secondary stress.
91
+ );
92
+
93
+ ################################################################################
94
+ # crank up 'say'.
95
+
96
+ # "say:sound <params>" needs to be changed to your installation of
97
+ # rsynth, with desired parameters.
98
+
99
+ open(SAY, "| say:sound +h -g 0.2") || die "Couldn't start SAY: $!\n";
100
+
101
+ # then make it flush after every write/print operation. The method is to
102
+ # set it temporarily as the default output channel, then get it to flush,
103
+ # then reset the old default channel. Surprisingly (for some ppl), SAY will
104
+ # STILL flush as required.
105
+
106
+ $oldofh = select(SAY); # make it default
107
+ $| = 1; # flush after each IO op.
108
+ select($oldofh); # and reset.
109
+
110
+
111
+ # open input file.
112
+
113
+ open(INPUT, $ARGV[0]) || die "Couldn't open pronunciations file $ARGV[0].\n";
114
+ shift;
115
+
116
+ # startfrom word?
117
+ $startfrom = $ARGV[0];
118
+
119
+ if ($startfrom ne "") {
120
+ do {
121
+ $_ = <INPUT>;
122
+ @tmp = split;
123
+ } until ( $tmp[0] eq $startfrom);
124
+ print "startfrom " . $_ . "\n";
125
+
126
+ } else {
127
+ $_ = <INPUT>; # first line expected
128
+ }
129
+
130
+
131
+ # main loop.
132
+
133
+ START:
134
+ do {
135
+ $orig = $_;
136
+ s/#.*//; # kill comment
137
+ s/[.]//g; # kill '.'
138
+
139
+ s/([a-z]*)1/1 \1/g;
140
+ s/([a-z]*)2/2 \1/g;
141
+ # change stress notation: OTA seems to require
142
+ # marks BEFORE the syllable, not AFTER the
143
+ # 'nuclear'(?) vowel.
144
+
145
+ @tmp = split;
146
+
147
+ $phons = "";
148
+ foreach $p (@tmp[1..$#tmp]) {
149
+ if ($timit2ota{$p} eq "") {
150
+ print "\nERROR: unknown symbol " . $p;
151
+ } else {
152
+ $phons .= $timit2ota{$p};
153
+ }
154
+ }
155
+ $tmp[0] =~ tr/A-Z/a-z/; # current word to lower case.
156
+
157
+ $ig = 0;
158
+ do {
159
+ print STDOUT ">> " . $orig;
160
+ print STDOUT "== " . $phons . "\n";
161
+ unless ($ig) { print SAY "[" . $phons . "]\n"; }
162
+ # write result to SAY
163
+ $ig = 0;
164
+ $cmd = <STDIN>;
165
+ chop $cmd;
166
+ if ($cmd =~ /^s/) { $cmd =~ s/^s//; if ($cmd eq "") { print SAY $tmp[0] . "\n"; } else { print SAY $cmd . "\n"; } $ig = 1;}
167
+ if ($cmd =~ /^t /) { $old = $_; $cmd =~ s/^t //; $_ = $cmd; goto START; }
168
+ if ($cmd =~ /^p/) { $_ = $old; goto START; }
169
+
170
+ } until ($cmd eq "n");
171
+ } while (<INPUT>);
172
+
173
+ close(INPUT);
174
+ close(SAY);
@@ -0,0 +1,36 @@
1
+
2
+ CMUdict
3
+ -------
4
+
5
+ CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free
6
+ pronouncing dictionary of English, suitable for uses in speech
7
+ technology and is maintained by the Speech Group in the School of
8
+ Computer Science at Carnegie Mellon University.
9
+
10
+ The Carnegie Mellon Speech Group does not guarantee the accuracy of
11
+ this dictionary, nor its suitability for any specific purpose. In
12
+ fact, we expect a number of errors, omissions and inconsistencies to
13
+ remain in the dictionary. We intend to continually update the
14
+ dictionary by correction existing entries and by adding new ones. From
15
+ time to time a new major version will be released.
16
+
17
+ We welcome input from users: Please send email to Alex Rudnicky
18
+ (air+cmudict@cs.cmu.edu).
19
+
20
+ The Carnegie Mellon Pronouncing Dictionary, in its current and
21
+ previous versions is Copyright (C) 1993-2008 by Carnegie Mellon
22
+ University. Use of this dictionary for any research or commercial
23
+ purpose is completely unrestricted. If you make use of or
24
+ redistribute this material we request that you acknowledge its
25
+ origin in your descriptions.
26
+
27
+ If you add words to or correct words in your version of this
28
+ dictionary, we would appreciate it if you could send these additions
29
+ and corrections to us (air+cmudict@cs.cmu.edu) for consideration in a
30
+ subsequent version. All submissions will be reviewed and approved by
31
+ the current maintainer, Alex Rudnicky at Carnegie Mellon.
32
+
33
+ ------------------------------------------------------------------
34
+ The current version of cmudict is cmudict.0.7a
35
+ [First released October 29, 2007]
36
+