pronounce 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,47 @@
1
+ This file documents the code used to represent the word forms in the lexicon
2
+
3
+ The valid characters in this field are:
4
+
5
+ Characters Notes Example
6
+ [A-Z] All words are presently in upper case AARDVARK
7
+ . Abbreviations are represented with a trailing '.' A.
8
+ - Hypenated words are seaprated by '-' A-BOMB
9
+ _ Separator for multiple words with one lexical entry A_PRIORI
10
+ ' ACCOUNTANTS'
11
+ \' CAF\'E
12
+ \` CR\`ECHE
13
+ \^ AR\^ETE
14
+
15
+
16
+ Also there are these one-off words:
17
+
18
+ !EXCLAMATION-POINT
19
+ "DOUBLE-QUOTE
20
+ %PERCENT
21
+ &AMPERSAND
22
+ &EM
23
+ ...
24
+ &UN
25
+
26
+ 'EM
27
+ ...
28
+ 'UN
29
+ (LEFT-PAREN
30
+ )RIGHT-PAREN
31
+ +PLUS
32
+ ,COMMA
33
+ --DASH
34
+ -HYPHEN
35
+ -SHIRE
36
+ ...ELLIPSIS
37
+ .PERIOD
38
+ .POINT
39
+ /SLASH
40
+ :COLON
41
+ ;SEMI-COLON
42
+ ;SEMI-COLON
43
+ ;SEMICOLON
44
+ ?QUESTION-MARK
45
+ ...
46
+ {LEFT-BRACE
47
+ }RIGHT-BRACE
@@ -0,0 +1,48 @@
1
+ ARPAbet MRPA Edin. Alvey Example Relative frequency
2
+ p p p p put 3.1%
3
+ b b b b but 2.3%
4
+ t t t t ten 6.8%
5
+ d d d d den 4.1%
6
+ k k k k can 4.7%
7
+ m m m m man 3.1%
8
+ n n n n not 6.5%
9
+ l l l l like 5.5%
10
+ r r r r run 5.4%
11
+ f f f f full 1.8%
12
+ v v v v very 1.2%
13
+ s s s s some 6.6%
14
+ z z z z zeal 3.6%
15
+ hh h h h hat 0.8%
16
+ w w w w went 0.9%
17
+ g g g g game 1.3%
18
+ ch ch ch tS chain 0.5%
19
+ jh jh j dZ Jane 0.8%
20
+ ng ng ng 9 long 1.6%
21
+ th th th T thin 0.3%
22
+ dh dh dh D then 12.2%
23
+ sh sh sh S ship 1.2%
24
+ zh zh zh Z measure 0.1%
25
+ y y y j yes 0.8%
26
+ iy ii ee i bean 1.4%
27
+ aa aa ar A barn 0.9%
28
+ ao oo aw O born 1.0%
29
+ uw uu uu u boon 1.0%
30
+ er @@ er 3 burn 0.7%
31
+ ih i i I pit 10.0%
32
+ eh e e e pet 2.4%
33
+ ae a aa & pat 2.5%
34
+ ah uh u V putt 1.5%
35
+ oh o o 0 pot 1.6%
36
+ uh u oo U good 0.4%
37
+ ax @ a @ about 7.2%
38
+ ey ei ai eI bay 2.0%
39
+ ay ai ie aI buy 1.6%
40
+ oy oi oi oI boy 0.2%
41
+ ow ou oa @U no 1.5%
42
+ aw au ou aU now 0.4%
43
+ ia i@ eer I@ peer 0.7%
44
+ ea e@ air e@ pair 0.2%
45
+ ua u@ oor U@ poor 0.2%
46
+
47
+ See also the files abvailable from the Oxford Text Archive by FTP from
48
+ sable.ox.ac.uk in directories /pub/ota/public/dicts/710 and 1054.
@@ -0,0 +1,45 @@
1
+ sil
2
+ aa
3
+ ae
4
+ ah
5
+ ao
6
+ aw
7
+ ax
8
+ ay
9
+ b
10
+ ch
11
+ d
12
+ dh
13
+ ea
14
+ eh
15
+ er
16
+ ey
17
+ f
18
+ g
19
+ hh
20
+ ia
21
+ ih
22
+ iy
23
+ jh
24
+ k
25
+ l
26
+ m
27
+ n
28
+ ng
29
+ oh
30
+ ow
31
+ oy
32
+ p
33
+ r
34
+ s
35
+ sh
36
+ t
37
+ th
38
+ ua
39
+ uh
40
+ uw
41
+ v
42
+ w
43
+ y
44
+ z
45
+ zh
@@ -0,0 +1,130 @@
1
+ 'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
2
+ notation, using 'say' from rsynth to produce audial checking of the transcrip-
3
+ tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
4
+ info is fed to it under certain commands. No attempt is made to make use of
5
+ the stress markers (this is due to rsynth, not the utility...).
6
+
7
+ Commands are given to say the current transcription, say arbitrary text, look
8
+ up the current word in a dictionary &c.
9
+
10
+ --------------------------------------------------------------------------------
11
+ instructions.
12
+ --------------------------------------------------------------------------------
13
+
14
+
15
+ Basic instructions
16
+ ------------------
17
+
18
+ usage> sayTimit <dict file> [<start place>]
19
+
20
+ dict file is assumed in a format like:
21
+
22
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
23
+ LAKEFIELD l ey1 k f iy2 l d
24
+
25
+ ie, the word; the timit transcription, and an optional comment.
26
+
27
+ "start place" is a word which should be in the file; this just allows you
28
+ to start work in the middle of a file, if something went wrong on a previous
29
+ run &c &c.
30
+
31
+ Before running - you need to determine how 'say' (from rsynth) is to be run.
32
+ Just change line that looks like:
33
+
34
+ > open(SAY, "| <say-command>"); # crank up 'say'.
35
+
36
+ eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
37
+ headphone port. Additionally, the port and gain are controllable on SUNs with
38
+ the 'gaintool' program which is part of the SunOS SOUND demonstration package
39
+ (have a look in /usr/demo/SOUND on your machine).
40
+
41
+
42
+ Using it.
43
+ --------
44
+
45
+ eg on the line for
46
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
47
+
48
+ the following is shown: the current line and the translation into ota phonemes.
49
+
50
+ >> LACKLUSTER l ae1 k l ah2 s t ax axr
51
+ == l'&kl,Vst@R
52
+
53
+ The control is a loop which only exits on 'n'.
54
+
55
+ * hitting return will 'say' the transcription (in ota format)
56
+ * hitting n <+cr> goes to the next line in the dict.
57
+ * s <text> sends the text verbatim to 'say' - eg you might want to listen to
58
+ a similar word: s duster
59
+ or try a different transcription: s [lVst3]
60
+
61
+ * t <text> makes the current line 'text' - eg can edit the dict file in a
62
+ separate window, and paste in a version to try.
63
+
64
+ * p retrieves the previous version of the current line (ie, before you used 't')
65
+
66
+
67
+ 'sayTimit' is a util. to check/develop transcription dictionaries in the "beep"
68
+ notation, using 'say' from rsynth to produce audial checking of the transcrip-
69
+ tion. To save time, 'say' is invoked as the destination of a UNIX pipe, and
70
+ info is fed to it under certain commands. No attempt is made to make use of
71
+ the stress markers (this is due to rsynth, not the utility...).
72
+
73
+ Commands are given to say the current transcription, say arbitrary text, look
74
+ up the current word in a dictionary &c.
75
+
76
+ --------------------------------------------------------------------------------
77
+ instructions.
78
+ --------------------------------------------------------------------------------
79
+
80
+
81
+ Basic instructions
82
+ ------------------
83
+
84
+ usage> sayTimit <dict file> [<start place>]
85
+
86
+ dict file is assumed in a format like:
87
+
88
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
89
+ LAKEFIELD l ey1 k f iy2 l d
90
+
91
+ ie, the word; the timit transcription, and an optional comment.
92
+
93
+ "start place" is a word which should be in the file; this just allows you
94
+ to start work in the middle of a file, if something went wrong on a previous
95
+ run &c &c.
96
+
97
+ Before running - you need to determine how 'say' (from rsynth) is to be run.
98
+ Just change line that looks like:
99
+
100
+ > open(SAY, "| <say-command>"); # crank up 'say'.
101
+
102
+ eg, "say +h -g 0.2" set the gain (volume) to 20%, and sends the sound to the
103
+ headphone port. Additionally, the port and gain are controllable on SUNs with
104
+ the 'gaintool' program which is part of the SunOS SOUND demonstration package
105
+ (have a look in /usr/demo/SOUND on your machine).
106
+
107
+
108
+ Using it.
109
+ --------
110
+
111
+ eg on the line for
112
+ LACKLUSTER l ae1 k l ah2 s t ax axr # comment.
113
+
114
+ the following is shown: the current line and the translation into ota phonemes.
115
+
116
+ >> LACKLUSTER l ae1 k l ah2 s t ax axr
117
+ == l'&kl,Vst@R
118
+
119
+ The control is a loop which only exits on 'n'.
120
+
121
+ * hitting return will 'say' the transcription (in ota format)
122
+ * hitting n <+cr> goes to the next line in the dict.
123
+ * s <text> sends the text verbatim to 'say' - eg you might want to listen to
124
+ a similar word: s duster
125
+ or try a different transcription: s [lVst3]
126
+
127
+ * t <text> makes the current line 'text' - eg can edit the dict file in a
128
+ separate window, and paste in a version to try.
129
+
130
+ * p retrieves the previous version of the current line (ie, before you used 't')
@@ -0,0 +1,174 @@
1
+ #! /usr/local/bin/perl
2
+
3
+ # sayTimit
4
+
5
+ # Paul Callaghan, may1994.
6
+ # University of Durham.
7
+
8
+ # instructions elsewhere (in file sayTimit.doc)
9
+
10
+ %timit2ota = (
11
+ "b", "b",
12
+ "d", "d",
13
+ "g", "g",
14
+ "p", "p",
15
+ "t", "t",
16
+ "k", "k",
17
+ "dx", "d",
18
+ "q", "t",
19
+
20
+
21
+ "jh", "dZ",
22
+ "ch", "tS",
23
+
24
+
25
+ "s", "s",
26
+ "sh", "S",
27
+ "z", "z",
28
+ "zh", "Z",
29
+ "f", "f",
30
+ "th", "T",
31
+ "v", "v",
32
+ "dh", "D",
33
+
34
+
35
+ "m", "m",
36
+ "n", "n",
37
+ "ng", "N",
38
+ "em", "m",
39
+ "en", "n",
40
+ "eng", "N",
41
+ "nx", "n",
42
+
43
+
44
+
45
+ "l", "l",
46
+ "r", "r",
47
+ "w", "w",
48
+ "y", "j",
49
+ "hh", "h",
50
+ "hv", "h",
51
+ "el", "l",
52
+
53
+
54
+
55
+
56
+ "iy", "i",
57
+ "ih", "I",
58
+ "eh", "e",
59
+
60
+ "ea", "e@", # eg bare, or air.
61
+ "ey", "eI",
62
+ "ae", "&",
63
+ "aa", "A",
64
+ "aw", "aU",
65
+ "ay", "aI",
66
+ "ah", "V",
67
+
68
+ "oh", "0",
69
+ "oy", "oI",
70
+ "ow", "@U",
71
+ "uh", "U",
72
+ "uw", "u",
73
+ "ux", "u",
74
+
75
+ "er", "3",
76
+ "ax", "@",
77
+ "ix", "I",
78
+ "axr", "R",
79
+ "ax-h", "e", # forgotten what this is!
80
+ # non-timit symbols for RP Vowels
81
+ "ia", "I@", # as in 'beer'
82
+ "ao", "O", # as in 'cord'
83
+ "ua", "U@", # as in 'tour'
84
+
85
+ "epi", " ", # epenthetic silence
86
+ "sil", " ", # silence
87
+ "pau", " ", # pause
88
+
89
+ "1", "'", # primary stress
90
+ "2", "," # secondary stress.
91
+ );
92
+
93
+ ################################################################################
94
+ # crank up 'say'.
95
+
96
+ # "say:sound <params>" needs to be changed to your installation of
97
+ # rsynth, with desired parameters.
98
+
99
+ open(SAY, "| say:sound +h -g 0.2") || die "Couldn't start SAY: $!\n";
100
+
101
+ # then make it flush after every write/print operation. The method is to
102
+ # set it temporarily as the default output channel, then get it to flush,
103
+ # then reset the old default channel. Surprisingly (for some ppl), SAY will
104
+ # STILL flush as required.
105
+
106
+ $oldofh = select(SAY); # make it default
107
+ $| = 1; # flush after each IO op.
108
+ select($oldofh); # and reset.
109
+
110
+
111
+ # open input file.
112
+
113
+ open(INPUT, $ARGV[0]) || die "Couldn't open pronunciations file $ARGV[0].\n";
114
+ shift;
115
+
116
+ # startfrom word?
117
+ $startfrom = $ARGV[0];
118
+
119
+ if ($startfrom ne "") {
120
+ do {
121
+ $_ = <INPUT>;
122
+ @tmp = split;
123
+ } until ( $tmp[0] eq $startfrom);
124
+ print "startfrom " . $_ . "\n";
125
+
126
+ } else {
127
+ $_ = <INPUT>; # first line expected
128
+ }
129
+
130
+
131
+ # main loop.
132
+
133
+ START:
134
+ do {
135
+ $orig = $_;
136
+ s/#.*//; # kill comment
137
+ s/[.]//g; # kill '.'
138
+
139
+ s/([a-z]*)1/1 \1/g;
140
+ s/([a-z]*)2/2 \1/g;
141
+ # change stress notation: OTA seems to require
142
+ # marks BEFORE the syllable, not AFTER the
143
+ # 'nuclear'(?) vowel.
144
+
145
+ @tmp = split;
146
+
147
+ $phons = "";
148
+ foreach $p (@tmp[1..$#tmp]) {
149
+ if ($timit2ota{$p} eq "") {
150
+ print "\nERROR: unknown symbol " . $p;
151
+ } else {
152
+ $phons .= $timit2ota{$p};
153
+ }
154
+ }
155
+ $tmp[0] =~ tr/A-Z/a-z/; # current word to lower case.
156
+
157
+ $ig = 0;
158
+ do {
159
+ print STDOUT ">> " . $orig;
160
+ print STDOUT "== " . $phons . "\n";
161
+ unless ($ig) { print SAY "[" . $phons . "]\n"; }
162
+ # write result to SAY
163
+ $ig = 0;
164
+ $cmd = <STDIN>;
165
+ chop $cmd;
166
+ if ($cmd =~ /^s/) { $cmd =~ s/^s//; if ($cmd eq "") { print SAY $tmp[0] . "\n"; } else { print SAY $cmd . "\n"; } $ig = 1;}
167
+ if ($cmd =~ /^t /) { $old = $_; $cmd =~ s/^t //; $_ = $cmd; goto START; }
168
+ if ($cmd =~ /^p/) { $_ = $old; goto START; }
169
+
170
+ } until ($cmd eq "n");
171
+ } while (<INPUT>);
172
+
173
+ close(INPUT);
174
+ close(SAY);
@@ -0,0 +1,36 @@
1
+
2
+ CMUdict
3
+ -------
4
+
5
+ CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free
6
+ pronouncing dictionary of English, suitable for uses in speech
7
+ technology and is maintained by the Speech Group in the School of
8
+ Computer Science at Carnegie Mellon University.
9
+
10
+ The Carnegie Mellon Speech Group does not guarantee the accuracy of
11
+ this dictionary, nor its suitability for any specific purpose. In
12
+ fact, we expect a number of errors, omissions and inconsistencies to
13
+ remain in the dictionary. We intend to continually update the
14
+ dictionary by correction existing entries and by adding new ones. From
15
+ time to time a new major version will be released.
16
+
17
+ We welcome input from users: Please send email to Alex Rudnicky
18
+ (air+cmudict@cs.cmu.edu).
19
+
20
+ The Carnegie Mellon Pronouncing Dictionary, in its current and
21
+ previous versions is Copyright (C) 1993-2008 by Carnegie Mellon
22
+ University. Use of this dictionary for any research or commercial
23
+ purpose is completely unrestricted. If you make use of or
24
+ redistribute this material we request that you acknowledge its
25
+ origin in your descriptions.
26
+
27
+ If you add words to or correct words in your version of this
28
+ dictionary, we would appreciate it if you could send these additions
29
+ and corrections to us (air+cmudict@cs.cmu.edu) for consideration in a
30
+ subsequent version. All submissions will be reviewed and approved by
31
+ the current maintainer, Alex Rudnicky at Carnegie Mellon.
32
+
33
+ ------------------------------------------------------------------
34
+ The current version of cmudict is cmudict.0.7a
35
+ [First released October 29, 2007]
36
+