syllabify 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +2 -0
- data/.yardopts +1 -0
- data/LICENSE +10 -0
- data/README.markdown +71 -0
- data/languages/en.yml +120 -0
- data/lib/cody_robbins/syllabify/syllable.rb +64 -0
- data/lib/cody_robbins/syllabify.rb +404 -0
- data/lib/syllabify.rb +4 -0
- data/syllabify.gemspec +19 -0
- metadata +73 -0
data/.gitignore
ADDED
data/.yardopts
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--no-private --charset UTF-8 --markup markdown lib/**/*.rb - LICENSE
|
data/LICENSE
ADDED
@@ -0,0 +1,10 @@
|
|
1
|
+
The MIT License
|
2
|
+
===============
|
3
|
+
|
4
|
+
© 2011 Cody Robbins
|
5
|
+
|
6
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
7
|
+
|
8
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
9
|
+
|
10
|
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
Syllabify
|
2
|
+
=========
|
3
|
+
|
4
|
+
A Ruby port of the [Penn Phonetic Toolkit](http://www.ling.upenn.edu/phonetics/p2tk/) (P2TK) [syllabifier](https://p2tk.svn.sourceforge.net/svnroot/p2tk/python/syllabify/). Unlike the P2TK syllabifier, this implementation works on transcriptions in [IPA](http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) rather than [Arpabet](http://en.wikipedia.org/wiki/Arpabet). Given a phonemic transcription in IPA, it automatically segments the phonemes into [syllables](http://en.wikipedia.org/wiki/Syllable).
|
5
|
+
|
6
|
+
Like the P2TK syllabifier, a [phoneme](http://en.wikipedia.org/wiki/Phoneme) inventory containing the legal [consonants](http://en.wikipedia.org/wiki/Consonant), [nuclei](http://en.wikipedia.org/wiki/Syllable#Nucleus) (typically the language’s vowels), and [onsets](http://en.wikipedia.org/wiki/Syllable#Onset) in the transcribed language must be created. This inventory is specified as plain text in YAML and a default phoneme inventory for English from the P2TK syllabifier is included. If you create inventories for other languages, please submit a pull request (or simply email it to me if you’re not a techie) and I will include it in subsequent releases of the gem.
|
7
|
+
|
8
|
+
Full documentation is at [RubyDoc.info](http://rubydoc.info/gems/syllabify).
|
9
|
+
|
10
|
+
Transcription constraints
|
11
|
+
-------------------------
|
12
|
+
|
13
|
+
Any phonemes represented in IPA by [digraphs](http://en.wikipedia.org/wiki/Digraph_\(orthography\)) (such as [affricates](http://en.wikipedia.org/wiki/Affricate_consonant), [doubly-articulated consonants](http://en.wikipedia.org/wiki/Doubly_articulated_consonant), and [diphthongs](http://en.wikipedia.org/wiki/Diphthong)) must be transcribed using a [tie](http://en.wikipedia.org/wiki/Tie_\(typography\)), otherwise there is no way to distinguish them from the phonemes of their individual components and syllabification will be incorrect in some cases.
|
14
|
+
|
15
|
+
For example, in English transcriptions the [voiceless postalveolar affricate](http://en.wikipedia.org/wiki/Voiceless_postalveolar_affricate) is customarily transcribed without the tie. For the purposes of syllabification, however, this is problematic because in English /t͡ʃ/ is a legal onset but /tʃ/ is not. If the English voiceless postalveolar affricate were to be transcribed without the tie as /tʃ/ then it would have to be included in the inventory of onsets, but doing so would cause the phoneme sequence /t/ followed by /ʃ/ to also be interpreted as an onset—which it isn’t. Given this constraint on the inventory, without the tie transcriptions where /tʃ/ represents two different phonemes rather than one—such as in *nutshell* /nʌtʃɛl/—will be incorrectly syllabified as /nʌ.tʃɛl/ rather than /nʌt.ʃɛl/. Transcribing as /nʌt͡ʃɛl/, however, produces the correct syllabification. Similarly, without a tie it’s not possible to determine whether the diphthong /ɔɪ/ in *clawing* /klɔɪŋ/ is one or two separate phonemes.
|
16
|
+
|
17
|
+
In other words, tie glyphs together if they represent the same phoneme. The only digraphs requiring ties in English are the voiced and voiceless postalveolar affricates consonants /t͡ʃ, d͡ʒ/ and the diphthongs /a͡ʊ, a͡ɪ, e͡ɪ, o͡ʊ, ɔ͡ɪ/.
|
18
|
+
|
19
|
+
### How to enter ties
|
20
|
+
|
21
|
+
The tie is represented in Unicode by [Combining Double Inverted Breve (U+0361)](http://www.unicode.org/charts/PDF/U0300.pdf). This character is entered between the two characters to be tied.
|
22
|
+
|
23
|
+
### Ligatures
|
24
|
+
|
25
|
+
Phonemes that require transcription with ties could potentially be alternatively transcribed using their respective [ligatures](http://en.wikipedia.org/wiki/Typographic_ligature), but glyphs for all the potential ligatures aren’t defined in Unicode and use of the ligatures is no longer official IPA usage in any case.
|
26
|
+
|
27
|
+
Example
|
28
|
+
-------
|
29
|
+
|
30
|
+
transcription = CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən')
|
31
|
+
|
32
|
+
transcription.to_s #=> 'dɪ.ˌsɔr.gə.nə.ˈze͡ɪ.ʃən'
|
33
|
+
transcription.syllables #=> [dɪ, ˌsɔr, gə, nə, ˈze͡ɪ, ʃən]
|
34
|
+
|
35
|
+
syllable = transcription.syllables[4]
|
36
|
+
syllable.stress #=> 'ˈ'
|
37
|
+
syllable.onset #=> 'z'
|
38
|
+
syllable.nucleus #=> 'e͡ɪ'
|
39
|
+
syllable.coda #=> ''
|
40
|
+
|
41
|
+
syllable = transcription.syllables.last
|
42
|
+
syllable.stress #=> nil
|
43
|
+
syllable.onset #=> 'ʃ'
|
44
|
+
syllable.nucleus #=> 'ə'
|
45
|
+
syllable.coda #=> 'n'
|
46
|
+
|
47
|
+
Colophon
|
48
|
+
--------
|
49
|
+
|
50
|
+
### See also
|
51
|
+
|
52
|
+
If you like this gem, you may also want to check out [transliterate](http://codyrobbins.com/software/transliterate).
|
53
|
+
|
54
|
+
### Tested with
|
55
|
+
|
56
|
+
* Ruby 1.9.2-p290 — 18 October 2011
|
57
|
+
|
58
|
+
### Contributing
|
59
|
+
|
60
|
+
* [Source](https://github.com/codyrobbins/syllabify)
|
61
|
+
* [Bug reports](https://github.com/codyrobbins/syllabify/issues)
|
62
|
+
|
63
|
+
To send patches, please fork on GitHub and submit a pull request.
|
64
|
+
|
65
|
+
### Credits
|
66
|
+
|
67
|
+
© 2011 [Cody Robbins](http://codyrobbins.com/). See LICENSE for details.
|
68
|
+
|
69
|
+
* [Homepage](http://codyrobbins.com/software/syllabify)
|
70
|
+
* [My other gems](http://codyrobbins.com/software#gems)
|
71
|
+
* [Follow me on Twitter](http://twitter.com/codyrobbins)
|
data/languages/en.yml
ADDED
@@ -0,0 +1,120 @@
|
|
1
|
+
consonants:
|
2
|
+
- t͡ʃ
|
3
|
+
- d͡ʒ
|
4
|
+
- b
|
5
|
+
- d
|
6
|
+
- ð
|
7
|
+
- f
|
8
|
+
- g
|
9
|
+
- h
|
10
|
+
- k
|
11
|
+
- l
|
12
|
+
- m
|
13
|
+
- n
|
14
|
+
- ŋ
|
15
|
+
- p
|
16
|
+
- r
|
17
|
+
- s
|
18
|
+
- ʃ
|
19
|
+
- t
|
20
|
+
- θ
|
21
|
+
- v
|
22
|
+
- w
|
23
|
+
- j
|
24
|
+
- z
|
25
|
+
- ʒ
|
26
|
+
- x
|
27
|
+
nuclei:
|
28
|
+
- a͡ʊ
|
29
|
+
- a͡ɪ
|
30
|
+
- e͡ɪ
|
31
|
+
- o͡ʊ
|
32
|
+
- ɔ͡ɪ
|
33
|
+
- ɛ͡ə
|
34
|
+
- iː
|
35
|
+
- ɔː
|
36
|
+
- uː
|
37
|
+
- ɑ
|
38
|
+
- æ
|
39
|
+
- ʌ
|
40
|
+
- ə
|
41
|
+
- ɔ
|
42
|
+
- ɛ
|
43
|
+
- ɝ
|
44
|
+
- ɚ
|
45
|
+
- ɪ
|
46
|
+
- i
|
47
|
+
- ʊ
|
48
|
+
- u
|
49
|
+
- ø
|
50
|
+
- y
|
51
|
+
onsets:
|
52
|
+
- p
|
53
|
+
- t
|
54
|
+
- k
|
55
|
+
- b
|
56
|
+
- d
|
57
|
+
- g
|
58
|
+
- f
|
59
|
+
- v
|
60
|
+
- θ
|
61
|
+
- ð
|
62
|
+
- s
|
63
|
+
- z
|
64
|
+
- ʃ
|
65
|
+
- t͡ʃ
|
66
|
+
- d͡ʒ
|
67
|
+
- m
|
68
|
+
- n
|
69
|
+
- r
|
70
|
+
- l
|
71
|
+
- h
|
72
|
+
- w
|
73
|
+
- j
|
74
|
+
- pr
|
75
|
+
- tr
|
76
|
+
- kr
|
77
|
+
- br
|
78
|
+
- dr
|
79
|
+
- gr
|
80
|
+
- fr
|
81
|
+
- θr
|
82
|
+
- ʃr
|
83
|
+
- pl
|
84
|
+
- kl
|
85
|
+
- bl
|
86
|
+
- gl
|
87
|
+
- fl
|
88
|
+
- sl
|
89
|
+
- tw
|
90
|
+
- kw
|
91
|
+
- dw
|
92
|
+
- sw
|
93
|
+
- sp
|
94
|
+
- st
|
95
|
+
- sk
|
96
|
+
- sf
|
97
|
+
- sm
|
98
|
+
- sn
|
99
|
+
- gw
|
100
|
+
- ʃw
|
101
|
+
- spr
|
102
|
+
- spl
|
103
|
+
- str
|
104
|
+
- skr
|
105
|
+
- skw
|
106
|
+
- skl
|
107
|
+
- θw
|
108
|
+
- ʒ
|
109
|
+
- pj
|
110
|
+
- kj
|
111
|
+
- bj
|
112
|
+
- fj
|
113
|
+
- hj
|
114
|
+
- vj
|
115
|
+
- θj
|
116
|
+
- mj
|
117
|
+
- spj
|
118
|
+
- skj
|
119
|
+
- gj
|
120
|
+
- hw
|
@@ -0,0 +1,64 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
module CodyRobbins
|
4
|
+
class Syllabify
|
5
|
+
class Syllable
|
6
|
+
# Any [stress](http://en.wikipedia.org/wiki/Stress_\(linguistics\)) marks associated with the syllable as a whole.
|
7
|
+
attr_reader(:stress)
|
8
|
+
|
9
|
+
# The [onset](http://en.wikipedia.org/wiki/Syllable#Onset) (ω) of the syllable.
|
10
|
+
attr_reader(:onset)
|
11
|
+
|
12
|
+
# The [nucleus](http://en.wikipedia.org/wiki/Syllable#Nucleus) (ν) of the syllable.
|
13
|
+
attr_reader(:nucleus)
|
14
|
+
|
15
|
+
# The [coda](http://en.wikipedia.org/wiki/Syllable_coda) (κ) of the syllable.
|
16
|
+
attr_reader(:coda)
|
17
|
+
|
18
|
+
# @private
|
19
|
+
def initialize(stress, onset, nucleus, coda = '')
|
20
|
+
set_stress(stress)
|
21
|
+
set_onset(onset)
|
22
|
+
set_nucleus(nucleus)
|
23
|
+
set_coda(coda)
|
24
|
+
end
|
25
|
+
|
26
|
+
# Joins the stress, onset, nucleus, and coda to form a single string representation of the [syllable](http://en.wikipedia.org/wiki/Syllable).
|
27
|
+
#
|
28
|
+
# @return [String]
|
29
|
+
#
|
30
|
+
# @example
|
31
|
+
# CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən').syllables[4].to_s #=> 'ˈze͡ɪ'
|
32
|
+
def to_s
|
33
|
+
join(stress, onset, nucleus, coda)
|
34
|
+
end
|
35
|
+
|
36
|
+
# @private
|
37
|
+
def append_coda(coda)
|
38
|
+
@coda += coda
|
39
|
+
end
|
40
|
+
|
41
|
+
protected
|
42
|
+
|
43
|
+
def set_stress(stress)
|
44
|
+
@stress = stress
|
45
|
+
end
|
46
|
+
|
47
|
+
def set_onset(onset)
|
48
|
+
@onset = onset
|
49
|
+
end
|
50
|
+
|
51
|
+
def set_nucleus(nucleus)
|
52
|
+
@nucleus = nucleus
|
53
|
+
end
|
54
|
+
|
55
|
+
def set_coda(coda)
|
56
|
+
@coda = coda
|
57
|
+
end
|
58
|
+
|
59
|
+
def join(*components)
|
60
|
+
components.join('')
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
@@ -0,0 +1,404 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
require('yaml')
|
4
|
+
|
5
|
+
module CodyRobbins
|
6
|
+
class Syllabify
|
7
|
+
# Create a new syllabified representation of an [IPA](http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) transcription.
|
8
|
+
#
|
9
|
+
# @param language [Symbol, String] The [ISO 639](http://en.wikipedia.org/wiki/ISO_639) code of the language represented in the transcription. If the language has a two-letter [ISO 639-1](http://en.wikipedia.org/wiki/ISO_639-1) code, use that; otherwise, use the three-letter [ISO 639-3](http://en.wikipedia.org/wiki/ISO_639-3) code. This maps to the phoneme inventory definitions in the `languages` directory.
|
10
|
+
# @param transcription [String] An [IPA](http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) transcription to syllabify. Any phonemes represented by digraphs must be combined with a tie as discussed in the {file:README}.
|
11
|
+
#
|
12
|
+
# @example
|
13
|
+
# transcription = CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən')
|
14
|
+
#
|
15
|
+
# transcription.to_s #=> 'dɪ.ˌsɔr.gə.nə.ˈze͡ɪ.ʃən'
|
16
|
+
# transcription.syllables #=> [dɪ, ˌsɔr, gə, nə, ˈze͡ɪ, ʃən]
|
17
|
+
def initialize(language, transcription)
|
18
|
+
set_language(language)
|
19
|
+
set_transcription(transcription)
|
20
|
+
initialize_coda_and_onset
|
21
|
+
end
|
22
|
+
|
23
|
+
# Render a syllabified [IPA](http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) transcription of the input transcription. Syllables are delimited by the IPA syllable delimiter.
|
24
|
+
#
|
25
|
+
# @return [String]
|
26
|
+
#
|
27
|
+
# @example
|
28
|
+
# CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən').to_s #=> 'dɪ.ˌsɔr.gə.nə.ˈze͡ɪ.ʃən'
|
29
|
+
def to_s
|
30
|
+
syllables_as_strings.join(SYLLABLE_DELIMETER)
|
31
|
+
end
|
32
|
+
|
33
|
+
# Return the individual {Syllable} objects representing the transcription’s syllables.
|
34
|
+
#
|
35
|
+
# @return [Array] The {Syllable} objects representing each individual syllable.
|
36
|
+
#
|
37
|
+
# @example
|
38
|
+
# CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən').syllables #=> [dɪ, ˌsɔr, gə, nə, ˈze͡ɪ, ʃən]
|
39
|
+
def syllables
|
40
|
+
@syllables ||= build_syllables
|
41
|
+
end
|
42
|
+
|
43
|
+
protected
|
44
|
+
|
45
|
+
attr_reader(:language,
|
46
|
+
:transcription,
|
47
|
+
:phoneme,
|
48
|
+
:stress,
|
49
|
+
:coda_and_onset,
|
50
|
+
:coda,
|
51
|
+
:onset)
|
52
|
+
|
53
|
+
# @private
|
54
|
+
SYLLABLE_DELIMETER = '.'
|
55
|
+
|
56
|
+
def set_language(language)
|
57
|
+
@language = language
|
58
|
+
end
|
59
|
+
|
60
|
+
def set_transcription(transcription)
|
61
|
+
@transcription = transcription
|
62
|
+
end
|
63
|
+
|
64
|
+
def initialize_coda_and_onset
|
65
|
+
@coda_and_onset = []
|
66
|
+
end
|
67
|
+
|
68
|
+
def syllables_as_strings
|
69
|
+
syllables.collect(&:to_s)
|
70
|
+
end
|
71
|
+
|
72
|
+
def join(array)
|
73
|
+
array.try(:join, '')
|
74
|
+
end
|
75
|
+
|
76
|
+
def phonemes
|
77
|
+
transcription.scan(transcription_tokenizing_regex)
|
78
|
+
end
|
79
|
+
|
80
|
+
def transcription_tokenizing_regex
|
81
|
+
Regexp.new("[ˈˌ]?(?:#{all_phonemes_disjunction})")
|
82
|
+
end
|
83
|
+
|
84
|
+
def all_phonemes_disjunction
|
85
|
+
all_phonemes.join('|')
|
86
|
+
end
|
87
|
+
|
88
|
+
def all_phonemes
|
89
|
+
phoneme_inventory[:consonants] + phoneme_inventory[:nuclei]
|
90
|
+
end
|
91
|
+
|
92
|
+
def phoneme_inventory
|
93
|
+
HashWithIndifferentAccess.new(phoneme_inventory_yaml)
|
94
|
+
end
|
95
|
+
|
96
|
+
def phoneme_inventory_yaml
|
97
|
+
YAML.load_file(phoneme_inventory_file)
|
98
|
+
end
|
99
|
+
|
100
|
+
def phoneme_inventory_file
|
101
|
+
"#{this_directory}/../../languages/#{language}.yml"
|
102
|
+
end
|
103
|
+
|
104
|
+
def this_directory
|
105
|
+
File.dirname(this_file)
|
106
|
+
end
|
107
|
+
|
108
|
+
def this_file
|
109
|
+
__FILE__
|
110
|
+
end
|
111
|
+
|
112
|
+
def build_syllables
|
113
|
+
initialize_syllables
|
114
|
+
process_each_phoneme
|
115
|
+
process_remaining_coda_and_onset_if_present
|
116
|
+
return_syllables
|
117
|
+
end
|
118
|
+
|
119
|
+
def initialize_syllables
|
120
|
+
@syllables = []
|
121
|
+
end
|
122
|
+
|
123
|
+
def process_each_phoneme
|
124
|
+
phonemes.each do |phoneme|
|
125
|
+
set_phoneme(phoneme)
|
126
|
+
strip_phoneme_and_process
|
127
|
+
end
|
128
|
+
end
|
129
|
+
|
130
|
+
def strip_phoneme_and_process
|
131
|
+
strip_phoneme
|
132
|
+
process_phoneme_unless_blank
|
133
|
+
end
|
134
|
+
|
135
|
+
def set_phoneme(phoneme)
|
136
|
+
@phoneme = phoneme
|
137
|
+
end
|
138
|
+
|
139
|
+
def strip_phoneme
|
140
|
+
@phoneme.strip!
|
141
|
+
end
|
142
|
+
|
143
|
+
def process_phoneme_unless_blank
|
144
|
+
process_phoneme unless phoneme_blank?
|
145
|
+
end
|
146
|
+
|
147
|
+
def phoneme_blank?
|
148
|
+
phoneme.blank?
|
149
|
+
end
|
150
|
+
|
151
|
+
def process_phoneme
|
152
|
+
remove_stress_from_phoneme_if_stressed
|
153
|
+
categorize_phoneme
|
154
|
+
end
|
155
|
+
|
156
|
+
def initialize_stress
|
157
|
+
set_stress(nil)
|
158
|
+
end
|
159
|
+
|
160
|
+
def set_stress(stress)
|
161
|
+
@stress = stress
|
162
|
+
end
|
163
|
+
|
164
|
+
def remove_stress_from_phoneme_if_stressed
|
165
|
+
remove_stress_from_phoneme if phoneme_has_stress?
|
166
|
+
end
|
167
|
+
|
168
|
+
def phoneme_has_stress?
|
169
|
+
phoneme_has_primary_stress? || phoneme_has_secondary_stress?
|
170
|
+
end
|
171
|
+
|
172
|
+
def phoneme_has_primary_stress?
|
173
|
+
first_character_of_phoneme == PRIMARY_STRESS_MARKER
|
174
|
+
end
|
175
|
+
|
176
|
+
def phoneme_has_secondary_stress?
|
177
|
+
first_character_of_phoneme == SECONDARY_STRESS_MARKER
|
178
|
+
end
|
179
|
+
|
180
|
+
# @private
|
181
|
+
PRIMARY_STRESS_MARKER = 'ˈ'
|
182
|
+
|
183
|
+
# @private
|
184
|
+
SECONDARY_STRESS_MARKER = 'ˌ'
|
185
|
+
|
186
|
+
def first_character_of_phoneme
|
187
|
+
phoneme[0]
|
188
|
+
end
|
189
|
+
|
190
|
+
def remove_stress_from_phoneme
|
191
|
+
set_stress_to_first_character_of_phoneme
|
192
|
+
set_phoneme_to_remaining_characters_of_phoneme
|
193
|
+
end
|
194
|
+
|
195
|
+
def set_stress_to_first_character_of_phoneme
|
196
|
+
set_stress(first_character_of_phoneme)
|
197
|
+
end
|
198
|
+
|
199
|
+
def set_phoneme_to_remaining_characters_of_phoneme
|
200
|
+
set_phoneme(remaining_characters_of_phoneme)
|
201
|
+
end
|
202
|
+
|
203
|
+
def remaining_characters_of_phoneme
|
204
|
+
phoneme[1..-1]
|
205
|
+
end
|
206
|
+
|
207
|
+
def categorize_phoneme
|
208
|
+
if nucleus?
|
209
|
+
assemble_syllable
|
210
|
+
elsif not_consonant_or_syllable_delimeter?
|
211
|
+
raise_invalid_phoneme_exception
|
212
|
+
else
|
213
|
+
add_phoneme_to_coda_and_onset
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
def nucleus?
|
218
|
+
nuclei.include?(phoneme)
|
219
|
+
end
|
220
|
+
|
221
|
+
def nuclei
|
222
|
+
phoneme_inventory[:nuclei]
|
223
|
+
end
|
224
|
+
|
225
|
+
def assemble_syllable
|
226
|
+
split_coda_and_onset
|
227
|
+
append_coda_to_last_syllable_unless_no_syllables?
|
228
|
+
append_new_syllable
|
229
|
+
initialize_stress
|
230
|
+
initialize_coda_and_onset
|
231
|
+
end
|
232
|
+
|
233
|
+
def split_coda_and_onset
|
234
|
+
if coda_and_onset_include_syllable_delimeter?
|
235
|
+
split_coda_and_onset_at_syllable_delimeter
|
236
|
+
else
|
237
|
+
split_coda_and_onset_at_largest_valid_onset
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
def coda_and_onset_include_syllable_delimeter?
|
242
|
+
coda_and_onset.include?(SYLLABLE_DELIMETER)
|
243
|
+
end
|
244
|
+
|
245
|
+
def split_coda_and_onset_at_syllable_delimeter
|
246
|
+
split_coda_and_onset_at(coda_and_onset_syllable_delimiter_position)
|
247
|
+
end
|
248
|
+
|
249
|
+
def coda_and_onset_syllable_delimiter_position
|
250
|
+
coda_and_onset.index(SYLLABLE_DELIMETER)
|
251
|
+
end
|
252
|
+
|
253
|
+
def split_coda_and_onset_at(midpoint)
|
254
|
+
split_coda(midpoint)
|
255
|
+
split_onset(midpoint)
|
256
|
+
end
|
257
|
+
|
258
|
+
def split_coda(midpoint)
|
259
|
+
set_coda(join(coda_from_coda_and_onset(midpoint)))
|
260
|
+
end
|
261
|
+
|
262
|
+
def set_coda(coda)
|
263
|
+
@coda = coda
|
264
|
+
end
|
265
|
+
|
266
|
+
def coda_from_coda_and_onset(midpoint)
|
267
|
+
coda_and_onset[0, midpoint]
|
268
|
+
end
|
269
|
+
|
270
|
+
def split_onset(midpoint)
|
271
|
+
set_onset(join(onset_from_coda_and_onset(midpoint)))
|
272
|
+
end
|
273
|
+
|
274
|
+
def set_onset(onset)
|
275
|
+
@onset = onset
|
276
|
+
end
|
277
|
+
|
278
|
+
def onset_from_coda_and_onset(midpoint)
|
279
|
+
coda_and_onset[midpoint, coda_and_onset.length]
|
280
|
+
end
|
281
|
+
|
282
|
+
def split_coda_and_onset_at_largest_valid_onset
|
283
|
+
coda_and_onset_split_range.each do |midpoint|
|
284
|
+
split_coda_and_onset_at(midpoint)
|
285
|
+
break if onset_or_start_of_word?(onset)
|
286
|
+
end
|
287
|
+
end
|
288
|
+
|
289
|
+
def coda_and_onset_split_range
|
290
|
+
0..coda_and_onset_range_length
|
291
|
+
end
|
292
|
+
|
293
|
+
def coda_and_onset_range_length
|
294
|
+
coda_and_onset.length + 1
|
295
|
+
end
|
296
|
+
|
297
|
+
def onset_or_start_of_word?(onset)
|
298
|
+
onset?(onset) || start_of_word?
|
299
|
+
end
|
300
|
+
|
301
|
+
def onset?(string)
|
302
|
+
onsets.include?(string)
|
303
|
+
end
|
304
|
+
|
305
|
+
def onsets
|
306
|
+
phoneme_inventory[:onsets]
|
307
|
+
end
|
308
|
+
|
309
|
+
def no_syllables?
|
310
|
+
syllables.empty?
|
311
|
+
end
|
312
|
+
alias :start_of_word? :no_syllables?
|
313
|
+
|
314
|
+
def append_coda_to_last_syllable_unless_no_syllables?
|
315
|
+
append_coda_to_last_syllable unless no_syllables?
|
316
|
+
end
|
317
|
+
|
318
|
+
def append_coda_to_last_syllable
|
319
|
+
last_syllable.append_coda(coda)
|
320
|
+
end
|
321
|
+
|
322
|
+
def last_syllable
|
323
|
+
@syllables.last
|
324
|
+
end
|
325
|
+
|
326
|
+
def append_new_syllable
|
327
|
+
create_syllable(onset, phoneme)
|
328
|
+
end
|
329
|
+
|
330
|
+
def create_syllable(onset, nucleus = nil)
|
331
|
+
@syllables << new_syllable(onset, nucleus)
|
332
|
+
end
|
333
|
+
|
334
|
+
def new_syllable(onset, nucleus)
|
335
|
+
Syllable.new(stress,
|
336
|
+
onset,
|
337
|
+
nucleus)
|
338
|
+
end
|
339
|
+
|
340
|
+
def not_consonant_or_syllable_delimeter?
|
341
|
+
!consonant_or_syllable_delimeter?
|
342
|
+
end
|
343
|
+
|
344
|
+
def consonant_or_syllable_delimeter?
|
345
|
+
consonant? || syllable_delimeter?
|
346
|
+
end
|
347
|
+
|
348
|
+
def consonant?
|
349
|
+
consonants.include?(phoneme)
|
350
|
+
end
|
351
|
+
|
352
|
+
def consonants
|
353
|
+
phoneme_inventory[:consonants]
|
354
|
+
end
|
355
|
+
|
356
|
+
def syllable_delimeter?
|
357
|
+
phoneme == SYLLABLE_DELIMETER
|
358
|
+
end
|
359
|
+
|
360
|
+
def raise_invalid_phoneme_exception
|
361
|
+
raise(invalid_phoneme_error)
|
362
|
+
end
|
363
|
+
|
364
|
+
def invalid_phoneme_error
|
365
|
+
"Invalid phoneme: #{phoneme}"
|
366
|
+
end
|
367
|
+
|
368
|
+
def add_phoneme_to_coda_and_onset
|
369
|
+
@coda_and_onset << phoneme
|
370
|
+
end
|
371
|
+
|
372
|
+
def coda_and_onset_empty?
|
373
|
+
coda_and_onset.empty?
|
374
|
+
end
|
375
|
+
|
376
|
+
def create_syllable_from_coda_and_onset
|
377
|
+
create_syllable(coda_and_onset_joined)
|
378
|
+
end
|
379
|
+
|
380
|
+
def append_coda_and_onset_to_last_syllable
|
381
|
+
last_syllable.append_coda(coda_and_onset_joined)
|
382
|
+
end
|
383
|
+
|
384
|
+
def coda_and_onset_joined
|
385
|
+
join(coda_and_onset)
|
386
|
+
end
|
387
|
+
|
388
|
+
def process_remaining_coda_and_onset_if_present
|
389
|
+
process_remaining_coda_and_onset unless coda_and_onset_empty?
|
390
|
+
end
|
391
|
+
|
392
|
+
def process_remaining_coda_and_onset
|
393
|
+
if no_syllables?
|
394
|
+
create_syllable_from_coda_and_onset
|
395
|
+
else
|
396
|
+
append_coda_and_onset_to_last_syllable
|
397
|
+
end
|
398
|
+
end
|
399
|
+
|
400
|
+
def return_syllables
|
401
|
+
@syllables
|
402
|
+
end
|
403
|
+
end
|
404
|
+
end
|
data/lib/syllabify.rb
ADDED
data/syllabify.gemspec
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Gem::Specification.new do |s|
|
2
|
+
s.name = 'syllabify'
|
3
|
+
s.version = '1.0.0'
|
4
|
+
s.summary = 'A Ruby port of the Penn Phonetics Toolkit (P2TK) syllabifier.'
|
5
|
+
s.homepage = 'http://codyrobbins.com/software/syllabify'
|
6
|
+
s.author = 'Cody Robbins'
|
7
|
+
s.email = 'cody@codyrobbins.com'
|
8
|
+
|
9
|
+
s.post_install_message = '
|
10
|
+
-------------------------------------------------------------
|
11
|
+
Follow me on Twitter! http://twitter.com/codyrobbins
|
12
|
+
-------------------------------------------------------------
|
13
|
+
|
14
|
+
'
|
15
|
+
|
16
|
+
s.files = `git ls-files`.split
|
17
|
+
|
18
|
+
s.add_dependency('activesupport')
|
19
|
+
end
|
metadata
ADDED
@@ -0,0 +1,73 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: syllabify
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Cody Robbins
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2011-10-22 00:00:00.000000000Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: activesupport
|
16
|
+
requirement: &70147390060300 !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: *70147390060300
|
25
|
+
description:
|
26
|
+
email: cody@codyrobbins.com
|
27
|
+
executables: []
|
28
|
+
extensions: []
|
29
|
+
extra_rdoc_files: []
|
30
|
+
files:
|
31
|
+
- .gitignore
|
32
|
+
- .yardopts
|
33
|
+
- LICENSE
|
34
|
+
- README.markdown
|
35
|
+
- languages/en.yml
|
36
|
+
- lib/cody_robbins/syllabify.rb
|
37
|
+
- lib/cody_robbins/syllabify/syllable.rb
|
38
|
+
- lib/syllabify.rb
|
39
|
+
- syllabify.gemspec
|
40
|
+
homepage: http://codyrobbins.com/software/syllabify
|
41
|
+
licenses: []
|
42
|
+
post_install_message: ! '
|
43
|
+
|
44
|
+
-------------------------------------------------------------
|
45
|
+
|
46
|
+
Follow me on Twitter! http://twitter.com/codyrobbins
|
47
|
+
|
48
|
+
-------------------------------------------------------------
|
49
|
+
|
50
|
+
|
51
|
+
'
|
52
|
+
rdoc_options: []
|
53
|
+
require_paths:
|
54
|
+
- lib
|
55
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
56
|
+
none: false
|
57
|
+
requirements:
|
58
|
+
- - ! '>='
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: '0'
|
61
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
62
|
+
none: false
|
63
|
+
requirements:
|
64
|
+
- - ! '>='
|
65
|
+
- !ruby/object:Gem::Version
|
66
|
+
version: '0'
|
67
|
+
requirements: []
|
68
|
+
rubyforge_project:
|
69
|
+
rubygems_version: 1.8.10
|
70
|
+
signing_key:
|
71
|
+
specification_version: 3
|
72
|
+
summary: A Ruby port of the Penn Phonetics Toolkit (P2TK) syllabifier.
|
73
|
+
test_files: []
|