ssmd 0.5.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ssmd/__init__.py +189 -0
- ssmd/_version.py +34 -0
- ssmd/capabilities.py +277 -0
- ssmd/document.py +918 -0
- ssmd/formatter.py +244 -0
- ssmd/parser.py +1049 -0
- ssmd/parser_types.py +41 -0
- ssmd/py.typed +0 -0
- ssmd/segment.py +720 -0
- ssmd/sentence.py +270 -0
- ssmd/ssml_conversions.py +124 -0
- ssmd/ssml_parser.py +599 -0
- ssmd/types.py +122 -0
- ssmd/utils.py +333 -0
- ssmd/xsampa_to_ipa.txt +174 -0
- ssmd-0.5.3.dist-info/METADATA +1210 -0
- ssmd-0.5.3.dist-info/RECORD +20 -0
- ssmd-0.5.3.dist-info/WHEEL +5 -0
- ssmd-0.5.3.dist-info/licenses/LICENSE +21 -0
- ssmd-0.5.3.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,1210 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ssmd
|
|
3
|
+
Version: 0.5.3
|
|
4
|
+
Summary: Speech Synthesis Markdown (SSMD) is a lightweight alternative syntax for SSML.
|
|
5
|
+
Author-email: Holger Nahrstaedt <nahrstaedt@gmail.com>
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2026 Holger Nahrstaedt
|
|
9
|
+
|
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
11
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
12
|
+
in the Software without restriction, including without limitation the rights
|
|
13
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
14
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
15
|
+
furnished to do so, subject to the following conditions:
|
|
16
|
+
|
|
17
|
+
The above copyright notice and this permission notice shall be included in all
|
|
18
|
+
copies or substantial portions of the Software.
|
|
19
|
+
|
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
23
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
25
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
26
|
+
SOFTWARE.
|
|
27
|
+
|
|
28
|
+
Project-URL: Homepage, https://github.com/holgern/ssmd
|
|
29
|
+
Keywords: ssml,sssmd,tts,text-to-speech
|
|
30
|
+
Classifier: Development Status :: 4 - Beta
|
|
31
|
+
Classifier: Intended Audience :: Developers
|
|
32
|
+
Classifier: Intended Audience :: Science/Research
|
|
33
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
34
|
+
Classifier: Operating System :: OS Independent
|
|
35
|
+
Classifier: Programming Language :: Python :: 3
|
|
36
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
37
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
38
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
39
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
40
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
41
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
42
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
43
|
+
Classifier: Topic :: Text Processing :: Linguistic
|
|
44
|
+
Requires-Python: >=3.10
|
|
45
|
+
Description-Content-Type: text/markdown
|
|
46
|
+
License-File: LICENSE
|
|
47
|
+
Requires-Dist: phrasplit>=0.2.2
|
|
48
|
+
Provides-Extra: dev
|
|
49
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
50
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
51
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
52
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
53
|
+
Provides-Extra: spacy
|
|
54
|
+
Requires-Dist: phrasplit[nlp]>=0.2.2; extra == "spacy"
|
|
55
|
+
Dynamic: license-file
|
|
56
|
+
|
|
57
|
+
[](https://pypi.org/project/ssmd/)
|
|
58
|
+

|
|
59
|
+

|
|
60
|
+
[](https://codecov.io/gh/holgern/ssmd)
|
|
61
|
+
|
|
62
|
+
# SSMD - Speech Synthesis Markdown
|
|
63
|
+
|
|
64
|
+
**SSMD** (Speech Synthesis Markdown) is a lightweight Python library that provides a
|
|
65
|
+
human-friendly markdown-like syntax for creating SSML (Speech Synthesis Markup Language)
|
|
66
|
+
documents. It's designed to make TTS (Text-to-Speech) content more readable and
|
|
67
|
+
maintainable.
|
|
68
|
+
|
|
69
|
+
## Features
|
|
70
|
+
|
|
71
|
+
โจ **Markdown-like syntax** - More intuitive than raw SSML ๐ฏ **Full SSML support** -
|
|
72
|
+
All major SSML features covered ๐ **Bidirectional** - Convert SSMDโSSML or strip to
|
|
73
|
+
plain text ๐ **Document-centric** - Build, edit, and export TTS documents ๐๏ธ **TTS
|
|
74
|
+
capabilities** - Auto-filter features based on engine support ๐จ **Extensible** - Custom
|
|
75
|
+
extensions for platform-specific features ๐งช **Spec-driven** - Follows the official SSMD
|
|
76
|
+
specification
|
|
77
|
+
|
|
78
|
+
## Installation
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
pip install ssmd
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
SSMD includes intelligent sentence detection via **phrasplit** (regex mode by default -
|
|
85
|
+
fast and lightweight).
|
|
86
|
+
|
|
87
|
+
### Optional: Enhanced Accuracy with spaCy
|
|
88
|
+
|
|
89
|
+
For best sentence detection accuracy, especially with complex or informal text, install
|
|
90
|
+
spaCy support:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
pip install "ssmd[spacy]"
|
|
94
|
+
|
|
95
|
+
# Install language models for the languages you need
|
|
96
|
+
python -m spacy download en_core_web_sm # English (small, ~30MB)
|
|
97
|
+
python -m spacy download en_core_web_md # English (medium, better accuracy, ~100MB)
|
|
98
|
+
python -m spacy download en_core_web_lg # English (large, best accuracy, ~500MB)
|
|
99
|
+
python -m spacy download fr_core_news_sm # French
|
|
100
|
+
python -m spacy download de_core_news_sm # German
|
|
101
|
+
python -m spacy download es_core_news_sm # Spanish
|
|
102
|
+
# See https://spacy.io/models for all available models
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Performance comparison:**
|
|
106
|
+
|
|
107
|
+
| Mode | Speed | Accuracy | Size | Use Case |
|
|
108
|
+
| ---------------------- | ----------- | -------- | ------- | ----------------------------- |
|
|
109
|
+
| **Regex (default)** | ~60x faster | ~85-90% | 0 MB | Simple text, speed-critical |
|
|
110
|
+
| **spaCy small models** | Baseline | ~95% | ~30 MB | Balanced accuracy/performance |
|
|
111
|
+
| **spaCy large models** | Slower | ~98%+ | ~500 MB | Best accuracy, complex text |
|
|
112
|
+
| **spaCy transformer** | Slowest | ~99%+ | ~1 GB | Research, maximum quality |
|
|
113
|
+
|
|
114
|
+
Without spaCy, SSMD uses fast regex-based sentence splitting that works great for
|
|
115
|
+
well-formatted text. With spaCy, you get ML-powered detection for complex cases like
|
|
116
|
+
abbreviations, URLs, and informal writing.
|
|
117
|
+
|
|
118
|
+
Or install from source:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
git clone https://github.com/holgern/ssmd.git
|
|
122
|
+
cd ssmd
|
|
123
|
+
pip install -e .
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Quick Start
|
|
127
|
+
|
|
128
|
+
### Basic Usage
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
import ssmd
|
|
132
|
+
|
|
133
|
+
# Convert SSMD to SSML
|
|
134
|
+
ssml = ssmd.to_ssml("Hello *world*!")
|
|
135
|
+
print(ssml)
|
|
136
|
+
# Output: <speak>Hello <emphasis>world</emphasis>!</speak>
|
|
137
|
+
|
|
138
|
+
# Strip SSMD markup for plain text
|
|
139
|
+
plain = ssmd.to_text("Hello *world* @marker!")
|
|
140
|
+
print(plain)
|
|
141
|
+
# Output: Hello world!
|
|
142
|
+
|
|
143
|
+
# Convert SSML back to SSMD
|
|
144
|
+
ssmd_text = ssmd.from_ssml('<speak><emphasis>Hello</emphasis></speak>')
|
|
145
|
+
print(ssmd_text)
|
|
146
|
+
# Output: *Hello*
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Document API - Build TTS Content Incrementally
|
|
150
|
+
|
|
151
|
+
```python
|
|
152
|
+
from ssmd import Document
|
|
153
|
+
|
|
154
|
+
# Create a document and build it piece by piece
|
|
155
|
+
doc = Document()
|
|
156
|
+
doc.add_sentence("Hello and *welcome* to SSMD!")
|
|
157
|
+
doc.add_sentence("This is a great tool for TTS.")
|
|
158
|
+
doc.add_paragraph("Let's start a new paragraph here.")
|
|
159
|
+
|
|
160
|
+
# Export to different formats
|
|
161
|
+
ssml = doc.to_ssml() # SSML output
|
|
162
|
+
markdown = doc.to_ssmd() # SSMD markdown
|
|
163
|
+
text = doc.to_text() # Plain text
|
|
164
|
+
|
|
165
|
+
# Access document content
|
|
166
|
+
print(doc.ssmd) # Raw SSMD content
|
|
167
|
+
print(len(doc)) # Number of sentences
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### TTS Streaming Integration
|
|
171
|
+
|
|
172
|
+
Perfect for streaming TTS where you process sentences one at a time:
|
|
173
|
+
|
|
174
|
+
```python
|
|
175
|
+
from ssmd import Document
|
|
176
|
+
|
|
177
|
+
# Create document with configuration
|
|
178
|
+
doc = Document(
|
|
179
|
+
config={'auto_sentence_tags': True},
|
|
180
|
+
capabilities='pyttsx3' # Auto-filter for pyttsx3 support
|
|
181
|
+
)
|
|
182
|
+
|
|
183
|
+
# Build the document
|
|
184
|
+
doc.add_paragraph("# Chapter 1: Introduction")
|
|
185
|
+
doc.add_sentence("Welcome to the *amazing* world of SSMD!")
|
|
186
|
+
doc.add_sentence("This makes TTS content much easier to write.")
|
|
187
|
+
doc.add_paragraph("# Chapter 2: Features")
|
|
188
|
+
doc.add_sentence("You can use all kinds of markup.")
|
|
189
|
+
doc.add_sentence("Including ...500ms pauses and [special pronunciations](ph: speSl).")
|
|
190
|
+
|
|
191
|
+
# Iterate through sentences for TTS
|
|
192
|
+
for i, sentence in enumerate(doc.sentences(), 1):
|
|
193
|
+
print(f"Sentence {i}: {sentence}")
|
|
194
|
+
# Your TTS engine here:
|
|
195
|
+
# tts_engine.speak(sentence)
|
|
196
|
+
# await tts_engine.wait_until_done()
|
|
197
|
+
|
|
198
|
+
# Or access specific sentences
|
|
199
|
+
print(f"Total sentences: {len(doc)}")
|
|
200
|
+
print(f"First sentence: {doc[0]}")
|
|
201
|
+
print(f"Last sentence: {doc[-1]}")
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Document Editing
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
from ssmd import Document
|
|
208
|
+
|
|
209
|
+
# Load from existing content
|
|
210
|
+
doc = Document("First sentence. Second sentence. Third sentence.")
|
|
211
|
+
|
|
212
|
+
# Edit like a list
|
|
213
|
+
doc[0] = "Modified first sentence."
|
|
214
|
+
del doc[1] # Remove second sentence
|
|
215
|
+
|
|
216
|
+
# String operations
|
|
217
|
+
doc.replace("sentence", "line")
|
|
218
|
+
|
|
219
|
+
# Merge documents
|
|
220
|
+
doc2 = Document("Additional content.")
|
|
221
|
+
doc.merge(doc2)
|
|
222
|
+
|
|
223
|
+
# Split into individual sentences
|
|
224
|
+
sentences = doc.split() # Returns list of Document objects
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
### TTS Engine Capabilities
|
|
228
|
+
|
|
229
|
+
SSMD can automatically filter SSML features based on your TTS engine's capabilities.
|
|
230
|
+
This ensures compatibility by stripping unsupported tags to plain text.
|
|
231
|
+
|
|
232
|
+
#### Using Presets
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
from ssmd import Document
|
|
236
|
+
|
|
237
|
+
# Use a preset for your TTS engine
|
|
238
|
+
doc = Document("*Hello* [world](en)!", capabilities='pyttsx3')
|
|
239
|
+
ssml = doc.to_ssml()
|
|
240
|
+
|
|
241
|
+
# pyttsx3 doesn't support emphasis or language tags, so they're stripped:
|
|
242
|
+
# <speak>Hello world!</speak>
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**Available Presets:**
|
|
246
|
+
|
|
247
|
+
- `minimal` - Plain text only (no SSML)
|
|
248
|
+
- `pyttsx3` - Minimal support (basic prosody only)
|
|
249
|
+
- `espeak` - Moderate support (breaks, language, prosody, phonemes)
|
|
250
|
+
- `google` / `azure` / `microsoft` - Full SSML support
|
|
251
|
+
- `polly` / `amazon` - Full support + Amazon extensions (whisper, DRC)
|
|
252
|
+
- `full` - All features enabled
|
|
253
|
+
|
|
254
|
+
#### Custom Capabilities
|
|
255
|
+
|
|
256
|
+
```python
|
|
257
|
+
from ssmd import Document, TTSCapabilities
|
|
258
|
+
|
|
259
|
+
# Define exactly what your TTS supports
|
|
260
|
+
caps = TTSCapabilities(
|
|
261
|
+
emphasis=False, # No <emphasis> support
|
|
262
|
+
break_tags=True, # Supports <break>
|
|
263
|
+
paragraph=True, # Supports <p>
|
|
264
|
+
language=False, # No language switching
|
|
265
|
+
prosody=True, # Supports volume/rate/pitch
|
|
266
|
+
say_as=False, # No <say-as>
|
|
267
|
+
audio=False, # No audio files
|
|
268
|
+
mark=False, # No markers
|
|
269
|
+
)
|
|
270
|
+
|
|
271
|
+
doc = Document("*Hello* world!", capabilities=caps)
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
#### Capability-Aware Streaming
|
|
275
|
+
|
|
276
|
+
```python
|
|
277
|
+
from ssmd import Document
|
|
278
|
+
|
|
279
|
+
# Create document for specific TTS engine
|
|
280
|
+
doc = Document(capabilities='espeak')
|
|
281
|
+
|
|
282
|
+
# Build content with various features
|
|
283
|
+
doc.add_paragraph("# Welcome")
|
|
284
|
+
doc.add_sentence("*Hello* world!")
|
|
285
|
+
doc.add_sentence("[Bonjour](fr) everyone!")
|
|
286
|
+
|
|
287
|
+
# All sentences are filtered for eSpeak compatibility
|
|
288
|
+
for sentence in doc.sentences():
|
|
289
|
+
# Features eSpeak doesn't support are automatically removed
|
|
290
|
+
tts_engine.speak(sentence)
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
**Comparison of Engine Outputs:**
|
|
294
|
+
|
|
295
|
+
Same input: `*Hello* world... [this is loud](v: 5)!`
|
|
296
|
+
|
|
297
|
+
| Engine | Output |
|
|
298
|
+
| ------- | ------------------------------------------------------------------------------------------------------------------------ |
|
|
299
|
+
| minimal | `<speak>Hello world... this is loud!</speak>` |
|
|
300
|
+
| pyttsx3 | `<speak>Hello world... <prosody volume="x-loud">this is loud</prosody>!</speak>` |
|
|
301
|
+
| espeak | `<speak>Hello world<break time="1000ms"/> <prosody volume="x-loud">this is loud</prosody>!</speak>` |
|
|
302
|
+
| google | `<speak><emphasis>Hello</emphasis> world<break time="1000ms"/> <prosody volume="x-loud">this is loud</prosody>!</speak>` |
|
|
303
|
+
|
|
304
|
+
See `examples/tts_with_capabilities.py` for a complete demonstration.
|
|
305
|
+
|
|
306
|
+
## SSMD Syntax Reference
|
|
307
|
+
|
|
308
|
+
### Text & Emphasis
|
|
309
|
+
|
|
310
|
+
SSMD supports all four SSML emphasis levels:
|
|
311
|
+
|
|
312
|
+
```python
|
|
313
|
+
# Moderate emphasis (default)
|
|
314
|
+
ssmd.to_ssml("*emphasized text*")
|
|
315
|
+
# โ <speak><emphasis>emphasized text</emphasis></speak>
|
|
316
|
+
|
|
317
|
+
# Strong emphasis
|
|
318
|
+
ssmd.to_ssml("**very important**")
|
|
319
|
+
# โ <speak><emphasis level="strong">very important</emphasis></speak>
|
|
320
|
+
|
|
321
|
+
# Reduced emphasis (subtle)
|
|
322
|
+
ssmd.to_ssml("_less important_")
|
|
323
|
+
# โ <speak><emphasis level="reduced">less important</emphasis></speak>
|
|
324
|
+
|
|
325
|
+
# No emphasis (explicit, rarely used)
|
|
326
|
+
ssmd.to_ssml("[monotone](emphasis: none)")
|
|
327
|
+
# โ <speak><emphasis level="none">monotone</emphasis></speak>
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
### Breaks & Pauses
|
|
331
|
+
|
|
332
|
+
```python
|
|
333
|
+
# Specific time (required - bare ... is preserved as ellipsis)
|
|
334
|
+
ssmd.to_ssml("Hello ...500ms world")
|
|
335
|
+
ssmd.to_ssml("Hello ...2s world")
|
|
336
|
+
ssmd.to_ssml("Hello ...1s world")
|
|
337
|
+
|
|
338
|
+
# Strength-based
|
|
339
|
+
ssmd.to_ssml("Hello ...n world") # none
|
|
340
|
+
ssmd.to_ssml("Hello ...w world") # weak (x-weak)
|
|
341
|
+
ssmd.to_ssml("Hello ...c world") # comma (medium)
|
|
342
|
+
ssmd.to_ssml("Hello ...s world") # sentence (strong)
|
|
343
|
+
ssmd.to_ssml("Hello ...p world") # paragraph (x-strong)
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
### Paragraphs
|
|
347
|
+
|
|
348
|
+
```python
|
|
349
|
+
text = """First paragraph here.
|
|
350
|
+
Second line of first paragraph.
|
|
351
|
+
|
|
352
|
+
Second paragraph starts here."""
|
|
353
|
+
|
|
354
|
+
ssmd.to_ssml(text)
|
|
355
|
+
# โ <speak>First paragraph here.
|
|
356
|
+
# Second line of first paragraph.
|
|
357
|
+
# Second paragraph starts here.</speak>
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
### Language
|
|
361
|
+
|
|
362
|
+
```python
|
|
363
|
+
# Auto-complete language codes
|
|
364
|
+
ssmd.to_ssml('[Bonjour](fr) world')
|
|
365
|
+
# โ <speak><lang xml:lang="fr-FR">Bonjour</lang> world</speak>
|
|
366
|
+
|
|
367
|
+
# Explicit locale
|
|
368
|
+
ssmd.to_ssml('[Cheerio](en-GB)')
|
|
369
|
+
# โ <speak><lang xml:lang="en-GB">Cheerio</lang></speak>
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
### Voice Selection
|
|
373
|
+
|
|
374
|
+
SSMD supports two ways to specify voices: **inline annotations** for short phrases and
|
|
375
|
+
**block directives** for longer passages (ideal for dialogue and scripts).
|
|
376
|
+
|
|
377
|
+
#### Inline Voice Annotations
|
|
378
|
+
|
|
379
|
+
Perfect for short voice changes within a sentence:
|
|
380
|
+
|
|
381
|
+
```python
|
|
382
|
+
# Simple voice name
|
|
383
|
+
ssmd.to_ssml('[Hello](voice: Joanna)')
|
|
384
|
+
# โ <speak><voice name="Joanna">Hello</voice></speak>
|
|
385
|
+
|
|
386
|
+
# Cloud TTS voice name (e.g., Google Wavenet, AWS Polly)
|
|
387
|
+
ssmd.to_ssml('[Hello](voice: en-US-Wavenet-A)')
|
|
388
|
+
# โ <speak><voice name="en-US-Wavenet-A">Hello</voice></speak>
|
|
389
|
+
|
|
390
|
+
# Language and gender
|
|
391
|
+
ssmd.to_ssml('[Bonjour](voice: fr-FR, gender: female)')
|
|
392
|
+
# โ <speak><voice language="fr-FR" gender="female">Bonjour</voice></speak>
|
|
393
|
+
|
|
394
|
+
# All attributes (language, gender, variant)
|
|
395
|
+
ssmd.to_ssml('[Text](voice: en-GB, gender: male, variant: 1)')
|
|
396
|
+
# โ <speak><voice language="en-GB" gender="male" variant="1">Text</voice></speak>
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
#### Voice Directives (Block Syntax)
|
|
400
|
+
|
|
401
|
+
Perfect for dialogue, podcasts, and scripts with multiple speakers:
|
|
402
|
+
|
|
403
|
+
```python
|
|
404
|
+
# Use @voice: name or @voice(name) for clean dialogue formatting
|
|
405
|
+
script = """
|
|
406
|
+
@voice: af_sarah
|
|
407
|
+
Welcome to Tech Talk! I'm Sarah, and today we're diving into the fascinating
|
|
408
|
+
world of text-to-speech technology.
|
|
409
|
+
...s
|
|
410
|
+
|
|
411
|
+
@voice: am_michael
|
|
412
|
+
And I'm Michael! We've got an amazing episode lined up. The advances in neural
|
|
413
|
+
TTS have been incredible lately.
|
|
414
|
+
...s
|
|
415
|
+
|
|
416
|
+
@voice: af_sarah
|
|
417
|
+
So what are we covering today?
|
|
418
|
+
"""
|
|
419
|
+
|
|
420
|
+
ssmd.to_ssml(script)
|
|
421
|
+
# Each voice directive creates a separate voice block in SSML
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
**Voice directives support all voice attributes:**
|
|
425
|
+
|
|
426
|
+
```python
|
|
427
|
+
# Language and gender
|
|
428
|
+
multilingual = """
|
|
429
|
+
@voice: fr-FR, gender: female
|
|
430
|
+
Bonjour! Comment allez-vous aujourd'hui?
|
|
431
|
+
|
|
432
|
+
@voice: en-GB, gender: male
|
|
433
|
+
Hello there! Lovely weather we're having.
|
|
434
|
+
|
|
435
|
+
@voice: es-ES, gender: female, variant: 1
|
|
436
|
+
ยกHola! ยฟCรณmo estรกs?
|
|
437
|
+
"""
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
**Voice directive features:**
|
|
441
|
+
|
|
442
|
+
- Use `@voice: name` or `@voice(name)` syntax
|
|
443
|
+
- Supports all attributes: language, gender, variant
|
|
444
|
+
- Applies to all text until the next directive or paragraph break
|
|
445
|
+
- Automatically detected on SSMLโSSMD conversion for long voice blocks
|
|
446
|
+
- Much more readable than inline annotations for dialogue
|
|
447
|
+
|
|
448
|
+
**Mixing both styles:**
|
|
449
|
+
|
|
450
|
+
```python
|
|
451
|
+
# Block directive for main speaker, inline for interruptions
|
|
452
|
+
text = """
|
|
453
|
+
@voice: sarah
|
|
454
|
+
Hello everyone, [but wait!](voice: michael) Michael interrupts...
|
|
455
|
+
|
|
456
|
+
@voice: michael
|
|
457
|
+
Sorry, I had to jump in there!
|
|
458
|
+
"""
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
### Phonetic Pronunciation
|
|
462
|
+
|
|
463
|
+
```python
|
|
464
|
+
# X-SAMPA notation (converted to IPA automatically)
|
|
465
|
+
ssmd.to_ssml('[tomato](ph: t@meItoU)')
|
|
466
|
+
|
|
467
|
+
# Direct IPA
|
|
468
|
+
ssmd.to_ssml('[tomato](ipa: tษหmeษชtoส)')
|
|
469
|
+
|
|
470
|
+
# Output: <speak><phoneme alphabet="ipa" ph="tษหmeษชtoส">tomato</phoneme></speak>
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### Prosody (Volume, Rate, Pitch)
|
|
474
|
+
|
|
475
|
+
#### Shorthand Notation
|
|
476
|
+
|
|
477
|
+
```python
|
|
478
|
+
# Volume
|
|
479
|
+
ssmd.to_ssml("~silent~") # silent
|
|
480
|
+
ssmd.to_ssml("--whisper--") # x-soft
|
|
481
|
+
ssmd.to_ssml("-soft-") # soft
|
|
482
|
+
ssmd.to_ssml("+loud+") # loud
|
|
483
|
+
ssmd.to_ssml("++very loud++") # x-loud
|
|
484
|
+
|
|
485
|
+
# Rate
|
|
486
|
+
ssmd.to_ssml("<<very slow<<") # x-slow
|
|
487
|
+
ssmd.to_ssml("<slow<") # slow
|
|
488
|
+
ssmd.to_ssml(">fast>") # fast
|
|
489
|
+
ssmd.to_ssml(">>very fast>>") # x-fast
|
|
490
|
+
|
|
491
|
+
# Pitch
|
|
492
|
+
ssmd.to_ssml("__very low__") # x-low
|
|
493
|
+
ssmd.to_ssml("_low_") # low
|
|
494
|
+
ssmd.to_ssml("^high^") # high
|
|
495
|
+
ssmd.to_ssml("^^very high^^") # x-high
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
#### Explicit Notation
|
|
499
|
+
|
|
500
|
+
```python
|
|
501
|
+
# Combined (volume, rate, pitch)
|
|
502
|
+
ssmd.to_ssml('[loud and fast](vrp: 555)')
|
|
503
|
+
# โ <prosody volume="x-loud" rate="x-fast" pitch="x-high">loud and fast</prosody>
|
|
504
|
+
|
|
505
|
+
# Individual attributes
|
|
506
|
+
ssmd.to_ssml('[text](v: 5, r: 3, p: 1)')
|
|
507
|
+
# โ <prosody volume="x-loud" rate="medium" pitch="x-low">text</prosody>
|
|
508
|
+
|
|
509
|
+
# Relative values
|
|
510
|
+
ssmd.to_ssml('[louder](v: +10dB)')
|
|
511
|
+
ssmd.to_ssml('[higher](p: +20%)')
|
|
512
|
+
```
|
|
513
|
+
|
|
514
|
+
### Substitution (Aliases)
|
|
515
|
+
|
|
516
|
+
```python
|
|
517
|
+
ssmd.to_ssml('[H2O](sub: water)')
|
|
518
|
+
# โ <speak><sub alias="water">H2O</sub></speak>
|
|
519
|
+
|
|
520
|
+
ssmd.to_ssml('[AWS](sub: Amazon Web Services)')
|
|
521
|
+
# โ <speak><sub alias="Amazon Web Services">AWS</sub></speak>
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Say-As
|
|
525
|
+
|
|
526
|
+
```python
|
|
527
|
+
# Telephone numbers
|
|
528
|
+
ssmd.to_ssml('[+1-555-0123](as: telephone)')
|
|
529
|
+
|
|
530
|
+
# Dates with format
|
|
531
|
+
ssmd.to_ssml('[31.12.2024](as: date, format: "dd.mm.yyyy")')
|
|
532
|
+
|
|
533
|
+
# Say-as with detail attribute (for verbosity control)
|
|
534
|
+
ssmd.to_ssml('[123](as: cardinal, detail: 2)')
|
|
535
|
+
# โ <speak><say-as interpret-as="cardinal" detail="2">123</say-as></speak>
|
|
536
|
+
|
|
537
|
+
ssmd.to_ssml('[12/31/2024](as: date, format: "mdy", detail: 1)')
|
|
538
|
+
# โ <speak><say-as interpret-as="date" format="mdy" detail="1">12/31/2024</say-as></speak>
|
|
539
|
+
|
|
540
|
+
# Spell out
|
|
541
|
+
ssmd.to_ssml('[NASA](as: character)')
|
|
542
|
+
|
|
543
|
+
# Numbers
|
|
544
|
+
ssmd.to_ssml('[123](as: cardinal)')
|
|
545
|
+
ssmd.to_ssml('[1st](as: ordinal)')
|
|
546
|
+
|
|
547
|
+
# Expletives (beeped)
|
|
548
|
+
ssmd.to_ssml('[damn](as: expletive)')
|
|
549
|
+
```
|
|
550
|
+
|
|
551
|
+
### Audio Files
|
|
552
|
+
|
|
553
|
+
```python
|
|
554
|
+
# Basic audio with description
|
|
555
|
+
ssmd.to_ssml('[doorbell](https://example.com/sounds/bell.mp3)')
|
|
556
|
+
# โ <audio src="https://example.com/sounds/bell.mp3"><desc>doorbell</desc></audio>
|
|
557
|
+
|
|
558
|
+
# With fallback text
|
|
559
|
+
ssmd.to_ssml('[cat purring](cat.ogg Sound file not loaded)')
|
|
560
|
+
# โ <audio src="cat.ogg"><desc>cat purring</desc>Sound file not loaded</audio>
|
|
561
|
+
|
|
562
|
+
# No description
|
|
563
|
+
ssmd.to_ssml('[](beep.mp3)')
|
|
564
|
+
# โ <audio src="beep.mp3"></audio>
|
|
565
|
+
|
|
566
|
+
# Advanced audio attributes
|
|
567
|
+
# Clip audio (play from 5s to 30s)
|
|
568
|
+
ssmd.to_ssml('[music](song.mp3 clip: 5s-30s)')
|
|
569
|
+
# โ <audio src="song.mp3" clipBegin="5s" clipEnd="30s"><desc>music</desc></audio>
|
|
570
|
+
|
|
571
|
+
# Speed control
|
|
572
|
+
ssmd.to_ssml('[announcement](speech.mp3 speed: 150%)')
|
|
573
|
+
# โ <audio src="speech.mp3" speed="150%"><desc>announcement</desc></audio>
|
|
574
|
+
|
|
575
|
+
# Repeat count
|
|
576
|
+
ssmd.to_ssml('[jingle](ad.mp3 repeat: 3)')
|
|
577
|
+
# โ <audio src="ad.mp3" repeatCount="3"><desc>jingle</desc></audio>
|
|
578
|
+
|
|
579
|
+
# Volume level
|
|
580
|
+
ssmd.to_ssml('[alarm](alert.mp3 level: +6dB)')
|
|
581
|
+
# โ <audio src="alert.mp3" soundLevel="+6dB"><desc>alarm</desc></audio>
|
|
582
|
+
|
|
583
|
+
# Combine multiple attributes with fallback text
|
|
584
|
+
ssmd.to_ssml('[background](music.mp3 clip: 0s-10s, speed: 120%, level: -3dB Fallback text)')
|
|
585
|
+
# โ <audio src="music.mp3" clipBegin="0s" clipEnd="10s" speed="120%" soundLevel="-3dB">
|
|
586
|
+
# <desc>background</desc>Fallback text</audio>
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
### Markers
|
|
590
|
+
|
|
591
|
+
```python
|
|
592
|
+
ssmd.to_ssml('I always wanted a @animal cat as a pet.')
|
|
593
|
+
# โ <speak>I always wanted a <mark name="animal"/> cat as a pet.</speak>
|
|
594
|
+
|
|
595
|
+
# Markers are removed in plain text (with smart whitespace handling)
|
|
596
|
+
ssmd.to_text('word @marker word')
|
|
597
|
+
# โ "word word" (not "word word")
|
|
598
|
+
```
|
|
599
|
+
|
|
600
|
+
### Headings
|
|
601
|
+
|
|
602
|
+
```python
|
|
603
|
+
doc = Document(config={
|
|
604
|
+
'heading_levels': {
|
|
605
|
+
1: [('pause_before', '300ms'), ('emphasis', 'strong'), ('pause', '300ms')],
|
|
606
|
+
2: [('pause_before', '75ms'), ('emphasis', 'moderate'), ('pause', '75ms')],
|
|
607
|
+
3: [('pause_before', '50ms'), ('prosody', {'rate': 'slow'}), ('pause', '50ms')],
|
|
608
|
+
}
|
|
609
|
+
})
|
|
610
|
+
|
|
611
|
+
doc.add("""
|
|
612
|
+
# Chapter 1
|
|
613
|
+
## Section 1.1
|
|
614
|
+
### Subsection
|
|
615
|
+
""")
|
|
616
|
+
|
|
617
|
+
ssml = doc.to_ssml()
|
|
618
|
+
```
|
|
619
|
+
|
|
620
|
+
### Extensions (Platform-Specific)
|
|
621
|
+
|
|
622
|
+
```python
|
|
623
|
+
# Amazon Polly whisper effect
|
|
624
|
+
ssmd.to_ssml('[whispered text](ext: whisper)')
|
|
625
|
+
# โ <speak><amazon:effect name="whispered">whispered text</amazon:effect></speak>
|
|
626
|
+
|
|
627
|
+
# Custom extensions
|
|
628
|
+
doc = Document(config={
|
|
629
|
+
'extensions': {
|
|
630
|
+
'custom': lambda text: f'<custom-tag>{text}</custom-tag>'
|
|
631
|
+
}
|
|
632
|
+
})
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
#### Google Cloud TTS Speaking Styles
|
|
636
|
+
|
|
637
|
+
Google Cloud TTS supports speaking styles via the `google:style` extension. You can use
|
|
638
|
+
SSMD's extension system to add these styles:
|
|
639
|
+
|
|
640
|
+
```python
|
|
641
|
+
from ssmd import Document
|
|
642
|
+
|
|
643
|
+
# Configure Google TTS styles
|
|
644
|
+
doc = Document(config={
|
|
645
|
+
'extensions': {
|
|
646
|
+
'cheerful': lambda text: f'<google:style name="cheerful">{text}</google:style>',
|
|
647
|
+
'calm': lambda text: f'<google:style name="calm">{text}</google:style>',
|
|
648
|
+
'empathetic': lambda text: f'<google:style name="empathetic">{text}</google:style>',
|
|
649
|
+
'apologetic': lambda text: f'<google:style name="apologetic">{text}</google:style>',
|
|
650
|
+
'firm': lambda text: f'<google:style name="firm">{text}</google:style>',
|
|
651
|
+
}
|
|
652
|
+
})
|
|
653
|
+
|
|
654
|
+
# Use styles in your content
|
|
655
|
+
doc.add_sentence("[Welcome to our service!](ext: cheerful)")
|
|
656
|
+
doc.add_sentence("[We apologize for the inconvenience.](ext: apologetic)")
|
|
657
|
+
doc.add_sentence("[Please remain calm.](ext: calm)")
|
|
658
|
+
|
|
659
|
+
ssml = doc.to_ssml()
|
|
660
|
+
# โ <speak>
|
|
661
|
+
# <google:style name="cheerful">Welcome to our service!</google:style>
|
|
662
|
+
# <google:style name="apologetic">We apologize for the inconvenience.</google:style>
|
|
663
|
+
# <google:style name="calm">Please remain calm.</google:style>
|
|
664
|
+
# </speak>
|
|
665
|
+
```
|
|
666
|
+
|
|
667
|
+
**Available Google TTS Styles:**
|
|
668
|
+
|
|
669
|
+
- `cheerful` - Upbeat and positive tone
|
|
670
|
+
- `calm` - Relaxed and soothing tone
|
|
671
|
+
- `empathetic` - Understanding and compassionate tone
|
|
672
|
+
- `apologetic` - Sorry and regretful tone
|
|
673
|
+
- `firm` - Confident and authoritative tone
|
|
674
|
+
- `news` - Professional news anchor tone
|
|
675
|
+
- `conversational` - Natural conversation tone
|
|
676
|
+
|
|
677
|
+
**Note:** These styles are only supported by specific Google Cloud TTS voices (typically
|
|
678
|
+
Neural2 and Studio voices). See the
|
|
679
|
+
[Google Cloud TTS documentation](https://cloud.google.com/text-to-speech/docs/speaking-styles)
|
|
680
|
+
for voice compatibility.
|
|
681
|
+
|
|
682
|
+
For a complete example, see `examples/google_tts_styles.py`:
|
|
683
|
+
|
|
684
|
+
```bash
|
|
685
|
+
python examples/google_tts_styles.py
|
|
686
|
+
```
|
|
687
|
+
|
|
688
|
+
## Parser API - Extract Structured Data
|
|
689
|
+
|
|
690
|
+
The SSMD parser provides an alternative to SSML generation by extracting structured
|
|
691
|
+
segments from SSMD text. This is useful when you need programmatic control over SSMD
|
|
692
|
+
features or want to build custom TTS pipelines.
|
|
693
|
+
|
|
694
|
+
### When to Use the Parser
|
|
695
|
+
|
|
696
|
+
- **Custom TTS integration** - Process SSMD features programmatically
|
|
697
|
+
- **Text transformations** - Handle say-as, substitution, and phoneme conversions
|
|
698
|
+
- **Multi-voice dialogue** - Build voice-specific processing pipelines
|
|
699
|
+
- **Feature extraction** - Analyze SSMD content without generating SSML
|
|
700
|
+
|
|
701
|
+
### Quick Example
|
|
702
|
+
|
|
703
|
+
```python
|
|
704
|
+
from ssmd import parse_sentences
|
|
705
|
+
|
|
706
|
+
script = """
|
|
707
|
+
@voice: sarah
|
|
708
|
+
Hello! Call [+1-555-0123](as: telephone) for info.
|
|
709
|
+
[H2O](sub: water) is important.
|
|
710
|
+
|
|
711
|
+
@voice: michael
|
|
712
|
+
Thanks *Sarah*!
|
|
713
|
+
"""
|
|
714
|
+
|
|
715
|
+
# Parse into structured sentences
|
|
716
|
+
sentences = parse_sentences(script)
|
|
717
|
+
|
|
718
|
+
for sentence in sentences:
|
|
719
|
+
# Get voice configuration
|
|
720
|
+
voice_name = sentence.voice.name if sentence.voice else "default"
|
|
721
|
+
|
|
722
|
+
# Process each segment
|
|
723
|
+
full_text = ""
|
|
724
|
+
for seg in sentence.segments:
|
|
725
|
+
# Handle text transformations
|
|
726
|
+
if seg.say_as:
|
|
727
|
+
# Your TTS engine converts based on interpret_as
|
|
728
|
+
text = convert_say_as(seg.text, seg.say_as.interpret_as)
|
|
729
|
+
elif seg.substitution:
|
|
730
|
+
# Use substitution text instead of original
|
|
731
|
+
text = seg.substitution
|
|
732
|
+
elif seg.phoneme:
|
|
733
|
+
# Use phoneme for pronunciation
|
|
734
|
+
text = seg.text # TTS engine handles phoneme
|
|
735
|
+
else:
|
|
736
|
+
text = seg.text
|
|
737
|
+
|
|
738
|
+
full_text += text
|
|
739
|
+
|
|
740
|
+
# Speak the complete sentence
|
|
741
|
+
tts.speak(full_text, voice=voice_name)
|
|
742
|
+
```
|
|
743
|
+
|
|
744
|
+
### Parser Functions
|
|
745
|
+
|
|
746
|
+
#### `parse_sentences(text, **options)` โ `list[Sentence]`
|
|
747
|
+
|
|
748
|
+
Parse SSMD text into structured sentences with segments.
|
|
749
|
+
|
|
750
|
+
> **Note:** `SSMDSentence` is a backward-compatibility alias for `Sentence`.
|
|
751
|
+
|
|
752
|
+
**Parameters:**
|
|
753
|
+
|
|
754
|
+
- `text` (str): SSMD text to parse
|
|
755
|
+
- `sentence_detection` (bool): Split text into sentences (default: True)
|
|
756
|
+
- `include_default_voice` (bool): Include text before first @voice directive (default:
|
|
757
|
+
True)
|
|
758
|
+
- `capabilities` (TTSCapabilities | str): Filter features based on TTS engine support
|
|
759
|
+
- `language` (str): Language code for sentence detection (default: "en")
|
|
760
|
+
- `model_size` (str): spaCy model size - "sm", "md", "lg", "trf" (default: "sm")
|
|
761
|
+
- `spacy_model` (str): Custom spaCy model name (overrides model_size)
|
|
762
|
+
- `use_spacy` (bool): If False, use fast regex splitting instead of spaCy (default:
|
|
763
|
+
True)
|
|
764
|
+
|
|
765
|
+
**Returns:** List of `Sentence` objects (alias: `SSMDSentence`)
|
|
766
|
+
|
|
767
|
+
**Example:**
|
|
768
|
+
|
|
769
|
+
```python
|
|
770
|
+
from ssmd import parse_sentences
|
|
771
|
+
|
|
772
|
+
# Default: uses small spaCy models (en_core_web_sm)
|
|
773
|
+
sentences = parse_sentences("Hello *world*! This is great.")
|
|
774
|
+
|
|
775
|
+
for sent in sentences:
|
|
776
|
+
print(f"Voice: {sent.voice.name if sent.voice else 'default'}")
|
|
777
|
+
print(f"Segments: {len(sent.segments)}")
|
|
778
|
+
for seg in sent.segments:
|
|
779
|
+
print(f" - {seg.text!r} (emphasis={seg.emphasis})")
|
|
780
|
+
|
|
781
|
+
# Fast mode: no spaCy required (uses regex)
|
|
782
|
+
sentences = parse_sentences("Hello world. Fast mode.", use_spacy=False)
|
|
783
|
+
|
|
784
|
+
# High quality: use large spaCy model for better accuracy
|
|
785
|
+
sentences = parse_sentences("Complex text here.", model_size="lg")
|
|
786
|
+
|
|
787
|
+
# Custom model: use domain-specific spaCy model
|
|
788
|
+
sentences = parse_sentences("Medical text.", spacy_model="en_core_sci_md")
|
|
789
|
+
```
|
|
790
|
+
|
|
791
|
+
**Sentence Detection Configuration:**
|
|
792
|
+
|
|
793
|
+
SSMD supports flexible sentence detection with quality/speed tradeoffs:
|
|
794
|
+
|
|
795
|
+
- **Fast mode** (`use_spacy=False`): Regex-based splitting, no dependencies, ~60x faster
|
|
796
|
+
- **Auto-detect** (default): Uses spaCy if installed, falls back to regex
|
|
797
|
+
- **Small models** (`model_size="sm"`): Best balance of speed and accuracy
|
|
798
|
+
- **Medium models** (`model_size="md"`): Better accuracy for complex text
|
|
799
|
+
- **Large models** (`model_size="lg"`): Best accuracy, slower
|
|
800
|
+
- **Transformer models** (`model_size="trf"`): Research-grade accuracy, slowest
|
|
801
|
+
|
|
802
|
+
The parser works out-of-the-box with fast regex mode. Install `ssmd[spacy]` and language
|
|
803
|
+
models for ML-powered accuracy.
|
|
804
|
+
|
|
805
|
+
**Installation note:** Larger spaCy models need manual installation:
|
|
806
|
+
|
|
807
|
+
```bash
|
|
808
|
+
# First install spaCy support
|
|
809
|
+
pip install "ssmd[spacy]"
|
|
810
|
+
|
|
811
|
+
# Then install models
|
|
812
|
+
python -m spacy download en_core_web_md
|
|
813
|
+
python -m spacy download fr_core_news_md
|
|
814
|
+
|
|
815
|
+
# Large models
|
|
816
|
+
python -m spacy download en_core_web_lg
|
|
817
|
+
|
|
818
|
+
# Transformer models
|
|
819
|
+
python -m spacy download en_core_web_trf
|
|
820
|
+
```
|
|
821
|
+
|
|
822
|
+
#### `parse_segments(text, **options)` โ `list[Segment]`
|
|
823
|
+
|
|
824
|
+
Parse SSMD text into segments without sentence grouping.
|
|
825
|
+
|
|
826
|
+
> **Note:** `SSMDSegment` is a backward-compatibility alias for `Segment`.
|
|
827
|
+
|
|
828
|
+
**Parameters:**
|
|
829
|
+
|
|
830
|
+
- `text` (str): SSMD text to parse
|
|
831
|
+
- `capabilities` (TTSCapabilities | str): Filter features based on TTS engine support
|
|
832
|
+
- `voice_context` (VoiceAttrs | None): Voice context for the segments (optional)
|
|
833
|
+
|
|
834
|
+
**Returns:** List of `Segment` objects (alias: `SSMDSegment`)
|
|
835
|
+
|
|
836
|
+
**Example:**
|
|
837
|
+
|
|
838
|
+
```python
|
|
839
|
+
from ssmd import parse_segments
|
|
840
|
+
|
|
841
|
+
segments = parse_segments("Call [+1-555-0123](as: telephone) now")
|
|
842
|
+
|
|
843
|
+
for seg in segments:
|
|
844
|
+
if seg.say_as:
|
|
845
|
+
print(f"Say-as: {seg.text!r} as {seg.say_as.interpret_as}")
|
|
846
|
+
```
|
|
847
|
+
|
|
848
|
+
#### `parse_voice_blocks(text)` โ `list[tuple[VoiceAttrs | None, str]]`
|
|
849
|
+
|
|
850
|
+
Split text by voice directives.
|
|
851
|
+
|
|
852
|
+
**Returns:** List of (voice_attrs, text) tuples
|
|
853
|
+
|
|
854
|
+
**Example:**
|
|
855
|
+
|
|
856
|
+
```python
|
|
857
|
+
from ssmd import parse_voice_blocks
|
|
858
|
+
|
|
859
|
+
blocks = parse_voice_blocks("""
|
|
860
|
+
@voice: sarah
|
|
861
|
+
Hello from Sarah
|
|
862
|
+
|
|
863
|
+
@voice: michael
|
|
864
|
+
Hello from Michael
|
|
865
|
+
""")
|
|
866
|
+
|
|
867
|
+
for voice, text in blocks:
|
|
868
|
+
print(f"{voice.name}: {text.strip()}")
|
|
869
|
+
```
|
|
870
|
+
|
|
871
|
+
### Data Structures
|
|
872
|
+
|
|
873
|
+
#### `Sentence` (alias: `SSMDSentence`)
|
|
874
|
+
|
|
875
|
+
Represents a complete sentence with voice context.
|
|
876
|
+
|
|
877
|
+
**Attributes:**
|
|
878
|
+
|
|
879
|
+
- `segments` (list[Segment]): List of text segments
|
|
880
|
+
- `voice` (VoiceAttrs | None): Voice configuration
|
|
881
|
+
- `is_paragraph_end` (bool): Whether sentence ends a paragraph
|
|
882
|
+
- `breaks_after` (list[BreakAttrs]): Pauses after the sentence
|
|
883
|
+
|
|
884
|
+
#### `Segment` (alias: `SSMDSegment`)
|
|
885
|
+
|
|
886
|
+
Represents a text segment with metadata.
|
|
887
|
+
|
|
888
|
+
**Attributes:**
|
|
889
|
+
|
|
890
|
+
- `text` (str): The text content
|
|
891
|
+
- `emphasis` (bool | str): Emphasis level (True, "moderate", "strong", "reduced",
|
|
892
|
+
"none")
|
|
893
|
+
- `prosody` (ProsodyAttrs | None): Volume, rate, pitch
|
|
894
|
+
- `language` (str | None): Language code (e.g., "fr-FR")
|
|
895
|
+
- `voice` (VoiceAttrs | None): Inline voice settings
|
|
896
|
+
- `say_as` (SayAsAttrs | None): Say-as interpretation
|
|
897
|
+
- `substitution` (str | None): Substitution text
|
|
898
|
+
- `phoneme` (PhonemeAttrs | None): Phonetic pronunciation (with `ph` and `alphabet`
|
|
899
|
+
attributes)
|
|
900
|
+
- `audio` (AudioAttrs | None): Audio file info
|
|
901
|
+
- `extension` (str | None): Platform-specific extension name
|
|
902
|
+
- `breaks_before` (list[BreakAttrs]): Pauses before this segment
|
|
903
|
+
- `breaks_after` (list[BreakAttrs]): Pauses after this segment
|
|
904
|
+
- `marks_before` (list[str]): Marker names before this segment
|
|
905
|
+
- `marks_after` (list[str]): Marker names after this segment
|
|
906
|
+
|
|
907
|
+
#### `VoiceAttrs`
|
|
908
|
+
|
|
909
|
+
Voice configuration attributes.
|
|
910
|
+
|
|
911
|
+
**Attributes:**
|
|
912
|
+
|
|
913
|
+
- `name` (str | None): Voice name (e.g., "sarah", "en-US-Wavenet-A")
|
|
914
|
+
- `language` (str | None): Language code (e.g., "en-US")
|
|
915
|
+
- `gender` (str | None): Gender ("male", "female", "neutral")
|
|
916
|
+
- `variant` (int | None): Voice variant number
|
|
917
|
+
|
|
918
|
+
#### `ProsodyAttrs`
|
|
919
|
+
|
|
920
|
+
Prosody (volume, rate, pitch) attributes.
|
|
921
|
+
|
|
922
|
+
**Attributes:**
|
|
923
|
+
|
|
924
|
+
- `volume` (str | None): Volume level (e.g., "x-loud", "+10dB")
|
|
925
|
+
- `rate` (str | None): Speech rate (e.g., "fast", "120%")
|
|
926
|
+
- `pitch` (str | None): Pitch level (e.g., "high", "+20%")
|
|
927
|
+
|
|
928
|
+
#### `BreakAttrs`
|
|
929
|
+
|
|
930
|
+
Pause/break attributes.
|
|
931
|
+
|
|
932
|
+
**Attributes:**
|
|
933
|
+
|
|
934
|
+
- `time` (str | None): Break duration (e.g., "500ms", "2s")
|
|
935
|
+
- `strength` (str | None): Break strength (e.g., "weak", "strong")
|
|
936
|
+
|
|
937
|
+
#### `SayAsAttrs`
|
|
938
|
+
|
|
939
|
+
Say-as interpretation attributes.
|
|
940
|
+
|
|
941
|
+
**Attributes:**
|
|
942
|
+
|
|
943
|
+
- `interpret_as` (str): Interpretation type (e.g., "telephone", "date")
|
|
944
|
+
- `format` (str | None): Format string (e.g., "mdy" for dates)
|
|
945
|
+
- `detail` (int | None): Verbosity level (1-2, platform-specific)
|
|
946
|
+
|
|
947
|
+
#### `AudioAttrs`
|
|
948
|
+
|
|
949
|
+
Audio file attributes.
|
|
950
|
+
|
|
951
|
+
**Attributes:**
|
|
952
|
+
|
|
953
|
+
- `src` (str): Audio file URL
|
|
954
|
+
- `alt_text` (str | None): Alternative text if audio fails
|
|
955
|
+
- `clip_begin` (str | None): Start time for audio clip (e.g., "5s")
|
|
956
|
+
- `clip_end` (str | None): End time for audio clip (e.g., "30s")
|
|
957
|
+
- `speed` (str | None): Playback speed (e.g., "150%")
|
|
958
|
+
- `repeat_count` (int | None): Number of times to repeat
|
|
959
|
+
- `repeat_dur` (str | None): Duration to repeat (e.g., "10s")
|
|
960
|
+
- `sound_level` (str | None): Volume adjustment (e.g., "+6dB", "-3dB")
|
|
961
|
+
|
|
962
|
+
### Complete Example
|
|
963
|
+
|
|
964
|
+
See `examples/parser_demo.py` for a comprehensive demonstration of all parser features:
|
|
965
|
+
|
|
966
|
+
```bash
|
|
967
|
+
python examples/parser_demo.py
|
|
968
|
+
```
|
|
969
|
+
|
|
970
|
+
The demo shows:
|
|
971
|
+
|
|
972
|
+
- Basic segment parsing
|
|
973
|
+
- Text transformations (say-as, substitution, phoneme)
|
|
974
|
+
- Voice block handling
|
|
975
|
+
- Complete TTS workflow with sentence assembly
|
|
976
|
+
- Prosody and language annotations
|
|
977
|
+
- Advanced sentence parsing options
|
|
978
|
+
- Mock TTS integration
|
|
979
|
+
|
|
980
|
+
## API Reference
|
|
981
|
+
|
|
982
|
+
### Module Functions
|
|
983
|
+
|
|
984
|
+
#### `ssmd.to_ssml(ssmd_text, **config)` โ `str`
|
|
985
|
+
|
|
986
|
+
Convert SSMD markup to SSML.
|
|
987
|
+
|
|
988
|
+
**Parameters:**
|
|
989
|
+
|
|
990
|
+
- `ssmd_text` (str): SSMD markdown text
|
|
991
|
+
- `**config`: Optional configuration parameters
|
|
992
|
+
|
|
993
|
+
**Returns:** SSML string
|
|
994
|
+
|
|
995
|
+
#### `ssmd.to_text(ssmd_text, **config)` โ `str`
|
|
996
|
+
|
|
997
|
+
Convert SSMD to plain text (strips all markup).
|
|
998
|
+
|
|
999
|
+
**Parameters:**
|
|
1000
|
+
|
|
1001
|
+
- `ssmd_text` (str): SSMD markdown text
|
|
1002
|
+
- `**config`: Optional configuration parameters
|
|
1003
|
+
|
|
1004
|
+
**Returns:** Plain text string
|
|
1005
|
+
|
|
1006
|
+
#### `ssmd.from_ssml(ssml_text, **config)` โ `str`
|
|
1007
|
+
|
|
1008
|
+
Convert SSML to SSMD format.
|
|
1009
|
+
|
|
1010
|
+
**Parameters:**
|
|
1011
|
+
|
|
1012
|
+
- `ssml_text` (str): SSML XML string
|
|
1013
|
+
- `**config`: Optional configuration parameters
|
|
1014
|
+
|
|
1015
|
+
**Returns:** SSMD markdown string
|
|
1016
|
+
|
|
1017
|
+
### Document Class
|
|
1018
|
+
|
|
1019
|
+
#### `Document(content="", config=None, capabilities=None)`
|
|
1020
|
+
|
|
1021
|
+
Main document container for building and managing TTS content.
|
|
1022
|
+
|
|
1023
|
+
**Parameters:**
|
|
1024
|
+
|
|
1025
|
+
- `content` (str): Optional initial SSMD content
|
|
1026
|
+
- `config` (dict): Configuration options
|
|
1027
|
+
- `capabilities` (TTSCapabilities | str): TTS capabilities preset or object
|
|
1028
|
+
|
|
1029
|
+
**Building Methods:**
|
|
1030
|
+
|
|
1031
|
+
- `add(text)` โ Add text without separator (returns self for chaining)
|
|
1032
|
+
- `add_sentence(text)` โ Add text with `\n` separator
|
|
1033
|
+
- `add_paragraph(text)` โ Add text with `\n\n` separator
|
|
1034
|
+
|
|
1035
|
+
**Export Methods:**
|
|
1036
|
+
|
|
1037
|
+
- `to_ssml()` โ Export to SSML string
|
|
1038
|
+
- `to_ssmd()` โ Export to SSMD string
|
|
1039
|
+
- `to_text()` โ Export to plain text
|
|
1040
|
+
|
|
1041
|
+
**Class Methods:**
|
|
1042
|
+
|
|
1043
|
+
- `Document.from_ssml(ssml, **config)` โ Create from SSML
|
|
1044
|
+
- `Document.from_text(text, **config)` โ Create from text
|
|
1045
|
+
|
|
1046
|
+
**Properties:**
|
|
1047
|
+
|
|
1048
|
+
- `ssmd` โ Raw SSMD content
|
|
1049
|
+
- `config` โ Configuration dict
|
|
1050
|
+
- `capabilities` โ TTS capabilities
|
|
1051
|
+
|
|
1052
|
+
**List-like Interface:**
|
|
1053
|
+
|
|
1054
|
+
- `len(doc)` โ Number of sentences
|
|
1055
|
+
- `doc[i]` โ Get sentence by index (SSML)
|
|
1056
|
+
- `doc[i] = text` โ Replace sentence
|
|
1057
|
+
- `del doc[i]` โ Delete sentence
|
|
1058
|
+
- `doc += text` โ Append content
|
|
1059
|
+
|
|
1060
|
+
**Iteration:**
|
|
1061
|
+
|
|
1062
|
+
- `sentences()` โ Iterator yielding SSML sentences
|
|
1063
|
+
- `sentences(as_documents=True)` โ Iterator yielding Document objects
|
|
1064
|
+
|
|
1065
|
+
**Editing Methods:**
|
|
1066
|
+
|
|
1067
|
+
- `insert(index, text, separator="")` โ Insert text at index
|
|
1068
|
+
- `remove(index)` โ Remove sentence
|
|
1069
|
+
- `clear()` โ Remove all content
|
|
1070
|
+
- `replace(old, new, count=-1)` โ Replace text
|
|
1071
|
+
|
|
1072
|
+
**Advanced Methods:**
|
|
1073
|
+
|
|
1074
|
+
- `merge(other_doc, separator="\n\n")` โ Merge another document
|
|
1075
|
+
- `split()` โ Split into sentence Documents
|
|
1076
|
+
- `get_fragment(index)` โ Get raw fragment by index
|
|
1077
|
+
|
|
1078
|
+
## Real-World TTS Example
|
|
1079
|
+
|
|
1080
|
+
```python
|
|
1081
|
+
import asyncio
|
|
1082
|
+
from ssmd import Document
|
|
1083
|
+
|
|
1084
|
+
# Your TTS engine (example with pyttsx3, kokoro-tts, etc.)
|
|
1085
|
+
class TTSEngine:
|
|
1086
|
+
async def speak(self, ssml: str):
|
|
1087
|
+
"""Speak SSML text."""
|
|
1088
|
+
# Implementation depends on your TTS engine
|
|
1089
|
+
pass
|
|
1090
|
+
|
|
1091
|
+
async def wait_until_done(self):
|
|
1092
|
+
"""Wait for speech to complete."""
|
|
1093
|
+
pass
|
|
1094
|
+
|
|
1095
|
+
async def read_document(content: str, tts: TTSEngine):
|
|
1096
|
+
"""Read an SSMD document sentence by sentence."""
|
|
1097
|
+
doc = Document(content, config={'auto_sentence_tags': True})
|
|
1098
|
+
|
|
1099
|
+
print(f"Reading document with {len(doc)} sentences...")
|
|
1100
|
+
|
|
1101
|
+
for i in range(len(doc)):
|
|
1102
|
+
sentence = doc[i]
|
|
1103
|
+
print(f"[{i+1}/{len(doc)}] Speaking...")
|
|
1104
|
+
await tts.speak(sentence)
|
|
1105
|
+
await tts.wait_until_done()
|
|
1106
|
+
|
|
1107
|
+
print("Done!")
|
|
1108
|
+
|
|
1109
|
+
# Usage
|
|
1110
|
+
document = """
|
|
1111
|
+
# Welcome
|
|
1112
|
+
Hello and *welcome* to our presentation!
|
|
1113
|
+
Today we'll discuss some exciting topics.
|
|
1114
|
+
|
|
1115
|
+
# Topic 1
|
|
1116
|
+
First ...500ms let's talk about SSMD.
|
|
1117
|
+
It makes writing TTS content [much easier](v: 4, p: 4)!
|
|
1118
|
+
|
|
1119
|
+
# Conclusion
|
|
1120
|
+
Thank you for listening @end_marker!
|
|
1121
|
+
"""
|
|
1122
|
+
|
|
1123
|
+
# Run async
|
|
1124
|
+
# await read_document(document, tts_engine)
|
|
1125
|
+
```
|
|
1126
|
+
|
|
1127
|
+
## Development
|
|
1128
|
+
|
|
1129
|
+
### Running Tests
|
|
1130
|
+
|
|
1131
|
+
```bash
|
|
1132
|
+
# Install dev dependencies
|
|
1133
|
+
pip install -e ".[dev]"
|
|
1134
|
+
|
|
1135
|
+
# Run all tests
|
|
1136
|
+
pytest
|
|
1137
|
+
|
|
1138
|
+
# Run with coverage
|
|
1139
|
+
pytest --cov=ssmd --cov-report=html
|
|
1140
|
+
|
|
1141
|
+
# Run specific test file
|
|
1142
|
+
pytest tests/test_basic.py -v
|
|
1143
|
+
```
|
|
1144
|
+
|
|
1145
|
+
### Code Quality
|
|
1146
|
+
|
|
1147
|
+
```bash
|
|
1148
|
+
# Format with ruff
|
|
1149
|
+
ruff format ssmd/ tests/
|
|
1150
|
+
|
|
1151
|
+
# Lint
|
|
1152
|
+
ruff check ssmd/ tests/
|
|
1153
|
+
|
|
1154
|
+
# Type check
|
|
1155
|
+
mypy ssmd/
|
|
1156
|
+
```
|
|
1157
|
+
|
|
1158
|
+
## Specification
|
|
1159
|
+
|
|
1160
|
+
This implementation follows the [SSMD Specification](SPECIFICATION.md) with additional
|
|
1161
|
+
features inspired by the
|
|
1162
|
+
[original Ruby SSMD specification](https://github.com/machisuji/ssmd/blob/master/SPECIFICATION.md).
|
|
1163
|
+
|
|
1164
|
+
### Implemented Features
|
|
1165
|
+
|
|
1166
|
+
โ
Text โ
Emphasis (`*text*`, `**strong**`, `_reduced_`, `[text](emphasis: none)`) โ
|
|
1167
|
+
Break (`...500ms`, `...2s`, `...n/w/c/s/p`) โ
Language (`[text](en)`, `[text](en-GB)`)
|
|
1168
|
+
โ
Voice inline (`[text](voice: Joanna)`, `[text](voice: en-US, gender: female)`) โ
|
|
1169
|
+
Voice directives (`@voice: name`) โ
Mark (`@marker`) โ
Paragraph (`\n\n`) โ
Phoneme
|
|
1170
|
+
(`[text](ph: xsampa)`, `[text](ipa: ipa)`) โ
Prosody shorthand (`++loud++`, `>>fast>>`,
|
|
1171
|
+
`^^high^^`) โ
Prosody explicit (`[text](vrp: 555)`, `[text](v: 5)`) โ
Substitution
|
|
1172
|
+
(`[text](sub: alias)`) โ
Say-as (`[text](as: telephone)`,
|
|
1173
|
+
`[text](as: date, detail: 1)`) โ
Audio (`[desc](url.mp3 alt)`,
|
|
1174
|
+
`[desc](url.mp3 clip: 5s-30s, speed: 120%)`) โ
Headings (`# ## ###`) โ
Extensions
|
|
1175
|
+
(`[text](ext: whisper)`, Google TTS styles) โ
Auto-sentence tags (`<s>`) โ
**SSML โ
|
|
1176
|
+
SSMD bidirectional conversion**
|
|
1177
|
+
|
|
1178
|
+
## Related Projects
|
|
1179
|
+
|
|
1180
|
+
- **[SSMD (Ruby)](https://github.com/machisuji/ssmd)** - Original reference
|
|
1181
|
+
implementation
|
|
1182
|
+
- **[SSMD (JavaScript)](https://github.com/fabien88/ssmd)** - JavaScript implementation
|
|
1183
|
+
- **[Speech Markdown](https://www.speechmarkdown.org/)** - Alternative specification
|
|
1184
|
+
|
|
1185
|
+
## Contributing
|
|
1186
|
+
|
|
1187
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
1188
|
+
|
|
1189
|
+
1. Fork the repository
|
|
1190
|
+
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
|
1191
|
+
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
1192
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
1193
|
+
5. Open a Pull Request
|
|
1194
|
+
|
|
1195
|
+
## License
|
|
1196
|
+
|
|
1197
|
+
This project is licensed under the MIT - see the [LICENSE](LICENSE) file for details.
|
|
1198
|
+
|
|
1199
|
+
## Acknowledgments
|
|
1200
|
+
|
|
1201
|
+
- Original SSMD specification by [machisuji](https://github.com/machisuji)
|
|
1202
|
+
- JavaScript implementation by [fabien88](https://github.com/fabien88)
|
|
1203
|
+
- X-SAMPA to IPA conversion table from the Ruby implementation
|
|
1204
|
+
|
|
1205
|
+
## Links
|
|
1206
|
+
|
|
1207
|
+
- **Homepage:** https://github.com/holgern/ssmd
|
|
1208
|
+
- **PyPI:** https://pypi.org/project/ssmd/
|
|
1209
|
+
- **Issues:** https://github.com/holgern/ssmd/issues
|
|
1210
|
+
- **Documentation:** https://ssmd.readthedocs.io/
|