lattifai 0.2.4__py3-none-any.whl → 0.4.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,811 @@
1
+ Metadata-Version: 2.4
2
+ Name: lattifai
3
+ Version: 0.4.0
4
+ Summary: Lattifai Python SDK: Seamless Integration with Lattifai's Speech and Video AI Services
5
+ Author-email: Lattifai Technologies <tech@lattifai.com>
6
+ Maintainer-email: Lattice <tech@lattifai.com>
7
+ License: MIT License
8
+
9
+ Copyright (c) 2025 Lattifai.
10
+
11
+ Permission is hereby granted, free of charge, to any person obtaining a copy
12
+ of this software and associated documentation files (the "Software"), to deal
13
+ in the Software without restriction, including without limitation the rights
14
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
+ copies of the Software, and to permit persons to whom the Software is
16
+ furnished to do so, subject to the following conditions:
17
+
18
+ The above copyright notice and this permission notice shall be included in all
19
+ copies or substantial portions of the Software.
20
+
21
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
+ SOFTWARE.
28
+ Project-URL: Homepage, https://github.com/lattifai/lattifai-python
29
+ Project-URL: Documentation, https://github.com/lattifai/lattifai-python/README.md
30
+ Project-URL: Bug Tracker, https://github.com/lattifai/lattifai-python/issues
31
+ Project-URL: Discussions, https://github.com/lattifai/lattifai-python/discussions
32
+ Project-URL: Changelog, https://github.com/lattifai/lattifai-python/CHANGELOG.md
33
+ Keywords: lattifai,speech recognition,video analysis,ai,sdk,api client
34
+ Classifier: Development Status :: 5 - Production/Stable
35
+ Classifier: Intended Audience :: Developers
36
+ Classifier: Intended Audience :: Science/Research
37
+ Classifier: License :: OSI Approved :: Apache Software License
38
+ Classifier: Programming Language :: Python :: 3.9
39
+ Classifier: Programming Language :: Python :: 3.10
40
+ Classifier: Programming Language :: Python :: 3.11
41
+ Classifier: Programming Language :: Python :: 3.12
42
+ Classifier: Programming Language :: Python :: 3.13
43
+ Classifier: Programming Language :: Python :: 3.14
44
+ Classifier: Operating System :: MacOS :: MacOS X
45
+ Classifier: Operating System :: POSIX :: Linux
46
+ Classifier: Operating System :: Microsoft :: Windows
47
+ Classifier: Topic :: Multimedia :: Sound/Audio
48
+ Classifier: Topic :: Multimedia :: Video
49
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
50
+ Requires-Python: >=3.9
51
+ Description-Content-Type: text/markdown
52
+ License-File: LICENSE
53
+ Requires-Dist: lattifai-core>=0.2.1
54
+ Requires-Dist: httpx
55
+ Requires-Dist: python-dotenv
56
+ Requires-Dist: lhotse>=1.26.0
57
+ Requires-Dist: colorful>=0.5.6
58
+ Requires-Dist: pysubs2
59
+ Requires-Dist: praatio
60
+ Requires-Dist: tgt
61
+ Requires-Dist: onnxruntime
62
+ Requires-Dist: resampy
63
+ Requires-Dist: g2p-phonemizer==0.1.1
64
+ Requires-Dist: wtpsplit>=2.1.6
65
+ Requires-Dist: av
66
+ Requires-Dist: questionary>=2.0
67
+ Requires-Dist: yt-dlp
68
+ Requires-Dist: pycryptodome
69
+ Requires-Dist: google-genai
70
+ Provides-Extra: numpy
71
+ Requires-Dist: numpy; extra == "numpy"
72
+ Provides-Extra: test
73
+ Requires-Dist: pytest; extra == "test"
74
+ Requires-Dist: pytest-cov; extra == "test"
75
+ Requires-Dist: pytest-asyncio; extra == "test"
76
+ Requires-Dist: ruff; extra == "test"
77
+ Requires-Dist: numpy; extra == "test"
78
+ Provides-Extra: all
79
+ Requires-Dist: numpy; extra == "all"
80
+ Requires-Dist: pytest; extra == "all"
81
+ Requires-Dist: pytest-cov; extra == "all"
82
+ Requires-Dist: pytest-asyncio; extra == "all"
83
+ Requires-Dist: ruff; extra == "all"
84
+ Dynamic: license-file
85
+
86
+ <div align="center">
87
+ <img src="https://raw.githubusercontent.com/lattifai/lattifai-python/main/assets/logo.png" width=256>
88
+
89
+ [![PyPI version](https://badge.fury.io/py/lattifai.svg)](https://badge.fury.io/py/lattifai)
90
+ [![Python Versions](https://img.shields.io/pypi/pyversions/lattifai.svg)](https://pypi.org/project/lattifai)
91
+ [![PyPI Status](https://pepy.tech/badge/lattifai)](https://pepy.tech/project/lattifai)
92
+ </div>
93
+
94
+ <p align="center">
95
+ 🌐 <a href="https://lattifai.com"><b>Official Website</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/lattifai/lattifai-python">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/Lattifai/Lattice-1-Alpha">Model</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://lattifai.com/blogs">Blog</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://discord.gg/kvF4WsBRK8"><img src="https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white" alt="Discord" style="vertical-align: middle;"></a>
96
+ </p>
97
+
98
+
99
+ # LattifAI Python
100
+
101
+ Advanced forced alignment and subtitle generation powered by [Lattice-1-Alpha](https://huggingface.co/Lattifai/Lattice-1-Alpha) model.
102
+
103
+ ## Installation
104
+
105
+ ```bash
106
+ pip install install-k2
107
+ # The installation will automatically detect and use your already installed PyTorch version(up to 2.8).
108
+ install-k2 # Install k2
109
+
110
+ pip install lattifai
111
+ ```
112
+ > **⚠️ Important**: You must run `install-k2` before using the lattifai library.
113
+ ```
114
+ > install-k2 --help
115
+ usage: install-k2 [-h] [--system {linux,darwin,windows}] [--dry-run] [--torch-version TORCH_VERSION]
116
+
117
+ Auto-install the latest k2 wheel for your environment.
118
+
119
+ optional arguments:
120
+ -h, --help show this help message and exit
121
+ --system {linux,darwin,windows}
122
+ Override OS detection. Valid values: linux, darwin (macOS), windows. Default: auto-detect
123
+ --dry-run Show what would be installed without making changes.
124
+ --torch-version TORCH_VERSION
125
+ Specify torch version (e.g., 2.8.0). If not specified, will auto-detect or use latest available.
126
+ ```
127
+
128
+
129
+ ## Quick Start
130
+
131
+ ### Command Line
132
+
133
+ The library provides two equivalent commands: `lai` (recommended for convenience) and `lattifai`.
134
+
135
+ ```bash
136
+ # Align audio with subtitle (using lai command)
137
+ lai align audio.wav subtitle.srt output.srt
138
+
139
+ # Or use the full command
140
+ lattifai align audio.wav subtitle.srt output.srt
141
+
142
+ # Process YouTube videos with intelligent workflow
143
+ lai agent --youtube https://www.youtube.com/watch?v=VIDEO_ID
144
+
145
+ # Download and align YouTube content directly
146
+ lai youtube https://www.youtube.com/watch?v=VIDEO_ID
147
+
148
+ # Convert subtitle format
149
+ lai subtitle convert input.srt output.vtt
150
+ ```
151
+
152
+ > **💡 Tip**: Use `lai` for faster typing in your daily workflow!
153
+
154
+ #### Command Quick Reference
155
+
156
+ | Command | Use Case | Best For |
157
+ |---------|----------|----------|
158
+ | `lai align` | Align existing audio + subtitle files | Local files, custom workflows |
159
+ | `lai youtube` | Download & align YouTube content | Quick one-off YouTube processing |
160
+ | `lai agent` | Intelligent YouTube workflow with retries | Production, batch jobs, automation |
161
+ | `lai subtitle` | Convert subtitle formats | Format conversion only |
162
+
163
+ #### lai align options
164
+ ```
165
+ > lai align --help
166
+ Usage: lattifai align [OPTIONS] INPUT_AUDIO_PATH INPUT_SUBTITLE_PATH OUTPUT_SUBTITLE_PATH
167
+
168
+ Command used to align audio with subtitles
169
+
170
+ Options:
171
+ -F, --input_format [srt|vtt|ass|ssa|sub|sbv|txt|auto|gemini] Input subtitle format.
172
+ -S, --split_sentence Re-segment subtitles by semantics.
173
+ -W, --word_level Include word-level alignment timestamps.
174
+ -D, --device [cpu|cuda|mps] Device to use for inference.
175
+ -M, --model_name_or_path TEXT Model name or path for alignment.
176
+ --api_key TEXT API key for LattifAI.
177
+ --help Show this message and exit.
178
+ ```
179
+
180
+ #### lai youtube command
181
+
182
+ Download and align YouTube videos in one step. Automatically downloads media, fetches subtitles (or uses Gemini transcription if unavailable), and performs forced alignment.
183
+
184
+ ```bash
185
+ # Basic usage
186
+ lai youtube https://www.youtube.com/watch?v=VIDEO_ID
187
+
188
+ # Common options: audio format, sentence splitting, word-level, GPU
189
+ lai youtube --media-format mp3 --split-sentence --word-level --device mps \
190
+ --output-dir ./output --output-format srt https://www.youtube.com/watch?v=VIDEO_ID
191
+
192
+ # Use Gemini for transcription fallback
193
+ # Gemini API Key: Get yours at https://aistudio.google.com/apikey
194
+ # Note: Your API key is completely safe - it's never logged or stored by our codebase
195
+ lai youtube --gemini-api-key YOUR_GEMINI_KEY https://www.youtube.com/watch?v=VIDEO_ID
196
+ ```
197
+
198
+ **Options**:
199
+ ```
200
+ > lai youtube --help
201
+ Usage: lattifai youtube [OPTIONS] YT_URL
202
+
203
+ Download media and subtitles from YouTube for further alignment.
204
+
205
+ Options:
206
+ -M, --media-format [mp3|wav|m4a|aac|flac|ogg|opus|aiff|mp4|webm|mkv|avi|mov] Media format for YouTube download.
207
+ -S, --split-sentence Re-segment subtitles by semantics.
208
+ -W, --word-level Include word-level alignment timestamps.
209
+ -O, --output-dir PATH Output directory (default: current directory).
210
+ -D, --device [cpu|cuda|mps] Device to use for inference.
211
+ -M, --model-name-or-path TEXT Model name or path for alignment.
212
+ --api-key TEXT API key for LattifAI.
213
+ --gemini-api-key TEXT Gemini API key for transcription fallback.
214
+ -F, --output-format [srt|vtt|ass|ssa|sub|sbv|txt|json|TextGrid] Subtitle output format.
215
+ --help Show this message and exit.
216
+ ```
217
+
218
+ #### lai agent command
219
+
220
+ **Intelligent Agentic Workflow** - Process YouTube videos through an advanced multi-step workflow with automatic retries, smart file management, and comprehensive error handling.
221
+
222
+ ```bash
223
+ # Basic usage
224
+ lai agent --youtube https://www.youtube.com/watch?v=VIDEO_ID
225
+
226
+ # Production workflow with retries, verbose logging, and force overwrite
227
+ lai agent --youtube --media-format mp4 --output-format TextGrid \
228
+ --split-sentence --word-level --device mps --max-retries 2 --verbose --force \
229
+ --output-dir ./outputs https://www.youtube.com/watch?v=VIDEO_ID
230
+ ```
231
+
232
+ **Key Features**:
233
+ - **🔄 Automatic Retry Logic**: Configurable retry mechanism for failed steps
234
+ - **📁 Smart File Management**: Detects existing files and prompts for action
235
+ - **🎯 Intelligent Workflow**: Multi-step pipeline with dependency management
236
+ - **🛡️ Error Recovery**: Graceful handling of failures with detailed logging
237
+ - **📊 Rich Output**: Comprehensive results with metadata and file paths
238
+ - **⚡ Async Processing**: Efficient parallel execution of independent tasks
239
+
240
+ **Options**:
241
+ ```
242
+ > lai agent --help
243
+ Usage: lattifai agent [OPTIONS] URL
244
+
245
+ LattifAI Agentic Workflow Agent
246
+
247
+ Process multimedia content through intelligent agent-based pipelines.
248
+
249
+ Options:
250
+ --youtube, --yt Process YouTube URL through agentic workflow.
251
+ --gemini-api-key TEXT Gemini API key for transcription.
252
+ --media-format [mp3|wav|m4a|aac|opus|mp4|webm|mkv|...] Media format for YouTube download.
253
+ --output-format [srt|vtt|ass|ssa|sub|sbv|txt|json|...] Subtitle output format.
254
+ --output-dir PATH Output directory (default: current directory).
255
+ --max-retries INTEGER Maximum retries for failed steps.
256
+ -S, --split-sentence Re-segment subtitles by semantics.
257
+ --word-level Include word-level alignment timestamps.
258
+ --verbose, -v Enable verbose logging.
259
+ --force, -f Force overwrite without confirmation.
260
+ --help Show this message and exit.
261
+ ```
262
+
263
+ **When to use `lai agent` vs `lai youtube`**:
264
+ - **Use `lai agent`**: For production workflows, batch processing, advanced error handling, and when you need retry logic
265
+ - **Use `lai youtube`**: For quick one-off downloads and alignment with minimal overhead
266
+
267
+ #### Understanding --split_sentence
268
+
269
+ The `--split_sentence` option performs intelligent sentence re-splitting based on punctuation and semantic boundaries. This is especially useful when processing subtitles that combine multiple semantic units in a single segment, such as:
270
+
271
+ - **Mixed content**: Non-speech elements (e.g., `[APPLAUSE]`, `[MUSIC]`) followed by actual dialogue
272
+ - **Natural punctuation boundaries**: Colons, periods, and other punctuation marks that indicate semantic breaks
273
+ - **Concatenated phrases**: Multiple distinct utterances joined together without proper separation
274
+
275
+ **Example transformations**:
276
+ ```
277
+ Input: "[APPLAUSE] >> MIRA MURATI: Thank you all"
278
+ Output: ["[APPLAUSE]", ">> MIRA MURATI: Thank you all"]
279
+
280
+ Input: "[MUSIC] Welcome back. Today we discuss AI."
281
+ Output: ["[MUSIC]", "Welcome back.", "Today we discuss AI."]
282
+ ```
283
+
284
+ This feature helps improve alignment accuracy by:
285
+ 1. Respecting punctuation-based semantic boundaries
286
+ 2. Separating distinct utterances for more precise timing
287
+ 3. Maintaining semantic context for each independent phrase
288
+
289
+ **Usage**:
290
+ ```bash
291
+ lai align --split_sentence audio.wav subtitle.srt output.srt
292
+ ```
293
+
294
+ #### Understanding --word_level
295
+
296
+ The `--word_level` option enables word-level alignment, providing precise timing information for each individual word in the audio. When enabled, the output includes detailed word boundaries within each subtitle segment, allowing for fine-grained synchronization and analysis.
297
+
298
+ **Key features**:
299
+ - **Individual word timestamps**: Each word gets its own start and end time
300
+ - **Format-specific output**:
301
+ - **JSON (Recommended)**: Full alignment details stored in `alignment.word` field of each segment, preserving all word-level timing information in a structured format
302
+ - **TextGrid**: Separate "words" tier alongside the "utterances" tier for linguistic analysis
303
+ - **TXT**: Each word on a separate line with timestamp range: `[start-end] word`
304
+ - **Standard subtitle formats** (SRT, VTT, ASS, etc.): Each word becomes a separate subtitle event
305
+
306
+ > **💡 Recommended**: Use JSON format (`output.json`) to preserve complete word-level alignment data. Other formats may lose some structural information.
307
+
308
+ **Example output formats**:
309
+
310
+ **JSON format** (with word-level details):
311
+ ```json
312
+ [
313
+ {
314
+ "id": "6",
315
+ "recording_id": "",
316
+ "start": 24.52,
317
+ "duration": 9.1,
318
+ "channel": 0,
319
+ "text": "We will start with why it is so important to us to have a product that we can make truly available and broadly available to everyone.",
320
+ "custom": {
321
+ "score": 0.8754
322
+ },
323
+ "alignment": {
324
+ "word": [
325
+ [
326
+ "We",
327
+ 24.6,
328
+ 0.14,
329
+ 1.0
330
+ ],
331
+ [
332
+ "will",
333
+ 24.74,
334
+ 0.14,
335
+ 1.0
336
+ ],
337
+ [
338
+ "start",
339
+ 24.88,
340
+ 0.46,
341
+ 0.771
342
+ ],
343
+ [
344
+ "with",
345
+ 25.34,
346
+ 0.28,
347
+ 0.9538
348
+ ],
349
+ [
350
+ "why",
351
+ 26.2,
352
+ 0.36,
353
+ 1.0
354
+ ],
355
+ [
356
+ "it",
357
+ 26.56,
358
+ 0.14,
359
+ 0.9726
360
+ ],
361
+ [
362
+ "is",
363
+ 26.74,
364
+ 0.02,
365
+ 0.6245
366
+ ],
367
+ [
368
+ "so",
369
+ 26.76,
370
+ 0.16,
371
+ 0.6615
372
+ ],
373
+ [
374
+ "important",
375
+ 26.92,
376
+ 0.54,
377
+ 0.9257
378
+ ],
379
+ [
380
+ "to",
381
+ 27.5,
382
+ 0.1,
383
+ 1.0
384
+ ],
385
+ [
386
+ "us",
387
+ 27.6,
388
+ 0.34,
389
+ 0.7955
390
+ ],
391
+ [
392
+ "to",
393
+ 28.04,
394
+ 0.08,
395
+ 0.8545
396
+ ],
397
+ [
398
+ "have",
399
+ 28.16,
400
+ 0.46,
401
+ 0.9994
402
+ ],
403
+ [
404
+ "a",
405
+ 28.76,
406
+ 0.06,
407
+ 1.0
408
+ ],
409
+ [
410
+ "product",
411
+ 28.82,
412
+ 0.56,
413
+ 0.9975
414
+ ],
415
+ [
416
+ "that",
417
+ 29.38,
418
+ 0.08,
419
+ 0.5602
420
+ ],
421
+ [
422
+ "we",
423
+ 29.46,
424
+ 0.16,
425
+ 0.7017
426
+ ],
427
+ [
428
+ "can",
429
+ 29.62,
430
+ 0.22,
431
+ 1.0
432
+ ],
433
+ [
434
+ "make",
435
+ 29.84,
436
+ 0.32,
437
+ 0.9643
438
+ ],
439
+ [
440
+ "truly",
441
+ 30.42,
442
+ 0.32,
443
+ 0.6737
444
+ ],
445
+ [
446
+ "available",
447
+ 30.74,
448
+ 0.6,
449
+ 0.9349
450
+ ],
451
+ [
452
+ "and",
453
+ 31.4,
454
+ 0.2,
455
+ 0.4114
456
+ ],
457
+ [
458
+ "broadly",
459
+ 31.6,
460
+ 0.44,
461
+ 0.6726
462
+ ],
463
+ [
464
+ "available",
465
+ 32.04,
466
+ 0.58,
467
+ 0.9108
468
+ ],
469
+ [
470
+ "to",
471
+ 32.72,
472
+ 0.06,
473
+ 1.0
474
+ ],
475
+ [
476
+ "everyone.",
477
+ 32.78,
478
+ 0.64,
479
+ 0.7886
480
+ ]
481
+ ]
482
+ }
483
+ }
484
+ ]
485
+ ```
486
+
487
+ **TXT format** (word-level):
488
+ ```
489
+ [0.50-1.20] Hello
490
+ [1.20-2.30] world
491
+ ```
492
+
493
+ **TextGrid format** (Praat-compatible):
494
+ ```
495
+ Two tiers created:
496
+ - "utterances" tier: Full segments with original text
497
+ - "words" tier: Individual words with precise boundaries
498
+ ```
499
+
500
+ **Use cases**:
501
+ - **Linguistic analysis**: Study pronunciation patterns, speech timing, and prosody
502
+ - **Accessibility**: Create more granular captions for hearing-impaired users
503
+ - **Video/Audio editing**: Enable precise word-level subtitle synchronization
504
+ - **Karaoke applications**: Highlight individual words as they are spoken
505
+ - **Language learning**: Provide precise word boundaries for pronunciation practice
506
+
507
+ **Usage**:
508
+ ```bash
509
+ # Generate word-level aligned JSON
510
+ lai align --word_level audio.wav subtitle.srt output.json
511
+
512
+ # Create TextGrid file for Praat analysis
513
+ lai align --word_level audio.wav subtitle.srt output.TextGrid
514
+
515
+ # Word-level TXT output
516
+ lai align --word_level audio.wav subtitle.srt output.txt
517
+
518
+ # Standard subtitle with word-level events
519
+ lai align --word_level audio.wav subtitle.srt output.srt
520
+ ```
521
+
522
+ **Combined with --split_sentence**:
523
+ ```bash
524
+ # Optimal alignment: semantic splitting + word-level details
525
+ lai align --split_sentence --word_level audio.wav subtitle.srt output.json
526
+ ```
527
+
528
+ ### Python API
529
+
530
+ ```python
531
+ from lattifai import LattifAI
532
+
533
+ client = LattifAI() # api_key will be read from LATTIFAI_API_KEY if not provided
534
+ alignments, output_path = client.alignment(
535
+ audio="audio.wav",
536
+ subtitle="subtitle.srt",
537
+ output_subtitle_path="output.srt",
538
+ )
539
+ ```
540
+
541
+ Need to run inside an async application? Use the drop-in asynchronous client:
542
+
543
+ ```python
544
+ import asyncio
545
+ from lattifai import AsyncLattifAI
546
+
547
+
548
+ async def main():
549
+ async with AsyncLattifAI() as client:
550
+ alignments, output_path = await client.alignment(
551
+ audio="audio.wav",
552
+ subtitle="subtitle.srt",
553
+ split_sentence=False,
554
+ output_subtitle_path="output.srt",
555
+ )
556
+
557
+
558
+ asyncio.run(main())
559
+ ```
560
+
561
+ Both clients return a list of `Supervision` segments with timing information and, if provided, the path where the aligned subtitle was written.
562
+
563
+ ## Supported Formats
564
+
565
+ **Audio**: WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, AIFF
566
+
567
+ **Video**: MP4, MKV, MOV, WEBM, AVI
568
+
569
+ **Subtitle Input**: SRT, VTT, ASS, SSA, SUB, SBV, TXT (plain text), Gemini (Google Gemini transcript format)
570
+
571
+ **Subtitle Output**: All input formats plus TextGrid (Praat format for linguistic analysis)
572
+
573
+ ## API Reference
574
+
575
+ ### LattifAI (sync)
576
+
577
+ ```python
578
+ LattifAI(
579
+ api_key: Optional[str] = None,
580
+ model_name_or_path: str = 'Lattifai/Lattice-1-Alpha',
581
+ device: str = 'cpu', # 'cpu', 'cuda', or 'mps'
582
+ )
583
+ ```
584
+
585
+ ### AsyncLattifAI (async)
586
+
587
+ ```python
588
+ AsyncLattifAI(
589
+ api_key: Optional[str] = None,
590
+ model_name_or_path: str = 'Lattifai/Lattice-1-Alpha',
591
+ device: str = 'cpu',
592
+ )
593
+ ```
594
+
595
+ Use `async with AsyncLattifAI() as client:` or call `await client.close()` when you are done to release the underlying HTTP session.
596
+
597
+ ### alignment()
598
+
599
+ ```python
600
+ client.alignment(
601
+ audio: str, # Path to audio file
602
+ subtitle: str, # Path to subtitle/text file
603
+ format: Optional[str] = None, # Input format: 'srt', 'vtt', 'ass', 'txt', 'gemini', or 'auto' (auto-detect if None)
604
+ split_sentence: bool = False, # Smart sentence splitting based on punctuation semantics
605
+ return_details: bool = False, # Enable word-level alignment details
606
+ output_subtitle_path: Optional[str] = None
607
+ ) -> Tuple[List[Supervision], Optional[str]] # await client.alignment(...) for AsyncLattifAI
608
+ ```
609
+
610
+ **Parameters**:
611
+ - `audio`: Path to the audio file to be aligned
612
+ - `subtitle`: Path to the subtitle or text file
613
+ - `format`: Input subtitle format. Supported values: 'srt', 'vtt', 'ass', 'txt', 'gemini', 'auto'. When set to None or 'auto', the format is automatically detected from file extension. Additional formats (ssa, sub, sbv) are supported through automatic format detection
614
+ - `split_sentence`: Enable intelligent sentence re-splitting (default: False). Set to True when subtitles combine multiple semantic units (non-speech elements + dialogue, or multiple sentences) that would benefit from separate timing alignment
615
+ - `return_details`: Enable word-level alignment details (default: False). When True, each `Supervision` object includes an `alignment` field with word-level timestamps, accessible via `supervision.alignment['word']`. This provides precise timing for each individual word within the segment
616
+ - `output_subtitle_path`: Output path for aligned subtitle (optional)
617
+
618
+ **Returns**:
619
+ - A tuple containing:
620
+ - `alignments`: List of aligned `Supervision` objects with timing information
621
+ - `output_subtitle_path`: Path where the subtitle was written (if `output_subtitle_path` was provided)
622
+
623
+ ## Examples
624
+
625
+ ### Basic Text Alignment
626
+
627
+ ```python
628
+ from lattifai import LattifAI
629
+
630
+ client = LattifAI()
631
+ alignments, output_path = client.alignment(
632
+ audio="speech.wav",
633
+ subtitle="transcript.txt",
634
+ format="txt",
635
+ split_sentence=False,
636
+ output_subtitle_path="output.srt"
637
+ )
638
+ ```
639
+
640
+ ### Word-Level Alignment
641
+
642
+ ```python
643
+ from lattifai import LattifAI
644
+
645
+ client = LattifAI()
646
+ alignments, output_path = client.alignment(
647
+ audio="speech.wav",
648
+ subtitle="transcript.srt",
649
+ return_details=True, # Enable word-level alignment
650
+ output_subtitle_path="output.json" # JSON format preserves word-level data
651
+ )
652
+
653
+ # Access word-level timestamps
654
+ for segment in alignments:
655
+ print(f"Segment: {segment.text} ({segment.start:.2f}s - {segment.end:.2f}s)")
656
+ if segment.alignment and 'word' in segment.alignment:
657
+ for word in segment.alignment['word']:
658
+ print(f" Word: {word.symbol} ({word.start:.2f}s - {word.end:.2f}s)")
659
+ ```
660
+
661
+ ### Batch Processing
662
+
663
+ ```python
664
+ from pathlib import Path
665
+ from lattifai import LattifAI
666
+
667
+ client = LattifAI()
668
+ audio_dir = Path("audio_files")
669
+ subtitle_dir = Path("subtitles")
670
+ output_dir = Path("aligned")
671
+
672
+ for audio in audio_dir.glob("*.wav"):
673
+ subtitle = subtitle_dir / f"{audio.stem}.srt"
674
+ if subtitle.exists():
675
+ alignments, output_path = client.alignment(
676
+ audio=audio,
677
+ subtitle=subtitle,
678
+ output_subtitle_path=output_dir / f"{audio.stem}_aligned.srt"
679
+ )
680
+ ```
681
+
682
+ ### GPU Acceleration
683
+
684
+ ```python
685
+ from lattifai import LattifAI
686
+
687
+ # NVIDIA GPU
688
+ client = LattifAI(device='cuda')
689
+
690
+ # Apple Silicon
691
+ client = LattifAI(device='mps')
692
+
693
+ # CLI
694
+ lai align --device mps audio.wav subtitle.srt output.srt
695
+ ```
696
+
697
+ ### YouTube Processing with Agent Workflow
698
+
699
+ ```python
700
+ import asyncio
701
+ from lattifai.workflows import YouTubeSubtitleAgent
702
+
703
+ async def process_youtube():
704
+ # Initialize agent with configuration
705
+ agent = YouTubeSubtitleAgent(
706
+ gemini_api_key="your-gemini-api-key",
707
+ video_format="mp4", # or "mp3", "wav", etc.
708
+ output_format="srt",
709
+ max_retries=2,
710
+ split_sentence=True,
711
+ word_level=True,
712
+ force_overwrite=False
713
+ )
714
+
715
+ # Process YouTube URL
716
+ result = await agent.process_youtube_url(
717
+ url="https://www.youtube.com/watch?v=VIDEO_ID",
718
+ output_dir="./output",
719
+ output_format="srt"
720
+ )
721
+
722
+ # Access results
723
+ print(f"Title: {result['metadata']['title']}")
724
+ print(f"Duration: {result['metadata']['duration']} seconds")
725
+ print(f"Subtitle count: {result['subtitle_count']}")
726
+
727
+ # Access generated files
728
+ for format_name, file_path in result['exported_files'].items():
729
+ print(f"{format_name.upper()}: {file_path}")
730
+
731
+ # Run the async workflow
732
+ asyncio.run(process_youtube())
733
+ ```
734
+
735
+ ## Configuration
736
+
737
+ ### API Key Setup
738
+
739
+ First, create your API key at [https://lattifai.com/dashboard/api-keys](https://lattifai.com/dashboard/api-keys)
740
+
741
+ **Recommended: Using .env file**
742
+
743
+ Create a `.env` file in your project root:
744
+ ```bash
745
+ LATTIFAI_API_KEY=your-api-key
746
+ ```
747
+
748
+ The library automatically loads the `.env` file (python-dotenv is included as a dependency).
749
+
750
+ **Alternative: Environment variable**
751
+ ```bash
752
+ export LATTIFAI_API_KEY="your-api-key"
753
+ ```
754
+
755
+ ## Model Information
756
+
757
+ **[Lattice-1-Alpha](https://huggingface.co/Lattifai/Lattice-1-Alpha)** features:
758
+ - State-of-the-art alignment precision
759
+ - **Language Support**: Currently supports English only. The upcoming **Lattice-1** release will support English, Chinese, and mixed English-Chinese content.
760
+ - Handles noisy audio and imperfect transcripts
761
+ - Optimized for CPU and GPU (CUDA/MPS)
762
+
763
+ **Requirements**:
764
+ - Python 3.9+
765
+ - 4GB RAM recommended
766
+ - ~2GB storage for model files
767
+
768
+ ## Development
769
+
770
+ ### Setup
771
+
772
+ ```bash
773
+ git clone https://github.com/lattifai/lattifai-python.git
774
+ cd lattifai-python
775
+ pip install -e ".[test]"
776
+ ./scripts/install-hooks.sh # Optional: install pre-commit hooks
777
+ ```
778
+
779
+ ### Testing
780
+
781
+ ```bash
782
+ pytest # Run all tests
783
+ pytest --cov=src # With coverage
784
+ pytest tests/test_basic.py # Specific test
785
+ ```
786
+
787
+ ### Code Quality
788
+
789
+ ```bash
790
+ ruff check src/ tests/ # Lint
791
+ ruff format src/ tests/ # Format
792
+ isort src/ tests/ # Sort imports
793
+ ```
794
+
795
+ ## Contributing
796
+
797
+ 1. Fork the repository
798
+ 2. Create a feature branch
799
+ 3. Make changes and add tests
800
+ 4. Run `pytest` and `ruff check`
801
+ 5. Submit a pull request
802
+
803
+ ## License
804
+
805
+ Apache License 2.0
806
+
807
+ ## Support
808
+
809
+ - **Issues**: [GitHub Issues](https://github.com/lattifai/lattifai-python/issues)
810
+ - **Discussions**: [GitHub Discussions](https://github.com/lattifai/lattifai-python/discussions)
811
+ - **Discord**: [Join our community](https://discord.gg/kvF4WsBRK8)