lattifai 0.2.5__py3-none-any.whl → 0.4.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,810 @@
1
+ Metadata-Version: 2.4
2
+ Name: lattifai
3
+ Version: 0.4.1
4
+ Summary: Lattifai Python SDK: Seamless Integration with Lattifai's Speech and Video AI Services
5
+ Author-email: Lattifai Technologies <tech@lattifai.com>
6
+ Maintainer-email: Lattice <tech@lattifai.com>
7
+ License: MIT License
8
+
9
+ Copyright (c) 2025 Lattifai.
10
+
11
+ Permission is hereby granted, free of charge, to any person obtaining a copy
12
+ of this software and associated documentation files (the "Software"), to deal
13
+ in the Software without restriction, including without limitation the rights
14
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
+ copies of the Software, and to permit persons to whom the Software is
16
+ furnished to do so, subject to the following conditions:
17
+
18
+ The above copyright notice and this permission notice shall be included in all
19
+ copies or substantial portions of the Software.
20
+
21
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
+ SOFTWARE.
28
+ Project-URL: Homepage, https://github.com/lattifai/lattifai-python
29
+ Project-URL: Documentation, https://github.com/lattifai/lattifai-python/README.md
30
+ Project-URL: Bug Tracker, https://github.com/lattifai/lattifai-python/issues
31
+ Project-URL: Discussions, https://github.com/lattifai/lattifai-python/discussions
32
+ Project-URL: Changelog, https://github.com/lattifai/lattifai-python/CHANGELOG.md
33
+ Keywords: lattifai,speech recognition,video analysis,ai,sdk,api client
34
+ Classifier: Development Status :: 5 - Production/Stable
35
+ Classifier: Intended Audience :: Developers
36
+ Classifier: Intended Audience :: Science/Research
37
+ Classifier: License :: OSI Approved :: Apache Software License
38
+ Classifier: Programming Language :: Python :: 3.9
39
+ Classifier: Programming Language :: Python :: 3.10
40
+ Classifier: Programming Language :: Python :: 3.11
41
+ Classifier: Programming Language :: Python :: 3.12
42
+ Classifier: Programming Language :: Python :: 3.13
43
+ Classifier: Operating System :: MacOS :: MacOS X
44
+ Classifier: Operating System :: POSIX :: Linux
45
+ Classifier: Operating System :: Microsoft :: Windows
46
+ Classifier: Topic :: Multimedia :: Sound/Audio
47
+ Classifier: Topic :: Multimedia :: Video
48
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
49
+ Requires-Python: >=3.9
50
+ Description-Content-Type: text/markdown
51
+ License-File: LICENSE
52
+ Requires-Dist: lattifai-core>=0.2.1
53
+ Requires-Dist: httpx
54
+ Requires-Dist: python-dotenv
55
+ Requires-Dist: lhotse>=1.26.0
56
+ Requires-Dist: colorful>=0.5.6
57
+ Requires-Dist: pysubs2
58
+ Requires-Dist: praatio
59
+ Requires-Dist: tgt
60
+ Requires-Dist: onnxruntime
61
+ Requires-Dist: resampy
62
+ Requires-Dist: g2p-phonemizer==0.1.1
63
+ Requires-Dist: wtpsplit>=2.1.6
64
+ Requires-Dist: av
65
+ Requires-Dist: questionary>=2.0
66
+ Requires-Dist: yt-dlp
67
+ Requires-Dist: pycryptodome
68
+ Requires-Dist: google-genai
69
+ Provides-Extra: numpy
70
+ Requires-Dist: numpy; extra == "numpy"
71
+ Provides-Extra: test
72
+ Requires-Dist: pytest; extra == "test"
73
+ Requires-Dist: pytest-cov; extra == "test"
74
+ Requires-Dist: pytest-asyncio; extra == "test"
75
+ Requires-Dist: ruff; extra == "test"
76
+ Requires-Dist: numpy; extra == "test"
77
+ Provides-Extra: all
78
+ Requires-Dist: numpy; extra == "all"
79
+ Requires-Dist: pytest; extra == "all"
80
+ Requires-Dist: pytest-cov; extra == "all"
81
+ Requires-Dist: pytest-asyncio; extra == "all"
82
+ Requires-Dist: ruff; extra == "all"
83
+ Dynamic: license-file
84
+
85
+ <div align="center">
86
+ <img src="https://raw.githubusercontent.com/lattifai/lattifai-python/main/assets/logo.png" width=256>
87
+
88
+ [![PyPI version](https://badge.fury.io/py/lattifai.svg)](https://badge.fury.io/py/lattifai)
89
+ [![Python Versions](https://img.shields.io/pypi/pyversions/lattifai.svg)](https://pypi.org/project/lattifai)
90
+ [![PyPI Status](https://pepy.tech/badge/lattifai)](https://pepy.tech/project/lattifai)
91
+ </div>
92
+
93
+ <p align="center">
94
+ 🌐 <a href="https://lattifai.com"><b>Official Website</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/lattifai/lattifai-python">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/Lattifai/Lattice-1-Alpha">Model</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://lattifai.com/blogs">Blog</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://discord.gg/kvF4WsBRK8"><img src="https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white" alt="Discord" style="vertical-align: middle;"></a>
95
+ </p>
96
+
97
+
98
+ # LattifAI Python
99
+
100
+ Advanced forced alignment and subtitle generation powered by [Lattice-1-Alpha](https://huggingface.co/Lattifai/Lattice-1-Alpha) model.
101
+
102
+ ## Installation
103
+
104
+ ```bash
105
+ pip install install-k2
106
+ # The installation will automatically detect and use your already installed PyTorch version(up to 2.8).
107
+ install-k2 # Install k2
108
+
109
+ pip install lattifai
110
+ ```
111
+ > **⚠️ Important**: You must run `install-k2` before using the lattifai library.
112
+ ```
113
+ > install-k2 --help
114
+ usage: install-k2 [-h] [--system {linux,darwin,windows}] [--dry-run] [--torch-version TORCH_VERSION]
115
+
116
+ Auto-install the latest k2 wheel for your environment.
117
+
118
+ optional arguments:
119
+ -h, --help show this help message and exit
120
+ --system {linux,darwin,windows}
121
+ Override OS detection. Valid values: linux, darwin (macOS), windows. Default: auto-detect
122
+ --dry-run Show what would be installed without making changes.
123
+ --torch-version TORCH_VERSION
124
+ Specify torch version (e.g., 2.8.0). If not specified, will auto-detect or use latest available.
125
+ ```
126
+
127
+
128
+ ## Quick Start
129
+
130
+ ### Command Line
131
+
132
+ The library provides two equivalent commands: `lai` (recommended for convenience) and `lattifai`.
133
+
134
+ ```bash
135
+ # Align audio with subtitle (using lai command)
136
+ lai align audio.wav subtitle.srt output.srt
137
+ # Or use the full command
138
+ lattifai align audio.wav subtitle.srt output.srt
139
+
140
+ # Download and align YouTube content directly
141
+ lai youtube https://www.youtube.com/watch?v=VIDEO_ID
142
+
143
+ # Process YouTube videos with intelligent workflow (equivalent to lai youtube)
144
+ lai agent --youtube https://www.youtube.com/watch?v=VIDEO_ID
145
+
146
+ # Convert subtitle format
147
+ lai subtitle convert input.srt output.vtt
148
+ ```
149
+
150
+ > **💡 Tip**: Use `lai` for faster typing in your daily workflow!
151
+
152
+ #### Command Quick Reference
153
+
154
+ | Command | Use Case | Best For |
155
+ |---------|----------|----------|
156
+ | `lai align` | Align existing audio + subtitle files | Local files, custom workflows |
157
+ | `lai youtube` | Download & align YouTube content | Quick one-off YouTube processing |
158
+ | `lai agent` | Intelligent YouTube workflow with retries | Production, batch jobs, automation |
159
+ | `lai subtitle` | Convert subtitle formats | Format conversion only |
160
+
161
+ #### lai align options
162
+ ```
163
+ > lai align --help
164
+ Usage: lattifai align [OPTIONS] INPUT_AUDIO_PATH INPUT_SUBTITLE_PATH OUTPUT_SUBTITLE_PATH
165
+
166
+ Command used to align audio with subtitles
167
+
168
+ Options:
169
+ -F, --input_format [srt|vtt|ass|ssa|sub|sbv|txt|auto|gemini] Input subtitle format.
170
+ -S, --split_sentence Re-segment subtitles by semantics.
171
+ -W, --word_level Include word-level alignment timestamps.
172
+ -D, --device [cpu|cuda|mps] Device to use for inference.
173
+ -M, --model_name_or_path TEXT Model name or path for alignment.
174
+ --api_key TEXT API key for LattifAI.
175
+ --help Show this message and exit.
176
+ ```
177
+
178
+ #### lai youtube command
179
+
180
+ Download and align YouTube videos in one step. Automatically downloads media, fetches subtitles (or uses Gemini transcription if unavailable), and performs forced alignment.
181
+
182
+ ```bash
183
+ # Basic usage
184
+ lai youtube https://www.youtube.com/watch?v=VIDEO_ID
185
+
186
+ # Common options: audio format, sentence splitting, word-level, GPU
187
+ lai youtube --media-format mp3 --split-sentence --word-level --device mps \
188
+ --output-dir ./output --output-format srt https://www.youtube.com/watch?v=VIDEO_ID
189
+
190
+ # Use Gemini for transcription fallback
191
+ # Gemini API Key: Get yours at https://aistudio.google.com/apikey
192
+ # Note: Your API key is completely safe - it's never logged or stored by our codebase
193
+ lai youtube --gemini-api-key YOUR_GEMINI_KEY https://www.youtube.com/watch?v=VIDEO_ID
194
+ ```
195
+
196
+ **Options**:
197
+ ```
198
+ > lai youtube --help
199
+ Usage: lattifai youtube [OPTIONS] YT_URL
200
+
201
+ Download media and subtitles from YouTube for further alignment.
202
+
203
+ Options:
204
+ -M, --media-format [mp3|wav|m4a|aac|flac|ogg|opus|aiff|mp4|webm|mkv|avi|mov] Media format for YouTube download.
205
+ -S, --split-sentence Re-segment subtitles by semantics.
206
+ -W, --word-level Include word-level alignment timestamps.
207
+ -O, --output-dir PATH Output directory (default: current directory).
208
+ -D, --device [cpu|cuda|mps] Device to use for inference.
209
+ -M, --model-name-or-path TEXT Model name or path for alignment.
210
+ --api-key TEXT API key for LattifAI.
211
+ --gemini-api-key TEXT Gemini API key for transcription fallback.
212
+ -F, --output-format [srt|vtt|ass|ssa|sub|sbv|txt|json|TextGrid] Subtitle output format.
213
+ --help Show this message and exit.
214
+ ```
215
+
216
+ #### lai agent command
217
+
218
+ **Intelligent Agentic Workflow** - Process YouTube videos through an advanced multi-step workflow with automatic retries, smart file management, and comprehensive error handling.
219
+
220
+ ```bash
221
+ # Basic usage
222
+ lai agent --youtube https://www.youtube.com/watch?v=VIDEO_ID
223
+
224
+ # Production workflow with retries, verbose logging, and force overwrite
225
+ lai agent --youtube --media-format mp4 --output-format TextGrid \
226
+ --split-sentence --word-level --device mps --max-retries 2 --verbose --force \
227
+ --output-dir ./outputs https://www.youtube.com/watch?v=VIDEO_ID
228
+ ```
229
+
230
+ **Key Features**:
231
+ - **🔄 Automatic Retry Logic**: Configurable retry mechanism for failed steps
232
+ - **📁 Smart File Management**: Detects existing files and prompts for action
233
+ - **🎯 Intelligent Workflow**: Multi-step pipeline with dependency management
234
+ - **🛡️ Error Recovery**: Graceful handling of failures with detailed logging
235
+ - **📊 Rich Output**: Comprehensive results with metadata and file paths
236
+ - **⚡ Async Processing**: Efficient parallel execution of independent tasks
237
+
238
+ **Options**:
239
+ ```
240
+ > lai agent --help
241
+ Usage: lattifai agent [OPTIONS] URL
242
+
243
+ LattifAI Agentic Workflow Agent
244
+
245
+ Process multimedia content through intelligent agent-based pipelines.
246
+
247
+ Options:
248
+ --youtube, --yt Process YouTube URL through agentic workflow.
249
+ --gemini-api-key TEXT Gemini API key for transcription.
250
+ --media-format [mp3|wav|m4a|aac|opus|mp4|webm|mkv|...] Media format for YouTube download.
251
+ --output-format [srt|vtt|ass|ssa|sub|sbv|txt|json|...] Subtitle output format.
252
+ --output-dir PATH Output directory (default: current directory).
253
+ --max-retries INTEGER Maximum retries for failed steps.
254
+ -S, --split-sentence Re-segment subtitles by semantics.
255
+ --word-level Include word-level alignment timestamps.
256
+ --verbose, -v Enable verbose logging.
257
+ --force, -f Force overwrite without confirmation.
258
+ --help Show this message and exit.
259
+ ```
260
+
261
+ **When to use `lai agent` vs `lai youtube`**:
262
+ - Both `lai agent --youtube URL` and `lai youtube URL` provide the same core functionality for downloading and aligning YouTube content
263
+ - **Use `lai agent --youtube`**: For production workflows, batch processing, advanced error handling, and when you need retry logic
264
+ - **Use `lai youtube`**: For quick one-off downloads and alignment with minimal overhead
265
+
266
+ #### Understanding --split_sentence
267
+
268
+ The `--split_sentence` option performs intelligent sentence re-splitting based on punctuation and semantic boundaries. This is especially useful when processing subtitles that combine multiple semantic units in a single segment, such as:
269
+
270
+ - **Mixed content**: Non-speech elements (e.g., `[APPLAUSE]`, `[MUSIC]`) followed by actual dialogue
271
+ - **Natural punctuation boundaries**: Colons, periods, and other punctuation marks that indicate semantic breaks
272
+ - **Concatenated phrases**: Multiple distinct utterances joined together without proper separation
273
+
274
+ **Example transformations**:
275
+ ```
276
+ Input: "[APPLAUSE] >> MIRA MURATI: Thank you all"
277
+ Output: ["[APPLAUSE]", ">> MIRA MURATI: Thank you all"]
278
+
279
+ Input: "[MUSIC] Welcome back. Today we discuss AI."
280
+ Output: ["[MUSIC]", "Welcome back.", "Today we discuss AI."]
281
+ ```
282
+
283
+ This feature helps improve alignment accuracy by:
284
+ 1. Respecting punctuation-based semantic boundaries
285
+ 2. Separating distinct utterances for more precise timing
286
+ 3. Maintaining semantic context for each independent phrase
287
+
288
+ **Usage**:
289
+ ```bash
290
+ lai align --split_sentence audio.wav subtitle.srt output.srt
291
+ ```
292
+
293
+ #### Understanding --word_level
294
+
295
+ The `--word_level` option enables word-level alignment, providing precise timing information for each individual word in the audio. When enabled, the output includes detailed word boundaries within each subtitle segment, allowing for fine-grained synchronization and analysis.
296
+
297
+ **Key features**:
298
+ - **Individual word timestamps**: Each word gets its own start and end time
299
+ - **Format-specific output**:
300
+ - **JSON (Recommended)**: Full alignment details stored in `alignment.word` field of each segment, preserving all word-level timing information in a structured format
301
+ - **TextGrid**: Separate "words" tier alongside the "utterances" tier for linguistic analysis
302
+ - **TXT**: Each word on a separate line with timestamp range: `[start-end] word`
303
+ - **Standard subtitle formats** (SRT, VTT, ASS, etc.): Each word becomes a separate subtitle event
304
+
305
+ > **💡 Recommended**: Use JSON format (`output.json`) to preserve complete word-level alignment data. Other formats may lose some structural information.
306
+
307
+ **Example output formats**:
308
+
309
+ **JSON format** (with word-level details):
310
+ ```json
311
+ [
312
+ {
313
+ "id": "6",
314
+ "recording_id": "",
315
+ "start": 24.52,
316
+ "duration": 9.1,
317
+ "channel": 0,
318
+ "text": "We will start with why it is so important to us to have a product that we can make truly available and broadly available to everyone.",
319
+ "custom": {
320
+ "score": 0.8754
321
+ },
322
+ "alignment": {
323
+ "word": [
324
+ [
325
+ "We",
326
+ 24.6,
327
+ 0.14,
328
+ 1.0
329
+ ],
330
+ [
331
+ "will",
332
+ 24.74,
333
+ 0.14,
334
+ 1.0
335
+ ],
336
+ [
337
+ "start",
338
+ 24.88,
339
+ 0.46,
340
+ 0.771
341
+ ],
342
+ [
343
+ "with",
344
+ 25.34,
345
+ 0.28,
346
+ 0.9538
347
+ ],
348
+ [
349
+ "why",
350
+ 26.2,
351
+ 0.36,
352
+ 1.0
353
+ ],
354
+ [
355
+ "it",
356
+ 26.56,
357
+ 0.14,
358
+ 0.9726
359
+ ],
360
+ [
361
+ "is",
362
+ 26.74,
363
+ 0.02,
364
+ 0.6245
365
+ ],
366
+ [
367
+ "so",
368
+ 26.76,
369
+ 0.16,
370
+ 0.6615
371
+ ],
372
+ [
373
+ "important",
374
+ 26.92,
375
+ 0.54,
376
+ 0.9257
377
+ ],
378
+ [
379
+ "to",
380
+ 27.5,
381
+ 0.1,
382
+ 1.0
383
+ ],
384
+ [
385
+ "us",
386
+ 27.6,
387
+ 0.34,
388
+ 0.7955
389
+ ],
390
+ [
391
+ "to",
392
+ 28.04,
393
+ 0.08,
394
+ 0.8545
395
+ ],
396
+ [
397
+ "have",
398
+ 28.16,
399
+ 0.46,
400
+ 0.9994
401
+ ],
402
+ [
403
+ "a",
404
+ 28.76,
405
+ 0.06,
406
+ 1.0
407
+ ],
408
+ [
409
+ "product",
410
+ 28.82,
411
+ 0.56,
412
+ 0.9975
413
+ ],
414
+ [
415
+ "that",
416
+ 29.38,
417
+ 0.08,
418
+ 0.5602
419
+ ],
420
+ [
421
+ "we",
422
+ 29.46,
423
+ 0.16,
424
+ 0.7017
425
+ ],
426
+ [
427
+ "can",
428
+ 29.62,
429
+ 0.22,
430
+ 1.0
431
+ ],
432
+ [
433
+ "make",
434
+ 29.84,
435
+ 0.32,
436
+ 0.9643
437
+ ],
438
+ [
439
+ "truly",
440
+ 30.42,
441
+ 0.32,
442
+ 0.6737
443
+ ],
444
+ [
445
+ "available",
446
+ 30.74,
447
+ 0.6,
448
+ 0.9349
449
+ ],
450
+ [
451
+ "and",
452
+ 31.4,
453
+ 0.2,
454
+ 0.4114
455
+ ],
456
+ [
457
+ "broadly",
458
+ 31.6,
459
+ 0.44,
460
+ 0.6726
461
+ ],
462
+ [
463
+ "available",
464
+ 32.04,
465
+ 0.58,
466
+ 0.9108
467
+ ],
468
+ [
469
+ "to",
470
+ 32.72,
471
+ 0.06,
472
+ 1.0
473
+ ],
474
+ [
475
+ "everyone.",
476
+ 32.78,
477
+ 0.64,
478
+ 0.7886
479
+ ]
480
+ ]
481
+ }
482
+ }
483
+ ]
484
+ ```
485
+
486
+ **TXT format** (word-level):
487
+ ```
488
+ [0.50-1.20] Hello
489
+ [1.20-2.30] world
490
+ ```
491
+
492
+ **TextGrid format** (Praat-compatible):
493
+ ```
494
+ Two tiers created:
495
+ - "utterances" tier: Full segments with original text
496
+ - "words" tier: Individual words with precise boundaries
497
+ ```
498
+
499
+ **Use cases**:
500
+ - **Linguistic analysis**: Study pronunciation patterns, speech timing, and prosody
501
+ - **Accessibility**: Create more granular captions for hearing-impaired users
502
+ - **Video/Audio editing**: Enable precise word-level subtitle synchronization
503
+ - **Karaoke applications**: Highlight individual words as they are spoken
504
+ - **Language learning**: Provide precise word boundaries for pronunciation practice
505
+
506
+ **Usage**:
507
+ ```bash
508
+ # Generate word-level aligned JSON
509
+ lai align --word_level audio.wav subtitle.srt output.json
510
+
511
+ # Create TextGrid file for Praat analysis
512
+ lai align --word_level audio.wav subtitle.srt output.TextGrid
513
+
514
+ # Word-level TXT output
515
+ lai align --word_level audio.wav subtitle.srt output.txt
516
+
517
+ # Standard subtitle with word-level events
518
+ lai align --word_level audio.wav subtitle.srt output.srt
519
+ ```
520
+
521
+ **Combined with --split_sentence**:
522
+ ```bash
523
+ # Optimal alignment: semantic splitting + word-level details
524
+ lai align --split_sentence --word_level audio.wav subtitle.srt output.json
525
+ ```
526
+
527
+ ### Python API
528
+
529
+ ```python
530
+ from lattifai import LattifAI
531
+
532
+ client = LattifAI() # api_key will be read from LATTIFAI_API_KEY if not provided
533
+ alignments, output_path = client.alignment(
534
+ audio="audio.wav",
535
+ subtitle="subtitle.srt",
536
+ output_subtitle_path="output.srt",
537
+ )
538
+ ```
539
+
540
+ Need to run inside an async application? Use the drop-in asynchronous client:
541
+
542
+ ```python
543
+ import asyncio
544
+ from lattifai import AsyncLattifAI
545
+
546
+
547
+ async def main():
548
+ async with AsyncLattifAI() as client:
549
+ alignments, output_path = await client.alignment(
550
+ audio="audio.wav",
551
+ subtitle="subtitle.srt",
552
+ split_sentence=False,
553
+ output_subtitle_path="output.srt",
554
+ )
555
+
556
+
557
+ asyncio.run(main())
558
+ ```
559
+
560
+ Both clients return a list of `Supervision` segments with timing information and, if provided, the path where the aligned subtitle was written.
561
+
562
+ ## Supported Formats
563
+
564
+ **Audio**: WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, AIFF
565
+
566
+ **Video**: MP4, MKV, MOV, WEBM, AVI
567
+
568
+ **Subtitle Input**: SRT, VTT, ASS, SSA, SUB, SBV, TXT (plain text), Gemini (Google Gemini transcript format)
569
+
570
+ **Subtitle Output**: All input formats plus TextGrid (Praat format for linguistic analysis)
571
+
572
+ ## API Reference
573
+
574
+ ### LattifAI (sync)
575
+
576
+ ```python
577
+ LattifAI(
578
+ api_key: Optional[str] = None,
579
+ model_name_or_path: str = 'Lattifai/Lattice-1-Alpha',
580
+ device: str = 'cpu', # 'cpu', 'cuda', or 'mps'
581
+ )
582
+ ```
583
+
584
+ ### AsyncLattifAI (async)
585
+
586
+ ```python
587
+ AsyncLattifAI(
588
+ api_key: Optional[str] = None,
589
+ model_name_or_path: str = 'Lattifai/Lattice-1-Alpha',
590
+ device: str = 'cpu',
591
+ )
592
+ ```
593
+
594
+ Use `async with AsyncLattifAI() as client:` or call `await client.close()` when you are done to release the underlying HTTP session.
595
+
596
+ ### alignment()
597
+
598
+ ```python
599
+ client.alignment(
600
+ audio: str, # Path to audio file
601
+ subtitle: str, # Path to subtitle/text file
602
+ format: Optional[str] = None, # Input format: 'srt', 'vtt', 'ass', 'txt', 'gemini', or 'auto' (auto-detect if None)
603
+ split_sentence: bool = False, # Smart sentence splitting based on punctuation semantics
604
+ return_details: bool = False, # Enable word-level alignment details
605
+ output_subtitle_path: Optional[str] = None
606
+ ) -> Tuple[List[Supervision], Optional[str]] # await client.alignment(...) for AsyncLattifAI
607
+ ```
608
+
609
+ **Parameters**:
610
+ - `audio`: Path to the audio file to be aligned
611
+ - `subtitle`: Path to the subtitle or text file
612
+ - `format`: Input subtitle format. Supported values: 'srt', 'vtt', 'ass', 'txt', 'gemini', 'auto'. When set to None or 'auto', the format is automatically detected from file extension. Additional formats (ssa, sub, sbv) are supported through automatic format detection
613
+ - `split_sentence`: Enable intelligent sentence re-splitting (default: False). Set to True when subtitles combine multiple semantic units (non-speech elements + dialogue, or multiple sentences) that would benefit from separate timing alignment
614
+ - `return_details`: Enable word-level alignment details (default: False). When True, each `Supervision` object includes an `alignment` field with word-level timestamps, accessible via `supervision.alignment['word']`. This provides precise timing for each individual word within the segment
615
+ - `output_subtitle_path`: Output path for aligned subtitle (optional)
616
+
617
+ **Returns**:
618
+ - A tuple containing:
619
+ - `alignments`: List of aligned `Supervision` objects with timing information
620
+ - `output_subtitle_path`: Path where the subtitle was written (if `output_subtitle_path` was provided)
621
+
622
+ ## Examples
623
+
624
+ ### Basic Text Alignment
625
+
626
+ ```python
627
+ from lattifai import LattifAI
628
+
629
+ client = LattifAI()
630
+ alignments, output_path = client.alignment(
631
+ audio="speech.wav",
632
+ subtitle="transcript.txt",
633
+ format="txt",
634
+ split_sentence=False,
635
+ output_subtitle_path="output.srt"
636
+ )
637
+ ```
638
+
639
+ ### Word-Level Alignment
640
+
641
+ ```python
642
+ from lattifai import LattifAI
643
+
644
+ client = LattifAI()
645
+ alignments, output_path = client.alignment(
646
+ audio="speech.wav",
647
+ subtitle="transcript.srt",
648
+ return_details=True, # Enable word-level alignment
649
+ output_subtitle_path="output.json" # JSON format preserves word-level data
650
+ )
651
+
652
+ # Access word-level timestamps
653
+ for segment in alignments:
654
+ print(f"Segment: {segment.text} ({segment.start:.2f}s - {segment.end:.2f}s)")
655
+ if segment.alignment and 'word' in segment.alignment:
656
+ for word in segment.alignment['word']:
657
+ print(f" Word: {word.symbol} ({word.start:.2f}s - {word.end:.2f}s)")
658
+ ```
659
+
660
+ ### Batch Processing
661
+
662
+ ```python
663
+ from pathlib import Path
664
+ from lattifai import LattifAI
665
+
666
+ client = LattifAI()
667
+ audio_dir = Path("audio_files")
668
+ subtitle_dir = Path("subtitles")
669
+ output_dir = Path("aligned")
670
+
671
+ for audio in audio_dir.glob("*.wav"):
672
+ subtitle = subtitle_dir / f"{audio.stem}.srt"
673
+ if subtitle.exists():
674
+ alignments, output_path = client.alignment(
675
+ audio=audio,
676
+ subtitle=subtitle,
677
+ output_subtitle_path=output_dir / f"{audio.stem}_aligned.srt"
678
+ )
679
+ ```
680
+
681
+ ### GPU Acceleration
682
+
683
+ ```python
684
+ from lattifai import LattifAI
685
+
686
+ # NVIDIA GPU
687
+ client = LattifAI(device='cuda')
688
+
689
+ # Apple Silicon
690
+ client = LattifAI(device='mps')
691
+
692
+ # CLI
693
+ lai align --device mps audio.wav subtitle.srt output.srt
694
+ ```
695
+
696
+ ### YouTube Processing with Agent Workflow
697
+
698
+ ```python
699
+ import asyncio
700
+ from lattifai.workflows import YouTubeSubtitleAgent
701
+
702
+ async def process_youtube():
703
+ # Initialize agent with configuration
704
+ agent = YouTubeSubtitleAgent(
705
+ gemini_api_key="your-gemini-api-key",
706
+ video_format="mp4", # or "mp3", "wav", etc.
707
+ output_format="srt",
708
+ max_retries=2,
709
+ split_sentence=True,
710
+ word_level=True,
711
+ force_overwrite=False
712
+ )
713
+
714
+ # Process YouTube URL
715
+ result = await agent.process_youtube_url(
716
+ url="https://www.youtube.com/watch?v=VIDEO_ID",
717
+ output_dir="./output",
718
+ output_format="srt"
719
+ )
720
+
721
+ # Access results
722
+ print(f"Title: {result['metadata']['title']}")
723
+ print(f"Duration: {result['metadata']['duration']} seconds")
724
+ print(f"Subtitle count: {result['subtitle_count']}")
725
+
726
+ # Access generated files
727
+ for format_name, file_path in result['exported_files'].items():
728
+ print(f"{format_name.upper()}: {file_path}")
729
+
730
+ # Run the async workflow
731
+ asyncio.run(process_youtube())
732
+ ```
733
+
734
+ ## Configuration
735
+
736
+ ### API Key Setup
737
+
738
+ First, create your API key at [https://lattifai.com/dashboard/api-keys](https://lattifai.com/dashboard/api-keys)
739
+
740
+ **Recommended: Using .env file**
741
+
742
+ Create a `.env` file in your project root:
743
+ ```bash
744
+ LATTIFAI_API_KEY=your-api-key
745
+ ```
746
+
747
+ The library automatically loads the `.env` file (python-dotenv is included as a dependency).
748
+
749
+ **Alternative: Environment variable**
750
+ ```bash
751
+ export LATTIFAI_API_KEY="your-api-key"
752
+ ```
753
+
754
+ ## Model Information
755
+
756
+ **[Lattice-1-Alpha](https://huggingface.co/Lattifai/Lattice-1-Alpha)** features:
757
+ - State-of-the-art alignment precision
758
+ - **Language Support**: Currently supports English only. The upcoming **Lattice-1** release will support English, Chinese, and mixed English-Chinese content.
759
+ - Handles noisy audio and imperfect transcripts
760
+ - Optimized for CPU and GPU (CUDA/MPS)
761
+
762
+ **Requirements**:
763
+ - Python 3.9+
764
+ - 4GB RAM recommended
765
+ - ~2GB storage for model files
766
+
767
+ ## Development
768
+
769
+ ### Setup
770
+
771
+ ```bash
772
+ git clone https://github.com/lattifai/lattifai-python.git
773
+ cd lattifai-python
774
+ pip install -e ".[test]"
775
+ ./scripts/install-hooks.sh # Optional: install pre-commit hooks
776
+ ```
777
+
778
+ ### Testing
779
+
780
+ ```bash
781
+ pytest # Run all tests
782
+ pytest --cov=src # With coverage
783
+ pytest tests/test_basic.py # Specific test
784
+ ```
785
+
786
+ ### Code Quality
787
+
788
+ ```bash
789
+ ruff check src/ tests/ # Lint
790
+ ruff format src/ tests/ # Format
791
+ isort src/ tests/ # Sort imports
792
+ ```
793
+
794
+ ## Contributing
795
+
796
+ 1. Fork the repository
797
+ 2. Create a feature branch
798
+ 3. Make changes and add tests
799
+ 4. Run `pytest` and `ruff check`
800
+ 5. Submit a pull request
801
+
802
+ ## License
803
+
804
+ Apache License 2.0
805
+
806
+ ## Support
807
+
808
+ - **Issues**: [GitHub Issues](https://github.com/lattifai/lattifai-python/issues)
809
+ - **Discussions**: [GitHub Discussions](https://github.com/lattifai/lattifai-python/discussions)
810
+ - **Discord**: [Join our community](https://discord.gg/kvF4WsBRK8)