lattifai 1.2.0__py3-none-any.whl → 1.2.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. lattifai/__init__.py +0 -24
  2. lattifai/alignment/__init__.py +10 -1
  3. lattifai/alignment/lattice1_aligner.py +66 -58
  4. lattifai/alignment/lattice1_worker.py +1 -6
  5. lattifai/alignment/punctuation.py +38 -0
  6. lattifai/alignment/segmenter.py +1 -1
  7. lattifai/alignment/sentence_splitter.py +350 -0
  8. lattifai/alignment/text_align.py +440 -0
  9. lattifai/alignment/tokenizer.py +91 -220
  10. lattifai/caption/__init__.py +82 -6
  11. lattifai/caption/caption.py +335 -1143
  12. lattifai/caption/formats/__init__.py +199 -0
  13. lattifai/caption/formats/base.py +211 -0
  14. lattifai/caption/formats/gemini.py +722 -0
  15. lattifai/caption/formats/json.py +194 -0
  16. lattifai/caption/formats/lrc.py +309 -0
  17. lattifai/caption/formats/nle/__init__.py +9 -0
  18. lattifai/caption/formats/nle/audition.py +561 -0
  19. lattifai/caption/formats/nle/avid.py +423 -0
  20. lattifai/caption/formats/nle/fcpxml.py +549 -0
  21. lattifai/caption/formats/nle/premiere.py +589 -0
  22. lattifai/caption/formats/pysubs2.py +642 -0
  23. lattifai/caption/formats/sbv.py +147 -0
  24. lattifai/caption/formats/tabular.py +338 -0
  25. lattifai/caption/formats/textgrid.py +193 -0
  26. lattifai/caption/formats/ttml.py +652 -0
  27. lattifai/caption/formats/vtt.py +469 -0
  28. lattifai/caption/parsers/__init__.py +9 -0
  29. lattifai/caption/{text_parser.py → parsers/text_parser.py} +4 -2
  30. lattifai/caption/standardize.py +636 -0
  31. lattifai/caption/utils.py +474 -0
  32. lattifai/cli/__init__.py +2 -1
  33. lattifai/cli/caption.py +108 -1
  34. lattifai/cli/transcribe.py +4 -9
  35. lattifai/cli/youtube.py +4 -1
  36. lattifai/client.py +48 -84
  37. lattifai/config/__init__.py +11 -1
  38. lattifai/config/alignment.py +9 -2
  39. lattifai/config/caption.py +267 -23
  40. lattifai/config/media.py +20 -0
  41. lattifai/diarization/__init__.py +41 -1
  42. lattifai/mixin.py +36 -18
  43. lattifai/transcription/base.py +6 -1
  44. lattifai/transcription/lattifai.py +19 -54
  45. lattifai/utils.py +81 -13
  46. lattifai/workflow/__init__.py +28 -4
  47. lattifai/workflow/file_manager.py +2 -5
  48. lattifai/youtube/__init__.py +43 -0
  49. lattifai/youtube/client.py +1170 -0
  50. lattifai/youtube/types.py +23 -0
  51. lattifai-1.2.2.dist-info/METADATA +615 -0
  52. lattifai-1.2.2.dist-info/RECORD +76 -0
  53. {lattifai-1.2.0.dist-info → lattifai-1.2.2.dist-info}/entry_points.txt +1 -2
  54. lattifai/caption/gemini_reader.py +0 -371
  55. lattifai/caption/gemini_writer.py +0 -173
  56. lattifai/cli/app_installer.py +0 -142
  57. lattifai/cli/server.py +0 -44
  58. lattifai/server/app.py +0 -427
  59. lattifai/workflow/youtube.py +0 -577
  60. lattifai-1.2.0.dist-info/METADATA +0 -1133
  61. lattifai-1.2.0.dist-info/RECORD +0 -57
  62. {lattifai-1.2.0.dist-info → lattifai-1.2.2.dist-info}/WHEEL +0 -0
  63. {lattifai-1.2.0.dist-info → lattifai-1.2.2.dist-info}/licenses/LICENSE +0 -0
  64. {lattifai-1.2.0.dist-info → lattifai-1.2.2.dist-info}/top_level.txt +0 -0
@@ -1,1133 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: lattifai
3
- Version: 1.2.0
4
- Summary: Lattifai Python SDK: Seamless Integration with Lattifai's Speech and Video AI Services
5
- Author-email: Lattifai Technologies <tech@lattifai.com>
6
- Maintainer-email: Lattice <tech@lattifai.com>
7
- License: MIT License
8
-
9
- Copyright (c) 2025 LattifAI.
10
-
11
- Permission is hereby granted, free of charge, to any person obtaining a copy
12
- of this software and associated documentation files (the "Software"), to deal
13
- in the Software without restriction, including without limitation the rights
14
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
- copies of the Software, and to permit persons to whom the Software is
16
- furnished to do so, subject to the following conditions:
17
-
18
- The above copyright notice and this permission notice shall be included in all
19
- copies or substantial portions of the Software.
20
-
21
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
- SOFTWARE.
28
-
29
- Project-URL: Homepage, https://github.com/lattifai/lattifai-python
30
- Project-URL: Documentation, https://github.com/lattifai/lattifai-python/blob/main/README.md
31
- Project-URL: Bug Tracker, https://github.com/lattifai/lattifai-python/issues
32
- Project-URL: Discussions, https://github.com/lattifai/lattifai-python/discussions
33
- Project-URL: Changelog, https://github.com/lattifai/lattifai-python/blob/main/CHANGELOG.md
34
- Keywords: lattifai,speech recognition,video analysis,ai,sdk,api client
35
- Classifier: Development Status :: 5 - Production/Stable
36
- Classifier: Intended Audience :: Developers
37
- Classifier: Intended Audience :: Science/Research
38
- Classifier: License :: OSI Approved :: Apache Software License
39
- Classifier: Programming Language :: Python :: 3.10
40
- Classifier: Programming Language :: Python :: 3.11
41
- Classifier: Programming Language :: Python :: 3.12
42
- Classifier: Programming Language :: Python :: 3.13
43
- Classifier: Programming Language :: Python :: 3.14
44
- Classifier: Operating System :: MacOS :: MacOS X
45
- Classifier: Operating System :: POSIX :: Linux
46
- Classifier: Operating System :: Microsoft :: Windows
47
- Classifier: Topic :: Multimedia :: Sound/Audio
48
- Classifier: Topic :: Multimedia :: Video
49
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
50
- Requires-Python: <3.15,>=3.10
51
- Description-Content-Type: text/markdown
52
- License-File: LICENSE
53
- Requires-Dist: k2py>=0.2.1
54
- Requires-Dist: lattifai-core>=0.6.0
55
- Requires-Dist: lattifai-run>=1.0.1
56
- Requires-Dist: python-dotenv
57
- Requires-Dist: lhotse>=1.26.0
58
- Requires-Dist: colorful>=0.5.6
59
- Requires-Dist: pysubs2
60
- Requires-Dist: praatio
61
- Requires-Dist: tgt
62
- Requires-Dist: onnx>=1.16.0
63
- Requires-Dist: onnxruntime
64
- Requires-Dist: msgpack
65
- Requires-Dist: scipy!=1.16.3
66
- Requires-Dist: g2p-phonemizer>=0.4.0
67
- Requires-Dist: av
68
- Requires-Dist: wtpsplit>=2.1.7
69
- Requires-Dist: modelscope==1.33.0
70
- Requires-Dist: OmniSenseVoice>=0.4.2
71
- Requires-Dist: nemo_toolkit_asr[asr]>=2.7.0rc4
72
- Requires-Dist: pyannote-audio-notorchdeps>=4.0.2
73
- Requires-Dist: questionary>=2.0
74
- Requires-Dist: yt-dlp
75
- Requires-Dist: pycryptodome
76
- Requires-Dist: google-genai>=1.22.0
77
- Requires-Dist: fastapi>=0.111.0
78
- Requires-Dist: uvicorn>=0.30.0
79
- Requires-Dist: python-multipart>=0.0.9
80
- Requires-Dist: jinja2>=3.1.4
81
- Provides-Extra: numpy
82
- Requires-Dist: numpy; extra == "numpy"
83
- Provides-Extra: diarization
84
- Requires-Dist: torch-audiomentations==0.12.0; extra == "diarization"
85
- Requires-Dist: pyannote.audio>=4.0.2; extra == "diarization"
86
- Provides-Extra: transcription
87
- Requires-Dist: OmniSenseVoice>=0.4.0; extra == "transcription"
88
- Requires-Dist: nemo_toolkit_asr[asr]>=2.7.0rc3; extra == "transcription"
89
- Provides-Extra: test
90
- Requires-Dist: pytest; extra == "test"
91
- Requires-Dist: pytest-cov; extra == "test"
92
- Requires-Dist: pytest-asyncio; extra == "test"
93
- Requires-Dist: numpy; extra == "test"
94
- Provides-Extra: all
95
- Requires-Dist: numpy; extra == "all"
96
- Requires-Dist: pytest; extra == "all"
97
- Requires-Dist: pytest-cov; extra == "all"
98
- Requires-Dist: pytest-asyncio; extra == "all"
99
- Requires-Dist: pyannote.audio>=4.0.2; extra == "all"
100
- Dynamic: license-file
101
-
102
- <div align="center">
103
- <img src="https://raw.githubusercontent.com/lattifai/lattifai-python/main/assets/logo.png" width=256>
104
-
105
- [![PyPI version](https://badge.fury.io/py/lattifai.svg)](https://badge.fury.io/py/lattifai)
106
- [![Python Versions](https://img.shields.io/pypi/pyversions/lattifai.svg)](https://pypi.org/project/lattifai)
107
- [![PyPI Status](https://pepy.tech/badge/lattifai)](https://pepy.tech/project/lattifai)
108
- </div>
109
-
110
- <p align="center">
111
- 🌐 <a href="https://lattifai.com"><b>Official Website</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/lattifai/lattifai-python">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/Lattifai/Lattice-1">Model</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://lattifai.com/blogs">Blog</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://discord.gg/kvF4WsBRK8"><img src="https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white" alt="Discord" style="vertical-align: middle;"></a>
112
- </p>
113
-
114
-
115
- # LattifAI: Precision Alignment, Infinite Possibilities
116
-
117
- Advanced forced alignment and subtitle generation powered by [ 🤗 Lattice-1](https://huggingface.co/Lattifai/Lattice-1) model.
118
-
119
- ## Table of Contents
120
-
121
- - [Core Capabilities](#core-capabilities)
122
- - [Installation](#installation)
123
- - [Quick Start](#quick-start)
124
- - [Command Line Interface](#command-line-interface)
125
- - [Python SDK (5 Lines of Code)](#python-sdk-5-lines-of-code)
126
- - [Web Interface](#web-interface)
127
- - [CLI Reference](#cli-reference)
128
- - [lai alignment align](#lai-alignment-align)
129
- - [lai alignment youtube](#lai-alignment-youtube)
130
- - [lai transcribe run](#lai-transcribe-run)
131
- - [lai caption convert](#lai-caption-convert)
132
- - [lai caption shift](#lai-caption-shift)
133
- - [Python SDK Reference](#python-sdk-reference)
134
- - [Basic Alignment](#basic-alignment)
135
- - [YouTube Processing](#youtube-processing)
136
- - [Configuration Objects](#configuration-objects)
137
- - [Advanced Features](#advanced-features)
138
- - [Audio Preprocessing](#audio-preprocessing)
139
- - [Long-Form Audio Support](#long-form-audio-support)
140
- - [Word-Level Alignment](#word-level-alignment)
141
- - [Smart Sentence Splitting](#smart-sentence-splitting)
142
- - [Speaker Diarization](#speaker-diarization)
143
- - [YAML Configuration Files](#yaml-configuration-files)
144
- - [Architecture Overview](#architecture-overview)
145
- - [Performance & Optimization](#performance--optimization)
146
- - [Supported Formats](#supported-formats)
147
- - [Supported Languages](#supported-languages)
148
- - [Roadmap](#roadmap)
149
- - [Development](#development)
150
-
151
- ---
152
-
153
- ## Core Capabilities
154
-
155
- LattifAI provides comprehensive audio-text alignment powered by the Lattice-1 model:
156
-
157
- | Feature | Description | Status |
158
- |---------|-------------|--------|
159
- | **Forced Alignment** | Precise word-level and segment-level synchronization with audio | ✅ Production |
160
- | **Multi-Model Transcription** | Gemini (100+ languages), Parakeet (24 languages), SenseVoice (5 languages) | ✅ Production |
161
- | **Speaker Diarization** | Automatic multi-speaker identification with label preservation | ✅ Production |
162
- | **Audio Preprocessing** | Multi-channel selection, device optimization (CPU/CUDA/MPS) | ✅ Production |
163
- | **Streaming Mode** | Process audio up to 20 hours with minimal memory footprint | ✅ Production |
164
- | **Smart Text Processing** | Intelligent sentence splitting and non-speech element separation | ✅ Production |
165
- | **Universal Format Support** | 30+ caption/subtitle formats with text normalization | ✅ Production |
166
- | **Configuration System** | YAML-based configs for reproducible workflows | ✅ Production |
167
-
168
- **Key Highlights:**
169
- - 🎯 **Accuracy**: State-of-the-art alignment precision with Lattice-1 model
170
- - 🌍 **Multilingual**: Support for 100+ languages via multiple transcription models
171
- - 🚀 **Performance**: Hardware-accelerated processing with streaming support
172
- - 🔧 **Flexible**: CLI, Python SDK, and Web UI interfaces
173
- - 📦 **Production-Ready**: Battle-tested on diverse audio/video content
174
-
175
- ---
176
-
177
- ## Installation
178
-
179
- ### Step 1: Install SDK
180
-
181
- **Using pip:**
182
- ```bash
183
-
184
- pip install lattifai
185
- ```
186
-
187
- **Using uv (Recommended - 10-100x faster):**
188
- ```bash
189
- # Install uv if you haven't already
190
- curl -LsSf https://astral.sh/uv/install.sh | sh
191
-
192
- # Create a new project with uv
193
- uv init my-project
194
- cd my-project
195
- source .venv/bin/activate
196
-
197
- # Install LattifAI
198
- uv pip install lattifai
199
- ```
200
-
201
-
202
-
203
- ### Step 2: Get Your API Key
204
-
205
- **LattifAI API Key (Required)**
206
-
207
- Get your **free API key** at [https://lattifai.com/dashboard/api-keys](https://lattifai.com/dashboard/api-keys)
208
-
209
- **Option A: Environment variable (recommended)**
210
- ```bash
211
- export LATTIFAI_API_KEY="lf_your_api_key_here"
212
- ```
213
-
214
- **Option B: `.env` file**
215
- ```bash
216
- # .env
217
- LATTIFAI_API_KEY=lf_your_api_key_here
218
- ```
219
-
220
- **Gemini API Key (Optional - for transcription)**
221
-
222
- If you want to use Gemini models for transcription (e.g., `gemini-2.5-pro`), get your **free Gemini API key** at [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey)
223
-
224
- ```bash
225
- # Add to environment variable
226
- export GEMINI_API_KEY="your_gemini_api_key_here"
227
-
228
- # Or add to .env file
229
- GEMINI_API_KEY=your_gemini_api_key_here # AIzaSyxxxx
230
- ```
231
-
232
- > **Note**: Gemini API key is only required if you use Gemini models for transcription. It's not needed for alignment or when using other transcription models.
233
-
234
- ---
235
-
236
- ## Quick Start
237
-
238
- ### Command Line Interface
239
-
240
- ![CLI Demo](assets/cli.png)
241
-
242
- ```bash
243
- # Align local audio with subtitle
244
- lai alignment align audio.wav subtitle.srt output.srt
245
-
246
- # Download and align YouTube video
247
- lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID"
248
- ```
249
-
250
- ### Python SDK (5 Lines of Code)
251
-
252
- ```python
253
- from lattifai import LattifAI
254
-
255
- client = LattifAI()
256
- caption = client.alignment(
257
- input_media="audio.wav",
258
- input_caption="subtitle.srt",
259
- output_caption_path="aligned.srt",
260
- )
261
- ```
262
-
263
- That's it! Your aligned subtitles are saved to `aligned.srt`.
264
-
265
- ### 🚧 Web Interface
266
-
267
- ![web Demo](assets/web.png)
268
-
269
- 1. **Install the web application (one-time setup):**
270
- ```bash
271
- lai-app-install
272
- ```
273
-
274
- This command will:
275
- - Check if Node.js/npm is installed (and install if needed)
276
- - Install frontend dependencies
277
- - Build the application
278
- - Setup the `lai-app` command globally
279
-
280
- 2. **Start the backend server:**
281
- ```bash
282
- lai-server
283
-
284
- # Custom port (default: 8001)
285
- lai-server --port 9000
286
-
287
- # Custom host
288
- lai-server --host 127.0.0.1 --port 9000
289
-
290
- # Production mode (disable auto-reload)
291
- lai-server --no-reload
292
- ```
293
-
294
- **Backend Server Options:**
295
- - `-p, --port` - Server port (default: 8001)
296
- - `--host` - Host address (default: 0.0.0.0)
297
- - `--no-reload` - Disable auto-reload for production
298
- - `-h, --help` - Show help message
299
-
300
- 3. **Start the frontend application:**
301
- ```bash
302
- lai-app
303
-
304
- # Custom port (default: 5173)
305
- lai-app --port 8080
306
-
307
- # Custom backend URL
308
- lai-app --backend http://localhost:9000
309
-
310
- # Don't auto-open browser
311
- lai-app --no-open
312
- ```
313
-
314
- **Frontend Application Options:**
315
- - `-p, --port` - Frontend server port (default: 5173)
316
- - `--backend` - Backend API URL (default: http://localhost:8001)
317
- - `--no-open` - Don't automatically open browser
318
- - `-h, --help` - Show help message
319
-
320
- The web interface will automatically open in your browser at `http://localhost:5173`.
321
-
322
- **Features:**
323
- - ✅ **Drag-and-Drop Upload**: Visual file upload for audio/video and captions
324
- - ✅ **Real-Time Progress**: Live alignment progress with detailed status
325
- - ✅ **Multiple Transcription Models**: Gemini, Parakeet, SenseVoice selection
326
-
327
- ---
328
-
329
- ## CLI Reference
330
-
331
- ### Command Overview
332
-
333
- | Command | Description |
334
- |---------|-------------|
335
- | `lai alignment align` | Align local audio/video with caption |
336
- | `lai alignment youtube` | Download & align YouTube content |
337
- | `lai transcribe run` | Transcribe audio/video or YouTube URL to caption |
338
- | `lai transcribe align` | Transcribe audio/video and align with generated transcript |
339
- | `lai caption convert` | Convert between caption formats |
340
- | `lai caption normalize` | Clean and normalize caption text |
341
- | `lai caption shift` | Shift caption timestamps |
342
-
343
-
344
- ### lai alignment align
345
-
346
- ```bash
347
- # Basic usage
348
- lai alignment align <audio> <caption> <output>
349
-
350
- # Examples
351
- lai alignment align audio.wav caption.srt output.srt
352
- lai alignment align video.mp4 caption.vtt output.srt alignment.device=cuda
353
- lai alignment align audio.wav caption.srt output.json \
354
- caption.split_sentence=true \
355
- caption.word_level=true
356
- ```
357
-
358
- ### lai alignment youtube
359
-
360
- ```bash
361
- # Basic usage
362
- lai alignment youtube <url>
363
-
364
- # Examples
365
- lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID"
366
- lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID" \
367
- media.output_dir=~/Downloads \
368
- caption.output_path=aligned.srt \
369
- caption.split_sentence=true
370
- ```
371
-
372
- ### lai transcribe run
373
-
374
- Perform automatic speech recognition (ASR) on audio/video files or YouTube URLs to generate timestamped transcriptions.
375
-
376
- ```bash
377
- # Basic usage - local file
378
- lai transcribe run <input> <output>
379
-
380
- # Basic usage - YouTube URL
381
- lai transcribe run <url> <output_dir>
382
-
383
- # Examples - Local files
384
- lai transcribe run audio.wav output.srt
385
- lai transcribe run audio.mp4 output.ass \
386
- transcription.model_name=nvidia/parakeet-tdt-0.6b-v3
387
-
388
- # Examples - YouTube URLs
389
- lai transcribe run "https://youtube.com/watch?v=VIDEO_ID" output_dir=./output
390
- lai transcribe run "https://youtube.com/watch?v=VIDEO_ID" output.ass output_dir=./output \
391
- transcription.model_name=gemini-2.5-pro \
392
- transcription.gemini_api_key=YOUR_GEMINI_API_KEY
393
-
394
- # Full configuration with keyword arguments
395
- lai transcribe run \
396
- input=audio.wav \
397
- output_caption=output.srt \
398
- channel_selector=average \
399
- transcription.device=cuda \
400
- transcription.model_name=iic/SenseVoiceSmall
401
- ```
402
-
403
- **Parameters:**
404
- - `input`: Path to audio/video file or YouTube URL (required)
405
- - `output_caption`: Path for output caption file (for local files)
406
- - `output_dir`: Directory for output files (for YouTube URLs, defaults to current directory)
407
- - `media_format`: Media format for YouTube downloads (default: mp3)
408
- - `channel_selector`: Audio channel selection - "average", "left", "right", or channel index (default: "average")
409
- - Note: Ignored when transcribing YouTube URLs with Gemini models
410
- - `transcription`: Transcription configuration (model_name, device, language, gemini_api_key)
411
-
412
- **Supported Transcription Models (More Coming Soon):**
413
- - `gemini-2.5-pro` - Google Gemini API (requires API key)
414
- - Languages: 100+ languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more
415
- - `gemini-3-pro-preview` - Google Gemini API (requires API key)
416
- - Languages: 100+ languages (same as gemini-2.5-pro)
417
- - `nvidia/parakeet-tdt-0.6b-v3` - NVIDIA Parakeet model
418
- - Languages: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)
419
- - `iic/SenseVoiceSmall` - Alibaba SenseVoice model
420
- - Languages: Chinese/Mandarin (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
421
- - More models will be integrated in future releases
422
-
423
- **Note:** For transcription with alignment on local files, use `lai transcribe align` instead.
424
-
425
- ### lai transcribe align
426
-
427
- Transcribe audio/video file and automatically align the generated transcript with the audio.
428
-
429
- This command combines transcription and alignment in a single step, producing precisely aligned captions.
430
-
431
- ```bash
432
- # Basic usage
433
- lai transcribe align <input_media> <output_caption>
434
-
435
- # Examples
436
- lai transcribe align audio.wav output.srt
437
- lai transcribe align audio.mp4 output.ass \
438
- transcription.model_name=nvidia/parakeet-tdt-0.6b-v3 \
439
- alignment.device=cuda
440
-
441
- # Using Gemini transcription with alignment
442
- lai transcribe align audio.wav output.srt \
443
- transcription.model_name=gemini-2.5-pro \
444
- transcription.gemini_api_key=YOUR_KEY \
445
- caption.split_sentence=true
446
-
447
- # Full configuration
448
- lai transcribe align \
449
- input_media=audio.wav \
450
- output_caption=output.srt \
451
- transcription.device=mps \
452
- transcription.model_name=iic/SenseVoiceSmall \
453
- alignment.device=cuda \
454
- caption.word_level=true
455
- ```
456
-
457
- **Parameters:**
458
- - `input_media`: Path to input audio/video file (required)
459
- - `output_caption`: Path for output aligned caption file (required)
460
- - `transcription`: Transcription configuration (model_name, device, language, gemini_api_key)
461
- - `alignment`: Alignment configuration (model_name, device)
462
- - `caption`: Caption formatting options (split_sentence, word_level, etc.)
463
-
464
-
465
- ### lai caption convert
466
-
467
- ```bash
468
- lai caption convert input.srt output.vtt
469
- lai caption convert input.srt output.json
470
- # Enable normalization to clean HTML entities and special characters:
471
- lai caption convert input.srt output.json normalize_text=true
472
- ```
473
-
474
- ### lai caption shift
475
-
476
- ```bash
477
- lai caption shift input.srt output.srt 2.0 # Delay by 2 seconds
478
- lai caption shift input.srt output.srt -1.5 # Advance by 1.5 seconds
479
- ```
480
-
481
- ---
482
-
483
- ## Python SDK Reference
484
-
485
- ### Basic Alignment
486
-
487
- ```python
488
- from lattifai import LattifAI
489
-
490
- # Initialize client (uses LATTIFAI_API_KEY from environment)
491
- client = LattifAI()
492
-
493
- # Align audio/video with subtitle
494
- caption = client.alignment(
495
- input_media="audio.wav", # Audio or video file
496
- input_caption="subtitle.srt", # Input subtitle file
497
- output_caption_path="output.srt", # Output aligned subtitle
498
- split_sentence=True, # Enable smart sentence splitting
499
- )
500
-
501
- # Access alignment results
502
- for segment in caption.supervisions:
503
- print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
504
- ```
505
-
506
- ### YouTube Processing
507
-
508
- ```python
509
- from lattifai import LattifAI
510
-
511
- client = LattifAI()
512
-
513
- # Download YouTube video and align with auto-downloaded subtitles
514
- caption = client.youtube(
515
- url="https://youtube.com/watch?v=VIDEO_ID",
516
- output_dir="./downloads",
517
- output_caption_path="aligned.srt",
518
- split_sentence=True,
519
- )
520
- ```
521
-
522
-
523
- ### Configuration Objects
524
-
525
- LattifAI uses a config-driven architecture for fine-grained control:
526
-
527
- #### ClientConfig - API Settings
528
-
529
- ```python
530
- from lattifai import LattifAI, ClientConfig
531
-
532
- client = LattifAI(
533
- client_config=ClientConfig(
534
- api_key="lf_your_api_key", # Or use LATTIFAI_API_KEY env var
535
- timeout=30.0,
536
- max_retries=3,
537
- )
538
- )
539
- ```
540
-
541
- #### AlignmentConfig - Model Settings
542
-
543
- ```python
544
- from lattifai import LattifAI, AlignmentConfig
545
-
546
- client = LattifAI(
547
- alignment_config=AlignmentConfig(
548
- model_name="Lattifai/Lattice-1",
549
- device="cuda", # "cpu", "cuda", "cuda:0", "mps"
550
- )
551
- )
552
- ```
553
-
554
- #### CaptionConfig - Subtitle Settings
555
-
556
- ```python
557
- from lattifai import LattifAI, CaptionConfig
558
-
559
- client = LattifAI(
560
- caption_config=CaptionConfig(
561
- split_sentence=True, # Smart sentence splitting (default: False)
562
- word_level=True, # Word-level timestamps (default: False)
563
- normalize_text=True, # Clean HTML entities (default: True)
564
- include_speaker_in_text=False, # Include speaker labels (default: True)
565
- )
566
- )
567
- ```
568
-
569
- #### Complete Configuration Example
570
-
571
- ```python
572
- from lattifai import (
573
- LattifAI,
574
- ClientConfig,
575
- AlignmentConfig,
576
- CaptionConfig
577
- )
578
-
579
- client = LattifAI(
580
- client_config=ClientConfig(
581
- api_key="lf_your_api_key",
582
- timeout=60.0,
583
- ),
584
- alignment_config=AlignmentConfig(
585
- model_name="Lattifai/Lattice-1",
586
- device="cuda",
587
- ),
588
- caption_config=CaptionConfig(
589
- split_sentence=True,
590
- word_level=True,
591
- output_format="json",
592
- ),
593
- )
594
-
595
- caption = client.alignment(
596
- input_media="audio.wav",
597
- input_caption="subtitle.srt",
598
- output_caption_path="output.json",
599
- )
600
- ```
601
-
602
- ### Available Exports
603
-
604
- ```python
605
- from lattifai import (
606
- # Client classes
607
- LattifAI,
608
- # AsyncLattifAI, # For async support
609
-
610
- # Config classes
611
- ClientConfig,
612
- AlignmentConfig,
613
- CaptionConfig,
614
- DiarizationConfig,
615
- MediaConfig,
616
-
617
- # I/O classes
618
- Caption,
619
- )
620
- ```
621
-
622
- ---
623
-
624
- ## Advanced Features
625
-
626
- ### Audio Preprocessing
627
-
628
- LattifAI provides powerful audio preprocessing capabilities for optimal alignment:
629
-
630
- **Channel Selection**
631
-
632
- Control which audio channel to process for stereo/multi-channel files:
633
-
634
- ```python
635
- from lattifai import LattifAI
636
-
637
- client = LattifAI()
638
-
639
- # Use left channel only
640
- caption = client.alignment(
641
- input_media="stereo.wav",
642
- input_caption="subtitle.srt",
643
- channel_selector="left", # Options: "left", "right", "average", or channel index (0, 1, 2, ...)
644
- )
645
-
646
- # Average all channels (default)
647
- caption = client.alignment(
648
- input_media="stereo.wav",
649
- input_caption="subtitle.srt",
650
- channel_selector="average",
651
- )
652
- ```
653
-
654
- **CLI Usage:**
655
- ```bash
656
- # Use right channel
657
- lai alignment align audio.wav subtitle.srt output.srt \
658
- media.channel_selector=right
659
-
660
- # Use specific channel index
661
- lai alignment align audio.wav subtitle.srt output.srt \
662
- media.channel_selector=1
663
- ```
664
-
665
- **Device Management**
666
-
667
- Optimize processing for your hardware:
668
-
669
- ```python
670
- from lattifai import LattifAI, AlignmentConfig
671
-
672
- # Use CUDA GPU
673
- client = LattifAI(
674
- alignment_config=AlignmentConfig(device="cuda")
675
- )
676
-
677
- # Use specific GPU
678
- client = LattifAI(
679
- alignment_config=AlignmentConfig(device="cuda:0")
680
- )
681
-
682
- # Use Apple Silicon MPS
683
- client = LattifAI(
684
- alignment_config=AlignmentConfig(device="mps")
685
- )
686
-
687
- # Use CPU
688
- client = LattifAI(
689
- alignment_config=AlignmentConfig(device="cpu")
690
- )
691
- ```
692
-
693
- **Supported Formats**
694
- - **Audio**: WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, AIFF, and more
695
- - **Video**: MP4, MKV, MOV, WEBM, AVI, and more
696
- - All formats supported by FFmpeg are compatible
697
-
698
- ### Long-Form Audio Support
699
-
700
- LattifAI now supports processing long audio files (up to 20 hours) through streaming mode. Enable streaming by setting the `streaming_chunk_secs` parameter:
701
-
702
- **Python SDK:**
703
- ```python
704
- from lattifai import LattifAI
705
-
706
- client = LattifAI()
707
-
708
- # Enable streaming for long audio files
709
- caption = client.alignment(
710
- input_media="long_audio.wav",
711
- input_caption="subtitle.srt",
712
- output_caption_path="output.srt",
713
- streaming_chunk_secs=600.0, # Process in 30-second chunks
714
- )
715
- ```
716
-
717
- **CLI:**
718
- ```bash
719
- # Enable streaming with chunk size
720
- lai alignment align long_audio.wav subtitle.srt output.srt \
721
- media.streaming_chunk_secs=300.0
722
-
723
- # For YouTube videos
724
- lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID" \
725
- media.streaming_chunk_secs=300.0
726
- ```
727
-
728
- **MediaConfig:**
729
- ```python
730
- from lattifai import LattifAI, MediaConfig
731
-
732
- client = LattifAI(
733
- media_config=MediaConfig(
734
- streaming_chunk_secs=600.0, # Chunk duration in seconds (1-1800), default: 600 (10 minutes)
735
- )
736
- )
737
- ```
738
-
739
- **Technical Details:**
740
-
741
- | Parameter | Description | Recommendation |
742
- |-----------|-------------|----------------|
743
- | **Default Value** | 600 seconds (10 minutes) | Good for most use cases |
744
- | **Memory Impact** | Lower chunks = less RAM usage | Adjust based on available RAM |
745
- | **Accuracy Impact** | Virtually zero degradation | Our precise implementation preserves quality |
746
-
747
- **Performance Characteristics:**
748
- - ✅ **Near-Perfect Accuracy**: Streaming implementation maintains alignment precision
749
- - 🚧 **Memory Efficient**: Process 20-hour audio with <10GB RAM (600-sec chunks)
750
-
751
-
752
- ### Word-Level Alignment
753
-
754
- Enable `word_level=True` to get precise timestamps for each word:
755
-
756
- ```python
757
- from lattifai import LattifAI, CaptionConfig
758
-
759
- client = LattifAI(
760
- caption_config=CaptionConfig(word_level=True)
761
- )
762
-
763
- caption = client.alignment(
764
- input_media="audio.wav",
765
- input_caption="subtitle.srt",
766
- output_caption_path="output.json", # JSON preserves word-level data
767
- )
768
-
769
- # Access word-level alignments
770
- for segment in caption.alignments:
771
- if segment.alignment and "word" in segment.alignment:
772
- for word_item in segment.alignment["word"]:
773
- print(f"{word_item.start:.2f}s: {word_item.symbol} (confidence: {word_item.score:.2f})")
774
- ```
775
-
776
- ### Smart Sentence Splitting
777
-
778
- The `split_sentence` option intelligently separates:
779
- - Non-speech elements (`[APPLAUSE]`, `[MUSIC]`) from dialogue
780
- - Multiple sentences within a single subtitle
781
- - Speaker labels from content
782
-
783
- ```python
784
- caption = client.alignment(
785
- input_media="audio.wav",
786
- input_caption="subtitle.srt",
787
- split_sentence=True,
788
- )
789
- ```
790
-
791
- ### Speaker Diarization
792
-
793
- Speaker diarization automatically identifies and labels different speakers in audio using state-of-the-art models.
794
-
795
- **Core Capabilities:**
796
- - 🎤 **Multi-Speaker Detection**: Automatically detect speaker changes in audio
797
- - 🏷️ **Smart Labeling**: Assign speaker labels (SPEAKER_00, SPEAKER_01, etc.)
798
- - 🔄 **Label Preservation**: Maintain existing speaker names from input captions
799
- - 🤖 **Gemini Integration**: Extract speaker names intelligently during transcription
800
-
801
- **How It Works:**
802
-
803
- 1. **Without Existing Labels**: System assigns generic labels (SPEAKER_00, SPEAKER_01)
804
- 2. **With Existing Labels**: System preserves your speaker names during alignment
805
- - Formats: `[Alice]`, `>> Bob:`, `SPEAKER_01:`, `Alice:` are all recognized
806
- 3. **Gemini Transcription**: When using Gemini models, speaker names are extracted from context
807
- - Example: "Hi, I'm Alice" → System labels as `Alice` instead of `SPEAKER_00`
808
-
809
- **Speaker Label Integration:**
810
-
811
- The diarization engine intelligently matches detected speakers with existing labels:
812
- - If input captions have speaker names → **Preserved during alignment**
813
- - If Gemini transcription provides names → **Used for labeling**
814
- - Otherwise → **Generic labels (SPEAKER_00, etc.) assigned**
815
- * 🚧 **Future Enhancement:**
816
- - **AI-Powered Speaker Name Inference**: Upcoming feature will use large language models combined with metadata (video title, description, context) to intelligently infer speaker names, making transcripts more human-readable and contextually accurate
817
-
818
- **CLI:**
819
- ```bash
820
- # Enable speaker diarization during alignment
821
- lai alignment align audio.wav subtitle.srt output.srt \
822
- diarization.enabled=true
823
-
824
- # With additional diarization settings
825
- lai alignment align audio.wav subtitle.srt output.srt \
826
- diarization.enabled=true \
827
- diarization.device=cuda \
828
- diarization.min_speakers=2 \
829
- diarization.max_speakers=4
830
-
831
- # For YouTube videos with diarization
832
- lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID" \
833
- diarization.enabled=true
834
- ```
835
-
836
- **Python SDK:**
837
- ```python
838
- from lattifai import LattifAI, DiarizationConfig
839
-
840
- client = LattifAI(
841
- diarization_config=DiarizationConfig(enabled=True)
842
- )
843
-
844
- caption = client.alignment(
845
- input_media="audio.wav",
846
- input_caption="subtitle.srt",
847
- output_caption_path="output.srt",
848
- )
849
-
850
- # Access speaker information
851
- for segment in caption.supervisions:
852
- print(f"[{segment.speaker}] {segment.text}")
853
- ```
854
-
855
- ### YAML Configuration Files
856
-
857
- * **under development**
858
-
859
- Create reusable configuration files:
860
-
861
- ```yaml
862
- # config/alignment.yaml
863
- model_name: "Lattifai/Lattice-1"
864
- device: "cuda"
865
- batch_size: 1
866
- ```
867
-
868
- ```bash
869
- lai alignment align audio.wav subtitle.srt output.srt \
870
- alignment=config/alignment.yaml
871
- ```
872
-
873
- ---
874
-
875
- ## Architecture Overview
876
-
877
- LattifAI uses a modular, config-driven architecture for maximum flexibility:
878
-
879
- ```
880
- ┌─────────────────────────────────────────────────────────────┐
881
- │ LattifAI Client │
882
- ├─────────────────────────────────────────────────────────────┤
883
- │ Configuration Layer (Config-Driven) │
884
- │ ├── ClientConfig (API settings) │
885
- │ ├── AlignmentConfig (Model & device) │
886
- │ ├── CaptionConfig (I/O formats) │
887
- │ ├── TranscriptionConfig (ASR models) │
888
- │ └── DiarizationConfig (Speaker detection) │
889
- ├─────────────────────────────────────────────────────────────┤
890
- │ Core Components │
891
- │ ├── AudioLoader → Load & preprocess audio │
892
- │ ├── Aligner → Lattice-1 forced alignment │
893
- │ ├── Transcriber → Multi-model ASR │
894
- │ ├── Diarizer → Speaker identification │
895
- │ └── Tokenizer → Intelligent text segmentation │
896
- ├─────────────────────────────────────────────────────────────┤
897
- │ Data Flow │
898
- │ Input → AudioLoader → Aligner → Diarizer → Caption │
899
- │ ↓ │
900
- │ Transcriber (optional) │
901
- └─────────────────────────────────────────────────────────────┘
902
- ```
903
-
904
- **Component Responsibilities:**
905
-
906
- | Component | Purpose | Configuration |
907
- |-----------|---------|---------------|
908
- | **AudioLoader** | Load audio/video, channel selection, format conversion | `MediaConfig` |
909
- | **Aligner** | Forced alignment using Lattice-1 model | `AlignmentConfig` |
910
- | **Transcriber** | ASR with Gemini/Parakeet/SenseVoice | `TranscriptionConfig` |
911
- | **Diarizer** | Speaker diarization with pyannote.audio | `DiarizationConfig` |
912
- | **Tokenizer** | Sentence splitting and text normalization | `CaptionConfig` |
913
- | **Caption** | Unified data structure for alignments | `CaptionConfig` |
914
-
915
- **Data Flow:**
916
-
917
- 1. **Audio Loading**: `AudioLoader` loads media, applies channel selection, converts to numpy array
918
- 2. **Transcription** (optional): `Transcriber` generates transcript if no caption provided
919
- 3. **Text Preprocessing**: `Tokenizer` splits sentences and normalizes text
920
- 4. **Alignment**: `Aligner` uses Lattice-1 to compute word-level timestamps
921
- 5. **Diarization** (optional): `Diarizer` identifies speakers and assigns labels
922
- 6. **Output**: `Caption` object contains all results, exported to desired format
923
-
924
- **Configuration Philosophy:**
925
- - ✅ **Declarative**: Describe what you want, not how to do it
926
- - ✅ **Composable**: Mix and match configurations
927
- - ✅ **Reproducible**: Save configs to YAML for consistent results
928
- - ✅ **Flexible**: Override configs per-method or globally
929
-
930
- ---
931
-
932
- ## Performance & Optimization
933
-
934
- ### Device Selection
935
-
936
- Choose the optimal device for your hardware:
937
-
938
- ```python
939
- from lattifai import LattifAI, AlignmentConfig
940
-
941
- # NVIDIA GPU (recommended for speed)
942
- client = LattifAI(
943
- alignment_config=AlignmentConfig(device="cuda")
944
- )
945
-
946
- # Apple Silicon GPU
947
- client = LattifAI(
948
- alignment_config=AlignmentConfig(device="mps")
949
- )
950
-
951
- # CPU (maximum compatibility)
952
- client = LattifAI(
953
- alignment_config=AlignmentConfig(device="cpu")
954
- )
955
- ```
956
-
957
- **Performance Comparison** (30-minute audio):
958
-
959
- | Device | Time |
960
- |--------|------|
961
- | CUDA (RTX 4090) | ~18 sec |
962
- | MPS (M4) | ~26 sec |
963
-
964
- ### Memory Management
965
-
966
- **Streaming Mode** for long audio:
967
-
968
- ```python
969
- # Process 20-hour audio with <10GB RAM
970
- caption = client.alignment(
971
- input_media="long_audio.wav",
972
- input_caption="subtitle.srt",
973
- streaming_chunk_secs=600.0, # 10-minute chunks
974
- )
975
- ```
976
-
977
- **Memory Usage** (approximate):
978
-
979
- | Chunk Size | Peak RAM | Suitable For |
980
- |------------|----------|-------------|
981
- | 600 sec | ~5 GB | Recommended |
982
- | No streaming | ~10 GB+ | Short audio only |
983
-
984
- ### Optimization Tips
985
-
986
- 1. **Use GPU when available**: 10x faster than CPU
987
- 2. **WIP: Enable streaming for long audio**: Process 20+ hour files without OOM
988
- 3. **Choose appropriate chunk size**: Balance memory vs. performance
989
- 4. **Batch processing**: Process multiple files in sequence (coming soon)
990
- 5. **Profile alignment**: Set `client.profile=True` to identify bottlenecks
991
-
992
- ---
993
-
994
- ## Supported Formats
995
-
996
- LattifAI supports virtually all common media and subtitle formats:
997
-
998
- | Type | Formats |
999
- |------|---------|
1000
- | **Audio** | WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, AIFF, and more |
1001
- | **Video** | MP4, MKV, MOV, WEBM, AVI, and more |
1002
- | **Caption/Subtitle Input** | SRT, VTT, ASS, SSA, SUB, SBV, TXT, Gemini, and more |
1003
- | **Caption/Subtitle Output** | All input formats + TextGrid (Praat) |
1004
-
1005
- **Tabular Formats:**
1006
- - **TSV**: Tab-separated values with optional speaker column
1007
- - **CSV**: Comma-separated values with optional speaker column
1008
- - **AUD**: Audacity labels format with `[[speaker]]` notation
1009
-
1010
- > **Note**: If a format is not listed above but commonly used, it's likely supported. Feel free to try it or reach out if you encounter any issues.
1011
-
1012
- ---
1013
-
1014
- ## Supported Languages
1015
-
1016
- LattifAI supports multiple transcription models with different language capabilities:
1017
-
1018
- ### Gemini Models (100+ Languages)
1019
-
1020
- **Models**: `gemini-2.5-pro`, `gemini-3-pro-preview`, `gemini-3-flash-preview`
1021
-
1022
- **Supported Languages**: English, Chinese (Mandarin & Cantonese), Spanish, French, German, Italian, Portuguese, Japanese, Korean, Arabic, Russian, Hindi, Bengali, Turkish, Dutch, Polish, Swedish, Danish, Norwegian, Finnish, Greek, Hebrew, Thai, Vietnamese, Indonesian, Malay, Filipino, Ukrainian, Czech, Romanian, Hungarian, Swahili, Tamil, Telugu, Marathi, Gujarati, Kannada, and 70+ more languages.
1023
-
1024
- > **Note**: Requires Gemini API key from [Google AI Studio](https://aistudio.google.com/apikey)
1025
-
1026
- ### NVIDIA Parakeet (24 European Languages)
1027
-
1028
- **Model**: `nvidia/parakeet-tdt-0.6b-v3`
1029
-
1030
- **Supported Languages**:
1031
- - **Western Europe**: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Dutch (nl)
1032
- - **Nordic**: Danish (da), Swedish (sv), Norwegian (no), Finnish (fi)
1033
- - **Eastern Europe**: Polish (pl), Czech (cs), Slovak (sk), Hungarian (hu), Romanian (ro), Bulgarian (bg), Ukrainian (uk), Russian (ru)
1034
- - **Others**: Croatian (hr), Estonian (et), Latvian (lv), Lithuanian (lt), Slovenian (sl), Maltese (mt), Greek (el)
1035
-
1036
- ### Alibaba SenseVoice (5 Asian Languages)
1037
-
1038
- **Model**: `iic/SenseVoiceSmall`
1039
-
1040
- **Supported Languages**:
1041
- - Chinese/Mandarin (zh)
1042
- - English (en)
1043
- - Japanese (ja)
1044
- - Korean (ko)
1045
- - Cantonese (yue)
1046
-
1047
- ### Language Selection
1048
-
1049
- ```python
1050
- from lattifai import LattifAI, TranscriptionConfig
1051
-
1052
- # Specify language for transcription
1053
- client = LattifAI(
1054
- transcription_config=TranscriptionConfig(
1055
- model_name="nvidia/parakeet-tdt-0.6b-v3",
1056
- language="de", # German
1057
- )
1058
- )
1059
- ```
1060
-
1061
- **CLI Usage:**
1062
- ```bash
1063
- lai transcribe run audio.wav output.srt \
1064
- transcription.model_name=nvidia/parakeet-tdt-0.6b-v3 \
1065
- transcription.language=de
1066
- ```
1067
-
1068
- > **Tip**: Use Gemini models for maximum language coverage, Parakeet for European languages, and SenseVoice for Asian languages.
1069
-
1070
- ---
1071
-
1072
- ## Roadmap
1073
-
1074
- Visit our [LattifAI roadmap](https://lattifai.com/roadmap) for the latest updates.
1075
-
1076
- | Date | Model Release | Features |
1077
- |------|---------|----------|
1078
- | **Oct 2025** | **Lattice-1-Alpha** | ✅ English forced alignment<br>✅ Multi-format support<br>✅ CPU/GPU optimization |
1079
- | **Nov 2025** | **Lattice-1** | ✅ English + Chinese + German<br>✅ Mixed languages alignment<br>✅ Speaker Diarization<br>✅ Multi-model transcription (Gemini, Parakeet, SenseVoice)<br>✅ Web interface with React<br>🚧 Advanced segmentation strategies (entire/transcription/hybrid)<br>🚧 Audio event detection ([MUSIC], [APPLAUSE], etc.)<br> |
1080
- | **Q1 2026** | **Lattice-2** | ✅ Streaming mode for long audio<br>🔮 40+ languages support<br>🔮 Real-time alignment |
1081
-
1082
-
1083
-
1084
- **Legend**: ✅ Released | 🚧 In Development | 📋 Planned | 🔮 Future
1085
-
1086
- ---
1087
-
1088
- ## Development
1089
-
1090
- ### Setup
1091
-
1092
- ```bash
1093
- git clone https://github.com/lattifai/lattifai-python.git
1094
- cd lattifai-python
1095
-
1096
- # Using uv (recommended)
1097
- curl -LsSf https://astral.sh/uv/install.sh | sh
1098
- uv sync
1099
- source .venv/bin/activate
1100
-
1101
- # Or using pip
1102
- pip install -e ".[test]"
1103
-
1104
- pre-commit install
1105
- ```
1106
-
1107
- ### Testing
1108
-
1109
- ```bash
1110
- pytest # Run all tests
1111
- pytest --cov=src # With coverage
1112
- pytest tests/test_basic.py # Specific test
1113
- ```
1114
-
1115
- ---
1116
-
1117
- ## Contributing
1118
-
1119
- 1. Fork the repository
1120
- 2. Create a feature branch
1121
- 3. Make changes and add tests
1122
- 4. Run `pytest` and `pre-commit run`
1123
- 5. Submit a pull request
1124
-
1125
- ## License
1126
-
1127
- Apache License 2.0
1128
-
1129
- ## Support
1130
-
1131
- - **Issues**: [GitHub Issues](https://github.com/lattifai/lattifai-python/issues)
1132
- - **Discussions**: [GitHub Discussions](https://github.com/lattifai/lattifai-python/discussions)
1133
- - **Discord**: [Join our community](https://discord.gg/kvF4WsBRK8)