langvision 0.0.1__py3-none-any.whl → 0.0.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of langvision might be problematic. Click here for more details.

@@ -0,0 +1,372 @@
1
+ Metadata-Version: 2.4
2
+ Name: langvision
3
+ Version: 0.0.2
4
+ Summary: A package for finetuning vision models.
5
+ Author-email: Pritesh Raj <priteshraj10@gmail.com>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Plim
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+ Project-URL: Homepage, https://github.com/langtrain-ai/langtrain
28
+ Project-URL: Documentation, https://github.com/langtrain-ai/langtrain/tree/main/docs
29
+ Project-URL: Source, https://github.com/langtrain-ai/langtrain
30
+ Project-URL: Tracker, https://github.com/langtrain-ai/langtrain/issues
31
+ Requires-Python: >=3.8
32
+ Description-Content-Type: text/markdown
33
+ License-File: LICENSE
34
+ Requires-Dist: torch>=1.10
35
+ Requires-Dist: numpy
36
+ Requires-Dist: tqdm
37
+ Requires-Dist: pyyaml
38
+ Requires-Dist: scipy
39
+ Requires-Dist: matplotlib
40
+ Requires-Dist: pillow
41
+ Dynamic: license-file
42
+
43
+ # Langvision: Vision LLMs with Efficient LoRA Fine-Tuning
44
+
45
+ <hr/>
46
+ <p align="center">
47
+ <picture>
48
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langvision-use-dark.png">
49
+ <img alt="Langvision Logo" src="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langvision-white.png" width="full" />
50
+ </picture>
51
+ </p>
52
+
53
+ <!-- Badges -->
54
+ <p align="center">
55
+ <a href="https://pypi.org/project/langvision/"><img src="https://img.shields.io/pypi/v/langvision.svg" alt="PyPI version"></a>
56
+ <a href="https://pepy.tech/project/langvision"><img src="https://pepy.tech/badge/langvision" alt="Downloads"></a>
57
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a>
58
+ <a href="https://img.shields.io/badge/coverage-90%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen"/></a>
59
+ <a href="https://img.shields.io/badge/python-3.8%2B-blue" alt="Python Version"> <img src="https://img.shields.io/badge/python-3.8%2B-blue"/></a>
60
+ <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
61
+ </p>
62
+
63
+ <p align="center">
64
+ <b>Langvision provides modular components for vision models and LoRA-based fine-tuning.</b><br/>
65
+ <span style="font-size:1.1em"><i>Adapt and fine-tune vision models for a range of tasks.</i></span>
66
+ </p>
67
+ <hr/>
68
+
69
+ ## Quick Links
70
+ - [Documentation](docs/index.md)
71
+ - [Tutorials](docs/tutorials/index.md)
72
+ - [Changelog](CHANGELOG.md)
73
+ - [Contributing Guide](CONTRIBUTING.md)
74
+ - [Roadmap](ROADMAP.md)
75
+
76
+ ---
77
+
78
+ ## Table of Contents
79
+ - [Features](#features)
80
+ - [Showcase](#showcase)
81
+ - [Getting Started](#getting-started)
82
+ - [Supported Python Versions](#supported-python-versions)
83
+ - [Why langvision?](#why-langvision)
84
+ - [Architecture Overview](#architecture-overview)
85
+ - [Core Modules](#core-modules)
86
+ - [Performance & Efficiency](#performance--efficiency)
87
+ - [Advanced Configuration](#advanced-configuration)
88
+ - [Documentation & Resources](#documentation--resources)
89
+ - [Testing & Quality](#testing--quality)
90
+ - [Examples & Use Cases](#examples--use-cases)
91
+ - [Extending the Framework](#extending-the-framework)
92
+ - [Contributing](#contributing)
93
+ - [FAQ](#faq)
94
+ - [Citation](#citation)
95
+ - [Acknowledgements](#acknowledgements)
96
+ - [License](#license)
97
+
98
+ ---
99
+
100
+ ## Features
101
+ - LoRA adapters for parameter-efficient fine-tuning
102
+ - Modular Vision Transformer (ViT) backbone
103
+ - Model zoo for open-source visual models
104
+ - Configurable and extensible codebase
105
+ - Checkpointing and resume support
106
+ - Mixed precision and distributed training
107
+ - Built-in metrics and visualization tools
108
+ - CLI for fine-tuning and evaluation
109
+ - Extensible callbacks (early stopping, logging, etc.)
110
+
111
+ ---
112
+
113
+ ## Showcase
114
+
115
+ Langvision is a framework for building and fine-tuning vision models with LoRA support. It is suitable for tasks such as image classification, visual question answering, and custom vision applications.
116
+
117
+ ---
118
+
119
+ ## Getting Started
120
+
121
+ Install with pip:
122
+
123
+ ```bash
124
+ pip install langvision
125
+ ```
126
+
127
+ Minimal example:
128
+
129
+ ```python
130
+ import torch
131
+ from langvision.models.vision_transformer import VisionTransformer
132
+ from langvision.utils.config import default_config
133
+
134
+ x = torch.randn(2, 3, 224, 224)
135
+ model = VisionTransformer(
136
+ img_size=default_config['img_size'],
137
+ patch_size=default_config['patch_size'],
138
+ in_chans=default_config['in_chans'],
139
+ num_classes=default_config['num_classes'],
140
+ embed_dim=default_config['embed_dim'],
141
+ depth=default_config['depth'],
142
+ num_heads=default_config['num_heads'],
143
+ mlp_ratio=default_config['mlp_ratio'],
144
+ lora_config=default_config['lora'],
145
+ )
146
+
147
+ with torch.no_grad():
148
+ out = model(x)
149
+ print('Output shape:', out.shape)
150
+ ```
151
+
152
+ For more details, see the [Documentation](docs/index.md) and [src/langvision/cli/finetune.py](src/langvision/cli/finetune.py).
153
+
154
+ ---
155
+
156
+ ## Supported Python Versions
157
+ - Python 3.8+
158
+
159
+ ---
160
+
161
+ ## Why langvision?
162
+
163
+ - Parameter-efficient fine-tuning with LoRA adapters
164
+ - Modular ViT backbone for flexible model design
165
+ - Unified interface for open-source vision models
166
+ - Designed for both research and production
167
+ - Efficient memory usage for large models
168
+
169
+ ---
170
+
171
+ ## Architecture Overview
172
+
173
+ Langvision uses a modular Vision Transformer backbone with LoRA adapters in attention and MLP layers. This allows adaptation of pre-trained models with fewer trainable parameters.
174
+
175
+ ### Model Data Flow
176
+
177
+ ```mermaid
178
+ ---
179
+ config:
180
+ layout: dagre
181
+ ---
182
+ flowchart TD
183
+ subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
184
+ LA1(["LoRA Adapter 1"])
185
+ LA2(["LoRA Adapter 2"])
186
+ LA3(["LoRA Adapter N"])
187
+ end
188
+ A(["Input Image"]) --> B(["Patch Embedding"])
189
+ B --> C(["CLS Token & Positional Encoding"])
190
+ C --> D1(["Encoder Layer 1"])
191
+ D1 --> D2(["Encoder Layer 2"])
192
+ D2 --> D3(["Encoder Layer N"])
193
+ D3 --> E(["LayerNorm"])
194
+ E --> F(["MLP Head"])
195
+ F --> G(["Output Class Logits"])
196
+ LA1 -.-> D1
197
+ LA2 -.-> D2
198
+ LA3 -.-> D3
199
+ LA1:::loraStyle
200
+ LA2:::loraStyle
201
+ LA3:::loraStyle
202
+ classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
203
+ ```
204
+
205
+ ---
206
+
207
+ ## Core Modules
208
+
209
+ | Module | Description | Key Features |
210
+ |--------|-------------|--------------|
211
+ | PatchEmbedding | Image-to-patch conversion and embedding | Configurable patch sizes, position embeddings |
212
+ | TransformerEncoder | Multi-layer transformer backbone | Self-attention, LoRA integration, checkpointing |
213
+ | LoRALinear | Low-rank adaptation layers | Configurable rank, memory-efficient updates |
214
+ | MLPHead | Output projection layer | Classification, regression, dropout |
215
+ | Config System | Centralized configuration | YAML/JSON config, CLI overrides |
216
+ | Data Utils | Preprocessing and augmentation | Built-in transforms, custom loaders |
217
+
218
+ ---
219
+
220
+ ## Performance & Efficiency
221
+
222
+ | Metric | Full Fine-tuning | LoRA Fine-tuning | Improvement |
223
+ |--------|------------------|------------------|-------------|
224
+ | Trainable Parameters | 86M | 2.4M | 97% reduction |
225
+ | Memory Usage | 12GB | 4GB | 67% reduction |
226
+ | Training Time | 4h | 1.5h | 62% faster |
227
+ | Storage per Task | 344MB | 9.6MB | 97% smaller |
228
+
229
+ *Benchmarks: ViT-Base, CIFAR-100, RTX 3090*
230
+
231
+ Supported model sizes: ViT-Tiny, ViT-Small, ViT-Base, ViT-Large
232
+
233
+ ---
234
+
235
+ ## Advanced Configuration
236
+
237
+ Example LoRA config:
238
+
239
+ ```python
240
+ lora_config = {
241
+ "rank": 16,
242
+ "alpha": 32,
243
+ "dropout": 0.1,
244
+ "target_modules": ["attention.qkv", "attention.proj", "mlp.fc1", "mlp.fc2"],
245
+ "merge_weights": False
246
+ }
247
+ ```
248
+
249
+ Example training config:
250
+
251
+ ```yaml
252
+ model:
253
+ name: "vit_base"
254
+ img_size: 224
255
+ patch_size: 16
256
+ num_classes: 1000
257
+ training:
258
+ epochs: 10
259
+ batch_size: 32
260
+ learning_rate: 1e-4
261
+ weight_decay: 0.01
262
+ warmup_steps: 1000
263
+ lora:
264
+ rank: 16
265
+ alpha: 32
266
+ dropout: 0.1
267
+ ```
268
+
269
+ ---
270
+
271
+ ## Documentation & Resources
272
+ - [API Reference](docs/api/index.md)
273
+ - [Tutorials and Examples](docs/tutorials/index.md)
274
+ - [Research Papers](#research-papers)
275
+ - [Best Practices Guide](docs/best_practices.md)
276
+ - [Troubleshooting](docs/troubleshooting.md)
277
+
278
+ ### Research Papers
279
+ - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
280
+ - [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
281
+ - [Vision Transformer for Fine-Grained Image Classification](https://arxiv.org/abs/2103.07579)
282
+
283
+ ---
284
+
285
+ ## Testing & Quality
286
+
287
+ Run tests:
288
+
289
+ ```bash
290
+ pytest tests/
291
+ ```
292
+
293
+ Code quality tools:
294
+
295
+ ```bash
296
+ flake8 src/
297
+ black src/ --check
298
+ mypy src/
299
+ bandit -r src/
300
+ ```
301
+
302
+ ---
303
+
304
+ ## Examples & Use Cases
305
+
306
+ Image classification:
307
+
308
+ ```python
309
+ from langvision import VisionTransformer
310
+ from langvision.datasets import CIFAR10Dataset
311
+
312
+ model = VisionTransformer.from_pretrained("vit_base_patch16_224")
313
+ dataset = CIFAR10Dataset(train=True, transform=model.default_transform)
314
+ model.finetune(dataset, epochs=10, lora_rank=16)
315
+ ```
316
+
317
+ Custom dataset:
318
+
319
+ ```python
320
+ from langvision.datasets import ImageFolderDataset
321
+
322
+ dataset = ImageFolderDataset(
323
+ root="/path/to/dataset",
324
+ split="train",
325
+ transform=model.default_transform
326
+ )
327
+ model.finetune(dataset, config_path="configs/custom_config.yaml")
328
+ ```
329
+
330
+ ---
331
+
332
+ ## Extending the Framework
333
+ - Add datasets in `src/langvision/data/datasets.py`
334
+ - Add callbacks in `src/langvision/callbacks/`
335
+ - Add models in `src/langvision/models/`
336
+ - Add CLI tools in `src/langvision/cli/`
337
+
338
+ ## Documentation
339
+ - See code comments and docstrings for details.
340
+ - For advanced usage, see `src/langvision/cli/finetune.py`.
341
+
342
+ ## Contributing
343
+ We welcome contributions. See the [Contributing Guide](CONTRIBUTING.md) for details.
344
+
345
+ ## License & Citation
346
+
347
+ This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
348
+
349
+ If you use langvision in your research, please cite:
350
+
351
+ ```bibtex
352
+ @software{langtrain2025,
353
+ author = {Pritesh Raj},
354
+ title = {langtrain: Vision LLMs with Efficient LoRA Fine-Tuning},
355
+ url = {https://github.com/langtrain-ai/langvision},
356
+ year = {2025},
357
+ version = {1.0.0}
358
+ }
359
+ ```
360
+
361
+ ## Acknowledgements
362
+
363
+ We thank the following projects and communities:
364
+ - [PyTorch](https://pytorch.org/)
365
+ - [HuggingFace](https://huggingface.co/)
366
+ - [timm](https://github.com/rwightman/pytorch-image-models)
367
+ - [PEFT](https://github.com/huggingface/peft)
368
+
369
+ <p align="center">
370
+ <b>Made in India 🇮🇳 with ❤️ by the langtrain team</b><br/>
371
+ <i>Star ⭐ this repo if you find it useful!</i>
372
+ </p>
@@ -32,9 +32,9 @@ langvision/utils/config.py,sha256=0T5Vl8S0hVhCq1NTpiQnXFmsRxOidiYPsXGv51ruEBM,27
32
32
  langvision/utils/cuda.py,sha256=rpp0j-tvwGK6uWVe7fFTOS1BdZcsfRkw2AS5B08L-R0,1372
33
33
  langvision/utils/data.py,sha256=95U8Z_R2slwpaBVFrqyvcBA6kYINcEuzp1r5djvwQYg,277
34
34
  langvision/utils/device.py,sha256=TLVwXHFcildDyKh1Q7zczIyU702UJj7OHpsGhjRwLjk,552
35
- langvision-0.0.1.dist-info/licenses/LICENSE,sha256=S5-0GNdCWFpkatz1c_DXeMYJM1uiWBWx-LNGRcFNP70,1061
36
- langvision-0.0.1.dist-info/METADATA,sha256=5X3tjtJy8GuK7RVAm_DVJR7FL7quSK87WUNZk5B72WM,16076
37
- langvision-0.0.1.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
38
- langvision-0.0.1.dist-info/entry_points.txt,sha256=9kxJHpRvjLKPMf88vdPS_Aed9nXr3HmoqwN7eEb_Nv4,69
39
- langvision-0.0.1.dist-info/top_level.txt,sha256=l3g51kgU-5cacP4nhASSfYADOCAwLC4UzlgVwuoJCgc,11
40
- langvision-0.0.1.dist-info/RECORD,,
35
+ langvision-0.0.2.dist-info/licenses/LICENSE,sha256=S5-0GNdCWFpkatz1c_DXeMYJM1uiWBWx-LNGRcFNP70,1061
36
+ langvision-0.0.2.dist-info/METADATA,sha256=nGqZ-YWp_swWrBUj6mWRMioae0SRWnVbTQA81XI6YNE,11285
37
+ langvision-0.0.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
38
+ langvision-0.0.2.dist-info/entry_points.txt,sha256=9kxJHpRvjLKPMf88vdPS_Aed9nXr3HmoqwN7eEb_Nv4,69
39
+ langvision-0.0.2.dist-info/top_level.txt,sha256=l3g51kgU-5cacP4nhASSfYADOCAwLC4UzlgVwuoJCgc,11
40
+ langvision-0.0.2.dist-info/RECORD,,
@@ -1,463 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: langvision
3
- Version: 0.0.1
4
- Summary: Vision LLMs with Efficient LoRA Fine-Tuning
5
- Author-email: Pritesh Raj <priteshraj10@gmail.com>
6
- License: MIT License
7
-
8
- Copyright (c) 2025 Plim
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a copy
11
- of this software and associated documentation files (the "Software"), to deal
12
- in the Software without restriction, including without limitation the rights
13
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
- copies of the Software, and to permit persons to whom the Software is
15
- furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included in all
18
- copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
- SOFTWARE.
27
- Project-URL: Homepage, https://github.com/langtrain-ai/langtrain
28
- Project-URL: Documentation, https://github.com/langtrain-ai/langtrain/tree/main/docs
29
- Project-URL: Source, https://github.com/langtrain-ai/langtrain
30
- Project-URL: Tracker, https://github.com/langtrain-ai/langtrain/issues
31
- Requires-Python: >=3.8
32
- Description-Content-Type: text/markdown
33
- License-File: LICENSE
34
- Requires-Dist: torch>=1.10
35
- Requires-Dist: numpy
36
- Requires-Dist: tqdm
37
- Requires-Dist: pyyaml
38
- Requires-Dist: scipy
39
- Requires-Dist: matplotlib
40
- Requires-Dist: pillow
41
- Dynamic: license-file
42
-
43
- # langtrain: Vision LLMs (Large Language Models for Vision) with Efficient LoRA Fine-Tuning
44
-
45
- <hr/>
46
- <p align="center">
47
- <picture>
48
- <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langvision-use-dark.png">
49
- <img alt="Langvision Logo" src="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langvision-white.png" width="full" />
50
- </picture>
51
- </p>
52
-
53
- <!-- Badges -->
54
- <p align="center">
55
- <a href="https://pypi.org/project/langvision/"><img src="https://img.shields.io/pypi/v/langvision.svg" alt="PyPI version"></a>
56
- <a href="https://pepy.tech/project/langvision"><img src="https://pepy.tech/badge/langvision" alt="Downloads"></a>
57
- <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a>
58
- <a href="https://img.shields.io/badge/coverage-90%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen"/></a>
59
- <a href="https://img.shields.io/badge/python-3.8%2B-blue" alt="Python Version"> <img src="https://img.shields.io/badge/python-3.8%2B-blue"/></a>
60
- <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
61
- </p>
62
-
63
- <p align="center">
64
- <b>Modular Vision LLMs (Large Language Models for Vision) with Efficient LoRA Fine-Tuning</b><br/>
65
- <span style="font-size:1.1em"><i>Build, adapt, and fine-tune vision models with ease and efficiency.</i></span>
66
- </p>
67
- <hr/>
68
-
69
- ## 🚀 Quick Links
70
- - [Documentation](docs/index.md)
71
- - [Tutorials](docs/tutorials/index.md)
72
- - [Changelog](CHANGELOG.md)
73
- - [Contributing Guide](CONTRIBUTING.md)
74
- - [Roadmap](ROADMAP.md)
75
-
76
- ---
77
-
78
- ## 📚 Table of Contents
79
- - [Features](#-features)
80
- - [Showcase](#-showcase)
81
- - [Getting Started](#-getting-started)
82
- - [Supported Python Versions](#-supported-python-versions)
83
- - [Why langvision?](#-why-langvision)
84
- - [Architecture Overview](#-architecture-overview)
85
- - [Core Modules](#-core-modules)
86
- - [Performance & Efficiency](#-performance--efficiency)
87
- - [Advanced Configuration](#-advanced-configuration)
88
- - [Documentation & Resources](#-documentation--resources)
89
- - [Testing & Quality](#-testing--quality)
90
- - [Examples & Use Cases](#-examples--use-cases)
91
- - [Extending the Framework](#-extending-the-framework)
92
- - [Contributing](#-contributing)
93
- - [FAQ](#-faq)
94
- - [Citation](#-citation)
95
- - [Acknowledgements](#-acknowledgements)
96
- - [License](#-license)
97
-
98
- ---
99
-
100
- ## ✨ Features
101
- - 🔧 **Plug-and-play LoRA adapters** for parameter-efficient fine-tuning
102
- - 🏗️ **Modular Vision Transformer (ViT) backbone** with customizable components
103
- - 🎯 **Unified model zoo** for open-source visual models
104
- - ⚙️ **Easy configuration** and extensible codebase
105
- - 🚀 **Production ready** with comprehensive testing and documentation
106
- - 💾 **Memory efficient** training with gradient checkpointing support
107
- - 📊 **Built-in metrics** and visualization tools
108
- - 🧩 **Modular training loop** with LoRA support
109
- - 🎯 **Unified CLI** for fine-tuning and evaluation
110
- - 🔌 **Extensible callbacks** (early stopping, logging, etc.)
111
- - 📦 **Checkpointing and resume**
112
- - 🚀 **Mixed precision training**
113
- - 🔧 **Easy dataset and model extension**
114
- - ⚡ **Ready for distributed/multi-GPU training**
115
-
116
- ---
117
-
118
- ## 🚀 Showcase
119
-
120
- **langvision** is a modular, research-friendly framework for building and fine-tuning Vision Large Language Models (LLMs) with efficient Low-Rank Adaptation (LoRA) support. Whether you're working on image classification, visual question answering, or custom vision tasks, langvision provides the tools you need for parameter-efficient model adaptation.
121
-
122
- ---
123
-
124
- ## 🏁 Getting Started
125
-
126
- Here's a minimal example to get you up and running:
127
-
128
- ```bash
129
- pip install langvision
130
- ```
131
-
132
- ```python
133
- import torch
134
- from langvision.models.vision_transformer import VisionTransformer
135
- from langvision.utils.config import default_config
136
-
137
- # Create model
138
- x = torch.randn(2, 3, 224, 224)
139
- model = VisionTransformer(
140
- img_size=default_config['img_size'],
141
- patch_size=default_config['patch_size'],
142
- in_chans=default_config['in_chans'],
143
- num_classes=default_config['num_classes'],
144
- embed_dim=default_config['embed_dim'],
145
- depth=default_config['depth'],
146
- num_heads=default_config['num_heads'],
147
- mlp_ratio=default_config['mlp_ratio'],
148
- lora_config=default_config['lora'],
149
- )
150
-
151
- # Forward pass
152
- with torch.no_grad():
153
- out = model(x)
154
- print('Output shape:', out.shape)
155
- ```
156
-
157
- For advanced usage, CLI details, and more, see the [Documentation](docs/index.md) and [src/langvision/cli/finetune.py](src/langvision/cli/finetune.py).
158
-
159
- ---
160
-
161
- ## 🐍 Supported Python Versions
162
- - Python 3.8+
163
-
164
- ---
165
-
166
- ## 🧩 Why langvision?
167
-
168
- - **Parameter-efficient fine-tuning**: Plug-and-play LoRA adapters for fast, memory-efficient adaptation with minimal computational overhead
169
- - **Modular ViT backbone**: Swap or extend components like patch embedding, attention, or MLP heads with ease
170
- - **Unified model zoo**: Access and experiment with open-source visual models through a consistent interface
171
- - **Research & production ready**: Clean, extensible codebase with comprehensive configuration options and robust utilities
172
- - **Memory efficient**: Fine-tune large models on consumer hardware by updating only a small fraction of parameters
173
-
174
- ---
175
-
176
- ## 🏗️ Architecture Overview
177
-
178
- langvision is built around a modular Vision Transformer (ViT) backbone, with LoRA adapters strategically injected into attention and MLP layers for efficient fine-tuning. This approach allows you to adapt large pre-trained models using only a fraction of the original parameters.
179
-
180
- ### Model Data Flow
181
-
182
- ```mermaid
183
- ---
184
- config:
185
- layout: dagre
186
- ---
187
- flowchart TD
188
- subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
189
- LA1(["LoRA Adapter 1"])
190
- LA2(["LoRA Adapter 2"])
191
- LA3(["LoRA Adapter N"])
192
- end
193
- A(["Input Image"]) --> B(["Patch Embedding"])
194
- B --> C(["CLS Token & Positional Encoding"])
195
- C --> D1(["Encoder Layer 1"])
196
- D1 --> D2(["Encoder Layer 2"])
197
- D2 --> D3(["Encoder Layer N"])
198
- D3 --> E(["LayerNorm"])
199
- E --> F(["MLP Head"])
200
- F --> G(["Output Class Logits"])
201
- LA1 -.-> D1
202
- LA2 -.-> D2
203
- LA3 -.-> D3
204
- LA1:::loraStyle
205
- LA2:::loraStyle
206
- LA3:::loraStyle
207
- classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
208
- ```
209
-
210
- ### Architecture Components
211
-
212
- **Legend:**
213
- - **Solid arrows**: Main data flow through the Vision Transformer
214
- - **Dashed arrows**: LoRA adapter injection points in encoder layers
215
- - **Blue boxes**: LoRA adapters for parameter-efficient fine-tuning
216
-
217
- **Data Flow Steps:**
218
- 1. **Input Image** (224×224×3): Raw image data ready for processing
219
- 2. **Patch Embedding**: Image split into 16×16 patches and projected to embedding dimension
220
- 3. **CLS Token & Positional Encoding**: Classification token prepended with learnable position embeddings
221
- 4. **Transformer Encoder Stack**: Multi-layer transformer with self-attention and MLP blocks
222
- - **LoRA Integration**: Low-rank adapters injected into attention and MLP layers
223
- - **Efficient Updates**: Only LoRA parameters updated during fine-tuning
224
- 5. **LayerNorm**: Final normalization of encoder outputs
225
- 6. **MLP Head**: Task-specific classification or regression head
226
- 7. **Output**: Final predictions (class probabilities, regression values, etc.)
227
-
228
- ---
229
-
230
- ## 🧩 Core Modules
231
-
232
- | Module | Description | Key Features |
233
- |--------|-------------|--------------|
234
- | **PatchEmbedding** | Image-to-patch conversion and embedding | • Configurable patch sizes<br>• Learnable position embeddings<br>• Support for different input resolutions |
235
- | **TransformerEncoder** | Multi-layer transformer backbone | • Self-attention mechanisms<br>• LoRA adapter integration<br>• Gradient checkpointing support |
236
- | **LoRALinear** | Low-rank adaptation layers | • Configurable rank and scaling<br>• Memory-efficient updates<br>• Easy enable/disable functionality |
237
- | **MLPHead** | Output projection layer | • Multi-class classification<br>• Regression support<br>• Dropout regularization |
238
- | **Config System** | Centralized configuration management | • YAML/JSON config files<br>• Command-line overrides<br>• Validation and defaults |
239
- | **Data Utils** | Preprocessing and augmentation | • Built-in transforms<br>• Custom dataset loaders<br>• Efficient data pipelines |
240
-
241
- ---
242
-
243
- ## 📊 Performance & Efficiency
244
-
245
- ### LoRA Benefits
246
-
247
- | Metric | Full Fine-tuning | LoRA Fine-tuning | Improvement |
248
- |--------|------------------|------------------|-------------|
249
- | **Trainable Parameters** | 86M | 2.4M | **97% reduction** |
250
- | **Memory Usage** | 12GB | 4GB | **67% reduction** |
251
- | **Training Time** | 4 hours | 1.5 hours | **62% faster** |
252
- | **Storage per Task** | 344MB | 9.6MB | **97% smaller** |
253
-
254
- *Benchmarks on ViT-Base with CIFAR-100, RTX 3090*
255
-
256
- ### Supported Model Sizes
257
-
258
- - **ViT-Tiny**: 5.7M parameters, perfect for experimentation
259
- - **ViT-Small**: 22M parameters, good balance of performance and efficiency
260
- - **ViT-Base**: 86M parameters, strong performance across tasks
261
- - **ViT-Large**: 307M parameters, state-of-the-art results
262
-
263
- ---
264
-
265
- ## 🔧 Advanced Configuration
266
-
267
- ### LoRA Configuration
268
-
269
- ```python
270
- lora_config = {
271
- "rank": 16, # Low-rank dimension
272
- "alpha": 32, # Scaling factor
273
- "dropout": 0.1, # Dropout rate
274
- "target_modules": [ # Modules to adapt
275
- "attention.qkv",
276
- "attention.proj",
277
- "mlp.fc1",
278
- "mlp.fc2"
279
- ],
280
- "merge_weights": False # Whether to merge during inference
281
- }
282
- ```
283
-
284
- ### Training Configuration
285
-
286
- ```yaml
287
- # config.yaml
288
- model:
289
- name: "vit_base"
290
- img_size: 224
291
- patch_size: 16
292
- num_classes: 1000
293
-
294
- training:
295
- epochs: 10
296
- batch_size: 32
297
- learning_rate: 1e-4
298
- weight_decay: 0.01
299
- warmup_steps: 1000
300
-
301
- lora:
302
- rank: 16
303
- alpha: 32
304
- dropout: 0.1
305
- ```
306
-
307
- ---
308
-
309
- ## 📚 Documentation & Resources
310
-
311
- - 📖 [Complete API Reference](docs/api/index.md)
312
- - 🎓 [Tutorials and Examples](docs/tutorials/index.md)
313
- - 🔬 [Research Papers](#research-papers)
314
- - 💡 [Best Practices Guide](docs/best_practices.md)
315
- - 🐛 [Troubleshooting](docs/troubleshooting.md)
316
-
317
- ### Research Papers
318
- - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
319
- - [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
320
- - [Vision Transformer for Fine-Grained Image Classification](https://arxiv.org/abs/2103.07579)
321
-
322
- ---
323
-
324
- ## 🧪 Testing & Quality
325
-
326
- Run the comprehensive test suite:
327
-
328
- ```bash
329
- # Unit tests
330
- pytest tests/unit/
331
-
332
- # Integration tests
333
- pytest tests/integration/
334
-
335
- # Performance benchmarks
336
- pytest tests/benchmarks/
337
-
338
- # All tests with coverage
339
- pytest tests/ --cov=langvision --cov-report=html
340
- ```
341
-
342
- ### Code Quality Tools
343
-
344
- ```bash
345
- # Linting
346
- flake8 src/
347
- black src/ --check
348
-
349
- # Type checking
350
- mypy src/
351
-
352
- # Security scanning
353
- bandit -r src/
354
- ```
355
-
356
- ---
357
-
358
- ## 🚀 Examples & Use Cases
359
-
360
- ### Image Classification
361
- ```python
362
- from langvision import VisionTransformer
363
- from langvision.datasets import CIFAR10Dataset
364
-
365
- # Load pre-trained model
366
- model = VisionTransformer.from_pretrained("vit_base_patch16_224")
367
-
368
- # Fine-tune on CIFAR-10
369
- dataset = CIFAR10Dataset(train=True, transform=model.default_transform)
370
- model.finetune(dataset, epochs=10, lora_rank=16)
371
- ```
372
-
373
- ### Custom Dataset
374
- ```python
375
- from langvision.datasets import ImageFolderDataset
376
-
377
- # Your custom dataset
378
- dataset = ImageFolderDataset(
379
- root="/path/to/dataset",
380
- split="train",
381
- transform=model.default_transform
382
- )
383
-
384
- # Fine-tune with custom configuration
385
- model.finetune(
386
- dataset,
387
- config_path="configs/custom_config.yaml"
388
- )
389
- ```
390
-
391
- ---
392
-
393
- ## 🧩 Extending the Framework
394
- - Add new datasets in `src/langvision/data/datasets.py`
395
- - Add new callbacks in `src/langvision/callbacks/`
396
- - Add new models in `src/langvision/models/`
397
- - Add new CLI tools in `src/langvision/cli/`
398
-
399
- ## 📖 Documentation
400
- - See code comments and docstrings for details on each module.
401
- - For advanced usage, see the `src/langvision/cli/finetune.py` script.
402
-
403
- ## 🤝 Contributing
404
- We welcome contributions from the community! Here's how you can get involved:
405
-
406
- ### Ways to Contribute
407
- - 🐛 **Report bugs** by opening issues with detailed reproduction steps
408
- - 💡 **Suggest features** through feature requests and discussions
409
- - 📝 **Improve documentation** with examples, tutorials, and API docs
410
- - 🔧 **Submit pull requests** for bug fixes and new features
411
- - 🧪 **Add tests** to improve code coverage and reliability
412
-
413
- ### Development Setup
414
- ```bash
415
- # Clone and setup development environment
416
- git clone https://github.com/langtrain-ai/langvision.git
417
- cd langvision
418
- pip install -e ".[dev]"
419
-
420
- # Install pre-commit hooks
421
- pre-commit install
422
-
423
- # Run tests
424
- pytest tests/
425
- ```
426
-
427
- ### Community Resources
428
- - 💬 [GitHub Discussions](https://github.com/langtrain-ai/langvision/discussions) - Ask questions and share ideas
429
- - 🐛 [Issue Tracker](https://github.com/langtrain-ai/langvision/issues) - Report bugs and request features
430
- - 📖 [Contributing Guide](CONTRIBUTING.md) - Detailed contribution guidelines
431
- - 🎯 [Roadmap](ROADMAP.md) - See what's planned for future releases
432
-
433
- ## 📄 License & Citation
434
-
435
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
436
-
437
- ### Citation
438
-
439
- If you use langtrain in your research, please cite:
440
-
441
- ```bibtex
442
- @software{langtrain2025,
443
- author = {Pritesh Raj},
444
- title = {langtrain: Vision LLMs with Efficient LoRA Fine-Tuning},
445
- url = {https://github.com/langtrain-ai/langvision},
446
- year = {2025},
447
- version = {1.0.0}
448
- }
449
- ```
450
-
451
- ## 🌟 Acknowledgements
452
-
453
- We thank the following projects and communities:
454
-
455
- - [PyTorch](https://pytorch.org/) - Deep learning framework
456
- - [HuggingFace](https://huggingface.co/) - Transformers and model hub
457
- - [timm](https://github.com/rwightman/pytorch-image-models) - Vision model implementations
458
- - [PEFT](https://github.com/huggingface/peft) - Parameter-efficient fine-tuning methods
459
-
460
- <p align="center">
461
- <b>Made in India 🇮🇳 with ❤️ by the langtrain team</b><br/>
462
- <i>Star ⭐ this repo if you find it useful!</i>
463
- </p>