langtune 0.1.0__tar.gz → 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of langtune might be problematic. Click here for more details.

@@ -0,0 +1,369 @@
1
+ Metadata-Version: 2.4
2
+ Name: langtune
3
+ Version: 0.1.1
4
+ Summary: A package for finetuning text models.
5
+ Author-email: Pritesh Raj <priteshraj41@gmail.com>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Pritesh Raj
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/langtrain-ai/langtune
29
+ Project-URL: Documentation, https://github.com/langtrain-ai/langtune/tree/main/docs
30
+ Project-URL: Source, https://github.com/langtrain-ai/langtune
31
+ Project-URL: Tracker, https://github.com/langtrain-ai/langtune/issues
32
+ Requires-Python: >=3.8
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ Requires-Dist: torch>=1.10
36
+ Requires-Dist: numpy
37
+ Requires-Dist: tqdm
38
+ Requires-Dist: pyyaml
39
+ Requires-Dist: scipy
40
+ Dynamic: license-file
41
+
42
+ # Langtune: Efficient LoRA Fine-Tuning for Text LLMs
43
+
44
+ <hr/>
45
+ <p align="center">
46
+ <picture>
47
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langtune-use-dark.png">
48
+ <img alt="Langtune Logo" src="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langtune-white.png" width="full" />
49
+ </picture>
50
+ </p>
51
+
52
+ <!-- Badges -->
53
+ <p align="center">
54
+ <a href="https://pypi.org/project/langtune/"><img src="https://img.shields.io/pypi/v/langtune.svg" alt="PyPI version"></a>
55
+ <a href="https://pepy.tech/project/langtune"><img src="https://pepy.tech/badge/langtune" alt="Downloads"></a>
56
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a>
57
+ <a href="https://img.shields.io/badge/coverage-90%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen"/></a>
58
+ <a href="https://img.shields.io/badge/python-3.8%2B-blue" alt="Python Version"> <img src="https://img.shields.io/badge/python-3.8%2B-blue"/></a>
59
+ <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
60
+ </p>
61
+
62
+ <p align="center">
63
+ <b>Langtune is a Python package for fine-tuning large language models on text data using LoRA.</b><br/>
64
+ <span style="font-size:1.1em"><i>Provides modular components for adapting language models to various NLP tasks.</i></span>
65
+ </p>
66
+ <hr/>
67
+
68
+ ## Quick Links
69
+ - [Documentation](docs/index.md)
70
+ - [Tutorials](docs/tutorials/index.md)
71
+ - [Changelog](CHANGELOG.md)
72
+ - [Contributing Guide](CONTRIBUTING.md)
73
+ - [Roadmap](ROADMAP.md)
74
+
75
+ ---
76
+
77
+ ## Table of Contents
78
+ - [Features](#features)
79
+ - [Showcase](#showcase)
80
+ - [Getting Started](#getting-started)
81
+ - [Supported Python Versions](#supported-python-versions)
82
+ - [Why langtune?](#why-langtune)
83
+ - [Architecture Overview](#architecture-overview)
84
+ - [Core Modules](#core-modules)
85
+ - [Performance & Efficiency](#performance--efficiency)
86
+ - [Advanced Configuration](#advanced-configuration)
87
+ - [Documentation & Resources](#documentation--resources)
88
+ - [Testing & Quality](#testing--quality)
89
+ - [Examples & Use Cases](#examples--use-cases)
90
+ - [Extending the Framework](#extending-the-framework)
91
+ - [Contributing](#contributing)
92
+ - [License](#license)
93
+ - [Citation](#citation)
94
+ - [Acknowledgements](#acknowledgements)
95
+
96
+ ---
97
+
98
+ ## Features
99
+ - LoRA adapters for efficient fine-tuning
100
+ - Modular transformer backbone
101
+ - Model zoo for language models
102
+ - Configurable and extensible codebase
103
+ - Checkpointing and resume
104
+ - Mixed precision and distributed training
105
+ - Metrics and visualization tools
106
+ - CLI for training and evaluation
107
+ - Callback support (early stopping, logging, etc.)
108
+
109
+ ---
110
+
111
+ ## Showcase
112
+
113
+ Langtune is intended for building and fine-tuning large language models with LoRA. It can be used for text classification, summarization, question answering, and other NLP tasks.
114
+
115
+ ---
116
+
117
+ ## Getting Started
118
+
119
+ Install:
120
+
121
+ ```bash
122
+ pip install langtune
123
+ ```
124
+
125
+ Example usage:
126
+
127
+ ```python
128
+ import torch
129
+ from langtune.models.llm import LanguageModel
130
+ from langtune.utils.config import default_config
131
+
132
+ input_ids = torch.randint(0, 1000, (2, 128))
133
+ model = LanguageModel(
134
+ vocab_size=default_config['vocab_size'],
135
+ embed_dim=default_config['embed_dim'],
136
+ num_layers=default_config['num_layers'],
137
+ num_heads=default_config['num_heads'],
138
+ mlp_ratio=default_config['mlp_ratio'],
139
+ lora_config=default_config['lora'],
140
+ )
141
+
142
+ with torch.no_grad():
143
+ out = model(input_ids)
144
+ print('Output shape:', out.shape)
145
+ ```
146
+
147
+ See the [Documentation](docs/index.md) and `src/langtune/cli/finetune.py` for more details.
148
+
149
+ ---
150
+
151
+ ## Supported Python Versions
152
+ - Python 3.8 or newer
153
+
154
+ ---
155
+
156
+ ## Why langtune?
157
+
158
+ - Fine-tuning with LoRA adapters
159
+ - Modular transformer design
160
+ - Unified interface for language models
161
+ - Suitable for research and production
162
+ - Efficient memory usage
163
+
164
+ ---
165
+
166
+ ## Architecture Overview
167
+
168
+ Langtune uses a transformer backbone with LoRA adapters in attention and MLP layers. This enables adaptation of pre-trained models with fewer trainable parameters.
169
+
170
+ ### Model Data Flow
171
+
172
+ ```mermaid
173
+ ---
174
+ config:
175
+ layout: dagre
176
+ ---
177
+ flowchart TD
178
+ subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
179
+ LA1(["LoRA Adapter 1"])
180
+ LA2(["LoRA Adapter 2"])
181
+ LA3(["LoRA Adapter N"])
182
+ end
183
+ A(["Input Tokens"]) --> B(["Embedding Layer"])
184
+ B --> C(["Positional Encoding"])
185
+ C --> D1(["Encoder Layer 1"])
186
+ D1 --> D2(["Encoder Layer 2"])
187
+ D2 --> D3(["Encoder Layer N"])
188
+ D3 --> E(["LayerNorm"])
189
+ E --> F(["MLP Head"])
190
+ F --> G(["Output Logits"])
191
+ LA1 -.-> D1
192
+ LA2 -.-> D2
193
+ LA3 -.-> D3
194
+ LA1:::loraStyle
195
+ LA2:::loraStyle
196
+ LA3:::loraStyle
197
+ classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
198
+ ```
199
+
200
+ ---
201
+
202
+ ## Core Modules
203
+
204
+ | Module | Description | Key Features |
205
+ |--------|-------------|--------------|
206
+ | Embedding | Token embedding and positional encoding | Configurable vocab size, position embeddings |
207
+ | TransformerEncoder | Multi-layer transformer backbone | Self-attention, LoRA integration, checkpointing |
208
+ | LoRALinear | Low-rank adaptation layers | Configurable rank, memory-efficient updates |
209
+ | MLPHead | Output projection layer | Classification, regression, dropout |
210
+ | Config System | Centralized configuration | YAML/JSON config, CLI overrides |
211
+ | Data Utils | Preprocessing and augmentation | Built-in tokenization, custom loaders |
212
+
213
+ ---
214
+
215
+ ## Performance & Efficiency
216
+
217
+ | Metric | Full Fine-tuning | LoRA Fine-tuning | Improvement |
218
+ |--------|------------------|------------------|-------------|
219
+ | Trainable Parameters | 125M | 3.2M | 97% reduction |
220
+ | Memory Usage | 16GB | 5GB | 69% reduction |
221
+ | Training Time | 6h | 2h | 67% faster |
222
+ | Storage per Task | 500MB | 12MB | 98% smaller |
223
+
224
+ *Benchmarks: Transformer-Base, WikiText-103, RTX 3090*
225
+
226
+ Supported model sizes: Transformer-Tiny, Transformer-Small, Transformer-Base, Transformer-Large
227
+
228
+ ---
229
+
230
+ ## Advanced Configuration
231
+
232
+ Example LoRA config:
233
+
234
+ ```python
235
+ lora_config = {
236
+ "rank": 16,
237
+ "alpha": 32,
238
+ "dropout": 0.1,
239
+ "target_modules": ["attention.qkv", "attention.proj", "mlp.fc1", "mlp.fc2"],
240
+ "merge_weights": False
241
+ }
242
+ ```
243
+
244
+ Example training config:
245
+
246
+ ```yaml
247
+ model:
248
+ name: "transformer_base"
249
+ vocab_size: 50257
250
+ embed_dim: 768
251
+ num_layers: 12
252
+ num_heads: 12
253
+ training:
254
+ epochs: 10
255
+ batch_size: 32
256
+ learning_rate: 1e-4
257
+ weight_decay: 0.01
258
+ warmup_steps: 1000
259
+ lora:
260
+ rank: 16
261
+ alpha: 32
262
+ dropout: 0.1
263
+ ```
264
+
265
+ ---
266
+
267
+ ## Documentation & Resources
268
+ - [API Reference](docs/api/index.md)
269
+ - [Tutorials and Examples](docs/tutorials/index.md)
270
+ - [Research Papers](#research-papers)
271
+ - [Best Practices Guide](docs/best_practices.md)
272
+ - [Troubleshooting](docs/troubleshooting.md)
273
+
274
+ ### Research Papers
275
+ - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
276
+ - [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
277
+ - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
278
+
279
+ ---
280
+
281
+ ## Testing & Quality
282
+
283
+ Run tests:
284
+
285
+ ```bash
286
+ pytest tests/
287
+ ```
288
+
289
+ Code quality tools:
290
+
291
+ ```bash
292
+ flake8 src/
293
+ black src/ --check
294
+ mypy src/
295
+ bandit -r src/
296
+ ```
297
+
298
+ ---
299
+
300
+ ## Examples & Use Cases
301
+
302
+ Text classification:
303
+
304
+ ```python
305
+ from langtune import LanguageModel
306
+ from langtune.datasets import TextClassificationDataset
307
+
308
+ model = LanguageModel.from_pretrained("transformer_base")
309
+ dataset = TextClassificationDataset(train=True, tokenizer=model.tokenizer)
310
+ model.finetune(dataset, epochs=10, lora_rank=16)
311
+ ```
312
+
313
+ Custom dataset:
314
+
315
+ ```python
316
+ from langtune.datasets import CustomTextDataset
317
+
318
+ dataset = CustomTextDataset(
319
+ file_path="/path/to/dataset.txt",
320
+ split="train",
321
+ tokenizer=model.tokenizer
322
+ )
323
+ model.finetune(dataset, config_path="configs/custom_config.yaml")
324
+ ```
325
+
326
+ ---
327
+
328
+ ## Extending the Framework
329
+ - Add datasets in `src/langtune/data/datasets.py`
330
+ - Add callbacks in `src/langtune/callbacks/`
331
+ - Add models in `src/langtune/models/`
332
+ - Add CLI tools in `src/langtune/cli/`
333
+
334
+ ## Documentation
335
+ - See code comments and docstrings for details.
336
+ - For advanced usage, see `src/langtune/cli/finetune.py`.
337
+
338
+ ## Contributing
339
+ Contributions are welcome. See the [Contributing Guide](CONTRIBUTING.md) for details.
340
+
341
+ ## License
342
+
343
+ This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
344
+
345
+ ## Citation
346
+
347
+ If you use langtune in your research, please cite:
348
+
349
+ ```bibtex
350
+ @software{langtune2025,
351
+ author = {Pritesh Raj},
352
+ title = {langtune: LLMs with Efficient LoRA Fine-Tuning},
353
+ url = {https://github.com/langtrain-ai/langtune},
354
+ year = {2025},
355
+ version = {0.1.0}
356
+ }
357
+ ```
358
+
359
+ ## Acknowledgements
360
+
361
+ We thank the following projects and communities:
362
+ - [PyTorch](https://pytorch.org/)
363
+ - [HuggingFace](https://huggingface.co/)
364
+ - [PEFT](https://github.com/huggingface/peft)
365
+
366
+ <p align="center">
367
+ <b>Made in India 🇮🇳 with ❤️ by the langtune team</b><br/>
368
+ <i>Star ⭐ this repo if you find it useful!</i>
369
+ </p>
@@ -0,0 +1,328 @@
1
+ # Langtune: Efficient LoRA Fine-Tuning for Text LLMs
2
+
3
+ <hr/>
4
+ <p align="center">
5
+ <picture>
6
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langtune-use-dark.png">
7
+ <img alt="Langtune Logo" src="https://raw.githubusercontent.com/langtrain-ai/langtrain/main/static/langtune-white.png" width="full" />
8
+ </picture>
9
+ </p>
10
+
11
+ <!-- Badges -->
12
+ <p align="center">
13
+ <a href="https://pypi.org/project/langtune/"><img src="https://img.shields.io/pypi/v/langtune.svg" alt="PyPI version"></a>
14
+ <a href="https://pepy.tech/project/langtune"><img src="https://pepy.tech/badge/langtune" alt="Downloads"></a>
15
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a>
16
+ <a href="https://img.shields.io/badge/coverage-90%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen"/></a>
17
+ <a href="https://img.shields.io/badge/python-3.8%2B-blue" alt="Python Version"> <img src="https://img.shields.io/badge/python-3.8%2B-blue"/></a>
18
+ <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
19
+ </p>
20
+
21
+ <p align="center">
22
+ <b>Langtune is a Python package for fine-tuning large language models on text data using LoRA.</b><br/>
23
+ <span style="font-size:1.1em"><i>Provides modular components for adapting language models to various NLP tasks.</i></span>
24
+ </p>
25
+ <hr/>
26
+
27
+ ## Quick Links
28
+ - [Documentation](docs/index.md)
29
+ - [Tutorials](docs/tutorials/index.md)
30
+ - [Changelog](CHANGELOG.md)
31
+ - [Contributing Guide](CONTRIBUTING.md)
32
+ - [Roadmap](ROADMAP.md)
33
+
34
+ ---
35
+
36
+ ## Table of Contents
37
+ - [Features](#features)
38
+ - [Showcase](#showcase)
39
+ - [Getting Started](#getting-started)
40
+ - [Supported Python Versions](#supported-python-versions)
41
+ - [Why langtune?](#why-langtune)
42
+ - [Architecture Overview](#architecture-overview)
43
+ - [Core Modules](#core-modules)
44
+ - [Performance & Efficiency](#performance--efficiency)
45
+ - [Advanced Configuration](#advanced-configuration)
46
+ - [Documentation & Resources](#documentation--resources)
47
+ - [Testing & Quality](#testing--quality)
48
+ - [Examples & Use Cases](#examples--use-cases)
49
+ - [Extending the Framework](#extending-the-framework)
50
+ - [Contributing](#contributing)
51
+ - [License](#license)
52
+ - [Citation](#citation)
53
+ - [Acknowledgements](#acknowledgements)
54
+
55
+ ---
56
+
57
+ ## Features
58
+ - LoRA adapters for efficient fine-tuning
59
+ - Modular transformer backbone
60
+ - Model zoo for language models
61
+ - Configurable and extensible codebase
62
+ - Checkpointing and resume
63
+ - Mixed precision and distributed training
64
+ - Metrics and visualization tools
65
+ - CLI for training and evaluation
66
+ - Callback support (early stopping, logging, etc.)
67
+
68
+ ---
69
+
70
+ ## Showcase
71
+
72
+ Langtune is intended for building and fine-tuning large language models with LoRA. It can be used for text classification, summarization, question answering, and other NLP tasks.
73
+
74
+ ---
75
+
76
+ ## Getting Started
77
+
78
+ Install:
79
+
80
+ ```bash
81
+ pip install langtune
82
+ ```
83
+
84
+ Example usage:
85
+
86
+ ```python
87
+ import torch
88
+ from langtune.models.llm import LanguageModel
89
+ from langtune.utils.config import default_config
90
+
91
+ input_ids = torch.randint(0, 1000, (2, 128))
92
+ model = LanguageModel(
93
+ vocab_size=default_config['vocab_size'],
94
+ embed_dim=default_config['embed_dim'],
95
+ num_layers=default_config['num_layers'],
96
+ num_heads=default_config['num_heads'],
97
+ mlp_ratio=default_config['mlp_ratio'],
98
+ lora_config=default_config['lora'],
99
+ )
100
+
101
+ with torch.no_grad():
102
+ out = model(input_ids)
103
+ print('Output shape:', out.shape)
104
+ ```
105
+
106
+ See the [Documentation](docs/index.md) and `src/langtune/cli/finetune.py` for more details.
107
+
108
+ ---
109
+
110
+ ## Supported Python Versions
111
+ - Python 3.8 or newer
112
+
113
+ ---
114
+
115
+ ## Why langtune?
116
+
117
+ - Fine-tuning with LoRA adapters
118
+ - Modular transformer design
119
+ - Unified interface for language models
120
+ - Suitable for research and production
121
+ - Efficient memory usage
122
+
123
+ ---
124
+
125
+ ## Architecture Overview
126
+
127
+ Langtune uses a transformer backbone with LoRA adapters in attention and MLP layers. This enables adaptation of pre-trained models with fewer trainable parameters.
128
+
129
+ ### Model Data Flow
130
+
131
+ ```mermaid
132
+ ---
133
+ config:
134
+ layout: dagre
135
+ ---
136
+ flowchart TD
137
+ subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
138
+ LA1(["LoRA Adapter 1"])
139
+ LA2(["LoRA Adapter 2"])
140
+ LA3(["LoRA Adapter N"])
141
+ end
142
+ A(["Input Tokens"]) --> B(["Embedding Layer"])
143
+ B --> C(["Positional Encoding"])
144
+ C --> D1(["Encoder Layer 1"])
145
+ D1 --> D2(["Encoder Layer 2"])
146
+ D2 --> D3(["Encoder Layer N"])
147
+ D3 --> E(["LayerNorm"])
148
+ E --> F(["MLP Head"])
149
+ F --> G(["Output Logits"])
150
+ LA1 -.-> D1
151
+ LA2 -.-> D2
152
+ LA3 -.-> D3
153
+ LA1:::loraStyle
154
+ LA2:::loraStyle
155
+ LA3:::loraStyle
156
+ classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
157
+ ```
158
+
159
+ ---
160
+
161
+ ## Core Modules
162
+
163
+ | Module | Description | Key Features |
164
+ |--------|-------------|--------------|
165
+ | Embedding | Token embedding and positional encoding | Configurable vocab size, position embeddings |
166
+ | TransformerEncoder | Multi-layer transformer backbone | Self-attention, LoRA integration, checkpointing |
167
+ | LoRALinear | Low-rank adaptation layers | Configurable rank, memory-efficient updates |
168
+ | MLPHead | Output projection layer | Classification, regression, dropout |
169
+ | Config System | Centralized configuration | YAML/JSON config, CLI overrides |
170
+ | Data Utils | Preprocessing and augmentation | Built-in tokenization, custom loaders |
171
+
172
+ ---
173
+
174
+ ## Performance & Efficiency
175
+
176
+ | Metric | Full Fine-tuning | LoRA Fine-tuning | Improvement |
177
+ |--------|------------------|------------------|-------------|
178
+ | Trainable Parameters | 125M | 3.2M | 97% reduction |
179
+ | Memory Usage | 16GB | 5GB | 69% reduction |
180
+ | Training Time | 6h | 2h | 67% faster |
181
+ | Storage per Task | 500MB | 12MB | 98% smaller |
182
+
183
+ *Benchmarks: Transformer-Base, WikiText-103, RTX 3090*
184
+
185
+ Supported model sizes: Transformer-Tiny, Transformer-Small, Transformer-Base, Transformer-Large
186
+
187
+ ---
188
+
189
+ ## Advanced Configuration
190
+
191
+ Example LoRA config:
192
+
193
+ ```python
194
+ lora_config = {
195
+ "rank": 16,
196
+ "alpha": 32,
197
+ "dropout": 0.1,
198
+ "target_modules": ["attention.qkv", "attention.proj", "mlp.fc1", "mlp.fc2"],
199
+ "merge_weights": False
200
+ }
201
+ ```
202
+
203
+ Example training config:
204
+
205
+ ```yaml
206
+ model:
207
+ name: "transformer_base"
208
+ vocab_size: 50257
209
+ embed_dim: 768
210
+ num_layers: 12
211
+ num_heads: 12
212
+ training:
213
+ epochs: 10
214
+ batch_size: 32
215
+ learning_rate: 1e-4
216
+ weight_decay: 0.01
217
+ warmup_steps: 1000
218
+ lora:
219
+ rank: 16
220
+ alpha: 32
221
+ dropout: 0.1
222
+ ```
223
+
224
+ ---
225
+
226
+ ## Documentation & Resources
227
+ - [API Reference](docs/api/index.md)
228
+ - [Tutorials and Examples](docs/tutorials/index.md)
229
+ - [Research Papers](#research-papers)
230
+ - [Best Practices Guide](docs/best_practices.md)
231
+ - [Troubleshooting](docs/troubleshooting.md)
232
+
233
+ ### Research Papers
234
+ - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
235
+ - [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
236
+ - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
237
+
238
+ ---
239
+
240
+ ## Testing & Quality
241
+
242
+ Run tests:
243
+
244
+ ```bash
245
+ pytest tests/
246
+ ```
247
+
248
+ Code quality tools:
249
+
250
+ ```bash
251
+ flake8 src/
252
+ black src/ --check
253
+ mypy src/
254
+ bandit -r src/
255
+ ```
256
+
257
+ ---
258
+
259
+ ## Examples & Use Cases
260
+
261
+ Text classification:
262
+
263
+ ```python
264
+ from langtune import LanguageModel
265
+ from langtune.datasets import TextClassificationDataset
266
+
267
+ model = LanguageModel.from_pretrained("transformer_base")
268
+ dataset = TextClassificationDataset(train=True, tokenizer=model.tokenizer)
269
+ model.finetune(dataset, epochs=10, lora_rank=16)
270
+ ```
271
+
272
+ Custom dataset:
273
+
274
+ ```python
275
+ from langtune.datasets import CustomTextDataset
276
+
277
+ dataset = CustomTextDataset(
278
+ file_path="/path/to/dataset.txt",
279
+ split="train",
280
+ tokenizer=model.tokenizer
281
+ )
282
+ model.finetune(dataset, config_path="configs/custom_config.yaml")
283
+ ```
284
+
285
+ ---
286
+
287
+ ## Extending the Framework
288
+ - Add datasets in `src/langtune/data/datasets.py`
289
+ - Add callbacks in `src/langtune/callbacks/`
290
+ - Add models in `src/langtune/models/`
291
+ - Add CLI tools in `src/langtune/cli/`
292
+
293
+ ## Documentation
294
+ - See code comments and docstrings for details.
295
+ - For advanced usage, see `src/langtune/cli/finetune.py`.
296
+
297
+ ## Contributing
298
+ Contributions are welcome. See the [Contributing Guide](CONTRIBUTING.md) for details.
299
+
300
+ ## License
301
+
302
+ This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
303
+
304
+ ## Citation
305
+
306
+ If you use langtune in your research, please cite:
307
+
308
+ ```bibtex
309
+ @software{langtune2025,
310
+ author = {Pritesh Raj},
311
+ title = {langtune: LLMs with Efficient LoRA Fine-Tuning},
312
+ url = {https://github.com/langtrain-ai/langtune},
313
+ year = {2025},
314
+ version = {0.1.0}
315
+ }
316
+ ```
317
+
318
+ ## Acknowledgements
319
+
320
+ We thank the following projects and communities:
321
+ - [PyTorch](https://pytorch.org/)
322
+ - [HuggingFace](https://huggingface.co/)
323
+ - [PEFT](https://github.com/huggingface/peft)
324
+
325
+ <p align="center">
326
+ <b>Made in India 🇮🇳 with ❤️ by the langtune team</b><br/>
327
+ <i>Star ⭐ this repo if you find it useful!</i>
328
+ </p>
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "langtune"
7
- version = "0.1.0"
7
+ version = "0.1.1"
8
8
  description = "A package for finetuning text models."
9
9
  authors = [
10
10
  { name = "Pritesh Raj", email = "priteshraj41@gmail.com" }