EvoScientist 0.0.1.dev3__py3-none-any.whl → 0.1.0rc1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. EvoScientist/EvoScientist.py +17 -49
  2. EvoScientist/backends.py +0 -26
  3. EvoScientist/cli.py +1109 -255
  4. EvoScientist/middleware.py +8 -61
  5. EvoScientist/stream/__init__.py +0 -25
  6. EvoScientist/stream/utils.py +16 -23
  7. EvoScientist/tools.py +0 -64
  8. evoscientist-0.1.0rc1.dist-info/METADATA +199 -0
  9. evoscientist-0.1.0rc1.dist-info/RECORD +21 -0
  10. evoscientist-0.1.0rc1.dist-info/entry_points.txt +2 -0
  11. EvoScientist/memory.py +0 -715
  12. EvoScientist/paths.py +0 -45
  13. EvoScientist/skills/accelerate/SKILL.md +0 -332
  14. EvoScientist/skills/accelerate/references/custom-plugins.md +0 -453
  15. EvoScientist/skills/accelerate/references/megatron-integration.md +0 -489
  16. EvoScientist/skills/accelerate/references/performance.md +0 -525
  17. EvoScientist/skills/bitsandbytes/SKILL.md +0 -411
  18. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +0 -521
  19. EvoScientist/skills/bitsandbytes/references/qlora-training.md +0 -521
  20. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +0 -447
  21. EvoScientist/skills/find-skills/SKILL.md +0 -133
  22. EvoScientist/skills/find-skills/scripts/install_skill.py +0 -211
  23. EvoScientist/skills/flash-attention/SKILL.md +0 -367
  24. EvoScientist/skills/flash-attention/references/benchmarks.md +0 -215
  25. EvoScientist/skills/flash-attention/references/transformers-integration.md +0 -293
  26. EvoScientist/skills/llama-cpp/SKILL.md +0 -258
  27. EvoScientist/skills/llama-cpp/references/optimization.md +0 -89
  28. EvoScientist/skills/llama-cpp/references/quantization.md +0 -213
  29. EvoScientist/skills/llama-cpp/references/server.md +0 -125
  30. EvoScientist/skills/lm-evaluation-harness/SKILL.md +0 -490
  31. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +0 -490
  32. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +0 -488
  33. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +0 -602
  34. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +0 -519
  35. EvoScientist/skills/ml-paper-writing/SKILL.md +0 -937
  36. EvoScientist/skills/ml-paper-writing/references/checklists.md +0 -361
  37. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +0 -562
  38. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +0 -367
  39. EvoScientist/skills/ml-paper-writing/references/sources.md +0 -159
  40. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +0 -476
  41. EvoScientist/skills/ml-paper-writing/templates/README.md +0 -251
  42. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +0 -534
  43. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +0 -144
  44. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +0 -952
  45. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +0 -111
  46. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +0 -1493
  47. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +0 -315
  48. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +0 -50
  49. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +0 -312
  50. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +0 -377
  51. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +0 -101
  52. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +0 -1940
  53. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +0 -26
  54. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +0 -70
  55. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +0 -326
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +0 -3
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +0 -11
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +0 -1440
  59. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  60. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +0 -218
  61. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +0 -305
  62. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +0 -485
  63. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +0 -508
  64. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +0 -1246
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +0 -485
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +0 -24
  67. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +0 -1440
  68. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  69. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +0 -246
  70. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +0 -414
  71. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +0 -508
  72. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +0 -1246
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +0 -79
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +0 -201
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +0 -75
  76. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  77. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +0 -662
  78. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +0 -864
  79. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +0 -1443
  80. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +0 -767
  81. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  82. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +0 -36
  83. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +0 -53
  84. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +0 -38
  85. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +0 -382
  86. EvoScientist/skills/peft/SKILL.md +0 -431
  87. EvoScientist/skills/peft/references/advanced-usage.md +0 -514
  88. EvoScientist/skills/peft/references/troubleshooting.md +0 -480
  89. EvoScientist/skills/ray-data/SKILL.md +0 -326
  90. EvoScientist/skills/ray-data/references/integration.md +0 -82
  91. EvoScientist/skills/ray-data/references/transformations.md +0 -83
  92. EvoScientist/skills/skill-creator/LICENSE.txt +0 -202
  93. EvoScientist/skills/skill-creator/SKILL.md +0 -356
  94. EvoScientist/skills/skill-creator/references/output-patterns.md +0 -82
  95. EvoScientist/skills/skill-creator/references/workflows.md +0 -28
  96. EvoScientist/skills/skill-creator/scripts/init_skill.py +0 -303
  97. EvoScientist/skills/skill-creator/scripts/package_skill.py +0 -110
  98. EvoScientist/skills/skill-creator/scripts/quick_validate.py +0 -95
  99. EvoScientist/skills_manager.py +0 -392
  100. EvoScientist/stream/display.py +0 -604
  101. EvoScientist/stream/events.py +0 -415
  102. EvoScientist/stream/state.py +0 -343
  103. evoscientist-0.0.1.dev3.dist-info/METADATA +0 -321
  104. evoscientist-0.0.1.dev3.dist-info/RECORD +0 -113
  105. evoscientist-0.0.1.dev3.dist-info/entry_points.txt +0 -5
  106. {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/WHEEL +0 -0
  107. {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/licenses/LICENSE +0 -0
  108. {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/top_level.txt +0 -0
@@ -1,431 +0,0 @@
1
- ---
2
- name: peft
3
- description: Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
4
- version: 1.0.0
5
- author: Orchestra Research
6
- license: MIT
7
- tags: [Fine-Tuning, PEFT, LoRA, QLoRA, Parameter-Efficient, Adapters, Low-Rank, Memory Optimization, Multi-Adapter]
8
- dependencies: [peft>=0.13.0, transformers>=4.45.0, torch>=2.0.0, bitsandbytes>=0.43.0]
9
- ---
10
-
11
- # PEFT (Parameter-Efficient Fine-Tuning)
12
-
13
- Fine-tune LLMs by training <1% of parameters using LoRA, QLoRA, and 25+ adapter methods.
14
-
15
- ## When to use PEFT
16
-
17
- **Use PEFT/LoRA when:**
18
- - Fine-tuning 7B-70B models on consumer GPUs (RTX 4090, A100)
19
- - Need to train <1% parameters (6MB adapters vs 14GB full model)
20
- - Want fast iteration with multiple task-specific adapters
21
- - Deploying multiple fine-tuned variants from one base model
22
-
23
- **Use QLoRA (PEFT + quantization) when:**
24
- - Fine-tuning 70B models on single 24GB GPU
25
- - Memory is the primary constraint
26
- - Can accept ~5% quality trade-off vs full fine-tuning
27
-
28
- **Use full fine-tuning instead when:**
29
- - Training small models (<1B parameters)
30
- - Need maximum quality and have compute budget
31
- - Significant domain shift requires updating all weights
32
-
33
- ## Quick start
34
-
35
- ### Installation
36
-
37
- ```bash
38
- # Basic installation
39
- pip install peft
40
-
41
- # With quantization support (recommended)
42
- pip install peft bitsandbytes
43
-
44
- # Full stack
45
- pip install peft transformers accelerate bitsandbytes datasets
46
- ```
47
-
48
- ### LoRA fine-tuning (standard)
49
-
50
- ```python
51
- from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
52
- from peft import get_peft_model, LoraConfig, TaskType
53
- from datasets import load_dataset
54
-
55
- # Load base model
56
- model_name = "meta-llama/Llama-3.1-8B"
57
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
58
- tokenizer = AutoTokenizer.from_pretrained(model_name)
59
- tokenizer.pad_token = tokenizer.eos_token
60
-
61
- # LoRA configuration
62
- lora_config = LoraConfig(
63
- task_type=TaskType.CAUSAL_LM,
64
- r=16, # Rank (8-64, higher = more capacity)
65
- lora_alpha=32, # Scaling factor (typically 2*r)
66
- lora_dropout=0.05, # Dropout for regularization
67
- target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], # Attention layers
68
- bias="none" # Don't train biases
69
- )
70
-
71
- # Apply LoRA
72
- model = get_peft_model(model, lora_config)
73
- model.print_trainable_parameters()
74
- # Output: trainable params: 13,631,488 || all params: 8,043,307,008 || trainable%: 0.17%
75
-
76
- # Prepare dataset
77
- dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
78
-
79
- def tokenize(example):
80
- text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}"
81
- return tokenizer(text, truncation=True, max_length=512, padding="max_length")
82
-
83
- tokenized = dataset.map(tokenize, remove_columns=dataset.column_names)
84
-
85
- # Training
86
- training_args = TrainingArguments(
87
- output_dir="./lora-llama",
88
- num_train_epochs=3,
89
- per_device_train_batch_size=4,
90
- gradient_accumulation_steps=4,
91
- learning_rate=2e-4,
92
- fp16=True,
93
- logging_steps=10,
94
- save_strategy="epoch"
95
- )
96
-
97
- trainer = Trainer(
98
- model=model,
99
- args=training_args,
100
- train_dataset=tokenized,
101
- data_collator=lambda data: {"input_ids": torch.stack([f["input_ids"] for f in data]),
102
- "attention_mask": torch.stack([f["attention_mask"] for f in data]),
103
- "labels": torch.stack([f["input_ids"] for f in data])}
104
- )
105
-
106
- trainer.train()
107
-
108
- # Save adapter only (6MB vs 16GB)
109
- model.save_pretrained("./lora-llama-adapter")
110
- ```
111
-
112
- ### QLoRA fine-tuning (memory-efficient)
113
-
114
- ```python
115
- from transformers import AutoModelForCausalLM, BitsAndBytesConfig
116
- from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
117
-
118
- # 4-bit quantization config
119
- bnb_config = BitsAndBytesConfig(
120
- load_in_4bit=True,
121
- bnb_4bit_quant_type="nf4", # NormalFloat4 (best for LLMs)
122
- bnb_4bit_compute_dtype="bfloat16", # Compute in bf16
123
- bnb_4bit_use_double_quant=True # Nested quantization
124
- )
125
-
126
- # Load quantized model
127
- model = AutoModelForCausalLM.from_pretrained(
128
- "meta-llama/Llama-3.1-70B",
129
- quantization_config=bnb_config,
130
- device_map="auto"
131
- )
132
-
133
- # Prepare for training (enables gradient checkpointing)
134
- model = prepare_model_for_kbit_training(model)
135
-
136
- # LoRA config for QLoRA
137
- lora_config = LoraConfig(
138
- r=64, # Higher rank for 70B
139
- lora_alpha=128,
140
- lora_dropout=0.1,
141
- target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
142
- bias="none",
143
- task_type="CAUSAL_LM"
144
- )
145
-
146
- model = get_peft_model(model, lora_config)
147
- # 70B model now fits on single 24GB GPU!
148
- ```
149
-
150
- ## LoRA parameter selection
151
-
152
- ### Rank (r) - capacity vs efficiency
153
-
154
- | Rank | Trainable Params | Memory | Quality | Use Case |
155
- |------|-----------------|--------|---------|----------|
156
- | 4 | ~3M | Minimal | Lower | Simple tasks, prototyping |
157
- | **8** | ~7M | Low | Good | **Recommended starting point** |
158
- | **16** | ~14M | Medium | Better | **General fine-tuning** |
159
- | 32 | ~27M | Higher | High | Complex tasks |
160
- | 64 | ~54M | High | Highest | Domain adaptation, 70B models |
161
-
162
- ### Alpha (lora_alpha) - scaling factor
163
-
164
- ```python
165
- # Rule of thumb: alpha = 2 * rank
166
- LoraConfig(r=16, lora_alpha=32) # Standard
167
- LoraConfig(r=16, lora_alpha=16) # Conservative (lower learning rate effect)
168
- LoraConfig(r=16, lora_alpha=64) # Aggressive (higher learning rate effect)
169
- ```
170
-
171
- ### Target modules by architecture
172
-
173
- ```python
174
- # Llama / Mistral / Qwen
175
- target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
176
-
177
- # GPT-2 / GPT-Neo
178
- target_modules = ["c_attn", "c_proj", "c_fc"]
179
-
180
- # Falcon
181
- target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]
182
-
183
- # BLOOM
184
- target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]
185
-
186
- # Auto-detect all linear layers
187
- target_modules = "all-linear" # PEFT 0.6.0+
188
- ```
189
-
190
- ## Loading and merging adapters
191
-
192
- ### Load trained adapter
193
-
194
- ```python
195
- from peft import PeftModel, AutoPeftModelForCausalLM
196
- from transformers import AutoModelForCausalLM
197
-
198
- # Option 1: Load with PeftModel
199
- base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
200
- model = PeftModel.from_pretrained(base_model, "./lora-llama-adapter")
201
-
202
- # Option 2: Load directly (recommended)
203
- model = AutoPeftModelForCausalLM.from_pretrained(
204
- "./lora-llama-adapter",
205
- device_map="auto"
206
- )
207
- ```
208
-
209
- ### Merge adapter into base model
210
-
211
- ```python
212
- # Merge for deployment (no adapter overhead)
213
- merged_model = model.merge_and_unload()
214
-
215
- # Save merged model
216
- merged_model.save_pretrained("./llama-merged")
217
- tokenizer.save_pretrained("./llama-merged")
218
-
219
- # Push to Hub
220
- merged_model.push_to_hub("username/llama-finetuned")
221
- ```
222
-
223
- ### Multi-adapter serving
224
-
225
- ```python
226
- from peft import PeftModel
227
-
228
- # Load base with first adapter
229
- model = AutoPeftModelForCausalLM.from_pretrained("./adapter-task1")
230
-
231
- # Load additional adapters
232
- model.load_adapter("./adapter-task2", adapter_name="task2")
233
- model.load_adapter("./adapter-task3", adapter_name="task3")
234
-
235
- # Switch between adapters at runtime
236
- model.set_adapter("task1") # Use task1 adapter
237
- output1 = model.generate(**inputs)
238
-
239
- model.set_adapter("task2") # Switch to task2
240
- output2 = model.generate(**inputs)
241
-
242
- # Disable adapters (use base model)
243
- with model.disable_adapter():
244
- base_output = model.generate(**inputs)
245
- ```
246
-
247
- ## PEFT methods comparison
248
-
249
- | Method | Trainable % | Memory | Speed | Best For |
250
- |--------|------------|--------|-------|----------|
251
- | **LoRA** | 0.1-1% | Low | Fast | General fine-tuning |
252
- | **QLoRA** | 0.1-1% | Very Low | Medium | Memory-constrained |
253
- | AdaLoRA | 0.1-1% | Low | Medium | Automatic rank selection |
254
- | IA3 | 0.01% | Minimal | Fastest | Few-shot adaptation |
255
- | Prefix Tuning | 0.1% | Low | Medium | Generation control |
256
- | Prompt Tuning | 0.001% | Minimal | Fast | Simple task adaptation |
257
- | P-Tuning v2 | 0.1% | Low | Medium | NLU tasks |
258
-
259
- ### IA3 (minimal parameters)
260
-
261
- ```python
262
- from peft import IA3Config
263
-
264
- ia3_config = IA3Config(
265
- target_modules=["q_proj", "v_proj", "k_proj", "down_proj"],
266
- feedforward_modules=["down_proj"]
267
- )
268
- model = get_peft_model(model, ia3_config)
269
- # Trains only 0.01% of parameters!
270
- ```
271
-
272
- ### Prefix Tuning
273
-
274
- ```python
275
- from peft import PrefixTuningConfig
276
-
277
- prefix_config = PrefixTuningConfig(
278
- task_type="CAUSAL_LM",
279
- num_virtual_tokens=20, # Prepended tokens
280
- prefix_projection=True # Use MLP projection
281
- )
282
- model = get_peft_model(model, prefix_config)
283
- ```
284
-
285
- ## Integration patterns
286
-
287
- ### With TRL (SFTTrainer)
288
-
289
- ```python
290
- from trl import SFTTrainer, SFTConfig
291
- from peft import LoraConfig
292
-
293
- lora_config = LoraConfig(r=16, lora_alpha=32, target_modules="all-linear")
294
-
295
- trainer = SFTTrainer(
296
- model=model,
297
- args=SFTConfig(output_dir="./output", max_seq_length=512),
298
- train_dataset=dataset,
299
- peft_config=lora_config, # Pass LoRA config directly
300
- )
301
- trainer.train()
302
- ```
303
-
304
- ### With Axolotl (YAML config)
305
-
306
- ```yaml
307
- # axolotl config.yaml
308
- adapter: lora
309
- lora_r: 16
310
- lora_alpha: 32
311
- lora_dropout: 0.05
312
- lora_target_modules:
313
- - q_proj
314
- - v_proj
315
- - k_proj
316
- - o_proj
317
- lora_target_linear: true # Target all linear layers
318
- ```
319
-
320
- ### With vLLM (inference)
321
-
322
- ```python
323
- from vllm import LLM
324
- from vllm.lora.request import LoRARequest
325
-
326
- # Load base model with LoRA support
327
- llm = LLM(model="meta-llama/Llama-3.1-8B", enable_lora=True)
328
-
329
- # Serve with adapter
330
- outputs = llm.generate(
331
- prompts,
332
- lora_request=LoRARequest("adapter1", 1, "./lora-adapter")
333
- )
334
- ```
335
-
336
- ## Performance benchmarks
337
-
338
- ### Memory usage (Llama 3.1 8B)
339
-
340
- | Method | GPU Memory | Trainable Params |
341
- |--------|-----------|------------------|
342
- | Full fine-tuning | 60+ GB | 8B (100%) |
343
- | LoRA r=16 | 18 GB | 14M (0.17%) |
344
- | QLoRA r=16 | 6 GB | 14M (0.17%) |
345
- | IA3 | 16 GB | 800K (0.01%) |
346
-
347
- ### Training speed (A100 80GB)
348
-
349
- | Method | Tokens/sec | vs Full FT |
350
- |--------|-----------|------------|
351
- | Full FT | 2,500 | 1x |
352
- | LoRA | 3,200 | 1.3x |
353
- | QLoRA | 2,100 | 0.84x |
354
-
355
- ### Quality (MMLU benchmark)
356
-
357
- | Model | Full FT | LoRA | QLoRA |
358
- |-------|---------|------|-------|
359
- | Llama 2-7B | 45.3 | 44.8 | 44.1 |
360
- | Llama 2-13B | 54.8 | 54.2 | 53.5 |
361
-
362
- ## Common issues
363
-
364
- ### CUDA OOM during training
365
-
366
- ```python
367
- # Solution 1: Enable gradient checkpointing
368
- model.gradient_checkpointing_enable()
369
-
370
- # Solution 2: Reduce batch size + increase accumulation
371
- TrainingArguments(
372
- per_device_train_batch_size=1,
373
- gradient_accumulation_steps=16
374
- )
375
-
376
- # Solution 3: Use QLoRA
377
- from transformers import BitsAndBytesConfig
378
- bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
379
- ```
380
-
381
- ### Adapter not applying
382
-
383
- ```python
384
- # Verify adapter is active
385
- print(model.active_adapters) # Should show adapter name
386
-
387
- # Check trainable parameters
388
- model.print_trainable_parameters()
389
-
390
- # Ensure model in training mode
391
- model.train()
392
- ```
393
-
394
- ### Quality degradation
395
-
396
- ```python
397
- # Increase rank
398
- LoraConfig(r=32, lora_alpha=64)
399
-
400
- # Target more modules
401
- target_modules = "all-linear"
402
-
403
- # Use more training data and epochs
404
- TrainingArguments(num_train_epochs=5)
405
-
406
- # Lower learning rate
407
- TrainingArguments(learning_rate=1e-4)
408
- ```
409
-
410
- ## Best practices
411
-
412
- 1. **Start with r=8-16**, increase if quality insufficient
413
- 2. **Use alpha = 2 * rank** as starting point
414
- 3. **Target attention + MLP layers** for best quality/efficiency
415
- 4. **Enable gradient checkpointing** for memory savings
416
- 5. **Save adapters frequently** (small files, easy rollback)
417
- 6. **Evaluate on held-out data** before merging
418
- 7. **Use QLoRA for 70B+ models** on consumer hardware
419
-
420
- ## References
421
-
422
- - **[Advanced Usage](references/advanced-usage.md)** - DoRA, LoftQ, rank stabilization, custom modules
423
- - **[Troubleshooting](references/troubleshooting.md)** - Common errors, debugging, optimization
424
-
425
- ## Resources
426
-
427
- - **GitHub**: https://github.com/huggingface/peft
428
- - **Docs**: https://huggingface.co/docs/peft
429
- - **LoRA Paper**: arXiv:2106.09685
430
- - **QLoRA Paper**: arXiv:2305.14314
431
- - **Models**: https://huggingface.co/models?library=peft