EvoScientist 0.0.1.dev1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. EvoScientist/EvoScientist.py +157 -0
  2. EvoScientist/__init__.py +24 -0
  3. EvoScientist/__main__.py +4 -0
  4. EvoScientist/backends.py +392 -0
  5. EvoScientist/cli.py +1553 -0
  6. EvoScientist/middleware.py +35 -0
  7. EvoScientist/prompts.py +277 -0
  8. EvoScientist/skills/accelerate/SKILL.md +332 -0
  9. EvoScientist/skills/accelerate/references/custom-plugins.md +453 -0
  10. EvoScientist/skills/accelerate/references/megatron-integration.md +489 -0
  11. EvoScientist/skills/accelerate/references/performance.md +525 -0
  12. EvoScientist/skills/bitsandbytes/SKILL.md +411 -0
  13. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +521 -0
  14. EvoScientist/skills/bitsandbytes/references/qlora-training.md +521 -0
  15. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +447 -0
  16. EvoScientist/skills/find-skills/SKILL.md +133 -0
  17. EvoScientist/skills/find-skills/scripts/install_skill.py +211 -0
  18. EvoScientist/skills/flash-attention/SKILL.md +367 -0
  19. EvoScientist/skills/flash-attention/references/benchmarks.md +215 -0
  20. EvoScientist/skills/flash-attention/references/transformers-integration.md +293 -0
  21. EvoScientist/skills/llama-cpp/SKILL.md +258 -0
  22. EvoScientist/skills/llama-cpp/references/optimization.md +89 -0
  23. EvoScientist/skills/llama-cpp/references/quantization.md +213 -0
  24. EvoScientist/skills/llama-cpp/references/server.md +125 -0
  25. EvoScientist/skills/lm-evaluation-harness/SKILL.md +490 -0
  26. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  27. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  28. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  29. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  30. EvoScientist/skills/ml-paper-writing/SKILL.md +937 -0
  31. EvoScientist/skills/ml-paper-writing/references/checklists.md +361 -0
  32. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  33. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  34. EvoScientist/skills/ml-paper-writing/references/sources.md +159 -0
  35. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +476 -0
  36. EvoScientist/skills/ml-paper-writing/templates/README.md +251 -0
  37. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  38. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  39. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  40. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  41. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  42. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  43. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +50 -0
  44. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  45. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  46. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  47. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  48. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  49. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  50. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  51. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  52. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  53. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  54. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  55. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  59. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  60. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  61. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  62. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  63. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  64. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  67. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  68. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  69. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  70. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  71. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  72. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  76. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  77. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  78. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  79. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  80. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  81. EvoScientist/skills/peft/SKILL.md +431 -0
  82. EvoScientist/skills/peft/references/advanced-usage.md +514 -0
  83. EvoScientist/skills/peft/references/troubleshooting.md +480 -0
  84. EvoScientist/skills/ray-data/SKILL.md +326 -0
  85. EvoScientist/skills/ray-data/references/integration.md +82 -0
  86. EvoScientist/skills/ray-data/references/transformations.md +83 -0
  87. EvoScientist/skills/skill-creator/LICENSE.txt +202 -0
  88. EvoScientist/skills/skill-creator/SKILL.md +356 -0
  89. EvoScientist/skills/skill-creator/references/output-patterns.md +82 -0
  90. EvoScientist/skills/skill-creator/references/workflows.md +28 -0
  91. EvoScientist/skills/skill-creator/scripts/init_skill.py +303 -0
  92. EvoScientist/skills/skill-creator/scripts/package_skill.py +110 -0
  93. EvoScientist/skills/skill-creator/scripts/quick_validate.py +95 -0
  94. EvoScientist/stream/__init__.py +53 -0
  95. EvoScientist/stream/emitter.py +94 -0
  96. EvoScientist/stream/formatter.py +168 -0
  97. EvoScientist/stream/tracker.py +115 -0
  98. EvoScientist/stream/utils.py +255 -0
  99. EvoScientist/subagent.yaml +147 -0
  100. EvoScientist/tools.py +135 -0
  101. EvoScientist/utils.py +207 -0
  102. evoscientist-0.0.1.dev1.dist-info/METADATA +222 -0
  103. evoscientist-0.0.1.dev1.dist-info/RECORD +107 -0
  104. evoscientist-0.0.1.dev1.dist-info/WHEEL +5 -0
  105. evoscientist-0.0.1.dev1.dist-info/entry_points.txt +2 -0
  106. evoscientist-0.0.1.dev1.dist-info/licenses/LICENSE +21 -0
  107. evoscientist-0.0.1.dev1.dist-info/top_level.txt +1 -0
@@ -0,0 +1,411 @@
1
+ ---
2
+ name: bitsandbytes
3
+ description: Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.
4
+ version: 1.0.0
5
+ author: Orchestra Research
6
+ license: MIT
7
+ tags: [Optimization, Bitsandbytes, Quantization, 8-Bit, 4-Bit, Memory Optimization, QLoRA, NF4, INT8, HuggingFace, Efficient Inference]
8
+ dependencies: [bitsandbytes, transformers, accelerate, torch]
9
+ ---
10
+
11
+ # bitsandbytes - LLM Quantization
12
+
13
+ ## Quick start
14
+
15
+ bitsandbytes reduces LLM memory by 50% (8-bit) or 75% (4-bit) with <1% accuracy loss.
16
+
17
+ **Installation**:
18
+ ```bash
19
+ pip install bitsandbytes transformers accelerate
20
+ ```
21
+
22
+ **8-bit quantization** (50% memory reduction):
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
25
+
26
+ config = BitsAndBytesConfig(load_in_8bit=True)
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ "meta-llama/Llama-2-7b-hf",
29
+ quantization_config=config,
30
+ device_map="auto"
31
+ )
32
+
33
+ # Memory: 14GB → 7GB
34
+ ```
35
+
36
+ **4-bit quantization** (75% memory reduction):
37
+ ```python
38
+ config = BitsAndBytesConfig(
39
+ load_in_4bit=True,
40
+ bnb_4bit_compute_dtype=torch.float16
41
+ )
42
+ model = AutoModelForCausalLM.from_pretrained(
43
+ "meta-llama/Llama-2-7b-hf",
44
+ quantization_config=config,
45
+ device_map="auto"
46
+ )
47
+
48
+ # Memory: 14GB → 3.5GB
49
+ ```
50
+
51
+ ## Common workflows
52
+
53
+ ### Workflow 1: Load large model in limited GPU memory
54
+
55
+ Copy this checklist:
56
+
57
+ ```
58
+ Quantization Loading:
59
+ - [ ] Step 1: Calculate memory requirements
60
+ - [ ] Step 2: Choose quantization level (4-bit or 8-bit)
61
+ - [ ] Step 3: Configure quantization
62
+ - [ ] Step 4: Load and verify model
63
+ ```
64
+
65
+ **Step 1: Calculate memory requirements**
66
+
67
+ Estimate model memory:
68
+ ```
69
+ FP16 memory (GB) = Parameters × 2 bytes / 1e9
70
+ INT8 memory (GB) = Parameters × 1 byte / 1e9
71
+ INT4 memory (GB) = Parameters × 0.5 bytes / 1e9
72
+
73
+ Example (Llama 2 7B):
74
+ FP16: 7B × 2 / 1e9 = 14 GB
75
+ INT8: 7B × 1 / 1e9 = 7 GB
76
+ INT4: 7B × 0.5 / 1e9 = 3.5 GB
77
+ ```
78
+
79
+ **Step 2: Choose quantization level**
80
+
81
+ | GPU VRAM | Model Size | Recommended |
82
+ |----------|------------|-------------|
83
+ | 8 GB | 3B | 4-bit |
84
+ | 12 GB | 7B | 4-bit |
85
+ | 16 GB | 7B | 8-bit or 4-bit |
86
+ | 24 GB | 13B | 8-bit or 70B 4-bit |
87
+ | 40+ GB | 70B | 8-bit |
88
+
89
+ **Step 3: Configure quantization**
90
+
91
+ For 8-bit (better accuracy):
92
+ ```python
93
+ from transformers import BitsAndBytesConfig
94
+ import torch
95
+
96
+ config = BitsAndBytesConfig(
97
+ load_in_8bit=True,
98
+ llm_int8_threshold=6.0, # Outlier threshold
99
+ llm_int8_has_fp16_weight=False
100
+ )
101
+ ```
102
+
103
+ For 4-bit (maximum memory savings):
104
+ ```python
105
+ config = BitsAndBytesConfig(
106
+ load_in_4bit=True,
107
+ bnb_4bit_compute_dtype=torch.float16, # Compute in FP16
108
+ bnb_4bit_quant_type="nf4", # NormalFloat4 (recommended)
109
+ bnb_4bit_use_double_quant=True # Nested quantization
110
+ )
111
+ ```
112
+
113
+ **Step 4: Load and verify model**
114
+
115
+ ```python
116
+ from transformers import AutoModelForCausalLM, AutoTokenizer
117
+
118
+ model = AutoModelForCausalLM.from_pretrained(
119
+ "meta-llama/Llama-2-13b-hf",
120
+ quantization_config=config,
121
+ device_map="auto", # Automatic device placement
122
+ torch_dtype=torch.float16
123
+ )
124
+
125
+ tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-13b-hf")
126
+
127
+ # Test inference
128
+ inputs = tokenizer("Hello, how are you?", return_tensors="pt").to("cuda")
129
+ outputs = model.generate(**inputs, max_length=50)
130
+ print(tokenizer.decode(outputs[0]))
131
+
132
+ # Check memory
133
+ import torch
134
+ print(f"Memory allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
135
+ ```
136
+
137
+ ### Workflow 2: Fine-tune with QLoRA (4-bit training)
138
+
139
+ QLoRA enables fine-tuning large models on consumer GPUs.
140
+
141
+ Copy this checklist:
142
+
143
+ ```
144
+ QLoRA Fine-tuning:
145
+ - [ ] Step 1: Install dependencies
146
+ - [ ] Step 2: Configure 4-bit base model
147
+ - [ ] Step 3: Add LoRA adapters
148
+ - [ ] Step 4: Train with standard Trainer
149
+ ```
150
+
151
+ **Step 1: Install dependencies**
152
+
153
+ ```bash
154
+ pip install bitsandbytes transformers peft accelerate datasets
155
+ ```
156
+
157
+ **Step 2: Configure 4-bit base model**
158
+
159
+ ```python
160
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
161
+ import torch
162
+
163
+ bnb_config = BitsAndBytesConfig(
164
+ load_in_4bit=True,
165
+ bnb_4bit_compute_dtype=torch.float16,
166
+ bnb_4bit_quant_type="nf4",
167
+ bnb_4bit_use_double_quant=True
168
+ )
169
+
170
+ model = AutoModelForCausalLM.from_pretrained(
171
+ "meta-llama/Llama-2-7b-hf",
172
+ quantization_config=bnb_config,
173
+ device_map="auto"
174
+ )
175
+ ```
176
+
177
+ **Step 3: Add LoRA adapters**
178
+
179
+ ```python
180
+ from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
181
+
182
+ # Prepare model for training
183
+ model = prepare_model_for_kbit_training(model)
184
+
185
+ # Configure LoRA
186
+ lora_config = LoraConfig(
187
+ r=16, # LoRA rank
188
+ lora_alpha=32, # LoRA alpha
189
+ target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
190
+ lora_dropout=0.05,
191
+ bias="none",
192
+ task_type="CAUSAL_LM"
193
+ )
194
+
195
+ # Add LoRA adapters
196
+ model = get_peft_model(model, lora_config)
197
+ model.print_trainable_parameters()
198
+ # Output: trainable params: 4.2M || all params: 6.7B || trainable%: 0.06%
199
+ ```
200
+
201
+ **Step 4: Train with standard Trainer**
202
+
203
+ ```python
204
+ from transformers import Trainer, TrainingArguments
205
+
206
+ training_args = TrainingArguments(
207
+ output_dir="./qlora-output",
208
+ per_device_train_batch_size=4,
209
+ gradient_accumulation_steps=4,
210
+ num_train_epochs=3,
211
+ learning_rate=2e-4,
212
+ fp16=True,
213
+ logging_steps=10,
214
+ save_strategy="epoch"
215
+ )
216
+
217
+ trainer = Trainer(
218
+ model=model,
219
+ args=training_args,
220
+ train_dataset=train_dataset,
221
+ tokenizer=tokenizer
222
+ )
223
+
224
+ trainer.train()
225
+
226
+ # Save LoRA adapters (only ~20MB)
227
+ model.save_pretrained("./qlora-adapters")
228
+ ```
229
+
230
+ ### Workflow 3: 8-bit optimizer for memory-efficient training
231
+
232
+ Use 8-bit Adam/AdamW to reduce optimizer memory by 75%.
233
+
234
+ ```
235
+ 8-bit Optimizer Setup:
236
+ - [ ] Step 1: Replace standard optimizer
237
+ - [ ] Step 2: Configure training
238
+ - [ ] Step 3: Monitor memory savings
239
+ ```
240
+
241
+ **Step 1: Replace standard optimizer**
242
+
243
+ ```python
244
+ import bitsandbytes as bnb
245
+ from transformers import Trainer, TrainingArguments
246
+
247
+ # Instead of torch.optim.AdamW
248
+ model = AutoModelForCausalLM.from_pretrained("model-name")
249
+
250
+ training_args = TrainingArguments(
251
+ output_dir="./output",
252
+ per_device_train_batch_size=8,
253
+ optim="paged_adamw_8bit", # 8-bit optimizer
254
+ learning_rate=5e-5
255
+ )
256
+
257
+ trainer = Trainer(
258
+ model=model,
259
+ args=training_args,
260
+ train_dataset=train_dataset
261
+ )
262
+
263
+ trainer.train()
264
+ ```
265
+
266
+ **Manual optimizer usage**:
267
+ ```python
268
+ import bitsandbytes as bnb
269
+
270
+ optimizer = bnb.optim.AdamW8bit(
271
+ model.parameters(),
272
+ lr=1e-4,
273
+ betas=(0.9, 0.999),
274
+ eps=1e-8
275
+ )
276
+
277
+ # Training loop
278
+ for batch in dataloader:
279
+ loss = model(**batch).loss
280
+ loss.backward()
281
+ optimizer.step()
282
+ optimizer.zero_grad()
283
+ ```
284
+
285
+ **Step 2: Configure training**
286
+
287
+ Compare memory:
288
+ ```
289
+ Standard AdamW optimizer memory = model_params × 8 bytes (states)
290
+ 8-bit AdamW memory = model_params × 2 bytes
291
+ Savings = 75% optimizer memory
292
+
293
+ Example (Llama 2 7B):
294
+ Standard: 7B × 8 = 56 GB
295
+ 8-bit: 7B × 2 = 14 GB
296
+ Savings: 42 GB
297
+ ```
298
+
299
+ **Step 3: Monitor memory savings**
300
+
301
+ ```python
302
+ import torch
303
+
304
+ before = torch.cuda.memory_allocated()
305
+
306
+ # Training step
307
+ optimizer.step()
308
+
309
+ after = torch.cuda.memory_allocated()
310
+ print(f"Memory used: {(after-before)/1e9:.2f}GB")
311
+ ```
312
+
313
+ ## When to use vs alternatives
314
+
315
+ **Use bitsandbytes when:**
316
+ - GPU memory limited (need to fit larger model)
317
+ - Training with QLoRA (fine-tune 70B on single GPU)
318
+ - Inference only (50-75% memory reduction)
319
+ - Using HuggingFace Transformers
320
+ - Acceptable 0-2% accuracy degradation
321
+
322
+ **Use alternatives instead:**
323
+ - **GPTQ/AWQ**: Production serving (faster inference than bitsandbytes)
324
+ - **GGUF**: CPU inference (llama.cpp)
325
+ - **FP8**: H100 GPUs (hardware FP8 faster)
326
+ - **Full precision**: Accuracy critical, memory not constrained
327
+
328
+ ## Common issues
329
+
330
+ **Issue: CUDA error during loading**
331
+
332
+ Install matching CUDA version:
333
+ ```bash
334
+ # Check CUDA version
335
+ nvcc --version
336
+
337
+ # Install matching bitsandbytes
338
+ pip install bitsandbytes --no-cache-dir
339
+ ```
340
+
341
+ **Issue: Model loading slow**
342
+
343
+ Use CPU offload for large models:
344
+ ```python
345
+ model = AutoModelForCausalLM.from_pretrained(
346
+ "model-name",
347
+ quantization_config=config,
348
+ device_map="auto",
349
+ max_memory={0: "20GB", "cpu": "30GB"} # Offload to CPU
350
+ )
351
+ ```
352
+
353
+ **Issue: Lower accuracy than expected**
354
+
355
+ Try 8-bit instead of 4-bit:
356
+ ```python
357
+ config = BitsAndBytesConfig(load_in_8bit=True)
358
+ # 8-bit has <0.5% accuracy loss vs 1-2% for 4-bit
359
+ ```
360
+
361
+ Or use NF4 with double quantization:
362
+ ```python
363
+ config = BitsAndBytesConfig(
364
+ load_in_4bit=True,
365
+ bnb_4bit_quant_type="nf4", # Better than fp4
366
+ bnb_4bit_use_double_quant=True # Extra accuracy
367
+ )
368
+ ```
369
+
370
+ **Issue: OOM even with 4-bit**
371
+
372
+ Enable CPU offload:
373
+ ```python
374
+ model = AutoModelForCausalLM.from_pretrained(
375
+ "model-name",
376
+ quantization_config=config,
377
+ device_map="auto",
378
+ offload_folder="offload", # Disk offload
379
+ offload_state_dict=True
380
+ )
381
+ ```
382
+
383
+ ## Advanced topics
384
+
385
+ **QLoRA training guide**: See [references/qlora-training.md](references/qlora-training.md) for complete fine-tuning workflows, hyperparameter tuning, and multi-GPU training.
386
+
387
+ **Quantization formats**: See [references/quantization-formats.md](references/quantization-formats.md) for INT8, NF4, FP4 comparison, double quantization, and custom quantization configs.
388
+
389
+ **Memory optimization**: See [references/memory-optimization.md](references/memory-optimization.md) for CPU offloading strategies, gradient checkpointing, and memory profiling.
390
+
391
+ ## Hardware requirements
392
+
393
+ - **GPU**: NVIDIA with compute capability 7.0+ (Turing, Ampere, Hopper)
394
+ - **VRAM**: Depends on model and quantization
395
+ - 4-bit Llama 2 7B: 4GB
396
+ - 4-bit Llama 2 13B: 8GB
397
+ - 4-bit Llama 2 70B: 24GB
398
+ - **CUDA**: 11.1+ (12.0+ recommended)
399
+ - **PyTorch**: 2.0+
400
+
401
+ **Supported platforms**: NVIDIA GPUs (primary), AMD ROCm, Intel GPUs (experimental)
402
+
403
+ ## Resources
404
+
405
+ - GitHub: https://github.com/bitsandbytes-foundation/bitsandbytes
406
+ - HuggingFace docs: https://huggingface.co/docs/transformers/quantization/bitsandbytes
407
+ - QLoRA paper: "QLoRA: Efficient Finetuning of Quantized LLMs" (2023)
408
+ - LLM.int8() paper: "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale" (2022)
409
+
410
+
411
+