EvoScientist 0.1.0rc1__py3-none-any.whl → 0.1.0rc2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. EvoScientist/EvoScientist.py +1 -1
  2. EvoScientist/cli.py +450 -178
  3. EvoScientist/middleware.py +5 -1
  4. EvoScientist/skills/accelerate/SKILL.md +332 -0
  5. EvoScientist/skills/accelerate/references/custom-plugins.md +453 -0
  6. EvoScientist/skills/accelerate/references/megatron-integration.md +489 -0
  7. EvoScientist/skills/accelerate/references/performance.md +525 -0
  8. EvoScientist/skills/bitsandbytes/SKILL.md +411 -0
  9. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +521 -0
  10. EvoScientist/skills/bitsandbytes/references/qlora-training.md +521 -0
  11. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +447 -0
  12. EvoScientist/skills/clip/SKILL.md +253 -0
  13. EvoScientist/skills/clip/references/applications.md +207 -0
  14. EvoScientist/skills/find-skills/SKILL.md +133 -0
  15. EvoScientist/skills/find-skills/scripts/install_skill.py +211 -0
  16. EvoScientist/skills/flash-attention/SKILL.md +367 -0
  17. EvoScientist/skills/flash-attention/references/benchmarks.md +215 -0
  18. EvoScientist/skills/flash-attention/references/transformers-integration.md +293 -0
  19. EvoScientist/skills/langgraph-docs/SKILL.md +36 -0
  20. EvoScientist/skills/llama-cpp/SKILL.md +258 -0
  21. EvoScientist/skills/llama-cpp/references/optimization.md +89 -0
  22. EvoScientist/skills/llama-cpp/references/quantization.md +213 -0
  23. EvoScientist/skills/llama-cpp/references/server.md +125 -0
  24. EvoScientist/skills/lm-evaluation-harness/SKILL.md +490 -0
  25. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  26. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  27. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  28. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  29. EvoScientist/skills/ml-paper-writing/SKILL.md +937 -0
  30. EvoScientist/skills/ml-paper-writing/references/checklists.md +361 -0
  31. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  32. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  33. EvoScientist/skills/ml-paper-writing/references/sources.md +159 -0
  34. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +476 -0
  35. EvoScientist/skills/ml-paper-writing/templates/README.md +251 -0
  36. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  37. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  38. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  39. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  40. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  41. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  42. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +50 -0
  43. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  44. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  45. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  46. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  47. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  48. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  49. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  50. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  51. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  52. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  53. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  54. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  55. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  59. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  60. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  61. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  62. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  63. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  64. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  67. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  68. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  69. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  70. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  71. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  72. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  76. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  77. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  78. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  79. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  80. EvoScientist/skills/peft/SKILL.md +431 -0
  81. EvoScientist/skills/peft/references/advanced-usage.md +514 -0
  82. EvoScientist/skills/peft/references/troubleshooting.md +480 -0
  83. EvoScientist/skills/ray-data/SKILL.md +326 -0
  84. EvoScientist/skills/ray-data/references/integration.md +82 -0
  85. EvoScientist/skills/ray-data/references/transformations.md +83 -0
  86. EvoScientist/skills/skill-creator/LICENSE.txt +202 -0
  87. EvoScientist/skills/skill-creator/SKILL.md +356 -0
  88. EvoScientist/skills/skill-creator/references/output-patterns.md +82 -0
  89. EvoScientist/skills/skill-creator/references/workflows.md +28 -0
  90. EvoScientist/skills/skill-creator/scripts/init_skill.py +303 -0
  91. EvoScientist/skills/skill-creator/scripts/package_skill.py +110 -0
  92. EvoScientist/skills/skill-creator/scripts/quick_validate.py +95 -0
  93. EvoScientist/skills/tensorboard/SKILL.md +629 -0
  94. EvoScientist/skills/tensorboard/references/integrations.md +638 -0
  95. EvoScientist/skills/tensorboard/references/profiling.md +545 -0
  96. EvoScientist/skills/tensorboard/references/visualization.md +620 -0
  97. EvoScientist/skills/vllm/SKILL.md +364 -0
  98. EvoScientist/skills/vllm/references/optimization.md +226 -0
  99. EvoScientist/skills/vllm/references/quantization.md +284 -0
  100. EvoScientist/skills/vllm/references/server-deployment.md +255 -0
  101. EvoScientist/skills/vllm/references/troubleshooting.md +447 -0
  102. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/METADATA +26 -3
  103. evoscientist-0.1.0rc2.dist-info/RECORD +119 -0
  104. evoscientist-0.1.0rc1.dist-info/RECORD +0 -21
  105. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/WHEEL +0 -0
  106. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/entry_points.txt +0 -0
  107. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/licenses/LICENSE +0 -0
  108. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,447 @@
1
+ # Quantization Formats
2
+
3
+ Complete guide to INT8, NF4, FP4 quantization formats, double quantization, and custom configurations in bitsandbytes.
4
+
5
+ ## Overview
6
+
7
+ bitsandbytes supports multiple quantization formats:
8
+ - **INT8**: 8-bit integer quantization (LLM.int8())
9
+ - **NF4**: 4-bit NormalFloat (for normally distributed weights)
10
+ - **FP4**: 4-bit FloatPoint (for uniformly distributed weights)
11
+ - **Double Quantization**: Quantize the quantization constants
12
+
13
+ ## INT8 Quantization
14
+
15
+ ### LLM.int8() Algorithm
16
+
17
+ LLM.int8() uses mixed 8-bit/16-bit matrix multiplication:
18
+ - Most features (>99.9%) computed in INT8
19
+ - Outlier features (>threshold) computed in FP16
20
+ - Results combined for final output
21
+
22
+ **Memory**: 50% reduction (2 bytes → 1 byte per parameter)
23
+ **Accuracy**: <0.5% degradation
24
+
25
+ ### Configuration
26
+
27
+ ```python
28
+ from transformers import BitsAndBytesConfig
29
+
30
+ config = BitsAndBytesConfig(
31
+ load_in_8bit=True,
32
+ llm_int8_threshold=6.0, # Outlier threshold
33
+ llm_int8_has_fp16_weight=False, # Use INT8 storage
34
+ llm_int8_skip_modules=["lm_head"] # Skip certain layers
35
+ )
36
+ ```
37
+
38
+ ### Parameters Explained
39
+
40
+ **`llm_int8_threshold`** (default: 6.0):
41
+ - Activations with magnitude > threshold are kept in FP16
42
+ - Lower = more FP16 (slower but more accurate)
43
+ - Higher = more INT8 (faster but less accurate)
44
+
45
+ ```python
46
+ # Conservative (more accurate)
47
+ llm_int8_threshold=5.0
48
+
49
+ # Aggressive (faster)
50
+ llm_int8_threshold=8.0
51
+ ```
52
+
53
+ **`llm_int8_has_fp16_weight`** (default: False):
54
+ - `False`: Store weights in INT8 (50% memory savings)
55
+ - `True`: Store in FP16, quantize only during computation (no memory savings)
56
+
57
+ **`llm_int8_skip_modules`**:
58
+ ```python
59
+ # Skip specific layers (keep in FP16)
60
+ llm_int8_skip_modules=["lm_head", "embed_tokens"]
61
+ ```
62
+
63
+ ### Example
64
+
65
+ ```python
66
+ from transformers import AutoModelForCausalLM
67
+
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ "meta-llama/Llama-2-13b-hf",
70
+ quantization_config=config,
71
+ device_map="auto"
72
+ )
73
+
74
+ # Memory: 26GB (FP16) → 13GB (INT8)
75
+ ```
76
+
77
+ ### When to Use INT8
78
+
79
+ ✅ **Use INT8 when**:
80
+ - Need high accuracy (<0.5% loss)
81
+ - Model fits with 50% reduction
82
+ - Have Turing+ GPU (tensor cores)
83
+
84
+ ❌ **Don't use when**:
85
+ - Need maximum memory savings (use 4-bit)
86
+ - Inference speed critical (use GPTQ/AWQ)
87
+
88
+ ## 4-Bit Quantization
89
+
90
+ ### NormalFloat4 (NF4)
91
+
92
+ Optimized for normally distributed weights (most neural networks).
93
+
94
+ **How it works**:
95
+ - Bins chosen to minimize quantization error for normal distribution
96
+ - Asymmetric quantization bins
97
+ - Better for transformer weights
98
+
99
+ **Configuration**:
100
+ ```python
101
+ config = BitsAndBytesConfig(
102
+ load_in_4bit=True,
103
+ bnb_4bit_compute_dtype=torch.bfloat16,
104
+ bnb_4bit_quant_type="nf4" # NormalFloat4
105
+ )
106
+ ```
107
+
108
+ **Memory**: 75% reduction (2 bytes → 0.5 bytes per parameter)
109
+
110
+ ### FloatPoint4 (FP4)
111
+
112
+ Standard 4-bit floating point for uniform distributions.
113
+
114
+ **How it works**:
115
+ - Symmetric quantization bins
116
+ - Better for weights with broader dynamic range
117
+ - Less common for transformers
118
+
119
+ **Configuration**:
120
+ ```python
121
+ config = BitsAndBytesConfig(
122
+ load_in_4bit=True,
123
+ bnb_4bit_compute_dtype=torch.bfloat16,
124
+ bnb_4bit_quant_type="fp4" # FloatPoint4
125
+ )
126
+ ```
127
+
128
+ ### NF4 vs FP4 Comparison
129
+
130
+ | Aspect | NF4 | FP4 |
131
+ |--------|-----|-----|
132
+ | Distribution | Normal | Uniform |
133
+ | Typical use | **Transformers** | CNNs, unusual architectures |
134
+ | Accuracy | **Better for LLMs** | Worse for LLMs |
135
+ | Speed | Same | Same |
136
+ | Recommendation | ✅ Default | Use only if NF4 fails |
137
+
138
+ **Rule of thumb**: Always use NF4 for transformers.
139
+
140
+ ### Example Comparison
141
+
142
+ ```python
143
+ # NF4 (recommended)
144
+ nf4_config = BitsAndBytesConfig(
145
+ load_in_4bit=True,
146
+ bnb_4bit_quant_type="nf4"
147
+ )
148
+
149
+ # FP4 (alternative)
150
+ fp4_config = BitsAndBytesConfig(
151
+ load_in_4bit=True,
152
+ bnb_4bit_quant_type="fp4"
153
+ )
154
+
155
+ # Load and compare
156
+ model_nf4 = AutoModelForCausalLM.from_pretrained(
157
+ "meta-llama/Llama-2-7b-hf",
158
+ quantization_config=nf4_config
159
+ )
160
+
161
+ model_fp4 = AutoModelForCausalLM.from_pretrained(
162
+ "meta-llama/Llama-2-7b-hf",
163
+ quantization_config=fp4_config
164
+ )
165
+
166
+ # Typical results on MMLU:
167
+ # NF4: 45.2%
168
+ # FP4: 43.8%
169
+ # FP16: 45.9%
170
+ ```
171
+
172
+ ## Compute Dtype
173
+
174
+ The `bnb_4bit_compute_dtype` controls the precision used for actual computation.
175
+
176
+ ### Options
177
+
178
+ **torch.bfloat16** (recommended):
179
+ ```python
180
+ bnb_4bit_compute_dtype=torch.bfloat16
181
+ ```
182
+ - Good balance of speed and accuracy
183
+ - Recommended for A100/H100
184
+ - Prevents numerical instability
185
+
186
+ **torch.float16**:
187
+ ```python
188
+ bnb_4bit_compute_dtype=torch.float16
189
+ ```
190
+ - Slightly faster than BF16
191
+ - Risk of overflow/underflow
192
+ - Use only if BF16 unavailable
193
+
194
+ **torch.float32**:
195
+ ```python
196
+ bnb_4bit_compute_dtype=torch.float32
197
+ ```
198
+ - Most accurate
199
+ - Slowest (no tensor core acceleration)
200
+ - Debugging only
201
+
202
+ ### Performance Comparison
203
+
204
+ | Dtype | Speed | Accuracy | Memory |
205
+ |-------|-------|----------|--------|
206
+ | FP32 | 1× (baseline) | 100% | 4 bytes |
207
+ | FP16 | 3-4× | 99.5% | 2 bytes |
208
+ | BF16 | 3-4× | **99.8%** | 2 bytes |
209
+
210
+ **Recommendation**: Always use `torch.bfloat16` if supported.
211
+
212
+ ## Double Quantization
213
+
214
+ Quantize the quantization constants for additional memory savings.
215
+
216
+ ### How It Works
217
+
218
+ Standard 4-bit quantization stores:
219
+ - 4-bit quantized weights
220
+ - FP32 scaling factors (4 bytes per block)
221
+
222
+ Double quantization:
223
+ - 4-bit quantized weights
224
+ - **INT8 quantized scaling factors** (1 byte per block)
225
+
226
+ **Additional savings**: ~2-3% memory reduction
227
+
228
+ ### Configuration
229
+
230
+ ```python
231
+ config = BitsAndBytesConfig(
232
+ load_in_4bit=True,
233
+ bnb_4bit_quant_type="nf4",
234
+ bnb_4bit_use_double_quant=True # Enable double quantization
235
+ )
236
+ ```
237
+
238
+ ### Example
239
+
240
+ ```python
241
+ # Without double quant
242
+ model_single = AutoModelForCausalLM.from_pretrained(
243
+ "meta-llama/Llama-2-70b-hf",
244
+ quantization_config=BitsAndBytesConfig(
245
+ load_in_4bit=True,
246
+ bnb_4bit_use_double_quant=False
247
+ )
248
+ )
249
+ # Memory: ~36GB
250
+
251
+ # With double quant
252
+ model_double = AutoModelForCausalLM.from_pretrained(
253
+ "meta-llama/Llama-2-70b-hf",
254
+ quantization_config=BitsAndBytesConfig(
255
+ load_in_4bit=True,
256
+ bnb_4bit_use_double_quant=True
257
+ )
258
+ )
259
+ # Memory: ~35GB (saves ~1GB)
260
+ ```
261
+
262
+ **Accuracy impact**: Negligible (<0.1%)
263
+
264
+ **Recommendation**: Always enable for maximum memory savings.
265
+
266
+ ## Quantization Storage
267
+
268
+ Controls storage dtype for quantized weights (important for FSDP).
269
+
270
+ ### Configuration
271
+
272
+ ```python
273
+ config = BitsAndBytesConfig(
274
+ load_in_4bit=True,
275
+ bnb_4bit_quant_storage=torch.bfloat16 # Storage dtype
276
+ )
277
+ ```
278
+
279
+ ### When to Use
280
+
281
+ **Default (uint8)**:
282
+ - Single GPU training/inference
283
+ - No special requirements
284
+
285
+ **torch.bfloat16** (for FSDP):
286
+ ```python
287
+ bnb_4bit_quant_storage=torch.bfloat16
288
+ ```
289
+ - **Required for FSDP+QLoRA**
290
+ - Ensures 4-bit layers wrapped like regular layers
291
+ - Enables proper model sharding
292
+
293
+ ### Example: FSDP Configuration
294
+
295
+ ```python
296
+ # CRITICAL: Set quant_storage for FSDP
297
+ fsdp_config = BitsAndBytesConfig(
298
+ load_in_4bit=True,
299
+ bnb_4bit_compute_dtype=torch.bfloat16,
300
+ bnb_4bit_quant_type="nf4",
301
+ bnb_4bit_use_double_quant=True,
302
+ bnb_4bit_quant_storage=torch.bfloat16 # Must match torch_dtype!
303
+ )
304
+
305
+ model = AutoModelForCausalLM.from_pretrained(
306
+ "meta-llama/Llama-2-70b-hf",
307
+ quantization_config=fsdp_config,
308
+ torch_dtype=torch.bfloat16 # Must match quant_storage!
309
+ )
310
+ ```
311
+
312
+ ## Recommended Configurations
313
+
314
+ ### Production Inference (Best Accuracy)
315
+
316
+ ```python
317
+ BitsAndBytesConfig(
318
+ load_in_8bit=True,
319
+ llm_int8_threshold=6.0
320
+ )
321
+ ```
322
+
323
+ **Use case**: Maximum accuracy with 50% memory savings
324
+
325
+ ### Production Inference (Maximum Memory Savings)
326
+
327
+ ```python
328
+ BitsAndBytesConfig(
329
+ load_in_4bit=True,
330
+ bnb_4bit_compute_dtype=torch.bfloat16,
331
+ bnb_4bit_quant_type="nf4",
332
+ bnb_4bit_use_double_quant=True
333
+ )
334
+ ```
335
+
336
+ **Use case**: 75% memory reduction with <1% accuracy loss
337
+
338
+ ### QLoRA Training (Single GPU)
339
+
340
+ ```python
341
+ BitsAndBytesConfig(
342
+ load_in_4bit=True,
343
+ bnb_4bit_compute_dtype=torch.bfloat16,
344
+ bnb_4bit_quant_type="nf4",
345
+ bnb_4bit_use_double_quant=True
346
+ )
347
+ ```
348
+
349
+ **Use case**: Fine-tune 70B on RTX 3090
350
+
351
+ ### FSDP + QLoRA (Multi-GPU)
352
+
353
+ ```python
354
+ BitsAndBytesConfig(
355
+ load_in_4bit=True,
356
+ bnb_4bit_compute_dtype=torch.bfloat16,
357
+ bnb_4bit_quant_type="nf4",
358
+ bnb_4bit_use_double_quant=True,
359
+ bnb_4bit_quant_storage=torch.bfloat16 # CRITICAL!
360
+ )
361
+ ```
362
+
363
+ **Use case**: Fine-tune 405B on 8×H100
364
+
365
+ ## Advanced: Block-wise Quantization
366
+
367
+ bitsandbytes uses block-wise quantization:
368
+ - Weights divided into blocks (typically 64 or 128 elements)
369
+ - Each block has own scaling factor
370
+ - Better accuracy than tensor-wise quantization
371
+
372
+ **Block size** (automatically determined):
373
+ ```python
374
+ # Typical block sizes
375
+ # 4-bit: 64 elements per block
376
+ # 8-bit: 64 elements per block
377
+ ```
378
+
379
+ **Cannot be configured** (internal implementation detail).
380
+
381
+ ## Quantization Quality Metrics
382
+
383
+ ### Perplexity (Lower is Better)
384
+
385
+ | Model | FP16 | INT8 | NF4 | NF4+DQ |
386
+ |-------|------|------|-----|--------|
387
+ | Llama 2 7B | 5.12 | 5.14 | 5.18 | 5.19 |
388
+ | Llama 2 13B | 4.88 | 4.90 | 4.93 | 4.94 |
389
+ | Llama 2 70B | 3.32 | 3.33 | 3.35 | 3.36 |
390
+
391
+ **Conclusion**: <1% degradation for all quantization methods
392
+
393
+ ### MMLU Accuracy (Higher is Better)
394
+
395
+ | Model | FP16 | INT8 | NF4 | FP4 |
396
+ |-------|------|------|-----|-----|
397
+ | Llama 2 7B | 45.9% | 45.7% | 45.2% | 43.8% |
398
+ | Llama 2 13B | 54.8% | 54.6% | 54.1% | 52.9% |
399
+ | Llama 2 70B | 68.9% | 68.7% | 68.4% | 67.2% |
400
+
401
+ **Conclusion**: NF4 is significantly better than FP4 for transformers
402
+
403
+ ## Troubleshooting
404
+
405
+ ### "Quantization failed" Error
406
+
407
+ Try different quant type:
408
+ ```python
409
+ # If NF4 fails
410
+ bnb_4bit_quant_type="fp4"
411
+ ```
412
+
413
+ ### Numerical Instability
414
+
415
+ Use BF16 compute:
416
+ ```python
417
+ bnb_4bit_compute_dtype=torch.bfloat16
418
+ ```
419
+
420
+ ### Poor Quality with 4-bit
421
+
422
+ 1. Try 8-bit instead:
423
+ ```python
424
+ load_in_8bit=True
425
+ ```
426
+
427
+ 2. Enable double quantization:
428
+ ```python
429
+ bnb_4bit_use_double_quant=True
430
+ ```
431
+
432
+ 3. Use BF16 compute dtype
433
+
434
+ ### FSDP Errors
435
+
436
+ Ensure quant_storage matches torch_dtype:
437
+ ```python
438
+ bnb_4bit_quant_storage=torch.bfloat16
439
+ torch_dtype=torch.bfloat16 # Must match!
440
+ ```
441
+
442
+ ## References
443
+
444
+ - LLM.int8() paper: "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale" (2022)
445
+ - QLoRA paper: "QLoRA: Efficient Finetuning of Quantized LLMs" (2023)
446
+ - bitsandbytes GitHub: https://github.com/bitsandbytes-foundation/bitsandbytes
447
+ - HuggingFace quantization docs: https://huggingface.co/docs/transformers/quantization/bitsandbytes
@@ -0,0 +1,253 @@
1
+ ---
2
+ name: clip
3
+ description: OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.
4
+ version: 1.0.0
5
+ author: Orchestra Research
6
+ license: MIT
7
+ tags: [Multimodal, CLIP, Vision-Language, Zero-Shot, Image Classification, OpenAI, Image Search, Cross-Modal Retrieval, Content Moderation]
8
+ dependencies: [transformers, torch, pillow]
9
+ ---
10
+
11
+ # CLIP - Contrastive Language-Image Pre-Training
12
+
13
+ OpenAI's model that understands images from natural language.
14
+
15
+ ## When to use CLIP
16
+
17
+ **Use when:**
18
+ - Zero-shot image classification (no training data needed)
19
+ - Image-text similarity/matching
20
+ - Semantic image search
21
+ - Content moderation (detect NSFW, violence)
22
+ - Visual question answering
23
+ - Cross-modal retrieval (image→text, text→image)
24
+
25
+ **Metrics**:
26
+ - **25,300+ GitHub stars**
27
+ - Trained on 400M image-text pairs
28
+ - Matches ResNet-50 on ImageNet (zero-shot)
29
+ - MIT License
30
+
31
+ **Use alternatives instead**:
32
+ - **BLIP-2**: Better captioning
33
+ - **LLaVA**: Vision-language chat
34
+ - **Segment Anything**: Image segmentation
35
+
36
+ ## Quick start
37
+
38
+ ### Installation
39
+
40
+ ```bash
41
+ pip install git+https://github.com/openai/CLIP.git
42
+ pip install torch torchvision ftfy regex tqdm
43
+ ```
44
+
45
+ ### Zero-shot classification
46
+
47
+ ```python
48
+ import torch
49
+ import clip
50
+ from PIL import Image
51
+
52
+ # Load model
53
+ device = "cuda" if torch.cuda.is_available() else "cpu"
54
+ model, preprocess = clip.load("ViT-B/32", device=device)
55
+
56
+ # Load image
57
+ image = preprocess(Image.open("photo.jpg")).unsqueeze(0).to(device)
58
+
59
+ # Define possible labels
60
+ text = clip.tokenize(["a dog", "a cat", "a bird", "a car"]).to(device)
61
+
62
+ # Compute similarity
63
+ with torch.no_grad():
64
+ image_features = model.encode_image(image)
65
+ text_features = model.encode_text(text)
66
+
67
+ # Cosine similarity
68
+ logits_per_image, logits_per_text = model(image, text)
69
+ probs = logits_per_image.softmax(dim=-1).cpu().numpy()
70
+
71
+ # Print results
72
+ labels = ["a dog", "a cat", "a bird", "a car"]
73
+ for label, prob in zip(labels, probs[0]):
74
+ print(f"{label}: {prob:.2%}")
75
+ ```
76
+
77
+ ## Available models
78
+
79
+ ```python
80
+ # Models (sorted by size)
81
+ models = [
82
+ "RN50", # ResNet-50
83
+ "RN101", # ResNet-101
84
+ "ViT-B/32", # Vision Transformer (recommended)
85
+ "ViT-B/16", # Better quality, slower
86
+ "ViT-L/14", # Best quality, slowest
87
+ ]
88
+
89
+ model, preprocess = clip.load("ViT-B/32")
90
+ ```
91
+
92
+ | Model | Parameters | Speed | Quality |
93
+ |-------|------------|-------|---------|
94
+ | RN50 | 102M | Fast | Good |
95
+ | ViT-B/32 | 151M | Medium | Better |
96
+ | ViT-L/14 | 428M | Slow | Best |
97
+
98
+ ## Image-text similarity
99
+
100
+ ```python
101
+ # Compute embeddings
102
+ image_features = model.encode_image(image)
103
+ text_features = model.encode_text(text)
104
+
105
+ # Normalize
106
+ image_features /= image_features.norm(dim=-1, keepdim=True)
107
+ text_features /= text_features.norm(dim=-1, keepdim=True)
108
+
109
+ # Cosine similarity
110
+ similarity = (image_features @ text_features.T).item()
111
+ print(f"Similarity: {similarity:.4f}")
112
+ ```
113
+
114
+ ## Semantic image search
115
+
116
+ ```python
117
+ # Index images
118
+ image_paths = ["img1.jpg", "img2.jpg", "img3.jpg"]
119
+ image_embeddings = []
120
+
121
+ for img_path in image_paths:
122
+ image = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
123
+ with torch.no_grad():
124
+ embedding = model.encode_image(image)
125
+ embedding /= embedding.norm(dim=-1, keepdim=True)
126
+ image_embeddings.append(embedding)
127
+
128
+ image_embeddings = torch.cat(image_embeddings)
129
+
130
+ # Search with text query
131
+ query = "a sunset over the ocean"
132
+ text_input = clip.tokenize([query]).to(device)
133
+ with torch.no_grad():
134
+ text_embedding = model.encode_text(text_input)
135
+ text_embedding /= text_embedding.norm(dim=-1, keepdim=True)
136
+
137
+ # Find most similar images
138
+ similarities = (text_embedding @ image_embeddings.T).squeeze(0)
139
+ top_k = similarities.topk(3)
140
+
141
+ for idx, score in zip(top_k.indices, top_k.values):
142
+ print(f"{image_paths[idx]}: {score:.3f}")
143
+ ```
144
+
145
+ ## Content moderation
146
+
147
+ ```python
148
+ # Define categories
149
+ categories = [
150
+ "safe for work",
151
+ "not safe for work",
152
+ "violent content",
153
+ "graphic content"
154
+ ]
155
+
156
+ text = clip.tokenize(categories).to(device)
157
+
158
+ # Check image
159
+ with torch.no_grad():
160
+ logits_per_image, _ = model(image, text)
161
+ probs = logits_per_image.softmax(dim=-1)
162
+
163
+ # Get classification
164
+ max_idx = probs.argmax().item()
165
+ max_prob = probs[0, max_idx].item()
166
+
167
+ print(f"Category: {categories[max_idx]} ({max_prob:.2%})")
168
+ ```
169
+
170
+ ## Batch processing
171
+
172
+ ```python
173
+ # Process multiple images
174
+ images = [preprocess(Image.open(f"img{i}.jpg")) for i in range(10)]
175
+ images = torch.stack(images).to(device)
176
+
177
+ with torch.no_grad():
178
+ image_features = model.encode_image(images)
179
+ image_features /= image_features.norm(dim=-1, keepdim=True)
180
+
181
+ # Batch text
182
+ texts = ["a dog", "a cat", "a bird"]
183
+ text_tokens = clip.tokenize(texts).to(device)
184
+
185
+ with torch.no_grad():
186
+ text_features = model.encode_text(text_tokens)
187
+ text_features /= text_features.norm(dim=-1, keepdim=True)
188
+
189
+ # Similarity matrix (10 images × 3 texts)
190
+ similarities = image_features @ text_features.T
191
+ print(similarities.shape) # (10, 3)
192
+ ```
193
+
194
+ ## Integration with vector databases
195
+
196
+ ```python
197
+ # Store CLIP embeddings in Chroma/FAISS
198
+ import chromadb
199
+
200
+ client = chromadb.Client()
201
+ collection = client.create_collection("image_embeddings")
202
+
203
+ # Add image embeddings
204
+ for img_path, embedding in zip(image_paths, image_embeddings):
205
+ collection.add(
206
+ embeddings=[embedding.cpu().numpy().tolist()],
207
+ metadatas=[{"path": img_path}],
208
+ ids=[img_path]
209
+ )
210
+
211
+ # Query with text
212
+ query = "a sunset"
213
+ text_embedding = model.encode_text(clip.tokenize([query]))
214
+ results = collection.query(
215
+ query_embeddings=[text_embedding.cpu().numpy().tolist()],
216
+ n_results=5
217
+ )
218
+ ```
219
+
220
+ ## Best practices
221
+
222
+ 1. **Use ViT-B/32 for most cases** - Good balance
223
+ 2. **Normalize embeddings** - Required for cosine similarity
224
+ 3. **Batch processing** - More efficient
225
+ 4. **Cache embeddings** - Expensive to recompute
226
+ 5. **Use descriptive labels** - Better zero-shot performance
227
+ 6. **GPU recommended** - 10-50× faster
228
+ 7. **Preprocess images** - Use provided preprocess function
229
+
230
+ ## Performance
231
+
232
+ | Operation | CPU | GPU (V100) |
233
+ |-----------|-----|------------|
234
+ | Image encoding | ~200ms | ~20ms |
235
+ | Text encoding | ~50ms | ~5ms |
236
+ | Similarity compute | <1ms | <1ms |
237
+
238
+ ## Limitations
239
+
240
+ 1. **Not for fine-grained tasks** - Best for broad categories
241
+ 2. **Requires descriptive text** - Vague labels perform poorly
242
+ 3. **Biased on web data** - May have dataset biases
243
+ 4. **No bounding boxes** - Whole image only
244
+ 5. **Limited spatial understanding** - Position/counting weak
245
+
246
+ ## Resources
247
+
248
+ - **GitHub**: https://github.com/openai/CLIP ⭐ 25,300+
249
+ - **Paper**: https://arxiv.org/abs/2103.00020
250
+ - **Colab**: https://colab.research.google.com/github/openai/clip/
251
+ - **License**: MIT
252
+
253
+