@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,450 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: gptq
|
|
3
|
+
description: Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Optimization, GPTQ, Quantization, 4-Bit, Post-Training, Memory Optimization, Consumer GPUs, Fast Inference, QLoRA, Group-Wise Quantization]
|
|
8
|
+
dependencies: [auto-gptq, transformers, optimum, peft]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# GPTQ (Generative Pre-trained Transformer Quantization)
|
|
12
|
+
|
|
13
|
+
Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization.
|
|
14
|
+
|
|
15
|
+
## When to use GPTQ
|
|
16
|
+
|
|
17
|
+
**Use GPTQ when:**
|
|
18
|
+
- Need to fit large models (70B+) on limited GPU memory
|
|
19
|
+
- Want 4× memory reduction with <2% accuracy loss
|
|
20
|
+
- Deploying on consumer GPUs (RTX 4090, 3090)
|
|
21
|
+
- Need faster inference (3-4× speedup vs FP16)
|
|
22
|
+
|
|
23
|
+
**Use AWQ instead when:**
|
|
24
|
+
- Need slightly better accuracy (<1% loss)
|
|
25
|
+
- Have newer GPUs (Ampere, Ada)
|
|
26
|
+
- Want Marlin kernel support (2× faster on some GPUs)
|
|
27
|
+
|
|
28
|
+
**Use bitsandbytes instead when:**
|
|
29
|
+
- Need simple integration with transformers
|
|
30
|
+
- Want 8-bit quantization (less compression, better quality)
|
|
31
|
+
- Don't need pre-quantized model files
|
|
32
|
+
|
|
33
|
+
## Quick start
|
|
34
|
+
|
|
35
|
+
### Installation
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# Install AutoGPTQ
|
|
39
|
+
pip install auto-gptq
|
|
40
|
+
|
|
41
|
+
# With Triton (Linux only, faster)
|
|
42
|
+
pip install auto-gptq[triton]
|
|
43
|
+
|
|
44
|
+
# With CUDA extensions (faster)
|
|
45
|
+
pip install auto-gptq --no-build-isolation
|
|
46
|
+
|
|
47
|
+
# Full installation
|
|
48
|
+
pip install auto-gptq transformers accelerate
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Load pre-quantized model
|
|
52
|
+
|
|
53
|
+
```python
|
|
54
|
+
from transformers import AutoTokenizer
|
|
55
|
+
from auto_gptq import AutoGPTQForCausalLM
|
|
56
|
+
|
|
57
|
+
# Load quantized model from HuggingFace
|
|
58
|
+
model_name = "TheBloke/Llama-2-7B-Chat-GPTQ"
|
|
59
|
+
|
|
60
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
61
|
+
model_name,
|
|
62
|
+
device="cuda:0",
|
|
63
|
+
use_triton=False # Set True on Linux for speed
|
|
64
|
+
)
|
|
65
|
+
|
|
66
|
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
67
|
+
|
|
68
|
+
# Generate
|
|
69
|
+
prompt = "Explain quantum computing"
|
|
70
|
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
|
|
71
|
+
outputs = model.generate(**inputs, max_new_tokens=200)
|
|
72
|
+
print(tokenizer.decode(outputs[0]))
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Quantize your own model
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
from transformers import AutoTokenizer
|
|
79
|
+
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
|
80
|
+
from datasets import load_dataset
|
|
81
|
+
|
|
82
|
+
# Load model
|
|
83
|
+
model_name = "meta-llama/Llama-2-7b-chat-hf"
|
|
84
|
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
85
|
+
|
|
86
|
+
# Quantization config
|
|
87
|
+
quantize_config = BaseQuantizeConfig(
|
|
88
|
+
bits=4, # 4-bit quantization
|
|
89
|
+
group_size=128, # Group size (recommended: 128)
|
|
90
|
+
desc_act=False, # Activation order (False for CUDA kernel)
|
|
91
|
+
damp_percent=0.01 # Dampening factor
|
|
92
|
+
)
|
|
93
|
+
|
|
94
|
+
# Load model for quantization
|
|
95
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
96
|
+
model_name,
|
|
97
|
+
quantize_config=quantize_config
|
|
98
|
+
)
|
|
99
|
+
|
|
100
|
+
# Prepare calibration data
|
|
101
|
+
dataset = load_dataset("c4", split="train", streaming=True)
|
|
102
|
+
calibration_data = [
|
|
103
|
+
tokenizer(example["text"])["input_ids"][:512]
|
|
104
|
+
for example in dataset.take(128)
|
|
105
|
+
]
|
|
106
|
+
|
|
107
|
+
# Quantize
|
|
108
|
+
model.quantize(calibration_data)
|
|
109
|
+
|
|
110
|
+
# Save quantized model
|
|
111
|
+
model.save_quantized("llama-2-7b-gptq")
|
|
112
|
+
tokenizer.save_pretrained("llama-2-7b-gptq")
|
|
113
|
+
|
|
114
|
+
# Push to HuggingFace
|
|
115
|
+
model.push_to_hub("username/llama-2-7b-gptq")
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Group-wise quantization
|
|
119
|
+
|
|
120
|
+
**How GPTQ works**:
|
|
121
|
+
1. **Group weights**: Divide each weight matrix into groups (typically 128 elements)
|
|
122
|
+
2. **Quantize per-group**: Each group has its own scale/zero-point
|
|
123
|
+
3. **Minimize error**: Uses Hessian information to minimize quantization error
|
|
124
|
+
4. **Result**: 4-bit weights with near-FP16 accuracy
|
|
125
|
+
|
|
126
|
+
**Group size trade-off**:
|
|
127
|
+
|
|
128
|
+
| Group Size | Model Size | Accuracy | Speed | Recommendation |
|
|
129
|
+
|------------|------------|----------|-------|----------------|
|
|
130
|
+
| -1 (per-column) | Smallest | Best | Slowest | Research only |
|
|
131
|
+
| 32 | Smaller | Better | Slower | High accuracy needed |
|
|
132
|
+
| **128** | Medium | Good | **Fast** | **Recommended default** |
|
|
133
|
+
| 256 | Larger | Lower | Faster | Speed critical |
|
|
134
|
+
| 1024 | Largest | Lowest | Fastest | Not recommended |
|
|
135
|
+
|
|
136
|
+
**Example**:
|
|
137
|
+
```
|
|
138
|
+
Weight matrix: [1024, 4096] = 4.2M elements
|
|
139
|
+
|
|
140
|
+
Group size = 128:
|
|
141
|
+
- Groups: 4.2M / 128 = 32,768 groups
|
|
142
|
+
- Each group: own 4-bit scale + zero-point
|
|
143
|
+
- Result: Better granularity → better accuracy
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
## Quantization configurations
|
|
147
|
+
|
|
148
|
+
### Standard 4-bit (recommended)
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
from auto_gptq import BaseQuantizeConfig
|
|
152
|
+
|
|
153
|
+
config = BaseQuantizeConfig(
|
|
154
|
+
bits=4, # 4-bit quantization
|
|
155
|
+
group_size=128, # Standard group size
|
|
156
|
+
desc_act=False, # Faster CUDA kernel
|
|
157
|
+
damp_percent=0.01 # Dampening factor
|
|
158
|
+
)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Performance**:
|
|
162
|
+
- Memory: 4× reduction (70B model: 140GB → 35GB)
|
|
163
|
+
- Accuracy: ~1.5% perplexity increase
|
|
164
|
+
- Speed: 3-4× faster than FP16
|
|
165
|
+
|
|
166
|
+
### High accuracy (3-bit with larger groups)
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
config = BaseQuantizeConfig(
|
|
170
|
+
bits=3, # 3-bit (more compression)
|
|
171
|
+
group_size=128, # Keep standard group size
|
|
172
|
+
desc_act=True, # Better accuracy (slower)
|
|
173
|
+
damp_percent=0.01
|
|
174
|
+
)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**Trade-off**:
|
|
178
|
+
- Memory: 5× reduction
|
|
179
|
+
- Accuracy: ~3% perplexity increase
|
|
180
|
+
- Speed: 5× faster (but less accurate)
|
|
181
|
+
|
|
182
|
+
### Maximum accuracy (4-bit with small groups)
|
|
183
|
+
|
|
184
|
+
```python
|
|
185
|
+
config = BaseQuantizeConfig(
|
|
186
|
+
bits=4,
|
|
187
|
+
group_size=32, # Smaller groups (better accuracy)
|
|
188
|
+
desc_act=True, # Activation reordering
|
|
189
|
+
damp_percent=0.005 # Lower dampening
|
|
190
|
+
)
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
**Trade-off**:
|
|
194
|
+
- Memory: 3.5× reduction (slightly larger)
|
|
195
|
+
- Accuracy: ~0.8% perplexity increase (best)
|
|
196
|
+
- Speed: 2-3× faster (kernel overhead)
|
|
197
|
+
|
|
198
|
+
## Kernel backends
|
|
199
|
+
|
|
200
|
+
### ExLlamaV2 (default, fastest)
|
|
201
|
+
|
|
202
|
+
```python
|
|
203
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
204
|
+
model_name,
|
|
205
|
+
device="cuda:0",
|
|
206
|
+
use_exllama=True, # Use ExLlamaV2
|
|
207
|
+
exllama_config={"version": 2}
|
|
208
|
+
)
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
**Performance**: 1.5-2× faster than Triton
|
|
212
|
+
|
|
213
|
+
### Marlin (Ampere+ GPUs)
|
|
214
|
+
|
|
215
|
+
```python
|
|
216
|
+
# Quantize with Marlin format
|
|
217
|
+
config = BaseQuantizeConfig(
|
|
218
|
+
bits=4,
|
|
219
|
+
group_size=128,
|
|
220
|
+
desc_act=False # Required for Marlin
|
|
221
|
+
)
|
|
222
|
+
|
|
223
|
+
model.quantize(calibration_data, use_marlin=True)
|
|
224
|
+
|
|
225
|
+
# Load with Marlin
|
|
226
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
227
|
+
model_name,
|
|
228
|
+
device="cuda:0",
|
|
229
|
+
use_marlin=True # 2× faster on A100/H100
|
|
230
|
+
)
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
**Requirements**:
|
|
234
|
+
- NVIDIA Ampere or newer (A100, H100, RTX 40xx)
|
|
235
|
+
- Compute capability ≥ 8.0
|
|
236
|
+
|
|
237
|
+
### Triton (Linux only)
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
241
|
+
model_name,
|
|
242
|
+
device="cuda:0",
|
|
243
|
+
use_triton=True # Linux only
|
|
244
|
+
)
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
**Performance**: 1.2-1.5× faster than CUDA backend
|
|
248
|
+
|
|
249
|
+
## Integration with transformers
|
|
250
|
+
|
|
251
|
+
### Direct transformers usage
|
|
252
|
+
|
|
253
|
+
```python
|
|
254
|
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
255
|
+
|
|
256
|
+
# Load quantized model (transformers auto-detects GPTQ)
|
|
257
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
258
|
+
"TheBloke/Llama-2-13B-Chat-GPTQ",
|
|
259
|
+
device_map="auto",
|
|
260
|
+
trust_remote_code=False
|
|
261
|
+
)
|
|
262
|
+
|
|
263
|
+
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-Chat-GPTQ")
|
|
264
|
+
|
|
265
|
+
# Use like any transformers model
|
|
266
|
+
inputs = tokenizer("Hello", return_tensors="pt").to("cuda")
|
|
267
|
+
outputs = model.generate(**inputs, max_new_tokens=100)
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### QLoRA fine-tuning (GPTQ + LoRA)
|
|
271
|
+
|
|
272
|
+
```python
|
|
273
|
+
from transformers import AutoModelForCausalLM
|
|
274
|
+
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
|
|
275
|
+
|
|
276
|
+
# Load GPTQ model
|
|
277
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
278
|
+
"TheBloke/Llama-2-7B-GPTQ",
|
|
279
|
+
device_map="auto"
|
|
280
|
+
)
|
|
281
|
+
|
|
282
|
+
# Prepare for LoRA training
|
|
283
|
+
model = prepare_model_for_kbit_training(model)
|
|
284
|
+
|
|
285
|
+
# LoRA config
|
|
286
|
+
lora_config = LoraConfig(
|
|
287
|
+
r=16,
|
|
288
|
+
lora_alpha=32,
|
|
289
|
+
target_modules=["q_proj", "v_proj"],
|
|
290
|
+
lora_dropout=0.05,
|
|
291
|
+
bias="none",
|
|
292
|
+
task_type="CAUSAL_LM"
|
|
293
|
+
)
|
|
294
|
+
|
|
295
|
+
# Add LoRA adapters
|
|
296
|
+
model = get_peft_model(model, lora_config)
|
|
297
|
+
|
|
298
|
+
# Fine-tune (memory efficient!)
|
|
299
|
+
# 70B model trainable on single A100 80GB
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
## Performance benchmarks
|
|
303
|
+
|
|
304
|
+
### Memory reduction
|
|
305
|
+
|
|
306
|
+
| Model | FP16 | GPTQ 4-bit | Reduction |
|
|
307
|
+
|-------|------|------------|-----------|
|
|
308
|
+
| Llama 2-7B | 14 GB | 3.5 GB | 4× |
|
|
309
|
+
| Llama 2-13B | 26 GB | 6.5 GB | 4× |
|
|
310
|
+
| Llama 2-70B | 140 GB | 35 GB | 4× |
|
|
311
|
+
| Llama 3-405B | 810 GB | 203 GB | 4× |
|
|
312
|
+
|
|
313
|
+
**Enables**:
|
|
314
|
+
- 70B on single A100 80GB (vs 2× A100 needed for FP16)
|
|
315
|
+
- 405B on 3× A100 80GB (vs 11× A100 needed for FP16)
|
|
316
|
+
- 13B on RTX 4090 24GB (vs OOM with FP16)
|
|
317
|
+
|
|
318
|
+
### Inference speed (Llama 2-7B, A100)
|
|
319
|
+
|
|
320
|
+
| Precision | Tokens/sec | vs FP16 |
|
|
321
|
+
|-----------|------------|---------|
|
|
322
|
+
| FP16 | 25 tok/s | 1× |
|
|
323
|
+
| GPTQ 4-bit (CUDA) | 85 tok/s | 3.4× |
|
|
324
|
+
| GPTQ 4-bit (ExLlama) | 105 tok/s | 4.2× |
|
|
325
|
+
| GPTQ 4-bit (Marlin) | 120 tok/s | 4.8× |
|
|
326
|
+
|
|
327
|
+
### Accuracy (perplexity on WikiText-2)
|
|
328
|
+
|
|
329
|
+
| Model | FP16 | GPTQ 4-bit (g=128) | Degradation |
|
|
330
|
+
|-------|------|---------------------|-------------|
|
|
331
|
+
| Llama 2-7B | 5.47 | 5.55 | +1.5% |
|
|
332
|
+
| Llama 2-13B | 4.88 | 4.95 | +1.4% |
|
|
333
|
+
| Llama 2-70B | 3.32 | 3.38 | +1.8% |
|
|
334
|
+
|
|
335
|
+
**Excellent quality preservation** - less than 2% degradation!
|
|
336
|
+
|
|
337
|
+
## Common patterns
|
|
338
|
+
|
|
339
|
+
### Multi-GPU deployment
|
|
340
|
+
|
|
341
|
+
```python
|
|
342
|
+
# Automatic device mapping
|
|
343
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
344
|
+
"TheBloke/Llama-2-70B-GPTQ",
|
|
345
|
+
device_map="auto", # Automatically split across GPUs
|
|
346
|
+
max_memory={0: "40GB", 1: "40GB"} # Limit per GPU
|
|
347
|
+
)
|
|
348
|
+
|
|
349
|
+
# Manual device mapping
|
|
350
|
+
device_map = {
|
|
351
|
+
"model.embed_tokens": 0,
|
|
352
|
+
"model.layers.0-39": 0, # First 40 layers on GPU 0
|
|
353
|
+
"model.layers.40-79": 1, # Last 40 layers on GPU 1
|
|
354
|
+
"model.norm": 1,
|
|
355
|
+
"lm_head": 1
|
|
356
|
+
}
|
|
357
|
+
|
|
358
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
359
|
+
model_name,
|
|
360
|
+
device_map=device_map
|
|
361
|
+
)
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
### CPU offloading
|
|
365
|
+
|
|
366
|
+
```python
|
|
367
|
+
# Offload some layers to CPU (for very large models)
|
|
368
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
369
|
+
"TheBloke/Llama-2-405B-GPTQ",
|
|
370
|
+
device_map="auto",
|
|
371
|
+
max_memory={
|
|
372
|
+
0: "80GB", # GPU 0
|
|
373
|
+
1: "80GB", # GPU 1
|
|
374
|
+
2: "80GB", # GPU 2
|
|
375
|
+
"cpu": "200GB" # Offload overflow to CPU
|
|
376
|
+
}
|
|
377
|
+
)
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Batch inference
|
|
381
|
+
|
|
382
|
+
```python
|
|
383
|
+
# Process multiple prompts efficiently
|
|
384
|
+
prompts = [
|
|
385
|
+
"Explain AI",
|
|
386
|
+
"Explain ML",
|
|
387
|
+
"Explain DL"
|
|
388
|
+
]
|
|
389
|
+
|
|
390
|
+
inputs = tokenizer(prompts, return_tensors="pt", padding=True).to("cuda")
|
|
391
|
+
|
|
392
|
+
outputs = model.generate(
|
|
393
|
+
**inputs,
|
|
394
|
+
max_new_tokens=100,
|
|
395
|
+
pad_token_id=tokenizer.eos_token_id
|
|
396
|
+
)
|
|
397
|
+
|
|
398
|
+
for i, output in enumerate(outputs):
|
|
399
|
+
print(f"Prompt {i}: {tokenizer.decode(output)}")
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
## Finding pre-quantized models
|
|
403
|
+
|
|
404
|
+
**TheBloke on HuggingFace**:
|
|
405
|
+
- https://huggingface.co/TheBloke
|
|
406
|
+
- 1000+ models in GPTQ format
|
|
407
|
+
- Multiple group sizes (32, 128)
|
|
408
|
+
- Both CUDA and Marlin formats
|
|
409
|
+
|
|
410
|
+
**Search**:
|
|
411
|
+
```bash
|
|
412
|
+
# Find GPTQ models on HuggingFace
|
|
413
|
+
https://huggingface.co/models?library=gptq
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
**Download**:
|
|
417
|
+
```python
|
|
418
|
+
from auto_gptq import AutoGPTQForCausalLM
|
|
419
|
+
|
|
420
|
+
# Automatically downloads from HuggingFace
|
|
421
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
422
|
+
"TheBloke/Llama-2-70B-Chat-GPTQ",
|
|
423
|
+
device="cuda:0"
|
|
424
|
+
)
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
## Supported models
|
|
428
|
+
|
|
429
|
+
- **LLaMA family**: Llama 2, Llama 3, Code Llama
|
|
430
|
+
- **Mistral**: Mistral 7B, Mixtral 8x7B, 8x22B
|
|
431
|
+
- **Qwen**: Qwen, Qwen2, QwQ
|
|
432
|
+
- **DeepSeek**: V2, V3
|
|
433
|
+
- **Phi**: Phi-2, Phi-3
|
|
434
|
+
- **Yi, Falcon, BLOOM, OPT**
|
|
435
|
+
- **100+ models** on HuggingFace
|
|
436
|
+
|
|
437
|
+
## References
|
|
438
|
+
|
|
439
|
+
- **[Calibration Guide](references/calibration.md)** - Dataset selection, quantization process, quality optimization
|
|
440
|
+
- **[Integration Guide](references/integration.md)** - Transformers, PEFT, vLLM, TensorRT-LLM
|
|
441
|
+
- **[Troubleshooting](references/troubleshooting.md)** - Common issues, performance optimization
|
|
442
|
+
|
|
443
|
+
## Resources
|
|
444
|
+
|
|
445
|
+
- **GitHub**: https://github.com/AutoGPTQ/AutoGPTQ
|
|
446
|
+
- **Paper**: GPTQ: Accurate Post-Training Quantization (arXiv:2210.17323)
|
|
447
|
+
- **Models**: https://huggingface.co/models?library=gptq
|
|
448
|
+
- **Discord**: https://discord.gg/autogptq
|
|
449
|
+
|
|
450
|
+
|