@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,337 @@
|
|
|
1
|
+
# GPTQ Calibration Guide
|
|
2
|
+
|
|
3
|
+
Complete guide to calibration data selection and quantization process.
|
|
4
|
+
|
|
5
|
+
## Calibration Data Selection
|
|
6
|
+
|
|
7
|
+
### Why calibration matters
|
|
8
|
+
|
|
9
|
+
Calibration data is used to:
|
|
10
|
+
1. **Compute weight importance** (Hessian matrix)
|
|
11
|
+
2. **Minimize quantization error** for important weights
|
|
12
|
+
3. **Preserve model accuracy** after quantization
|
|
13
|
+
|
|
14
|
+
**Impact**:
|
|
15
|
+
- Good calibration: <1.5% perplexity increase
|
|
16
|
+
- Poor calibration: 5-10% perplexity increase
|
|
17
|
+
- No calibration: Model may output gibberish
|
|
18
|
+
|
|
19
|
+
### Dataset size
|
|
20
|
+
|
|
21
|
+
**Recommended**:
|
|
22
|
+
- **128-256 samples** of 512 tokens each
|
|
23
|
+
- Total: 65K-131K tokens
|
|
24
|
+
|
|
25
|
+
**More is not always better**:
|
|
26
|
+
- <64 samples: Underfitting (poor quality)
|
|
27
|
+
- 128-256 samples: Sweet spot
|
|
28
|
+
- >512 samples: Diminishing returns, slower quantization
|
|
29
|
+
|
|
30
|
+
### Dataset selection by domain
|
|
31
|
+
|
|
32
|
+
**General purpose models (GPT, Llama)**:
|
|
33
|
+
```python
|
|
34
|
+
from datasets import load_dataset
|
|
35
|
+
|
|
36
|
+
# C4 dataset (recommended for general models)
|
|
37
|
+
dataset = load_dataset("c4", split="train", streaming=True)
|
|
38
|
+
calibration_data = [
|
|
39
|
+
tokenizer(example["text"])["input_ids"][:512]
|
|
40
|
+
for example in dataset.take(128)
|
|
41
|
+
]
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Code models (CodeLlama, StarCoder)**:
|
|
45
|
+
```python
|
|
46
|
+
# The Stack dataset
|
|
47
|
+
dataset = load_dataset("bigcode/the-stack", split="train", streaming=True)
|
|
48
|
+
calibration_data = [
|
|
49
|
+
tokenizer(example["content"])["input_ids"][:512]
|
|
50
|
+
for example in dataset.take(128)
|
|
51
|
+
if example["lang"] == "Python" # Or your target language
|
|
52
|
+
]
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
**Chat models**:
|
|
56
|
+
```python
|
|
57
|
+
# ShareGPT or Alpaca format
|
|
58
|
+
dataset = load_dataset("anon8231489123/ShareGPT_Vicuna_unfiltered", split="train")
|
|
59
|
+
|
|
60
|
+
calibration_data = []
|
|
61
|
+
for example in dataset.select(range(128)):
|
|
62
|
+
# Format as conversation
|
|
63
|
+
conversation = tokenizer.apply_chat_template(
|
|
64
|
+
example["conversations"],
|
|
65
|
+
tokenize=True,
|
|
66
|
+
max_length=512
|
|
67
|
+
)
|
|
68
|
+
calibration_data.append(conversation)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Domain-specific (medical, legal)**:
|
|
72
|
+
```python
|
|
73
|
+
# Use domain-specific text
|
|
74
|
+
dataset = load_dataset("medical_dataset", split="train")
|
|
75
|
+
calibration_data = [
|
|
76
|
+
tokenizer(example["text"])["input_ids"][:512]
|
|
77
|
+
for example in dataset.take(256) # More samples for niche domains
|
|
78
|
+
]
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Quantization Process
|
|
82
|
+
|
|
83
|
+
### Basic quantization
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
|
87
|
+
from transformers import AutoTokenizer
|
|
88
|
+
from datasets import load_dataset
|
|
89
|
+
|
|
90
|
+
# 1. Load model
|
|
91
|
+
model_name = "meta-llama/Llama-2-7b-hf"
|
|
92
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
93
|
+
model_name,
|
|
94
|
+
quantize_config=BaseQuantizeConfig(
|
|
95
|
+
bits=4,
|
|
96
|
+
group_size=128,
|
|
97
|
+
desc_act=False
|
|
98
|
+
)
|
|
99
|
+
)
|
|
100
|
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
101
|
+
|
|
102
|
+
# 2. Prepare calibration data
|
|
103
|
+
dataset = load_dataset("c4", split="train", streaming=True)
|
|
104
|
+
calibration_data = [
|
|
105
|
+
tokenizer(example["text"])["input_ids"][:512]
|
|
106
|
+
for example in dataset.take(128)
|
|
107
|
+
]
|
|
108
|
+
|
|
109
|
+
# 3. Quantize
|
|
110
|
+
model.quantize(calibration_data)
|
|
111
|
+
|
|
112
|
+
# 4. Save
|
|
113
|
+
model.save_quantized("llama-2-7b-gptq")
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
**Time**: ~10-30 minutes for 7B model on A100
|
|
117
|
+
|
|
118
|
+
### Advanced configuration
|
|
119
|
+
|
|
120
|
+
```python
|
|
121
|
+
config = BaseQuantizeConfig(
|
|
122
|
+
bits=4, # 3, 4, or 8 bits
|
|
123
|
+
group_size=128, # 32, 64, 128, or -1 (per-column)
|
|
124
|
+
desc_act=False, # Activation order (True = better accuracy, slower)
|
|
125
|
+
damp_percent=0.01, # Dampening (0.001-0.1, default 0.01)
|
|
126
|
+
static_groups=False, # Static quantization
|
|
127
|
+
sym=True, # Symmetric quantization
|
|
128
|
+
true_sequential=True, # Sequential quantization (more accurate)
|
|
129
|
+
model_seqlen=2048 # Model sequence length
|
|
130
|
+
)
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
**Parameter tuning**:
|
|
134
|
+
- `damp_percent`: Lower = more accurate, slower. Try 0.005-0.02.
|
|
135
|
+
- `desc_act=True`: 0.5-1% better accuracy, 20-30% slower inference
|
|
136
|
+
- `group_size=32`: Better accuracy, slightly larger model
|
|
137
|
+
|
|
138
|
+
### Multi-GPU quantization
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
# Quantize on multiple GPUs (faster)
|
|
142
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
143
|
+
model_name,
|
|
144
|
+
quantize_config=config,
|
|
145
|
+
device_map="auto", # Distribute across GPUs
|
|
146
|
+
max_memory={0: "40GB", 1: "40GB"}
|
|
147
|
+
)
|
|
148
|
+
|
|
149
|
+
model.quantize(calibration_data)
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Quality Evaluation
|
|
153
|
+
|
|
154
|
+
### Perplexity testing
|
|
155
|
+
|
|
156
|
+
```python
|
|
157
|
+
from datasets import load_dataset
|
|
158
|
+
import torch
|
|
159
|
+
|
|
160
|
+
# Load test dataset
|
|
161
|
+
test_dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
|
|
162
|
+
test_text = "\n\n".join(test_dataset["text"])
|
|
163
|
+
|
|
164
|
+
# Tokenize
|
|
165
|
+
encodings = tokenizer(test_text, return_tensors="pt")
|
|
166
|
+
max_length = model.seqlen
|
|
167
|
+
|
|
168
|
+
# Calculate perplexity
|
|
169
|
+
nlls = []
|
|
170
|
+
for i in range(0, encodings.input_ids.size(1), max_length):
|
|
171
|
+
begin_loc = i
|
|
172
|
+
end_loc = min(i + max_length, encodings.input_ids.size(1))
|
|
173
|
+
input_ids = encodings.input_ids[:, begin_loc:end_loc].to("cuda")
|
|
174
|
+
|
|
175
|
+
with torch.no_grad():
|
|
176
|
+
outputs = model(input_ids, labels=input_ids)
|
|
177
|
+
nll = outputs.loss
|
|
178
|
+
|
|
179
|
+
nlls.append(nll)
|
|
180
|
+
|
|
181
|
+
ppl = torch.exp(torch.stack(nlls).mean())
|
|
182
|
+
print(f"Perplexity: {ppl.item():.2f}")
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Quality targets**:
|
|
186
|
+
- <1.5% increase: Excellent
|
|
187
|
+
- 1.5-3% increase: Good
|
|
188
|
+
- 3-5% increase: Acceptable for some use cases
|
|
189
|
+
- >5% increase: Poor, redo calibration
|
|
190
|
+
|
|
191
|
+
### Benchmark evaluation
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
from lm_eval import evaluator
|
|
195
|
+
|
|
196
|
+
# Evaluate on standard benchmarks
|
|
197
|
+
results = evaluator.simple_evaluate(
|
|
198
|
+
model=model,
|
|
199
|
+
tasks=["hellaswag", "mmlu", "arc_challenge"],
|
|
200
|
+
num_fewshot=5
|
|
201
|
+
)
|
|
202
|
+
|
|
203
|
+
print(results["results"])
|
|
204
|
+
|
|
205
|
+
# Compare to baseline FP16 scores
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## Optimization Tips
|
|
209
|
+
|
|
210
|
+
### Improving accuracy
|
|
211
|
+
|
|
212
|
+
**1. Use more calibration samples**:
|
|
213
|
+
```python
|
|
214
|
+
# Try 256 or 512 samples
|
|
215
|
+
calibration_data = [... for example in dataset.take(256)]
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
**2. Use domain-specific data**:
|
|
219
|
+
```python
|
|
220
|
+
# Match your use case
|
|
221
|
+
if code_model:
|
|
222
|
+
dataset = load_dataset("bigcode/the-stack")
|
|
223
|
+
elif chat_model:
|
|
224
|
+
dataset = load_dataset("ShareGPT")
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
**3. Enable activation reordering**:
|
|
228
|
+
```python
|
|
229
|
+
config = BaseQuantizeConfig(
|
|
230
|
+
bits=4,
|
|
231
|
+
group_size=128,
|
|
232
|
+
desc_act=True # Better accuracy, slower inference
|
|
233
|
+
)
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
**4. Use smaller group size**:
|
|
237
|
+
```python
|
|
238
|
+
config = BaseQuantizeConfig(
|
|
239
|
+
bits=4,
|
|
240
|
+
group_size=32, # vs 128
|
|
241
|
+
desc_act=False
|
|
242
|
+
)
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
### Reducing quantization time
|
|
246
|
+
|
|
247
|
+
**1. Use fewer samples**:
|
|
248
|
+
```python
|
|
249
|
+
# 64-128 samples usually sufficient
|
|
250
|
+
calibration_data = [... for example in dataset.take(64)]
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
**2. Disable activation ordering**:
|
|
254
|
+
```python
|
|
255
|
+
config = BaseQuantizeConfig(
|
|
256
|
+
desc_act=False # Faster quantization
|
|
257
|
+
)
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**3. Use multi-GPU**:
|
|
261
|
+
```python
|
|
262
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
263
|
+
model_name,
|
|
264
|
+
device_map="auto" # Parallelize across GPUs
|
|
265
|
+
)
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
## Troubleshooting
|
|
269
|
+
|
|
270
|
+
### Poor quality after quantization
|
|
271
|
+
|
|
272
|
+
**Symptom**: >5% perplexity increase or gibberish output
|
|
273
|
+
|
|
274
|
+
**Solutions**:
|
|
275
|
+
1. **Check calibration data**:
|
|
276
|
+
```python
|
|
277
|
+
# Verify data is representative
|
|
278
|
+
for sample in calibration_data[:5]:
|
|
279
|
+
print(tokenizer.decode(sample))
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
2. **Try more samples**:
|
|
283
|
+
```python
|
|
284
|
+
calibration_data = [... for example in dataset.take(256)]
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
3. **Use domain-specific data**:
|
|
288
|
+
```python
|
|
289
|
+
# Match your model's use case
|
|
290
|
+
dataset = load_dataset("domain_specific_dataset")
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
4. **Adjust dampening**:
|
|
294
|
+
```python
|
|
295
|
+
config = BaseQuantizeConfig(damp_percent=0.005) # Lower dampening
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
### Quantization OOM
|
|
299
|
+
|
|
300
|
+
**Solutions**:
|
|
301
|
+
1. **Reduce batch size**:
|
|
302
|
+
```python
|
|
303
|
+
model.quantize(calibration_data, batch_size=1) # Default: auto
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
2. **Use CPU offloading**:
|
|
307
|
+
```python
|
|
308
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
309
|
+
model_name,
|
|
310
|
+
device_map="auto",
|
|
311
|
+
max_memory={"cpu": "100GB"}
|
|
312
|
+
)
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
3. **Quantize on larger GPU** or use multi-GPU
|
|
316
|
+
|
|
317
|
+
### Slow quantization
|
|
318
|
+
|
|
319
|
+
**Typical times** (7B model):
|
|
320
|
+
- Single A100: 10-15 minutes
|
|
321
|
+
- Single RTX 4090: 20-30 minutes
|
|
322
|
+
- CPU: 2-4 hours (not recommended)
|
|
323
|
+
|
|
324
|
+
**Speedup**:
|
|
325
|
+
- Use fewer samples (64 vs 256)
|
|
326
|
+
- Disable `desc_act`
|
|
327
|
+
- Use multi-GPU
|
|
328
|
+
|
|
329
|
+
## Best Practices
|
|
330
|
+
|
|
331
|
+
1. **Use C4 dataset for general models** - well-balanced, diverse
|
|
332
|
+
2. **Match domain** - code models need code data, chat needs conversations
|
|
333
|
+
3. **Start with 128 samples** - good balance of speed and quality
|
|
334
|
+
4. **Test perplexity** - always verify quality before deployment
|
|
335
|
+
5. **Compare kernels** - try ExLlama, Marlin, Triton for speed
|
|
336
|
+
6. **Save multiple versions** - try group_size 32, 128, 256
|
|
337
|
+
7. **Document settings** - save quantize_config.json for reproducibility
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# GPTQ Integration Guide
|
|
2
|
+
|
|
3
|
+
Integration with transformers, PEFT, vLLM, and other frameworks.
|
|
4
|
+
|
|
5
|
+
## Transformers Integration
|
|
6
|
+
|
|
7
|
+
### Auto-detection
|
|
8
|
+
```python
|
|
9
|
+
from transformers import AutoModelForCausalLM
|
|
10
|
+
|
|
11
|
+
# Automatically detects and loads GPTQ model
|
|
12
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
13
|
+
"TheBloke/Llama-2-13B-GPTQ",
|
|
14
|
+
device_map="auto"
|
|
15
|
+
)
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
### Manual loading
|
|
19
|
+
```python
|
|
20
|
+
from auto_gptq import AutoGPTQForCausalLM
|
|
21
|
+
|
|
22
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
23
|
+
"TheBloke/Llama-2-13B-GPTQ",
|
|
24
|
+
device="cuda:0",
|
|
25
|
+
use_exllama=True
|
|
26
|
+
)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## QLoRA Fine-Tuning
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from transformers import AutoModelForCausalLM, TrainingArguments
|
|
33
|
+
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
|
|
34
|
+
from trl import SFTTrainer
|
|
35
|
+
|
|
36
|
+
# Load GPTQ model
|
|
37
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
38
|
+
"TheBloke/Llama-2-70B-GPTQ",
|
|
39
|
+
device_map="auto"
|
|
40
|
+
)
|
|
41
|
+
|
|
42
|
+
# Prepare for training
|
|
43
|
+
model = prepare_model_for_kbit_training(model)
|
|
44
|
+
|
|
45
|
+
# LoRA config
|
|
46
|
+
lora_config = LoraConfig(
|
|
47
|
+
r=16,
|
|
48
|
+
lora_alpha=32,
|
|
49
|
+
target_modules=["q_proj", "v_proj"],
|
|
50
|
+
lora_dropout=0.05,
|
|
51
|
+
bias="none",
|
|
52
|
+
task_type="CAUSAL_LM"
|
|
53
|
+
)
|
|
54
|
+
|
|
55
|
+
model = get_peft_model(model, lora_config)
|
|
56
|
+
|
|
57
|
+
# Train (70B model on single A100!)
|
|
58
|
+
trainer = SFTTrainer(
|
|
59
|
+
model=model,
|
|
60
|
+
train_dataset=dataset,
|
|
61
|
+
max_seq_length=2048,
|
|
62
|
+
args=TrainingArguments(
|
|
63
|
+
per_device_train_batch_size=1,
|
|
64
|
+
gradient_accumulation_steps=16,
|
|
65
|
+
learning_rate=2e-4,
|
|
66
|
+
num_train_epochs=3,
|
|
67
|
+
output_dir="./results"
|
|
68
|
+
)
|
|
69
|
+
)
|
|
70
|
+
|
|
71
|
+
trainer.train()
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## vLLM Integration
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
from vllm import LLM, SamplingParams
|
|
78
|
+
|
|
79
|
+
# Load GPTQ model in vLLM
|
|
80
|
+
llm = LLM(
|
|
81
|
+
model="TheBloke/Llama-2-70B-GPTQ",
|
|
82
|
+
quantization="gptq",
|
|
83
|
+
dtype="float16",
|
|
84
|
+
gpu_memory_utilization=0.95
|
|
85
|
+
)
|
|
86
|
+
|
|
87
|
+
# Generate
|
|
88
|
+
sampling_params = SamplingParams(
|
|
89
|
+
temperature=0.7,
|
|
90
|
+
top_p=0.9,
|
|
91
|
+
max_tokens=200
|
|
92
|
+
)
|
|
93
|
+
|
|
94
|
+
outputs = llm.generate(["Explain AI"], sampling_params)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Text Generation Inference (TGI)
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
# Docker with GPTQ support
|
|
101
|
+
docker run --gpus all -p 8080:80 \
|
|
102
|
+
-v $PWD/data:/data \
|
|
103
|
+
ghcr.io/huggingface/text-generation-inference:latest \
|
|
104
|
+
--model-id TheBloke/Llama-2-70B-GPTQ \
|
|
105
|
+
--quantize gptq
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## LangChain Integration
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
from langchain.llms import HuggingFacePipeline
|
|
112
|
+
from transformers import AutoTokenizer, pipeline
|
|
113
|
+
|
|
114
|
+
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-GPTQ")
|
|
115
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
116
|
+
"TheBloke/Llama-2-13B-GPTQ",
|
|
117
|
+
device_map="auto"
|
|
118
|
+
)
|
|
119
|
+
|
|
120
|
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200)
|
|
121
|
+
llm = HuggingFacePipeline(pipeline=pipe)
|
|
122
|
+
|
|
123
|
+
# Use in LangChain
|
|
124
|
+
from langchain.chains import LLMChain
|
|
125
|
+
from langchain.prompts import PromptTemplate
|
|
126
|
+
|
|
127
|
+
chain = LLMChain(llm=llm, prompt=PromptTemplate(...))
|
|
128
|
+
result = chain.run(input="...")
|
|
129
|
+
```
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# GPTQ Troubleshooting Guide
|
|
2
|
+
|
|
3
|
+
Common issues and solutions for GPTQ quantization and inference.
|
|
4
|
+
|
|
5
|
+
## Installation Issues
|
|
6
|
+
|
|
7
|
+
### CUDA mismatch
|
|
8
|
+
```bash
|
|
9
|
+
# Check CUDA version
|
|
10
|
+
nvcc --version
|
|
11
|
+
python -c "import torch; print(torch.version.cuda)"
|
|
12
|
+
|
|
13
|
+
# Install matching version
|
|
14
|
+
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # CUDA 11.8
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### Build errors
|
|
18
|
+
```bash
|
|
19
|
+
# Install build dependencies
|
|
20
|
+
pip install auto-gptq --no-build-isolation
|
|
21
|
+
|
|
22
|
+
# On Ubuntu
|
|
23
|
+
sudo apt-get install python3-dev
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Runtime Issues
|
|
27
|
+
|
|
28
|
+
### Slow inference
|
|
29
|
+
```python
|
|
30
|
+
# Try different backends
|
|
31
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
32
|
+
model_name,
|
|
33
|
+
use_exllama=True # Fastest (try v1 or v2)
|
|
34
|
+
)
|
|
35
|
+
|
|
36
|
+
# Or Marlin (Ampere+ GPUs)
|
|
37
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
38
|
+
model_name,
|
|
39
|
+
use_marlin=True
|
|
40
|
+
)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### OOM during inference
|
|
44
|
+
```python
|
|
45
|
+
# Reduce batch size
|
|
46
|
+
outputs = model.generate(**inputs, batch_size=1)
|
|
47
|
+
|
|
48
|
+
# Use CPU offloading
|
|
49
|
+
model = AutoGPTQForCausalLM.from_quantized(
|
|
50
|
+
model_name,
|
|
51
|
+
device_map="auto",
|
|
52
|
+
max_memory={"cpu": "100GB"}
|
|
53
|
+
)
|
|
54
|
+
|
|
55
|
+
# Reduce context
|
|
56
|
+
model.seqlen = 1024 # Instead of 2048
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Poor quality outputs
|
|
60
|
+
```python
|
|
61
|
+
# Requantize with better calibration
|
|
62
|
+
# 1. Use more samples (256 instead of 128)
|
|
63
|
+
# 2. Use domain-specific data
|
|
64
|
+
# 3. Lower dampening: damp_percent=0.005
|
|
65
|
+
# 4. Enable desc_act=True
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Quantization Issues
|
|
69
|
+
|
|
70
|
+
### Very slow quantization
|
|
71
|
+
```bash
|
|
72
|
+
# Expected times (7B model):
|
|
73
|
+
# - A100: 10-15 min
|
|
74
|
+
# - RTX 4090: 20-30 min
|
|
75
|
+
# - CPU: 2-4 hours
|
|
76
|
+
|
|
77
|
+
# Speed up:
|
|
78
|
+
# 1. Use GPU
|
|
79
|
+
# 2. Reduce samples (64 instead of 256)
|
|
80
|
+
# 3. Disable desc_act
|
|
81
|
+
# 4. Use multi-GPU
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Quantization crashes
|
|
85
|
+
```python
|
|
86
|
+
# Reduce memory usage
|
|
87
|
+
model = AutoGPTQForCausalLM.from_pretrained(
|
|
88
|
+
model_name,
|
|
89
|
+
device_map="auto",
|
|
90
|
+
max_memory={"cpu": "100GB"} # Offload to CPU
|
|
91
|
+
)
|
|
92
|
+
|
|
93
|
+
# Or quantize layer-by-layer (slower but works)
|
|
94
|
+
model.quantize(calibration_data, batch_size=1)
|
|
95
|
+
```
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# GRPO/RL Training Skill
|
|
2
|
+
|
|
3
|
+
**Expert-level guidance for Group Relative Policy Optimization with TRL**
|
|
4
|
+
|
|
5
|
+
## 📁 Skill Structure
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
grpo-rl-training/
|
|
9
|
+
├── SKILL.md # Main skill documentation (READ THIS FIRST)
|
|
10
|
+
├── README.md # This file
|
|
11
|
+
├── templates/
|
|
12
|
+
│ └── basic_grpo_training.py # Production-ready training template
|
|
13
|
+
└── examples/
|
|
14
|
+
└── reward_functions_library.py # 20+ reward function examples
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## 🚀 Quick Start
|
|
18
|
+
|
|
19
|
+
1. **Read SKILL.md** - Comprehensive guide with all concepts and patterns
|
|
20
|
+
2. **Copy `templates/basic_grpo_training.py`** - Start with working code
|
|
21
|
+
3. **Browse `examples/reward_functions_library.py`** - Pick reward functions for your task
|
|
22
|
+
4. **Modify for your use case** - Adapt dataset, rewards, and config
|
|
23
|
+
|
|
24
|
+
## 💡 What's Inside
|
|
25
|
+
|
|
26
|
+
### SKILL.md (Main Documentation)
|
|
27
|
+
- Core GRPO concepts and algorithm fundamentals
|
|
28
|
+
- Complete implementation workflow (dataset → rewards → training → deployment)
|
|
29
|
+
- 10+ reward function examples with code
|
|
30
|
+
- Hyperparameter tuning guide
|
|
31
|
+
- Training insights (loss behavior, metrics, debugging)
|
|
32
|
+
- Troubleshooting guide
|
|
33
|
+
- Production best practices
|
|
34
|
+
|
|
35
|
+
### Templates
|
|
36
|
+
- **basic_grpo_training.py**: Minimal, production-ready training script
|
|
37
|
+
- Uses Qwen 2.5 1.5B Instruct
|
|
38
|
+
- 3 reward functions (format + correctness)
|
|
39
|
+
- LoRA for efficient training
|
|
40
|
+
- Fully documented and ready to run
|
|
41
|
+
|
|
42
|
+
### Examples
|
|
43
|
+
- **reward_functions_library.py**: 20+ battle-tested reward functions
|
|
44
|
+
- Correctness rewards (exact match, fuzzy match, numeric, code execution)
|
|
45
|
+
- Format rewards (XML, JSON, strict/soft)
|
|
46
|
+
- Length rewards (ideal length, min/max)
|
|
47
|
+
- Style rewards (reasoning quality, citations, repetition penalty)
|
|
48
|
+
- Combined rewards (multi-objective optimization)
|
|
49
|
+
- Preset collections for common tasks
|
|
50
|
+
|
|
51
|
+
## 📖 Usage for Agents
|
|
52
|
+
|
|
53
|
+
When this skill is loaded in your agent's context:
|
|
54
|
+
|
|
55
|
+
1. **Always read SKILL.md first** before implementing
|
|
56
|
+
2. **Start simple** - Use length-based reward to validate setup
|
|
57
|
+
3. **Build incrementally** - Add one reward function at a time
|
|
58
|
+
4. **Reference examples** - Copy patterns from reward_functions_library.py
|
|
59
|
+
5. **Monitor training** - Watch reward metrics (not loss!)
|
|
60
|
+
|
|
61
|
+
## 🎯 Common Use Cases
|
|
62
|
+
|
|
63
|
+
| Task Type | Recommended Rewards | Template |
|
|
64
|
+
|-----------|---------------------|----------|
|
|
65
|
+
| Math reasoning | `MATH_REASONING_REWARDS` preset | basic_grpo_training.py |
|
|
66
|
+
| Code generation | `CODE_GENERATION_REWARDS` preset | Modify dataset in template |
|
|
67
|
+
| Summarization | `SUMMARIZATION_REWARDS` preset | Adjust prompts + rewards |
|
|
68
|
+
| Q&A | `QA_REWARDS` preset | Use fuzzy match + citations |
|
|
69
|
+
|
|
70
|
+
## ⚠️ Critical Reminders
|
|
71
|
+
|
|
72
|
+
- **Loss goes UP during training** - This is normal (it's KL divergence)
|
|
73
|
+
- **Use 3-5 reward functions** - Single rewards often fail
|
|
74
|
+
- **Test rewards before training** - Debug each function independently
|
|
75
|
+
- **Monitor reward_std** - Should stay > 0.1 (avoid mode collapse)
|
|
76
|
+
- **Start with num_generations=4-8** - Scale up if GPU allows
|
|
77
|
+
|
|
78
|
+
## 🔗 External Resources
|
|
79
|
+
|
|
80
|
+
- [TRL Documentation](https://huggingface.co/docs/trl)
|
|
81
|
+
- [DeepSeek R1 Paper](https://arxiv.org/abs/2501.12948)
|
|
82
|
+
- [Open R1 Implementation](https://github.com/huggingface/open-r1)
|
|
83
|
+
- [Unsloth (2-3x faster)](https://docs.unsloth.ai/)
|
|
84
|
+
|
|
85
|
+
## 📝 Version
|
|
86
|
+
|
|
87
|
+
**v1.0.0** - Initial release (January 2025)
|
|
88
|
+
|
|
89
|
+
## 👨💻 Maintained By
|
|
90
|
+
|
|
91
|
+
Synthetic Sciences
|
|
92
|
+
For questions or improvements, see https://orchestra.com
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
**License:** MIT
|
|
97
|
+
**Last Updated:** January 2025
|