@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,334 @@
|
|
|
1
|
+
# MiniLLM: Reverse KL Divergence for LLM Distillation
|
|
2
|
+
|
|
3
|
+
Based on arXiv 2306.08543 (2024) - MiniLLM: Knowledge Distillation of Large Language Models
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
**Source**: https://arxiv.org/abs/2306.08543
|
|
8
|
+
**GitHub**: https://github.com/microsoft/LMOps/tree/main/minillm
|
|
9
|
+
|
|
10
|
+
MiniLLM replaces forward KLD with reverse KLD for knowledge distillation, achieving better performance on generative language models.
|
|
11
|
+
|
|
12
|
+
## Problem with Standard KLD
|
|
13
|
+
|
|
14
|
+
### Forward KL Divergence (Standard)
|
|
15
|
+
|
|
16
|
+
**Formula**: `KL(Student || Teacher)`
|
|
17
|
+
|
|
18
|
+
**Minimization behavior**: Mode-seeking
|
|
19
|
+
```
|
|
20
|
+
Student tries to match teacher's MEAN behavior
|
|
21
|
+
→ Student focuses on teacher's highest probability regions
|
|
22
|
+
→ Student ignores low-probability but valid generations
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**Issue for generative models**: Limits diversity, student generates safe but boring outputs.
|
|
26
|
+
|
|
27
|
+
### Why Forward KL Fails for Generation
|
|
28
|
+
|
|
29
|
+
```python
|
|
30
|
+
# Teacher distribution (diverse)
|
|
31
|
+
teacher_probs = [0.3, 0.3, 0.2, 0.1, 0.1] # Multiple valid options
|
|
32
|
+
|
|
33
|
+
# Forward KL minimization
|
|
34
|
+
# Student learns: [0.6, 0.3, 0.1, 0.0, 0.0]
|
|
35
|
+
# Problem: Ignores options 4-5 entirely (mode-seeking)
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## MiniLLM Solution: Reverse KLD
|
|
39
|
+
|
|
40
|
+
### Reverse KL Divergence
|
|
41
|
+
|
|
42
|
+
**Formula**: `KL(Teacher || Student)`
|
|
43
|
+
|
|
44
|
+
**Minimization behavior**: Mode-covering
|
|
45
|
+
```
|
|
46
|
+
Student tries to COVER all teacher's modes
|
|
47
|
+
→ Student learns diverse generation
|
|
48
|
+
→ Student doesn't ignore any valid teacher outputs
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Mathematical Formulation
|
|
52
|
+
|
|
53
|
+
**Forward KL** (standard distillation):
|
|
54
|
+
```
|
|
55
|
+
L_forward = Σ p_student(x) log(p_student(x) / p_teacher(x))
|
|
56
|
+
= E_{x~student} [log p_student(x) - log p_teacher(x)]
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
**Reverse KL** (MiniLLM):
|
|
60
|
+
```
|
|
61
|
+
L_reverse = Σ p_teacher(x) log(p_teacher(x) / p_student(x))
|
|
62
|
+
= E_{x~teacher} [log p_teacher(x) - log p_student(x)]
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
**Key difference**: Expectation over teacher distribution vs student distribution.
|
|
66
|
+
|
|
67
|
+
## Implementation
|
|
68
|
+
|
|
69
|
+
### Reverse KLD Loss
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
import torch
|
|
73
|
+
import torch.nn.functional as F
|
|
74
|
+
|
|
75
|
+
def reverse_kl_loss(student_logits, teacher_logits, temperature=1.0):
|
|
76
|
+
"""
|
|
77
|
+
Reverse KL divergence: KL(Teacher || Student).
|
|
78
|
+
|
|
79
|
+
Args:
|
|
80
|
+
student_logits: Model predictions (batch, seq_len, vocab_size)
|
|
81
|
+
teacher_logits: Teacher predictions (batch, seq_len, vocab_size)
|
|
82
|
+
temperature: Softening parameter
|
|
83
|
+
|
|
84
|
+
Returns:
|
|
85
|
+
Reverse KL divergence loss
|
|
86
|
+
"""
|
|
87
|
+
# Teacher distribution (target, detached)
|
|
88
|
+
p_teacher = F.softmax(teacher_logits / temperature, dim=-1)
|
|
89
|
+
p_teacher = p_teacher.detach() # Don't backprop through teacher
|
|
90
|
+
|
|
91
|
+
# Student distribution (learnable)
|
|
92
|
+
log_p_student = F.log_softmax(student_logits / temperature, dim=-1)
|
|
93
|
+
|
|
94
|
+
# Reverse KL: -Σ p_teacher * log p_student
|
|
95
|
+
reverse_kl = -(p_teacher * log_p_student).sum(dim=-1).mean()
|
|
96
|
+
|
|
97
|
+
# Temperature correction
|
|
98
|
+
return reverse_kl * (temperature ** 2)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Policy Gradient Optimization
|
|
102
|
+
|
|
103
|
+
**Challenge**: Reverse KL requires sampling from teacher.
|
|
104
|
+
|
|
105
|
+
**Solution**: Use policy gradient with teacher samples.
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
def minillm_policy_gradient(student_model, teacher_model, prompt_batch):
|
|
109
|
+
"""
|
|
110
|
+
MiniLLM training with policy gradient.
|
|
111
|
+
|
|
112
|
+
Steps:
|
|
113
|
+
1. Sample responses from teacher
|
|
114
|
+
2. Compute reverse KL using those samples
|
|
115
|
+
3. Optimize student to cover teacher's distribution
|
|
116
|
+
"""
|
|
117
|
+
# 1. Generate from teacher (detached)
|
|
118
|
+
with torch.no_grad():
|
|
119
|
+
teacher_outputs = teacher_model.generate(
|
|
120
|
+
prompt_batch,
|
|
121
|
+
max_new_tokens=256,
|
|
122
|
+
do_sample=True,
|
|
123
|
+
temperature=1.0,
|
|
124
|
+
return_dict_in_generate=True,
|
|
125
|
+
output_scores=True
|
|
126
|
+
)
|
|
127
|
+
|
|
128
|
+
teacher_sequences = teacher_outputs.sequences
|
|
129
|
+
teacher_scores = teacher_outputs.scores
|
|
130
|
+
|
|
131
|
+
# 2. Student evaluates teacher's samples
|
|
132
|
+
student_outputs = student_model(
|
|
133
|
+
input_ids=teacher_sequences,
|
|
134
|
+
labels=teacher_sequences
|
|
135
|
+
)
|
|
136
|
+
|
|
137
|
+
# 3. Policy gradient loss
|
|
138
|
+
# Maximize student's likelihood on teacher's samples
|
|
139
|
+
loss = -student_outputs.logits.mean()
|
|
140
|
+
|
|
141
|
+
return loss
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Training Procedure
|
|
145
|
+
|
|
146
|
+
### Two-Stage MiniLLM
|
|
147
|
+
|
|
148
|
+
**Stage 1**: Imitation learning (reverse KLD)
|
|
149
|
+
```python
|
|
150
|
+
# Learn to generate like teacher
|
|
151
|
+
for epoch in range(num_imitation_epochs):
|
|
152
|
+
for batch in dataloader:
|
|
153
|
+
# Sample from teacher
|
|
154
|
+
teacher_samples = teacher.generate(batch['prompts'])
|
|
155
|
+
|
|
156
|
+
# Student imitates
|
|
157
|
+
loss = reverse_kl_loss(
|
|
158
|
+
student(teacher_samples).logits,
|
|
159
|
+
teacher(teacher_samples).logits
|
|
160
|
+
)
|
|
161
|
+
|
|
162
|
+
loss.backward()
|
|
163
|
+
optimizer.step()
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
**Stage 2**: Self-training (optional)
|
|
167
|
+
```python
|
|
168
|
+
# Fine-tune on student's own generations
|
|
169
|
+
for epoch in range(num_self_train_epochs):
|
|
170
|
+
for batch in dataloader:
|
|
171
|
+
# Student generates
|
|
172
|
+
student_samples = student.generate(batch['prompts'])
|
|
173
|
+
|
|
174
|
+
# Self-training loss
|
|
175
|
+
loss = student(student_samples).loss
|
|
176
|
+
|
|
177
|
+
loss.backward()
|
|
178
|
+
optimizer.step()
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Complete Training Script
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
|
|
185
|
+
|
|
186
|
+
def train_minillm(
|
|
187
|
+
teacher_name="meta-llama/Llama-2-70b-hf",
|
|
188
|
+
student_name="meta-llama/Llama-2-7b-hf",
|
|
189
|
+
output_dir="./minillm-7b",
|
|
190
|
+
):
|
|
191
|
+
# Load models
|
|
192
|
+
teacher = AutoModelForCausalLM.from_pretrained(teacher_name, torch_dtype=torch.float16, device_map="auto")
|
|
193
|
+
student = AutoModelForCausalLM.from_pretrained(student_name, torch_dtype=torch.float16)
|
|
194
|
+
|
|
195
|
+
# Custom trainer with reverse KLD
|
|
196
|
+
class MiniLLMTrainer(Trainer):
|
|
197
|
+
def compute_loss(self, model, inputs, return_outputs=False):
|
|
198
|
+
# Generate from teacher
|
|
199
|
+
with torch.no_grad():
|
|
200
|
+
teacher_outputs = teacher.generate(
|
|
201
|
+
inputs['input_ids'],
|
|
202
|
+
max_new_tokens=256,
|
|
203
|
+
do_sample=True,
|
|
204
|
+
return_dict_in_generate=True,
|
|
205
|
+
output_scores=True
|
|
206
|
+
)
|
|
207
|
+
|
|
208
|
+
teacher_sequences = teacher_outputs.sequences
|
|
209
|
+
teacher_logits = torch.stack(teacher_outputs.scores, dim=1)
|
|
210
|
+
|
|
211
|
+
# Student evaluates teacher samples
|
|
212
|
+
student_outputs = model(
|
|
213
|
+
input_ids=teacher_sequences,
|
|
214
|
+
labels=teacher_sequences
|
|
215
|
+
)
|
|
216
|
+
|
|
217
|
+
student_logits = student_outputs.logits
|
|
218
|
+
|
|
219
|
+
# Reverse KL loss
|
|
220
|
+
loss = reverse_kl_loss(student_logits, teacher_logits)
|
|
221
|
+
|
|
222
|
+
return (loss, student_outputs) if return_outputs else loss
|
|
223
|
+
|
|
224
|
+
# Training arguments
|
|
225
|
+
training_args = TrainingArguments(
|
|
226
|
+
output_dir=output_dir,
|
|
227
|
+
num_train_epochs=5,
|
|
228
|
+
per_device_train_batch_size=2,
|
|
229
|
+
gradient_accumulation_steps=16,
|
|
230
|
+
learning_rate=5e-5,
|
|
231
|
+
warmup_steps=1000,
|
|
232
|
+
logging_steps=100,
|
|
233
|
+
save_steps=1000,
|
|
234
|
+
bf16=True,
|
|
235
|
+
)
|
|
236
|
+
|
|
237
|
+
# Train
|
|
238
|
+
trainer = MiniLLMTrainer(
|
|
239
|
+
model=student,
|
|
240
|
+
args=training_args,
|
|
241
|
+
train_dataset=train_dataset,
|
|
242
|
+
)
|
|
243
|
+
|
|
244
|
+
trainer.train()
|
|
245
|
+
student.save_pretrained(output_dir)
|
|
246
|
+
|
|
247
|
+
# Usage
|
|
248
|
+
train_minillm(
|
|
249
|
+
teacher_name="meta-llama/Llama-2-70b-hf",
|
|
250
|
+
student_name="meta-llama/Llama-2-7b-hf",
|
|
251
|
+
)
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
## Performance Results
|
|
255
|
+
|
|
256
|
+
**From paper (LLaMA models)**:
|
|
257
|
+
|
|
258
|
+
| Student | Teacher | Method | MT-Bench Score | AlpacaEval |
|
|
259
|
+
|---------|---------|--------|----------------|------------|
|
|
260
|
+
| LLaMA-7B | - | Baseline | 5.2 | 55% |
|
|
261
|
+
| LLaMA-7B | LLaMA-70B | Forward KL | 5.8 | 62% |
|
|
262
|
+
| LLaMA-7B | LLaMA-70B | **MiniLLM (Reverse KL)** | **6.4** | **71%** |
|
|
263
|
+
|
|
264
|
+
**Key findings**:
|
|
265
|
+
- Reverse KL outperforms forward KL by ~10%
|
|
266
|
+
- Distilled 7B model approaches 70B performance
|
|
267
|
+
- Better diversity and generation quality
|
|
268
|
+
|
|
269
|
+
## Comparison: Forward vs Reverse KL
|
|
270
|
+
|
|
271
|
+
### Generation Quality
|
|
272
|
+
|
|
273
|
+
```python
|
|
274
|
+
# Prompt: "Explain quantum computing"
|
|
275
|
+
|
|
276
|
+
# Forward KL (mode-seeking)
|
|
277
|
+
# Student output: "Quantum computing uses quantum bits..."
|
|
278
|
+
# → Safe, generic, one mode
|
|
279
|
+
|
|
280
|
+
# Reverse KL (mode-covering)
|
|
281
|
+
# Student output: Multiple diverse valid explanations
|
|
282
|
+
# → Covers different valid explanations
|
|
283
|
+
# → More creative, diverse
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### When to Use Each
|
|
287
|
+
|
|
288
|
+
**Forward KL**:
|
|
289
|
+
- Classification tasks
|
|
290
|
+
- Single correct answer
|
|
291
|
+
- Need deterministic output
|
|
292
|
+
|
|
293
|
+
**Reverse KL (MiniLLM)**:
|
|
294
|
+
- Generative tasks
|
|
295
|
+
- Multiple valid outputs
|
|
296
|
+
- Need diversity
|
|
297
|
+
- Open-ended generation
|
|
298
|
+
|
|
299
|
+
## Hyperparameters
|
|
300
|
+
|
|
301
|
+
### Temperature
|
|
302
|
+
|
|
303
|
+
```python
|
|
304
|
+
# Temperature for both teacher and student
|
|
305
|
+
|
|
306
|
+
T = 1.0 # Standard (from paper)
|
|
307
|
+
T = 0.8 # Sharper (less diversity)
|
|
308
|
+
T = 1.2 # Softer (more diversity)
|
|
309
|
+
|
|
310
|
+
# Rule: Use T=1.0 for MiniLLM (higher temps help mode-covering)
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
### Learning Rate
|
|
314
|
+
|
|
315
|
+
```python
|
|
316
|
+
# MiniLLM uses higher LR than standard distillation
|
|
317
|
+
|
|
318
|
+
lr_forward_kl = 2e-5 # Standard distillation
|
|
319
|
+
lr_minillm = 5e-5 # MiniLLM (can handle higher LR)
|
|
320
|
+
|
|
321
|
+
# Reason: Reverse KL has better gradient properties
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
## Limitations
|
|
325
|
+
|
|
326
|
+
1. **Computational cost**: Requires sampling from teacher during training
|
|
327
|
+
2. **Implementation complexity**: More complex than standard distillation
|
|
328
|
+
3. **Memory**: Need to store teacher samples
|
|
329
|
+
|
|
330
|
+
## Resources
|
|
331
|
+
|
|
332
|
+
- **Paper**: https://arxiv.org/abs/2306.08543
|
|
333
|
+
- **GitHub**: https://github.com/microsoft/LMOps/tree/main/minillm
|
|
334
|
+
- **Blog**: https://www.microsoft.com/en-us/research/blog/minillm-small-language-models-via-large-language-model-distillation/
|