@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Unsloth Documentation
|
|
2
|
+
|
|
3
|
+
## Unsloth Documentation
|
|
4
|
+
|
|
5
|
+
- [Unsloth Docs](/get-started/unsloth-docs.md): Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.
|
|
6
|
+
- [Beginner? Start here!](/get-started/beginner-start-here.md)
|
|
7
|
+
- [Unsloth Requirements](/get-started/beginner-start-here/unsloth-requirements.md): Here are Unsloth's requirements including system and GPU VRAM requirements.
|
|
8
|
+
- [FAQ + Is Fine-tuning Right For Me?](/get-started/beginner-start-here/faq-+-is-fine-tuning-right-for-me.md): If you're stuck on if fine-tuning is right for you, see here! Learn about fine-tuning misconceptions, how it compared to RAG and more:
|
|
9
|
+
- [Unsloth Notebooks](/get-started/unsloth-notebooks.md): Explore our catalog of Unsloth notebooks:
|
|
10
|
+
- [All Our Models](/get-started/all-our-models.md)
|
|
11
|
+
- [Install & Update](/get-started/install-and-update.md): Learn to install Unsloth locally or online.
|
|
12
|
+
- [Updating](/get-started/install-and-update/updating.md): To update or use an old version of Unsloth, follow the steps below:
|
|
13
|
+
- [Pip Install](/get-started/install-and-update/pip-install.md): To install Unsloth locally via Pip, follow the steps below:
|
|
14
|
+
- [Docker](/get-started/install-and-update/docker.md): Install Unsloth using our official Docker container
|
|
15
|
+
- [Windows Installation](/get-started/install-and-update/windows-installation.md): See how to install Unsloth on Windows with or without WSL.
|
|
16
|
+
- [AMD](/get-started/install-and-update/amd.md): Fine-tune with Unsloth on AMD GPUs.
|
|
17
|
+
- [Conda Install](/get-started/install-and-update/conda-install.md): To install Unsloth locally on Conda, follow the steps below:
|
|
18
|
+
- [Google Colab](/get-started/install-and-update/google-colab.md): To install and run Unsloth on Google Colab, follow the steps below:
|
|
19
|
+
- [Fine-tuning LLMs Guide](/get-started/fine-tuning-llms-guide.md): Learn all the basics and best practices of fine-tuning. Beginner-friendly.
|
|
20
|
+
- [What Model Should I Use?](/get-started/fine-tuning-llms-guide/what-model-should-i-use.md)
|
|
21
|
+
- [Datasets Guide](/get-started/fine-tuning-llms-guide/datasets-guide.md): Learn how to create & prepare a dataset for fine-tuning.
|
|
22
|
+
- [LoRA Hyperparameters Guide](/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide.md): Optimal lora rank. alpha, number of epochs, batch size & gradient accumulation, QLoRA vs LoRA, target modules and more!
|
|
23
|
+
- [Tutorial: How to Finetune Llama-3 and Use In Ollama](/get-started/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama.md): Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
|
|
24
|
+
- [Reinforcement Learning (RL) Guide](/get-started/reinforcement-learning-rl-guide.md): Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.
|
|
25
|
+
- [Tutorial: Train your own Reasoning model with GRPO](/get-started/reinforcement-learning-rl-guide/tutorial-train-your-own-reasoning-model-with-grpo.md): Beginner's Guide to transforming a model like Llama 3.1 (8B) into a reasoning model by using Unsloth and GRPO.
|
|
26
|
+
- [Advanced RL Documentation](/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation.md): Advanced documentation settings when using Unsloth with GRPO.
|
|
27
|
+
- [Memory Efficient RL](/get-started/reinforcement-learning-rl-guide/memory-efficient-rl.md)
|
|
28
|
+
- [RL Reward Hacking](/get-started/reinforcement-learning-rl-guide/rl-reward-hacking.md): Learn what is Reward Hacking in Reinforcement Learning and how to counter it.
|
|
29
|
+
- [GSPO Reinforcement Learning](/get-started/reinforcement-learning-rl-guide/gspo-reinforcement-learning.md): Train with GSPO (Group Sequence Policy Optimization) RL in Unsloth.
|
|
30
|
+
- [Reinforcement Learning - DPO, ORPO & KTO](/get-started/reinforcement-learning-rl-guide/reinforcement-learning-dpo-orpo-and-kto.md): To use the reward modelling functions for DPO, GRPO, ORPO or KTO with Unsloth, follow the steps below:
|
|
31
|
+
- [DeepSeek-OCR: How to Run & Fine-tune](/new/deepseek-ocr-how-to-run-and-fine-tune.md): Guide on how to run and fine-tune DeepSeek-OCR locally.
|
|
32
|
+
- [How to Fine-tune LLMs with Unsloth & Docker](/new/how-to-fine-tune-llms-with-unsloth-and-docker.md): Learn how to fine-tune LLMs or do Reinforcement Learning (RL) with Unsloth's Docker image.
|
|
33
|
+
- [Vision Reinforcement Learning (VLM RL)](/new/vision-reinforcement-learning-vlm-rl.md): Train Vision/multimodal models via GRPO and RL with Unsloth!
|
|
34
|
+
- [gpt-oss Reinforcement Learning](/new/gpt-oss-reinforcement-learning.md)
|
|
35
|
+
- [Tutorial: How to Train gpt-oss with RL](/new/gpt-oss-reinforcement-learning/tutorial-how-to-train-gpt-oss-with-rl.md): Learn to train OpenAI gpt-oss with GRPO to autonomously beat 2048 locally or on Colab.
|
|
36
|
+
- [Unsloth Dynamic GGUFs on Aider Polyglot](/new/unsloth-dynamic-ggufs-on-aider-polyglot.md): Performance of Unsloth Dynamic GGUFs on Aider Polyglot Benchmarks
|
|
37
|
+
- [Qwen3-VL: How to Run & Fine-tune](/models/qwen3-vl-how-to-run-and-fine-tune.md): Learn to fine-tune and run Qwen3-VL locally with Unsloth.
|
|
38
|
+
- [gpt-oss: How to Run & Fine-tune](/models/gpt-oss-how-to-run-and-fine-tune.md): Run & fine-tune OpenAI's new open-source models!
|
|
39
|
+
- [Tutorial: How to Fine-tune gpt-oss](/models/gpt-oss-how-to-run-and-fine-tune/tutorial-how-to-fine-tune-gpt-oss.md): Learn step-by-step how to train OpenAI gpt-oss locally with Unsloth.
|
|
40
|
+
- [Long Context gpt-oss Training](/models/gpt-oss-how-to-run-and-fine-tune/long-context-gpt-oss-training.md)
|
|
41
|
+
- [GLM-4.6: How to Run Locally](/models/glm-4.6-how-to-run-locally.md): A guide on how to run Z.ai's new GLM-4.6 model on your own local device!
|
|
42
|
+
- [IBM Granite 4.0](/models/ibm-granite-4.0.md): How to run IBM Granite-4.0 with Unsloth GGUFs on llama.cpp, Ollama and how to fine-tune!
|
|
43
|
+
- [DeepSeek-V3.1: How to Run Locally](/models/deepseek-v3.1-how-to-run-locally.md): A guide on how to run DeepSeek-V3.1 and Terminus on your own local device!
|
|
44
|
+
- [Qwen3-Coder: How to Run Locally](/models/qwen3-coder-how-to-run-locally.md): Run Qwen3-Coder-30B-A3B-Instruct and 480B-A35B locally with Unsloth Dynamic quants.
|
|
45
|
+
- [Gemma 3: How to Run & Fine-tune](/models/gemma-3-how-to-run-and-fine-tune.md): How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth!
|
|
46
|
+
- [Gemma 3n: How to Run & Fine-tune](/models/gemma-3-how-to-run-and-fine-tune/gemma-3n-how-to-run-and-fine-tune.md): Run Google's new Gemma 3n locally with Dynamic GGUFs on llama.cpp, Ollama, Open WebUI and fine-tune with Unsloth!
|
|
47
|
+
- [Qwen3: How to Run & Fine-tune](/models/qwen3-how-to-run-and-fine-tune.md): Learn to run & fine-tune Qwen3 locally with Unsloth + our Dynamic 2.0 quants
|
|
48
|
+
- [Qwen3-2507](/models/qwen3-how-to-run-and-fine-tune/qwen3-2507.md): Run Qwen3-30B-A3B-2507 and 235B-A22B Thinking and Instruct versions locally on your device!
|
|
49
|
+
- [Tutorials: How To Fine-tune & Run LLMs](/models/tutorials-how-to-fine-tune-and-run-llms.md): Learn how to run and fine-tune models for optimal performance 100% locally with Unsloth.
|
|
50
|
+
- [DeepSeek-R1-0528: How to Run Locally](/models/tutorials-how-to-fine-tune-and-run-llms/deepseek-r1-0528-how-to-run-locally.md): A guide on how to run DeepSeek-R1-0528 including Qwen3 on your own local device!
|
|
51
|
+
- [Magistral: How to Run & Fine-tune](/models/tutorials-how-to-fine-tune-and-run-llms/magistral-how-to-run-and-fine-tune.md): Meet Magistral - Mistral's new reasoning models.
|
|
52
|
+
- [Llama 4: How to Run & Fine-tune](/models/tutorials-how-to-fine-tune-and-run-llms/llama-4-how-to-run-and-fine-tune.md): How to run Llama 4 locally using our dynamic GGUFs which recovers accuracy compared to standard quantization.
|
|
53
|
+
- [Kimi K2: How to Run Locally](/models/tutorials-how-to-fine-tune-and-run-llms/kimi-k2-how-to-run-locally.md): Guide on running Kimi K2 and Kimi-K2-Instruct-0905 on your own local device!
|
|
54
|
+
- [Grok 2](/models/tutorials-how-to-fine-tune-and-run-llms/grok-2.md): Run xAI's Grok 2 model locally!
|
|
55
|
+
- [Devstral: How to Run & Fine-tune](/models/tutorials-how-to-fine-tune-and-run-llms/devstral-how-to-run-and-fine-tune.md): Run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505.
|
|
56
|
+
- [DeepSeek-V3-0324: How to Run Locally](/models/tutorials-how-to-fine-tune-and-run-llms/deepseek-v3-0324-how-to-run-locally.md): How to run DeepSeek-V3-0324 locally using our dynamic quants which recovers accuracy
|
|
57
|
+
- [DeepSeek-R1: How to Run Locally](/models/tutorials-how-to-fine-tune-and-run-llms/deepseek-r1-how-to-run-locally.md): A guide on how you can run our 1.58-bit Dynamic Quants for DeepSeek-R1 using llama.cpp.
|
|
58
|
+
- [DeepSeek-R1 Dynamic 1.58-bit](/models/tutorials-how-to-fine-tune-and-run-llms/deepseek-r1-how-to-run-locally/deepseek-r1-dynamic-1.58-bit.md): See performance comparison tables for Unsloth's Dynamic GGUF Quants vs Standard IMatrix Quants.
|
|
59
|
+
- [QwQ-32B: How to Run effectively](/models/tutorials-how-to-fine-tune-and-run-llms/qwq-32b-how-to-run-effectively.md): How to run QwQ-32B effectively with our bug fixes and without endless generations + GGUFs.
|
|
60
|
+
- [Phi-4 Reasoning: How to Run & Fine-tune](/models/tutorials-how-to-fine-tune-and-run-llms/phi-4-reasoning-how-to-run-and-fine-tune.md): Learn to run & fine-tune Phi-4 reasoning models locally with Unsloth + our Dynamic 2.0 quants
|
|
61
|
+
- [Running & Saving Models](/basics/running-and-saving-models.md): Learn how to save your finetuned model so you can run it in your favorite inference engine.
|
|
62
|
+
- [Saving to GGUF](/basics/running-and-saving-models/saving-to-gguf.md): Saving models to 16bit for GGUF so you can use it for Ollama, Jan AI, Open WebUI and more!
|
|
63
|
+
- [Saving to Ollama](/basics/running-and-saving-models/saving-to-ollama.md)
|
|
64
|
+
- [Saving to vLLM for deployment](/basics/running-and-saving-models/saving-to-vllm-for-deployment.md): Saving models to 16bit for vLLM deployment and serving
|
|
65
|
+
- [Saving to SGLang for deployment](/basics/running-and-saving-models/saving-to-sglang-for-deployment.md): Saving models to 16bit for SGLang for deployment and serving
|
|
66
|
+
- [Unsloth Inference](/basics/running-and-saving-models/unsloth-inference.md): Learn how to run your finetuned model with Unsloth's faster inference.
|
|
67
|
+
- [Troubleshooting Inference](/basics/running-and-saving-models/troubleshooting-inference.md): If you're experiencing issues when running or saving your model.
|
|
68
|
+
- [vLLM Engine Arguments](/basics/running-and-saving-models/vllm-engine-arguments.md)
|
|
69
|
+
- [LoRA Hot Swapping Guide](/basics/running-and-saving-models/lora-hot-swapping-guide.md)
|
|
70
|
+
- [Text-to-Speech (TTS) Fine-tuning](/basics/text-to-speech-tts-fine-tuning.md): Learn how to to fine-tune TTS & STT voice models with Unsloth.
|
|
71
|
+
- [Unsloth Dynamic 2.0 GGUFs](/basics/unsloth-dynamic-2.0-ggufs.md): A big new upgrade to our Dynamic Quants!
|
|
72
|
+
- [Vision Fine-tuning](/basics/vision-fine-tuning.md): Learn how to fine-tune vision/multimodal LLMs with Unsloth
|
|
73
|
+
- [Fine-tuning LLMs with NVIDIA DGX Spark and Unsloth](/basics/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth.md): Tutorial on how to fine-tune and do reinforcement learning (RL) with OpenAI gpt-oss on NVIDIA DGX Spark.
|
|
74
|
+
- [Fine-tuning LLMs with Blackwell, RTX 50 series & Unsloth](/basics/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth.md): Learn how to fine-tune LLMs on NVIDIA's Blackwell RTX 50 series and B200 GPUs with our step-by-step guide.
|
|
75
|
+
- [Multi-GPU Training with Unsloth](/basics/multi-gpu-training-with-unsloth.md): Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.
|
|
76
|
+
- [Finetuning from Last Checkpoint](/basics/finetuning-from-last-checkpoint.md): Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
|
|
77
|
+
- [Troubleshooting & FAQs](/basics/troubleshooting-and-faqs.md): Tips to solve issues, and frequently asked questions.
|
|
78
|
+
- [Chat Templates](/basics/chat-templates.md): Learn the fundamentals and customization options of chat templates, including Conversational, ChatML, ShareGPT, Alpaca formats, and more!
|
|
79
|
+
- [Quantization-Aware Training (QAT)](/basics/quantization-aware-training-qat.md): Quantize models to 4-bit with Unsloth and PyTorch to recover accuracy.
|
|
80
|
+
- [Unsloth Environment Flags](/basics/unsloth-environment-flags.md): Advanced flags which might be useful if you see breaking finetunes, or you want to turn stuff off.
|
|
81
|
+
- [Continued Pretraining](/basics/continued-pretraining.md): AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
|
|
82
|
+
- [Unsloth Benchmarks](/basics/unsloth-benchmarks.md): Unsloth recorded benchmarks on NVIDIA GPUs.
|
|
@@ -0,0 +1,391 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verl-rl-training
|
|
3
|
+
description: Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Reinforcement Learning, RLHF, GRPO, PPO, Post-Training, Distributed Training]
|
|
8
|
+
dependencies: [verl>=0.3.0, torch>=2.0.0, ray>=2.41.0, vllm>=0.8.2, transformers>=4.40.0]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# verl: Volcano Engine Reinforcement Learning for LLMs
|
|
12
|
+
|
|
13
|
+
verl is a flexible, efficient, and production-ready RL training library for large language models from ByteDance's Seed team. It implements the HybridFlow framework (EuroSys 2025) and powers models like Doubao-1.5-pro achieving O1-level performance on math benchmarks.
|
|
14
|
+
|
|
15
|
+
## When to Use verl
|
|
16
|
+
|
|
17
|
+
**Choose verl when you need:**
|
|
18
|
+
- Production-ready RL training at scale (tested up to 671B parameters)
|
|
19
|
+
- Flexibility to swap backends (FSDP ↔ Megatron-LM ↔ vLLM ↔ SGLang)
|
|
20
|
+
- Support for multiple RL algorithms (PPO, GRPO, RLOO, REINFORCE++, DAPO)
|
|
21
|
+
- Multi-turn rollout with tool calling for agentic workflows
|
|
22
|
+
- Vision-language model RL training
|
|
23
|
+
|
|
24
|
+
**Consider alternatives when:**
|
|
25
|
+
- You need Megatron-native training → use **slime** or **miles**
|
|
26
|
+
- You want PyTorch-native abstractions with Monarch → use **torchforge**
|
|
27
|
+
- You only need simple SFT/DPO → use **TRL** or **Axolotl**
|
|
28
|
+
|
|
29
|
+
## Key Features
|
|
30
|
+
|
|
31
|
+
- **Training backends**: FSDP, FSDP2, Megatron-LM
|
|
32
|
+
- **Rollout engines**: vLLM, SGLang, HuggingFace Transformers
|
|
33
|
+
- **Algorithms**: PPO, GRPO, DAPO, RLOO, ReMax, REINFORCE++, SPIN, SPPO
|
|
34
|
+
- **Models**: Qwen-3, Llama-3.1, DeepSeek, Gemma-2 (0.5B to 671B)
|
|
35
|
+
- **Advanced**: LoRA RL, sequence parallelism, expert parallelism, multi-turn tools
|
|
36
|
+
|
|
37
|
+
## Installation
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Option 1: pip install
|
|
41
|
+
pip install verl[vllm] # or verl[sglang] for SGLang backend
|
|
42
|
+
|
|
43
|
+
# Option 2: Docker (recommended for production)
|
|
44
|
+
docker pull verlai/verl:vllm011.latest
|
|
45
|
+
|
|
46
|
+
# Option 3: From source
|
|
47
|
+
git clone https://github.com/volcengine/verl.git
|
|
48
|
+
cd verl && pip install -e .[vllm,math]
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Quick Start: GRPO Training
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
python3 -m verl.trainer.main_ppo \
|
|
55
|
+
algorithm.adv_estimator=grpo \
|
|
56
|
+
data.train_files=~/data/gsm8k/train.parquet \
|
|
57
|
+
actor_rollout_ref.model.path=Qwen/Qwen2.5-7B \
|
|
58
|
+
actor_rollout_ref.rollout.n=8 \
|
|
59
|
+
actor_rollout_ref.actor.use_kl_loss=True \
|
|
60
|
+
trainer.n_gpus_per_node=8
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Core Architecture
|
|
64
|
+
|
|
65
|
+
verl uses a **HybridFlow** programming model separating control flow from computation:
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
┌─────────────────────────────────────────────────────────┐
|
|
69
|
+
│ Single-Process Controller (Ray) │
|
|
70
|
+
│ - Synthetic Sciencestes: rollout → reward → train → sync │
|
|
71
|
+
└─────────────────────┬───────────────────────────────────┘
|
|
72
|
+
│
|
|
73
|
+
┌─────────────────────▼───────────────────────────────────┐
|
|
74
|
+
│ Multi-Process Workers │
|
|
75
|
+
│ ├── ActorRolloutRefWorker (policy + generation) │
|
|
76
|
+
│ ├── CriticWorker (value estimation, PPO only) │
|
|
77
|
+
│ └── RewardManager (model-based or rule-based rewards) │
|
|
78
|
+
└─────────────────────────────────────────────────────────┘
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Workflow 1: Math Reasoning with GRPO
|
|
84
|
+
|
|
85
|
+
Use this workflow for training reasoning models on math tasks like GSM8K or MATH.
|
|
86
|
+
|
|
87
|
+
### Prerequisites Checklist
|
|
88
|
+
- [ ] GPU cluster with 8+ GPUs (H100 recommended)
|
|
89
|
+
- [ ] Dataset in parquet format with `prompt` and `reward_model` columns
|
|
90
|
+
- [ ] Base model from HuggingFace Hub
|
|
91
|
+
|
|
92
|
+
### Step 1: Prepare Dataset
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
import pandas as pd
|
|
96
|
+
|
|
97
|
+
data = [
|
|
98
|
+
{
|
|
99
|
+
"prompt": [{"role": "user", "content": "What is 15 + 27?"}],
|
|
100
|
+
"reward_model": {"ground_truth": "42"}
|
|
101
|
+
},
|
|
102
|
+
# ... more examples
|
|
103
|
+
]
|
|
104
|
+
df = pd.DataFrame(data)
|
|
105
|
+
df.to_parquet("train.parquet")
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Step 2: Define Reward Function
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
# reward_function.py
|
|
112
|
+
import re
|
|
113
|
+
|
|
114
|
+
def compute_reward(responses, ground_truths):
|
|
115
|
+
rewards = []
|
|
116
|
+
for response, gt in zip(responses, ground_truths):
|
|
117
|
+
# Extract answer from response
|
|
118
|
+
match = re.search(r'\\boxed{([^}]+)}', response)
|
|
119
|
+
if match and match.group(1).strip() == gt.strip():
|
|
120
|
+
rewards.append(1.0)
|
|
121
|
+
else:
|
|
122
|
+
rewards.append(0.0)
|
|
123
|
+
return rewards
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Step 3: Create Training Config
|
|
127
|
+
|
|
128
|
+
```yaml
|
|
129
|
+
# config/grpo_math.yaml
|
|
130
|
+
algorithm:
|
|
131
|
+
adv_estimator: grpo
|
|
132
|
+
gamma: 1.0
|
|
133
|
+
lam: 1.0
|
|
134
|
+
|
|
135
|
+
data:
|
|
136
|
+
train_files: /path/to/train.parquet
|
|
137
|
+
val_files: /path/to/val.parquet
|
|
138
|
+
train_batch_size: 256
|
|
139
|
+
max_prompt_length: 512
|
|
140
|
+
max_response_length: 2048
|
|
141
|
+
|
|
142
|
+
actor_rollout_ref:
|
|
143
|
+
model:
|
|
144
|
+
path: Qwen/Qwen2.5-7B-Instruct
|
|
145
|
+
actor:
|
|
146
|
+
use_kl_loss: true
|
|
147
|
+
kl_loss_coef: 0.001
|
|
148
|
+
ppo_mini_batch_size: 64
|
|
149
|
+
rollout:
|
|
150
|
+
name: vllm
|
|
151
|
+
n: 8 # samples per prompt
|
|
152
|
+
temperature: 0.7
|
|
153
|
+
top_p: 0.95
|
|
154
|
+
|
|
155
|
+
trainer:
|
|
156
|
+
total_epochs: 3
|
|
157
|
+
n_gpus_per_node: 8
|
|
158
|
+
save_freq: 100
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Step 4: Launch Training
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
python3 -m verl.trainer.main_ppo \
|
|
165
|
+
--config-path config \
|
|
166
|
+
--config-name grpo_math \
|
|
167
|
+
trainer.experiment_name=grpo_math_qwen7b
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Step 5: Monitor and Validate
|
|
171
|
+
- [ ] Check WandB/TensorBoard for loss curves
|
|
172
|
+
- [ ] Verify reward is increasing over steps
|
|
173
|
+
- [ ] Run evaluation on held-out test set
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Workflow 2: PPO with Critic Model
|
|
178
|
+
|
|
179
|
+
Use this workflow when you need value-based advantage estimation (GAE).
|
|
180
|
+
|
|
181
|
+
### Key Differences from GRPO
|
|
182
|
+
- Requires separate critic model
|
|
183
|
+
- Uses Generalized Advantage Estimation (GAE)
|
|
184
|
+
- Better for tasks with dense rewards
|
|
185
|
+
|
|
186
|
+
### Configuration
|
|
187
|
+
|
|
188
|
+
```yaml
|
|
189
|
+
algorithm:
|
|
190
|
+
adv_estimator: gae # Use GAE instead of GRPO
|
|
191
|
+
gamma: 0.99
|
|
192
|
+
lam: 0.95
|
|
193
|
+
|
|
194
|
+
critic:
|
|
195
|
+
model:
|
|
196
|
+
path: Qwen/Qwen2.5-7B-Instruct # Can be same or different from actor
|
|
197
|
+
ppo_mini_batch_size: 64
|
|
198
|
+
|
|
199
|
+
actor_rollout_ref:
|
|
200
|
+
actor:
|
|
201
|
+
use_kl_loss: true
|
|
202
|
+
kl_loss_coef: 0.02
|
|
203
|
+
clip_ratio: 0.2 # PPO clipping
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### Launch with Critic
|
|
207
|
+
|
|
208
|
+
```bash
|
|
209
|
+
python3 -m verl.trainer.main_ppo \
|
|
210
|
+
algorithm.adv_estimator=gae \
|
|
211
|
+
critic.model.path=Qwen/Qwen2.5-7B-Instruct \
|
|
212
|
+
trainer.n_gpus_per_node=8
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## Workflow 3: Large-Scale Training with Megatron
|
|
218
|
+
|
|
219
|
+
Use this workflow for models >70B parameters or when you need expert parallelism.
|
|
220
|
+
|
|
221
|
+
### Prerequisites
|
|
222
|
+
- [ ] Install Megatron-LM bridge: `pip install mbridge`
|
|
223
|
+
- [ ] Convert model to Megatron format
|
|
224
|
+
- [ ] Multi-node cluster with NVLink/InfiniBand
|
|
225
|
+
|
|
226
|
+
### Configuration for 70B+ Models
|
|
227
|
+
|
|
228
|
+
```yaml
|
|
229
|
+
actor_rollout_ref:
|
|
230
|
+
model:
|
|
231
|
+
path: /path/to/megatron/checkpoint
|
|
232
|
+
backend: megatron
|
|
233
|
+
actor:
|
|
234
|
+
strategy: megatron
|
|
235
|
+
tensor_model_parallel_size: 8
|
|
236
|
+
pipeline_model_parallel_size: 2
|
|
237
|
+
rollout:
|
|
238
|
+
name: vllm
|
|
239
|
+
tensor_parallel_size: 8
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### Launch Multi-Node
|
|
243
|
+
|
|
244
|
+
```bash
|
|
245
|
+
# On head node
|
|
246
|
+
ray start --head --port=6379
|
|
247
|
+
|
|
248
|
+
# On worker nodes
|
|
249
|
+
ray start --address='head_ip:6379'
|
|
250
|
+
|
|
251
|
+
# Launch training
|
|
252
|
+
python3 -m verl.trainer.main_ppo \
|
|
253
|
+
trainer.nnodes=4 \
|
|
254
|
+
trainer.n_gpus_per_node=8
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Configuration Reference
|
|
260
|
+
|
|
261
|
+
### Algorithm Selection
|
|
262
|
+
|
|
263
|
+
| Algorithm | `adv_estimator` | Use Case |
|
|
264
|
+
|-----------|-----------------|----------|
|
|
265
|
+
| GRPO | `grpo` | Critic-free, math/reasoning |
|
|
266
|
+
| PPO/GAE | `gae` | Dense rewards, value estimation |
|
|
267
|
+
| REINFORCE++ | `reinforce_plus_plus` | Variance reduction |
|
|
268
|
+
| RLOO | `rloo` | Leave-one-out baseline |
|
|
269
|
+
| ReMax | `remax` | Maximum reward baseline |
|
|
270
|
+
| OPO | `opo` | Optimal policy optimization |
|
|
271
|
+
|
|
272
|
+
### Key Parameters
|
|
273
|
+
|
|
274
|
+
```yaml
|
|
275
|
+
# Rollout parameters
|
|
276
|
+
actor_rollout_ref.rollout.n: 8 # Samples per prompt
|
|
277
|
+
actor_rollout_ref.rollout.temperature: 0.7 # Sampling temperature
|
|
278
|
+
actor_rollout_ref.rollout.top_p: 0.95 # Nucleus sampling
|
|
279
|
+
|
|
280
|
+
# Training parameters
|
|
281
|
+
actor_rollout_ref.actor.lr: 1e-6 # Learning rate
|
|
282
|
+
actor_rollout_ref.actor.ppo_mini_batch_size: 64
|
|
283
|
+
actor_rollout_ref.actor.clip_ratio: 0.2 # PPO clip range
|
|
284
|
+
|
|
285
|
+
# KL control
|
|
286
|
+
actor_rollout_ref.actor.use_kl_loss: true
|
|
287
|
+
actor_rollout_ref.actor.kl_loss_coef: 0.001
|
|
288
|
+
algorithm.kl_ctrl.target_kl: 0.1 # For adaptive KL control
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## Common Issues and Solutions
|
|
294
|
+
|
|
295
|
+
### Issue: OOM During Rollout
|
|
296
|
+
|
|
297
|
+
**Symptoms**: CUDA out of memory during generation phase
|
|
298
|
+
|
|
299
|
+
**Solutions**:
|
|
300
|
+
```yaml
|
|
301
|
+
# Reduce batch size
|
|
302
|
+
actor_rollout_ref.rollout.log_prob_micro_batch_size: 4
|
|
303
|
+
|
|
304
|
+
# Enable gradient checkpointing
|
|
305
|
+
actor_rollout_ref.model.enable_gradient_checkpointing: true
|
|
306
|
+
|
|
307
|
+
# Use FSDP2 with CPU offloading
|
|
308
|
+
actor_rollout_ref.actor.strategy: fsdp2
|
|
309
|
+
actor_rollout_ref.actor.fsdp_config.offload_policy: true
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
### Issue: Training Instability
|
|
313
|
+
|
|
314
|
+
**Symptoms**: Loss spikes, reward collapse
|
|
315
|
+
|
|
316
|
+
**Solutions**:
|
|
317
|
+
```yaml
|
|
318
|
+
# Reduce learning rate
|
|
319
|
+
actor_rollout_ref.actor.lr: 5e-7
|
|
320
|
+
|
|
321
|
+
# Increase KL penalty
|
|
322
|
+
actor_rollout_ref.actor.kl_loss_coef: 0.01
|
|
323
|
+
|
|
324
|
+
# Enable gradient clipping
|
|
325
|
+
actor_rollout_ref.actor.max_grad_norm: 1.0
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
### Issue: Slow Weight Sync
|
|
329
|
+
|
|
330
|
+
**Symptoms**: Long pauses between rollout and training
|
|
331
|
+
|
|
332
|
+
**Solutions**:
|
|
333
|
+
```bash
|
|
334
|
+
# Use FSDP2 for faster resharding
|
|
335
|
+
actor_rollout_ref.actor.strategy=fsdp2
|
|
336
|
+
|
|
337
|
+
# Enable async weight transfer
|
|
338
|
+
trainer.async_weight_update=true
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
### Issue: vLLM Version Mismatch
|
|
342
|
+
|
|
343
|
+
**Symptoms**: Import errors or generation failures
|
|
344
|
+
|
|
345
|
+
**Solution**: Use compatible versions:
|
|
346
|
+
```bash
|
|
347
|
+
pip install vllm>=0.8.5,<=0.12.0
|
|
348
|
+
# Avoid vLLM 0.7.x (known bugs)
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Advanced Topics
|
|
354
|
+
|
|
355
|
+
### Multi-Turn Tool Calling
|
|
356
|
+
|
|
357
|
+
See [references/multi-turn.md](references/multi-turn.md) for agentic workflows with tool use.
|
|
358
|
+
|
|
359
|
+
### Vision-Language Models
|
|
360
|
+
|
|
361
|
+
```yaml
|
|
362
|
+
actor_rollout_ref:
|
|
363
|
+
model:
|
|
364
|
+
path: Qwen/Qwen2.5-VL-7B-Instruct
|
|
365
|
+
rollout:
|
|
366
|
+
name: vllm
|
|
367
|
+
enable_vision: true
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
### LoRA Training
|
|
371
|
+
|
|
372
|
+
```yaml
|
|
373
|
+
actor_rollout_ref:
|
|
374
|
+
actor:
|
|
375
|
+
lora:
|
|
376
|
+
enabled: true
|
|
377
|
+
r: 16
|
|
378
|
+
alpha: 32
|
|
379
|
+
target_modules: ["q_proj", "v_proj"]
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
---
|
|
383
|
+
|
|
384
|
+
## Resources
|
|
385
|
+
|
|
386
|
+
- **Documentation**: https://verl.readthedocs.io/
|
|
387
|
+
- **Paper**: https://arxiv.org/abs/2409.19256
|
|
388
|
+
- **GitHub**: https://github.com/volcengine/verl
|
|
389
|
+
- **Recipes**: https://github.com/verl-project/verl-recipe (DAPO, GSPO, etc.)
|
|
390
|
+
- **Community**: Slack at verl-project
|
|
391
|
+
|