@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,301 @@
|
|
|
1
|
+
# verl API Reference
|
|
2
|
+
|
|
3
|
+
## Core Classes
|
|
4
|
+
|
|
5
|
+
### RayPPOTrainer
|
|
6
|
+
|
|
7
|
+
The central controller for the training loop. Manages resource allocation and coordinates worker groups.
|
|
8
|
+
|
|
9
|
+
```python
|
|
10
|
+
from verl import RayPPOTrainer
|
|
11
|
+
|
|
12
|
+
trainer = RayPPOTrainer(
|
|
13
|
+
config=config,
|
|
14
|
+
resource_pool_manager=resource_manager,
|
|
15
|
+
ray_worker_group_cls=RayWorkerGroup,
|
|
16
|
+
)
|
|
17
|
+
trainer.init_workers()
|
|
18
|
+
trainer.fit()
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### ResourcePoolManager
|
|
22
|
+
|
|
23
|
+
Manages GPU allocation across different worker groups using Ray PlacementGroups.
|
|
24
|
+
|
|
25
|
+
```python
|
|
26
|
+
from verl.trainer.ppo.resource_pool import ResourcePoolManager
|
|
27
|
+
|
|
28
|
+
manager = ResourcePoolManager(
|
|
29
|
+
resource_pool_spec={
|
|
30
|
+
"actor_rollout_ref": {"gpu": 4},
|
|
31
|
+
"critic": {"gpu": 2},
|
|
32
|
+
}
|
|
33
|
+
)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### RayWorkerGroup
|
|
37
|
+
|
|
38
|
+
Abstraction for distributed method execution. Spawns Ray actors and dispatches method calls.
|
|
39
|
+
|
|
40
|
+
```python
|
|
41
|
+
from verl.trainer.ppo.ray_worker_group import RayWorkerGroup
|
|
42
|
+
|
|
43
|
+
worker_group = RayWorkerGroup(
|
|
44
|
+
num_workers=8,
|
|
45
|
+
worker_cls=ActorRolloutRefWorker,
|
|
46
|
+
resource_pool=pool,
|
|
47
|
+
)
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### ActorRolloutRefWorker
|
|
51
|
+
|
|
52
|
+
Worker class implementing policy training, generation, and reference model computations. Manages hybrid engine mode switching.
|
|
53
|
+
|
|
54
|
+
```python
|
|
55
|
+
# Typically configured via YAML, not instantiated directly
|
|
56
|
+
# See configuration section below
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### RolloutReplica
|
|
60
|
+
|
|
61
|
+
Interface for inference backends with implementations for vLLM, SGLang, TensorRT-LLM, and HuggingFace.
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
from verl.workers.rollout import RolloutReplica
|
|
65
|
+
|
|
66
|
+
# Backend selection via config
|
|
67
|
+
rollout:
|
|
68
|
+
name: vllm # or: sglang, hf, tensorrt-llm
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Configuration Schema
|
|
72
|
+
|
|
73
|
+
### PPO Configuration (`verl/trainer/config/ppo_trainer.yaml`)
|
|
74
|
+
|
|
75
|
+
```yaml
|
|
76
|
+
# Data configuration
|
|
77
|
+
data:
|
|
78
|
+
train_files: /path/to/train.parquet
|
|
79
|
+
val_files: /path/to/val.parquet
|
|
80
|
+
train_batch_size: 256 # Global batch size of prompts
|
|
81
|
+
max_prompt_length: 512
|
|
82
|
+
max_response_length: 2048
|
|
83
|
+
|
|
84
|
+
# Algorithm configuration
|
|
85
|
+
algorithm:
|
|
86
|
+
adv_estimator: gae # gae, grpo, rloo, reinforce_plus_plus
|
|
87
|
+
gamma: 0.99 # Discount factor
|
|
88
|
+
lam: 0.95 # GAE lambda
|
|
89
|
+
use_kl_in_reward: false # Add KL term to reward
|
|
90
|
+
|
|
91
|
+
# Actor configuration
|
|
92
|
+
actor_rollout_ref:
|
|
93
|
+
model:
|
|
94
|
+
path: Qwen/Qwen2.5-7B-Instruct
|
|
95
|
+
backend: fsdp # fsdp, fsdp2, megatron
|
|
96
|
+
actor:
|
|
97
|
+
ppo_mini_batch_size: 64 # Mini-batch for actor updates
|
|
98
|
+
ppo_epochs: 1 # Number of actor update epochs
|
|
99
|
+
clip_ratio: 0.2 # PPO clip range
|
|
100
|
+
use_kl_loss: true # Use KL loss in actor
|
|
101
|
+
kl_loss_coef: 0.001 # KL loss coefficient
|
|
102
|
+
kl_loss_type: low_var # KL divergence calculation method
|
|
103
|
+
loss_agg_mode: token-mean # token-mean or sequence-mean
|
|
104
|
+
gradient_checkpointing: true
|
|
105
|
+
max_grad_norm: 1.0 # Gradient clipping
|
|
106
|
+
lr: 1e-6 # Learning rate
|
|
107
|
+
rollout:
|
|
108
|
+
name: vllm # vllm, sglang, hf
|
|
109
|
+
n: 8 # Samples per prompt
|
|
110
|
+
temperature: 0.7
|
|
111
|
+
top_p: 0.95
|
|
112
|
+
log_prob_micro_batch_size: 8
|
|
113
|
+
|
|
114
|
+
# Critic configuration (PPO only)
|
|
115
|
+
critic:
|
|
116
|
+
model:
|
|
117
|
+
path: Qwen/Qwen2.5-7B-Instruct
|
|
118
|
+
ppo_mini_batch_size: 64
|
|
119
|
+
ppo_epochs: 1 # Defaults to actor epochs
|
|
120
|
+
|
|
121
|
+
# Trainer configuration
|
|
122
|
+
trainer:
|
|
123
|
+
total_epochs: 3
|
|
124
|
+
n_gpus_per_node: 8
|
|
125
|
+
nnodes: 1
|
|
126
|
+
save_freq: 100
|
|
127
|
+
experiment_name: my_experiment
|
|
128
|
+
async_weight_update: false
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### GRPO Configuration (`docs/algo/grpo.md`)
|
|
132
|
+
|
|
133
|
+
```yaml
|
|
134
|
+
algorithm:
|
|
135
|
+
adv_estimator: grpo # Enable GRPO
|
|
136
|
+
gamma: 1.0
|
|
137
|
+
lam: 1.0
|
|
138
|
+
|
|
139
|
+
actor_rollout_ref:
|
|
140
|
+
rollout:
|
|
141
|
+
n: 8 # Must be > 1 for GRPO
|
|
142
|
+
actor:
|
|
143
|
+
use_kl_loss: true # Required for GRPO
|
|
144
|
+
kl_loss_coef: 0.001
|
|
145
|
+
kl_loss_type: low_var # or: k1, k2, k3
|
|
146
|
+
loss_agg_mode: token-mean
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Multi-Turn Configuration (`verl/trainer/config/rollout/rollout.yaml`)
|
|
150
|
+
|
|
151
|
+
```yaml
|
|
152
|
+
actor_rollout_ref:
|
|
153
|
+
rollout:
|
|
154
|
+
name: sglang # Required for multi-turn
|
|
155
|
+
multi_turn:
|
|
156
|
+
enable: true
|
|
157
|
+
tool_config_path: /path/to/tools.yaml
|
|
158
|
+
interaction_config_path: /path/to/interaction.yaml
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
## Reward Functions
|
|
162
|
+
|
|
163
|
+
### Built-in Reward Types
|
|
164
|
+
|
|
165
|
+
```yaml
|
|
166
|
+
# Model-based reward
|
|
167
|
+
reward_model:
|
|
168
|
+
path: OpenRLHF/Llama-3-8b-rm-700k
|
|
169
|
+
|
|
170
|
+
# Custom function-based reward
|
|
171
|
+
custom_reward_function:
|
|
172
|
+
path: /path/to/reward.py
|
|
173
|
+
name: compute_score # Function name, default: compute_score
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Custom Reward Function Signature
|
|
177
|
+
|
|
178
|
+
```python
|
|
179
|
+
# reward.py
|
|
180
|
+
def compute_score(responses: list[str], ground_truths: list[str], **kwargs) -> list[float]:
|
|
181
|
+
"""
|
|
182
|
+
Compute rewards for a batch of responses.
|
|
183
|
+
|
|
184
|
+
Args:
|
|
185
|
+
responses: Generated completions
|
|
186
|
+
ground_truths: Expected answers from data
|
|
187
|
+
**kwargs: Additional metadata
|
|
188
|
+
|
|
189
|
+
Returns:
|
|
190
|
+
List of reward scores (floats)
|
|
191
|
+
"""
|
|
192
|
+
rewards = []
|
|
193
|
+
for response, gt in zip(responses, ground_truths):
|
|
194
|
+
# Your reward logic
|
|
195
|
+
score = 1.0 if correct(response, gt) else 0.0
|
|
196
|
+
rewards.append(score)
|
|
197
|
+
return rewards
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
## Backend-Specific Configuration
|
|
201
|
+
|
|
202
|
+
### FSDP Configuration
|
|
203
|
+
|
|
204
|
+
```yaml
|
|
205
|
+
actor_rollout_ref:
|
|
206
|
+
actor:
|
|
207
|
+
strategy: fsdp
|
|
208
|
+
fsdp_config:
|
|
209
|
+
mixed_precision: bf16
|
|
210
|
+
sharding_strategy: FULL_SHARD
|
|
211
|
+
offload_policy: false
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### FSDP2 Configuration
|
|
215
|
+
|
|
216
|
+
```yaml
|
|
217
|
+
actor_rollout_ref:
|
|
218
|
+
actor:
|
|
219
|
+
strategy: fsdp2
|
|
220
|
+
fsdp_config:
|
|
221
|
+
offload_policy: true # CPU offloading
|
|
222
|
+
reshard_after_forward: true
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Megatron Configuration
|
|
226
|
+
|
|
227
|
+
```yaml
|
|
228
|
+
actor_rollout_ref:
|
|
229
|
+
model:
|
|
230
|
+
backend: megatron
|
|
231
|
+
actor:
|
|
232
|
+
strategy: megatron
|
|
233
|
+
tensor_model_parallel_size: 8
|
|
234
|
+
pipeline_model_parallel_size: 2
|
|
235
|
+
megatron:
|
|
236
|
+
use_mbridge: true # Required for format conversion
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
### vLLM Rollout Configuration
|
|
240
|
+
|
|
241
|
+
```yaml
|
|
242
|
+
actor_rollout_ref:
|
|
243
|
+
rollout:
|
|
244
|
+
name: vllm
|
|
245
|
+
tensor_parallel_size: 2
|
|
246
|
+
gpu_memory_utilization: 0.9
|
|
247
|
+
max_num_seqs: 256
|
|
248
|
+
enforce_eager: false
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
### SGLang Rollout Configuration
|
|
252
|
+
|
|
253
|
+
```yaml
|
|
254
|
+
actor_rollout_ref:
|
|
255
|
+
rollout:
|
|
256
|
+
name: sglang
|
|
257
|
+
tp_size: 2
|
|
258
|
+
mem_fraction_static: 0.8
|
|
259
|
+
context_length: 8192
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
## Algorithm Reference
|
|
263
|
+
|
|
264
|
+
| Algorithm | `adv_estimator` | Requires Critic | Best For |
|
|
265
|
+
|-----------|-----------------|-----------------|----------|
|
|
266
|
+
| PPO | `gae` | Yes | Dense rewards, value estimation |
|
|
267
|
+
| GRPO | `grpo` | No | Sparse rewards, math/reasoning |
|
|
268
|
+
| RLOO | `rloo` | No | Leave-one-out baseline |
|
|
269
|
+
| REINFORCE++ | `reinforce_plus_plus` | No | Variance reduction |
|
|
270
|
+
| DAPO | `dapo` | No | Doubly-adaptive optimization |
|
|
271
|
+
|
|
272
|
+
## Vision-Language Model Support
|
|
273
|
+
|
|
274
|
+
```yaml
|
|
275
|
+
actor_rollout_ref:
|
|
276
|
+
model:
|
|
277
|
+
path: Qwen/Qwen2.5-VL-7B-Instruct
|
|
278
|
+
rollout:
|
|
279
|
+
name: vllm
|
|
280
|
+
enable_vision: true
|
|
281
|
+
max_model_len: 32768
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
## LoRA Configuration
|
|
285
|
+
|
|
286
|
+
```yaml
|
|
287
|
+
actor_rollout_ref:
|
|
288
|
+
actor:
|
|
289
|
+
lora:
|
|
290
|
+
enabled: true
|
|
291
|
+
r: 16
|
|
292
|
+
alpha: 32
|
|
293
|
+
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
|
|
294
|
+
dropout: 0.05
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Resources
|
|
298
|
+
|
|
299
|
+
- Documentation: https://verl.readthedocs.io/
|
|
300
|
+
- GitHub: https://github.com/volcengine/verl
|
|
301
|
+
- Paper: https://arxiv.org/abs/2409.19256 (HybridFlow)
|
|
@@ -0,0 +1,391 @@
|
|
|
1
|
+
# verl Troubleshooting Guide
|
|
2
|
+
|
|
3
|
+
## Common Issues and Solutions
|
|
4
|
+
|
|
5
|
+
### OOM (Out of Memory) Issues
|
|
6
|
+
|
|
7
|
+
#### Issue: OOM During Rollout
|
|
8
|
+
|
|
9
|
+
**Symptoms**: CUDA out of memory during generation phase
|
|
10
|
+
|
|
11
|
+
**Solutions**:
|
|
12
|
+
|
|
13
|
+
1. **Reduce log prob batch size**:
|
|
14
|
+
```yaml
|
|
15
|
+
actor_rollout_ref:
|
|
16
|
+
rollout:
|
|
17
|
+
log_prob_micro_batch_size: 4 # Reduce from 8
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
2. **Enable gradient checkpointing**:
|
|
21
|
+
```yaml
|
|
22
|
+
actor_rollout_ref:
|
|
23
|
+
actor:
|
|
24
|
+
gradient_checkpointing: true
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
3. **Use FSDP2 with CPU offloading**:
|
|
28
|
+
```yaml
|
|
29
|
+
actor_rollout_ref:
|
|
30
|
+
actor:
|
|
31
|
+
strategy: fsdp2
|
|
32
|
+
fsdp_config:
|
|
33
|
+
offload_policy: true
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
4. **Reduce vLLM memory utilization**:
|
|
37
|
+
```yaml
|
|
38
|
+
actor_rollout_ref:
|
|
39
|
+
rollout:
|
|
40
|
+
gpu_memory_utilization: 0.7 # Reduce from 0.9
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
#### Issue: OOM During Training
|
|
44
|
+
|
|
45
|
+
**Symptoms**: CUDA OOM in backward pass
|
|
46
|
+
|
|
47
|
+
**Solutions**:
|
|
48
|
+
|
|
49
|
+
1. **Reduce batch sizes**:
|
|
50
|
+
```yaml
|
|
51
|
+
actor_rollout_ref:
|
|
52
|
+
actor:
|
|
53
|
+
ppo_mini_batch_size: 32 # Reduce from 64
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
2. **Use gradient accumulation**:
|
|
57
|
+
```yaml
|
|
58
|
+
actor_rollout_ref:
|
|
59
|
+
actor:
|
|
60
|
+
gradient_accumulation_steps: 4
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
3. **Enable mixed precision**:
|
|
64
|
+
```yaml
|
|
65
|
+
actor_rollout_ref:
|
|
66
|
+
actor:
|
|
67
|
+
fsdp_config:
|
|
68
|
+
mixed_precision: bf16
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Training Stability Issues
|
|
72
|
+
|
|
73
|
+
#### Issue: Training Instability / Loss Spikes
|
|
74
|
+
|
|
75
|
+
**Symptoms**: Loss spikes, reward collapse, divergence
|
|
76
|
+
|
|
77
|
+
**Solutions**:
|
|
78
|
+
|
|
79
|
+
1. **Reduce learning rate**:
|
|
80
|
+
```yaml
|
|
81
|
+
actor_rollout_ref:
|
|
82
|
+
actor:
|
|
83
|
+
lr: 5e-7 # Reduce from 1e-6
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
2. **Increase KL penalty**:
|
|
87
|
+
```yaml
|
|
88
|
+
actor_rollout_ref:
|
|
89
|
+
actor:
|
|
90
|
+
kl_loss_coef: 0.01 # Increase from 0.001
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
3. **Enable gradient clipping**:
|
|
94
|
+
```yaml
|
|
95
|
+
actor_rollout_ref:
|
|
96
|
+
actor:
|
|
97
|
+
max_grad_norm: 1.0
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
4. **Use smaller PPO clip range**:
|
|
101
|
+
```yaml
|
|
102
|
+
actor_rollout_ref:
|
|
103
|
+
actor:
|
|
104
|
+
clip_ratio: 0.1 # Reduce from 0.2
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
#### Issue: Policy Collapse (Entropy Drops to Zero)
|
|
108
|
+
|
|
109
|
+
**Symptoms**: Model outputs become deterministic, entropy approaches zero
|
|
110
|
+
|
|
111
|
+
**Solutions**:
|
|
112
|
+
|
|
113
|
+
1. **Increase temperature during rollout**:
|
|
114
|
+
```yaml
|
|
115
|
+
actor_rollout_ref:
|
|
116
|
+
rollout:
|
|
117
|
+
temperature: 0.9 # Increase from 0.7
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
2. **Add entropy bonus**:
|
|
121
|
+
```yaml
|
|
122
|
+
algorithm:
|
|
123
|
+
entropy_coef: 0.01
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
3. **Reduce KL penalty**:
|
|
127
|
+
```yaml
|
|
128
|
+
actor_rollout_ref:
|
|
129
|
+
actor:
|
|
130
|
+
kl_loss_coef: 0.0001 # Reduce
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Weight Synchronization Issues
|
|
134
|
+
|
|
135
|
+
#### Issue: Slow Weight Sync
|
|
136
|
+
|
|
137
|
+
**Symptoms**: Long pauses between rollout and training phases
|
|
138
|
+
|
|
139
|
+
**Solutions**:
|
|
140
|
+
|
|
141
|
+
1. **Use FSDP2 for faster resharding**:
|
|
142
|
+
```yaml
|
|
143
|
+
actor_rollout_ref:
|
|
144
|
+
actor:
|
|
145
|
+
strategy: fsdp2
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
2. **Enable async weight transfer**:
|
|
149
|
+
```yaml
|
|
150
|
+
trainer:
|
|
151
|
+
async_weight_update: true
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
3. **Reduce sync frequency**:
|
|
155
|
+
```yaml
|
|
156
|
+
trainer:
|
|
157
|
+
weight_sync_interval: 2 # Sync every 2 steps
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
#### Issue: Weight Sync Timeout
|
|
161
|
+
|
|
162
|
+
**Symptoms**: Ray actor timeouts during weight synchronization
|
|
163
|
+
|
|
164
|
+
**Solutions**:
|
|
165
|
+
|
|
166
|
+
1. **Increase Ray timeout**:
|
|
167
|
+
```python
|
|
168
|
+
import ray
|
|
169
|
+
ray.init(num_gpus=8, timeout=3600) # 1 hour timeout
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
2. **Use colocated mode** (if memory allows):
|
|
173
|
+
```yaml
|
|
174
|
+
trainer:
|
|
175
|
+
colocate_actor_ref: true
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### vLLM Version Issues
|
|
179
|
+
|
|
180
|
+
#### Issue: vLLM Import Errors or Generation Failures
|
|
181
|
+
|
|
182
|
+
**Symptoms**: Import errors, generation hangs, incorrect outputs
|
|
183
|
+
|
|
184
|
+
**Solutions**:
|
|
185
|
+
|
|
186
|
+
1. **Use compatible vLLM version**:
|
|
187
|
+
```bash
|
|
188
|
+
pip install vllm>=0.8.2,<=0.12.0
|
|
189
|
+
# Avoid vLLM 0.7.x (known bugs)
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
2. **For vLLM 0.8.x issues**:
|
|
193
|
+
```yaml
|
|
194
|
+
actor_rollout_ref:
|
|
195
|
+
rollout:
|
|
196
|
+
enforce_eager: true # Disable CUDA graphs
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
3. **Check CUDA version compatibility**:
|
|
200
|
+
```bash
|
|
201
|
+
# vLLM 0.11+ requires CUDA 12.1+
|
|
202
|
+
nvidia-smi # Check CUDA version
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### Ray Issues
|
|
206
|
+
|
|
207
|
+
#### Issue: Ray Cluster Connection Failures
|
|
208
|
+
|
|
209
|
+
**Symptoms**: Cannot connect to Ray cluster
|
|
210
|
+
|
|
211
|
+
**Solutions**:
|
|
212
|
+
|
|
213
|
+
1. **Check Ray head node**:
|
|
214
|
+
```bash
|
|
215
|
+
ray status
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
2. **Restart Ray cluster**:
|
|
219
|
+
```bash
|
|
220
|
+
ray stop
|
|
221
|
+
ray start --head --port=6379 --num-gpus=8
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
3. **Verify network connectivity**:
|
|
225
|
+
```bash
|
|
226
|
+
ping head_node_ip
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
#### Issue: Ray Actor OOM
|
|
230
|
+
|
|
231
|
+
**Symptoms**: Ray actors killed due to OOM
|
|
232
|
+
|
|
233
|
+
**Solutions**:
|
|
234
|
+
|
|
235
|
+
1. **Increase Ray object store memory**:
|
|
236
|
+
```bash
|
|
237
|
+
ray start --head --object-store-memory=10000000000 # 10GB
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
2. **Enable spilling to disk**:
|
|
241
|
+
```bash
|
|
242
|
+
export RAY_object_spilling_config='{"type":"filesystem","params":{"directory_path":"/tmp/ray_spill"}}'
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
### Multi-Node Issues
|
|
246
|
+
|
|
247
|
+
#### Issue: NCCL Timeout
|
|
248
|
+
|
|
249
|
+
**Symptoms**: NCCL operations timeout on multi-node
|
|
250
|
+
|
|
251
|
+
**Solutions**:
|
|
252
|
+
|
|
253
|
+
1. **Set NCCL environment variables**:
|
|
254
|
+
```bash
|
|
255
|
+
export NCCL_DEBUG=INFO
|
|
256
|
+
export NCCL_SOCKET_IFNAME=eth0
|
|
257
|
+
export NCCL_IB_DISABLE=0 # Enable InfiniBand if available
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
2. **Increase NCCL timeout**:
|
|
261
|
+
```bash
|
|
262
|
+
export NCCL_TIMEOUT=1800 # 30 minutes
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
3. **Check network interface**:
|
|
266
|
+
```bash
|
|
267
|
+
ifconfig # Verify correct interface
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
#### Issue: DeepSpeed GPU Index Out of Range
|
|
271
|
+
|
|
272
|
+
**Symptoms**: "GPU index out of range" error with DeepSpeed
|
|
273
|
+
|
|
274
|
+
**Solutions**:
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
export RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
### Data Issues
|
|
281
|
+
|
|
282
|
+
#### Issue: Empty Batches
|
|
283
|
+
|
|
284
|
+
**Symptoms**: Training receives empty batches
|
|
285
|
+
|
|
286
|
+
**Solutions**:
|
|
287
|
+
|
|
288
|
+
1. **Verify data format**:
|
|
289
|
+
```python
|
|
290
|
+
import pandas as pd
|
|
291
|
+
df = pd.read_parquet("train.parquet")
|
|
292
|
+
print(df.columns) # Should include 'prompt', 'reward_model'
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
2. **Check data loading**:
|
|
296
|
+
```yaml
|
|
297
|
+
data:
|
|
298
|
+
train_files: /absolute/path/to/train.parquet # Use absolute path
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
#### Issue: Tokenization Errors
|
|
302
|
+
|
|
303
|
+
**Symptoms**: Tokenizer errors, sequence length mismatches
|
|
304
|
+
|
|
305
|
+
**Solutions**:
|
|
306
|
+
|
|
307
|
+
1. **Set padding token**:
|
|
308
|
+
```python
|
|
309
|
+
tokenizer.pad_token = tokenizer.eos_token
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
2. **Verify max length configuration**:
|
|
313
|
+
```yaml
|
|
314
|
+
data:
|
|
315
|
+
max_prompt_length: 512
|
|
316
|
+
max_response_length: 2048
|
|
317
|
+
# Total should not exceed model's max length
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
### Megatron-Specific Issues
|
|
321
|
+
|
|
322
|
+
#### Issue: Megatron Checkpoint Loading Fails
|
|
323
|
+
|
|
324
|
+
**Symptoms**: Cannot load Megatron checkpoints
|
|
325
|
+
|
|
326
|
+
**Solutions**:
|
|
327
|
+
|
|
328
|
+
1. **Enable mbridge conversion**:
|
|
329
|
+
```yaml
|
|
330
|
+
actor_rollout_ref:
|
|
331
|
+
actor:
|
|
332
|
+
megatron:
|
|
333
|
+
use_mbridge: true
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
2. **Convert HuggingFace to Megatron format**:
|
|
337
|
+
```bash
|
|
338
|
+
python tools/convert_hf_to_megatron.py \
|
|
339
|
+
--hf_model_path /path/to/hf/model \
|
|
340
|
+
--save_path /path/to/megatron/checkpoint
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
#### Issue: Megatron on AMD GPUs
|
|
344
|
+
|
|
345
|
+
**Current Limitation**: Megatron-LM backend is not supported on AMD GPUs. Use FSDP backend instead:
|
|
346
|
+
|
|
347
|
+
```yaml
|
|
348
|
+
actor_rollout_ref:
|
|
349
|
+
model:
|
|
350
|
+
backend: fsdp
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
### Debugging Tips
|
|
354
|
+
|
|
355
|
+
#### Enable Verbose Logging
|
|
356
|
+
|
|
357
|
+
```yaml
|
|
358
|
+
trainer:
|
|
359
|
+
logging_level: DEBUG
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
export VERL_DEBUG=1
|
|
364
|
+
export RAY_DEDUP_LOGS=0
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
#### Check GPU Utilization
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
watch -n 1 nvidia-smi
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
#### Profile Training
|
|
374
|
+
|
|
375
|
+
```python
|
|
376
|
+
# Add profiling to training loop
|
|
377
|
+
import torch.profiler
|
|
378
|
+
|
|
379
|
+
with torch.profiler.profile(
|
|
380
|
+
activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
|
|
381
|
+
record_shapes=True,
|
|
382
|
+
) as prof:
|
|
383
|
+
trainer.fit()
|
|
384
|
+
prof.export_chrome_trace("trace.json")
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
## Resources
|
|
388
|
+
|
|
389
|
+
- GitHub Issues: https://github.com/volcengine/verl/issues
|
|
390
|
+
- Documentation: https://verl.readthedocs.io/
|
|
391
|
+
- Community Slack: verl-project
|