@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,433 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: torchforge-rl-training
|
|
3
|
+
description: Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Reinforcement Learning, PyTorch, GRPO, SFT, Monarch, TorchTitan, Meta]
|
|
8
|
+
dependencies: [torch>=2.9.0, torchtitan>=0.2.0, vllm, monarch]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# torchforge: PyTorch-Native Agentic RL Library
|
|
12
|
+
|
|
13
|
+
torchforge is Meta's PyTorch-native RL library that separates infrastructure concerns from algorithm concerns. It enables rapid RL research by letting you focus on algorithms while handling distributed training, inference, and weight sync automatically.
|
|
14
|
+
|
|
15
|
+
## When to Use torchforge
|
|
16
|
+
|
|
17
|
+
**Choose torchforge when you need:**
|
|
18
|
+
- Clean separation between RL algorithms and infrastructure
|
|
19
|
+
- PyTorch-native abstractions (no Ray dependency)
|
|
20
|
+
- Easy algorithm experimentation (GRPO, DAPO, SAPO in ~100 lines)
|
|
21
|
+
- Scalable training with Monarch actor system
|
|
22
|
+
- Integration with TorchTitan for model parallelism
|
|
23
|
+
|
|
24
|
+
**Consider alternatives when:**
|
|
25
|
+
- You need production-ready stability → use **miles** or **verl**
|
|
26
|
+
- You want Megatron-native training → use **slime**
|
|
27
|
+
- torchforge is experimental and APIs may change
|
|
28
|
+
|
|
29
|
+
## Key Features
|
|
30
|
+
|
|
31
|
+
- **Algorithm isolation**: Implement RL algorithms without touching infrastructure
|
|
32
|
+
- **Scalability**: From single GPU to thousands via Monarch
|
|
33
|
+
- **Modern stack**: TorchTitan (training), vLLM (inference), TorchStore (sync)
|
|
34
|
+
- **Loss functions**: GRPO, DAPO, CISPO, GSPO, SAPO built-in
|
|
35
|
+
|
|
36
|
+
## Architecture Overview
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
┌─────────────────────────────────────────────────────────┐
|
|
40
|
+
│ Application Layer (Your Code) │
|
|
41
|
+
│ - Define reward models, loss functions, sampling │
|
|
42
|
+
└─────────────────────┬───────────────────────────────────┘
|
|
43
|
+
│
|
|
44
|
+
┌─────────────────────▼───────────────────────────────────┐
|
|
45
|
+
│ Forge API Layer │
|
|
46
|
+
│ - Episode, Group dataclasses │
|
|
47
|
+
│ - Service interfaces (async/await) │
|
|
48
|
+
└─────────────────────┬───────────────────────────────────┘
|
|
49
|
+
│
|
|
50
|
+
┌─────────────────────▼───────────────────────────────────┐
|
|
51
|
+
│ Distributed Services (Monarch) │
|
|
52
|
+
│ ├── Trainer (TorchTitan FSDP) │
|
|
53
|
+
│ ├── Generator (vLLM inference) │
|
|
54
|
+
│ ├── Reference Model (frozen KL baseline) │
|
|
55
|
+
│ └── Reward Actors (compute rewards) │
|
|
56
|
+
└─────────────────────────────────────────────────────────┘
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Installation
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Create environment
|
|
63
|
+
conda create -n forge python=3.12
|
|
64
|
+
conda activate forge
|
|
65
|
+
|
|
66
|
+
# Install (handles PyTorch nightly + dependencies)
|
|
67
|
+
./scripts/install.sh
|
|
68
|
+
|
|
69
|
+
# Verify
|
|
70
|
+
python -c "import torch, forge, vllm; print('OK')"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### ROCm Installation
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
./scripts/install_rocm.sh
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Quick Start
|
|
80
|
+
|
|
81
|
+
### SFT Training (2+ GPUs)
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
python -m apps.sft.main --config apps/sft/llama3_8b.yaml
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### GRPO Training (3+ GPUs)
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Workflow 1: GRPO Training for Math Reasoning
|
|
96
|
+
|
|
97
|
+
Use this workflow for training reasoning models with group-relative advantages.
|
|
98
|
+
|
|
99
|
+
### Prerequisites Checklist
|
|
100
|
+
- [ ] 3+ GPUs (GPU0: trainer, GPU1: ref_model, GPU2: generator)
|
|
101
|
+
- [ ] Model from HuggingFace Hub
|
|
102
|
+
- [ ] Training dataset (GSM8K, MATH, etc.)
|
|
103
|
+
|
|
104
|
+
### Step 1: Create Configuration
|
|
105
|
+
|
|
106
|
+
```yaml
|
|
107
|
+
# config/grpo_math.yaml
|
|
108
|
+
model: "Qwen/Qwen2.5-7B-Instruct"
|
|
109
|
+
|
|
110
|
+
dataset:
|
|
111
|
+
path: "openai/gsm8k"
|
|
112
|
+
split: "train"
|
|
113
|
+
streaming: true
|
|
114
|
+
|
|
115
|
+
training:
|
|
116
|
+
batch_size: 4
|
|
117
|
+
learning_rate: 1e-6
|
|
118
|
+
seq_len: 4096
|
|
119
|
+
dtype: bfloat16
|
|
120
|
+
gradient_accumulation_steps: 4
|
|
121
|
+
|
|
122
|
+
grpo:
|
|
123
|
+
n_samples: 8 # Responses per prompt
|
|
124
|
+
clip_low: 0.2
|
|
125
|
+
clip_high: 0.28
|
|
126
|
+
beta: 0.1 # KL penalty coefficient
|
|
127
|
+
temperature: 0.7
|
|
128
|
+
|
|
129
|
+
services:
|
|
130
|
+
generator:
|
|
131
|
+
procs: 1
|
|
132
|
+
num_replicas: 1
|
|
133
|
+
with_gpus: true
|
|
134
|
+
trainer:
|
|
135
|
+
procs: 1
|
|
136
|
+
num_replicas: 1
|
|
137
|
+
with_gpus: true
|
|
138
|
+
ref_model:
|
|
139
|
+
procs: 1
|
|
140
|
+
num_replicas: 1
|
|
141
|
+
with_gpus: true
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Step 2: Define Reward Function
|
|
145
|
+
|
|
146
|
+
```python
|
|
147
|
+
# rewards.py
|
|
148
|
+
# Reward functions are in forge.data.rewards
|
|
149
|
+
from forge.data.rewards import MathReward, ThinkingReward
|
|
150
|
+
import re
|
|
151
|
+
|
|
152
|
+
# Or define your own reward function
|
|
153
|
+
class CustomMathReward:
|
|
154
|
+
def __call__(self, prompt: str, response: str, target: str) -> float:
|
|
155
|
+
# Extract answer from response
|
|
156
|
+
match = re.search(r'\\boxed{([^}]+)}', response)
|
|
157
|
+
if not match:
|
|
158
|
+
return 0.0
|
|
159
|
+
|
|
160
|
+
answer = match.group(1).strip()
|
|
161
|
+
return 1.0 if answer == target else 0.0
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Step 3: Launch Training
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
python -m apps.grpo.main --config config/grpo_math.yaml
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Step 4: Monitor Progress
|
|
171
|
+
- [ ] Check W&B dashboard for loss curves
|
|
172
|
+
- [ ] Verify entropy is decreasing (policy becoming more deterministic)
|
|
173
|
+
- [ ] Monitor KL divergence (should stay bounded)
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Workflow 2: Custom Loss Function
|
|
178
|
+
|
|
179
|
+
Use this workflow to implement new RL algorithms.
|
|
180
|
+
|
|
181
|
+
### Step 1: Create Loss Class
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
# src/forge/losses/custom_loss.py
|
|
185
|
+
import torch
|
|
186
|
+
import torch.nn as nn
|
|
187
|
+
|
|
188
|
+
class CustomLoss(nn.Module):
|
|
189
|
+
def __init__(self, clip_range: float = 0.2, beta: float = 0.1):
|
|
190
|
+
super().__init__()
|
|
191
|
+
self.clip_range = clip_range
|
|
192
|
+
self.beta = beta
|
|
193
|
+
|
|
194
|
+
def forward(
|
|
195
|
+
self,
|
|
196
|
+
logprobs: torch.Tensor,
|
|
197
|
+
ref_logprobs: torch.Tensor,
|
|
198
|
+
advantages: torch.Tensor,
|
|
199
|
+
padding_mask: torch.Tensor,
|
|
200
|
+
) -> torch.Tensor:
|
|
201
|
+
# Compute importance ratio
|
|
202
|
+
ratio = torch.exp(logprobs - ref_logprobs)
|
|
203
|
+
|
|
204
|
+
# Clipped policy gradient
|
|
205
|
+
clipped_ratio = torch.clamp(
|
|
206
|
+
ratio,
|
|
207
|
+
1 - self.clip_range,
|
|
208
|
+
1 + self.clip_range
|
|
209
|
+
)
|
|
210
|
+
pg_loss = -torch.min(ratio * advantages, clipped_ratio * advantages)
|
|
211
|
+
|
|
212
|
+
# KL penalty
|
|
213
|
+
kl = ref_logprobs - logprobs
|
|
214
|
+
|
|
215
|
+
# Apply mask and aggregate
|
|
216
|
+
masked_loss = (pg_loss + self.beta * kl) * padding_mask
|
|
217
|
+
loss = masked_loss.sum() / padding_mask.sum()
|
|
218
|
+
|
|
219
|
+
return loss
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Step 2: Integrate into Application
|
|
223
|
+
|
|
224
|
+
```python
|
|
225
|
+
# apps/custom/main.py
|
|
226
|
+
from forge.losses.custom_loss import CustomLoss
|
|
227
|
+
|
|
228
|
+
loss_fn = CustomLoss(clip_range=0.2, beta=0.1)
|
|
229
|
+
|
|
230
|
+
# In training loop
|
|
231
|
+
loss = loss_fn(
|
|
232
|
+
logprobs=logprobs,
|
|
233
|
+
ref_logprobs=ref_logprobs,
|
|
234
|
+
advantages=advantages,
|
|
235
|
+
padding_mask=padding_mask,
|
|
236
|
+
)
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
## Workflow 3: Multi-GPU Distributed Training
|
|
242
|
+
|
|
243
|
+
Use this workflow for scaling to multiple GPUs or nodes.
|
|
244
|
+
|
|
245
|
+
### Configuration for Distributed
|
|
246
|
+
|
|
247
|
+
```yaml
|
|
248
|
+
# config/distributed.yaml
|
|
249
|
+
model: "meta-llama/Meta-Llama-3.1-8B-Instruct"
|
|
250
|
+
|
|
251
|
+
parallelism:
|
|
252
|
+
tensor_parallel_degree: 2 # Split model across GPUs
|
|
253
|
+
pipeline_parallel_degree: 1
|
|
254
|
+
data_parallel_shard_degree: 2
|
|
255
|
+
|
|
256
|
+
services:
|
|
257
|
+
generator:
|
|
258
|
+
procs: 2 # 2 processes for TP=2
|
|
259
|
+
num_replicas: 1
|
|
260
|
+
with_gpus: true
|
|
261
|
+
trainer:
|
|
262
|
+
procs: 2
|
|
263
|
+
num_replicas: 1
|
|
264
|
+
with_gpus: true
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Launch with SLURM
|
|
268
|
+
|
|
269
|
+
```bash
|
|
270
|
+
# Submit job
|
|
271
|
+
sbatch --nodes=2 --gpus-per-node=8 run_grpo.sh
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### Launch Locally (Multi-GPU)
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
# 8 GPU setup
|
|
278
|
+
python -m apps.grpo.main \
|
|
279
|
+
--config config/distributed.yaml \
|
|
280
|
+
--trainer.procs 4 \
|
|
281
|
+
--generator.procs 4
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## Core API Reference
|
|
287
|
+
|
|
288
|
+
### Training Batch Format
|
|
289
|
+
|
|
290
|
+
torchforge uses dictionary-based batches for training:
|
|
291
|
+
|
|
292
|
+
```python
|
|
293
|
+
# inputs: list of dicts with torch.Tensor values
|
|
294
|
+
inputs = [{"tokens": torch.Tensor}]
|
|
295
|
+
|
|
296
|
+
# targets: list of dicts with training signals
|
|
297
|
+
targets = [{
|
|
298
|
+
"response": torch.Tensor,
|
|
299
|
+
"ref_logprobs": torch.Tensor,
|
|
300
|
+
"advantages": torch.Tensor,
|
|
301
|
+
"padding_mask": torch.Tensor
|
|
302
|
+
}]
|
|
303
|
+
|
|
304
|
+
# train_step returns loss as float
|
|
305
|
+
loss = trainer.train_step(inputs, targets)
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
### Completion
|
|
309
|
+
|
|
310
|
+
Generated output from vLLM:
|
|
311
|
+
|
|
312
|
+
```python
|
|
313
|
+
@dataclass
|
|
314
|
+
class Completion:
|
|
315
|
+
text: str # Generated text
|
|
316
|
+
token_ids: list[int] # Token IDs
|
|
317
|
+
logprobs: list[float] # Log probabilities
|
|
318
|
+
metadata: dict # Custom metadata
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## Built-in Loss Functions
|
|
324
|
+
|
|
325
|
+
### Loss Functions
|
|
326
|
+
|
|
327
|
+
Loss functions are in the `forge.losses` module:
|
|
328
|
+
|
|
329
|
+
```python
|
|
330
|
+
from forge.losses import SimpleGRPOLoss, ReinforceLoss
|
|
331
|
+
|
|
332
|
+
# SimpleGRPOLoss for GRPO training
|
|
333
|
+
loss_fn = SimpleGRPOLoss(beta=0.1)
|
|
334
|
+
|
|
335
|
+
# Forward pass
|
|
336
|
+
loss = loss_fn(
|
|
337
|
+
logprobs=logprobs,
|
|
338
|
+
ref_logprobs=ref_logprobs,
|
|
339
|
+
advantages=advantages,
|
|
340
|
+
padding_mask=padding_mask
|
|
341
|
+
)
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### ReinforceLoss
|
|
345
|
+
|
|
346
|
+
```python
|
|
347
|
+
from forge.losses.reinforce_loss import ReinforceLoss
|
|
348
|
+
|
|
349
|
+
# With optional importance ratio clipping
|
|
350
|
+
loss_fn = ReinforceLoss(clip_ratio=0.2)
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
---
|
|
354
|
+
|
|
355
|
+
## Common Issues and Solutions
|
|
356
|
+
|
|
357
|
+
### Issue: Not Enough GPUs
|
|
358
|
+
|
|
359
|
+
**Symptoms**: "Insufficient GPU resources" error
|
|
360
|
+
|
|
361
|
+
**Solutions**:
|
|
362
|
+
```yaml
|
|
363
|
+
# Reduce service requirements
|
|
364
|
+
services:
|
|
365
|
+
generator:
|
|
366
|
+
procs: 1
|
|
367
|
+
with_gpus: true
|
|
368
|
+
trainer:
|
|
369
|
+
procs: 1
|
|
370
|
+
with_gpus: true
|
|
371
|
+
# Remove ref_model (uses generator weights)
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
Or use CPU for reference model:
|
|
375
|
+
```yaml
|
|
376
|
+
ref_model:
|
|
377
|
+
with_gpus: false
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Issue: OOM During Generation
|
|
381
|
+
|
|
382
|
+
**Symptoms**: CUDA OOM in vLLM
|
|
383
|
+
|
|
384
|
+
**Solutions**:
|
|
385
|
+
```yaml
|
|
386
|
+
# Reduce batch size
|
|
387
|
+
grpo:
|
|
388
|
+
n_samples: 4 # Reduce from 8
|
|
389
|
+
|
|
390
|
+
# Or reduce sequence length
|
|
391
|
+
training:
|
|
392
|
+
seq_len: 2048
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
### Issue: Slow Weight Sync
|
|
396
|
+
|
|
397
|
+
**Symptoms**: Long pauses between training and generation
|
|
398
|
+
|
|
399
|
+
**Solutions**:
|
|
400
|
+
```bash
|
|
401
|
+
# Enable RDMA (if available)
|
|
402
|
+
export TORCHSTORE_USE_RDMA=1
|
|
403
|
+
|
|
404
|
+
# Or reduce sync frequency
|
|
405
|
+
training:
|
|
406
|
+
sync_interval: 10 # Sync every 10 steps
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
### Issue: Policy Collapse
|
|
410
|
+
|
|
411
|
+
**Symptoms**: Entropy drops to zero, reward stops improving
|
|
412
|
+
|
|
413
|
+
**Solutions**:
|
|
414
|
+
```yaml
|
|
415
|
+
# Increase KL penalty
|
|
416
|
+
grpo:
|
|
417
|
+
beta: 0.2 # Increase from 0.1
|
|
418
|
+
|
|
419
|
+
# Or add entropy bonus
|
|
420
|
+
training:
|
|
421
|
+
entropy_coef: 0.01
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## Resources
|
|
427
|
+
|
|
428
|
+
- **Documentation**: https://meta-pytorch.org/torchforge
|
|
429
|
+
- **GitHub**: https://github.com/meta-pytorch/torchforge
|
|
430
|
+
- **Discord**: https://discord.gg/YsTYBh6PD9
|
|
431
|
+
- **TorchTitan**: https://github.com/pytorch/torchtitan
|
|
432
|
+
- **Monarch**: https://github.com/meta-pytorch/monarch
|
|
433
|
+
|