@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: miles-rl-training
|
|
3
|
+
description: Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Reinforcement Learning, MoE, FP8, INT4, Enterprise, SGLang, Megatron-LM]
|
|
8
|
+
dependencies: [sglang-router>=0.2.3, ray, torch>=2.0.0, transformers>=4.40.0]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# miles: Enterprise-Grade RL for Large-Scale Model Training
|
|
12
|
+
|
|
13
|
+
miles is a high-performance, enterprise-ready RL framework optimized for large-scale model post-training. Built as a production fork of slime, it addresses critical challenges in MoE training stability, low-precision training, and train-inference alignment.
|
|
14
|
+
|
|
15
|
+
## When to Use miles
|
|
16
|
+
|
|
17
|
+
**Choose miles when you need:**
|
|
18
|
+
- Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE)
|
|
19
|
+
- FP8 or INT4 quantization-aware training
|
|
20
|
+
- Bit-wise identical train-inference alignment
|
|
21
|
+
- Speculative RL for maximum throughput
|
|
22
|
+
- Production stability with enterprise support
|
|
23
|
+
|
|
24
|
+
**Consider alternatives when:**
|
|
25
|
+
- You want the research-grade original → use **slime**
|
|
26
|
+
- You need flexible backend swapping → use **verl**
|
|
27
|
+
- You want PyTorch-native abstractions → use **torchforge**
|
|
28
|
+
|
|
29
|
+
## Key Features
|
|
30
|
+
|
|
31
|
+
### Low-Precision Training
|
|
32
|
+
- **Unified FP8**: End-to-end FP8 for both inference and training
|
|
33
|
+
- **INT4 QAT**: 1TB models on single-machine VRAM (H200)
|
|
34
|
+
- **Rollout Routing Replay (R3)**: Bit-wise expert alignment for MoE
|
|
35
|
+
|
|
36
|
+
### Performance Optimizations
|
|
37
|
+
- **Speculative RL**: 25%+ rollout speedup with online SFT draft models
|
|
38
|
+
- **Zero-Copy Weight Sync**: CUDA IPC zero-copy mapping
|
|
39
|
+
- **Partial Rollout**: Recycle half-finished trajectories
|
|
40
|
+
|
|
41
|
+
### Train-Inference Alignment
|
|
42
|
+
- **TIS/MIS**: Truncated/Masked Importance Sampling for off-policy correction
|
|
43
|
+
- **Kernel-level optimization**: FlashAttention-3, DeepGEMM integration
|
|
44
|
+
|
|
45
|
+
## Installation
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
# Recommended: Docker
|
|
49
|
+
docker pull radixark/miles:latest
|
|
50
|
+
docker run --rm --gpus all --ipc=host --shm-size=16g \
|
|
51
|
+
-it radixark/miles:latest /bin/bash
|
|
52
|
+
|
|
53
|
+
# From source
|
|
54
|
+
git clone https://github.com/radixark/miles.git
|
|
55
|
+
cd miles
|
|
56
|
+
pip install -r requirements.txt
|
|
57
|
+
pip install -e .
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Quick Start
|
|
61
|
+
|
|
62
|
+
miles inherits slime's configuration system. Basic training:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
python train.py \
|
|
66
|
+
--advantage-estimator grpo \
|
|
67
|
+
--model-name qwen3-30b-a3b \
|
|
68
|
+
--hf-checkpoint /path/to/qwen3-30b-a3b-hf \
|
|
69
|
+
--rollout-batch-size 512 \
|
|
70
|
+
--n-samples-per-prompt 8
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Workflow 1: Large MoE Training
|
|
76
|
+
|
|
77
|
+
Use this workflow for training large MoE models like DeepSeek V3 or Qwen3-MoE.
|
|
78
|
+
|
|
79
|
+
### Prerequisites Checklist
|
|
80
|
+
- [ ] H100/H200 GPUs with FP8 support
|
|
81
|
+
- [ ] MoE model (DeepSeek V3, Qwen3-MoE)
|
|
82
|
+
- [ ] Docker environment with miles
|
|
83
|
+
|
|
84
|
+
### Step 1: Environment Setup
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
# FP8 block scaling (recommended for stability)
|
|
88
|
+
export NVTE_FP8_BLOCK_SCALING_FP32_SCALES=1
|
|
89
|
+
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Step 2: Configure Training
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
python train.py \
|
|
96
|
+
--actor-num-gpus-per-node 8 \
|
|
97
|
+
--rollout-num-gpus 8 \
|
|
98
|
+
--hf-checkpoint /path/to/deepseek-v3 \
|
|
99
|
+
--advantage-estimator grpo \
|
|
100
|
+
--tensor-model-parallel-size 8 \
|
|
101
|
+
--expert-model-parallel-size 4 \
|
|
102
|
+
--prompt-data /path/to/data.jsonl \
|
|
103
|
+
--num-rollout 3000
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Verification Checklist
|
|
107
|
+
- [ ] Model loads without errors
|
|
108
|
+
- [ ] Routing decisions are consistent
|
|
109
|
+
- [ ] No NaN/Inf in loss values
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Workflow 2: Speculative RL Training
|
|
114
|
+
|
|
115
|
+
Use this workflow for maximum rollout throughput with EAGLE speculative decoding.
|
|
116
|
+
|
|
117
|
+
### How Speculative RL Works
|
|
118
|
+
|
|
119
|
+
1. Small draft model generates candidate tokens
|
|
120
|
+
2. Target model verifies in parallel
|
|
121
|
+
3. Draft model updated via online SFT to track policy
|
|
122
|
+
|
|
123
|
+
### Step 1: Enable Speculative Decoding
|
|
124
|
+
|
|
125
|
+
miles supports EAGLE speculative decoding via SGLang:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
python train.py \
|
|
129
|
+
--actor-num-gpus-per-node 8 \
|
|
130
|
+
--hf-checkpoint /path/to/target-model \
|
|
131
|
+
--sglang-speculative-algorithm EAGLE \
|
|
132
|
+
--sglang-speculative-num-steps 3 \
|
|
133
|
+
--sglang-speculative-eagle-topk 1 \
|
|
134
|
+
--sglang-speculative-num-draft-tokens 4 \
|
|
135
|
+
--sglang-speculative-draft-model-path /path/to/draft-model \
|
|
136
|
+
--advantage-estimator grpo \
|
|
137
|
+
--prompt-data /path/to/data.jsonl
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Step 2: Enable Online MTP Training (Optional)
|
|
141
|
+
|
|
142
|
+
For online SFT of draft model during training:
|
|
143
|
+
|
|
144
|
+
```bash
|
|
145
|
+
--mtp-num-layers 1 \
|
|
146
|
+
--enable-mtp-training \
|
|
147
|
+
--mtp-loss-scaling-factor 0.2
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**Note**: Online MTP training requires a torch dist checkpoint with MTP weights. Add `--mtp-num-layers 1` during checkpoint conversion from HuggingFace.
|
|
151
|
+
|
|
152
|
+
### Expected Speedup
|
|
153
|
+
|
|
154
|
+
- **Standard rollout**: Baseline
|
|
155
|
+
- **Speculative RL**: 25-40% faster rollout
|
|
156
|
+
- **With partial rollout**: Additional 10-15% throughput
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## Configuration Reference
|
|
161
|
+
|
|
162
|
+
miles inherits all slime arguments. See [slime API Reference](../slime/references/api-reference.md) for the complete list.
|
|
163
|
+
|
|
164
|
+
### Cluster Resources (from slime)
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
--actor-num-nodes 1
|
|
168
|
+
--actor-num-gpus-per-node 8
|
|
169
|
+
--rollout-num-gpus 8
|
|
170
|
+
--rollout-num-gpus-per-engine 2
|
|
171
|
+
--colocate
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Megatron Parallelism (from slime)
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
--tensor-model-parallel-size 8
|
|
178
|
+
--pipeline-model-parallel-size 2
|
|
179
|
+
--expert-model-parallel-size 4 # MoE expert parallelism
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Speculative Decoding (miles-specific)
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
--sglang-speculative-algorithm EAGLE
|
|
186
|
+
--sglang-speculative-num-steps 3
|
|
187
|
+
--sglang-speculative-eagle-topk 1
|
|
188
|
+
--sglang-speculative-num-draft-tokens 4
|
|
189
|
+
--sglang-enable-draft-weights-cpu-backup
|
|
190
|
+
--sglang-speculative-draft-model-path /your/draft/model/path
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### Online MTP Training (miles-specific)
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
--mtp-num-layers 1
|
|
197
|
+
--enable-mtp-training
|
|
198
|
+
--mtp-loss-scaling-factor 0.2
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Key Features (Conceptual)
|
|
204
|
+
|
|
205
|
+
The following features are documented in miles but specific CLI flags may vary. Consult the miles repository for latest configuration.
|
|
206
|
+
|
|
207
|
+
### Unified FP8 Pipeline
|
|
208
|
+
|
|
209
|
+
End-to-end FP8 sampling and training that eliminates quantization-induced discrepancy causing RL collapse in MoE models.
|
|
210
|
+
|
|
211
|
+
### Rollout Routing Replay (R3)
|
|
212
|
+
|
|
213
|
+
Records expert routing decisions during SGLang inference and replays them during Megatron training for bit-wise expert alignment.
|
|
214
|
+
|
|
215
|
+
**How R3 Works**:
|
|
216
|
+
1. During SGLang inference, expert routing decisions are recorded
|
|
217
|
+
2. Routing decisions stored in `sample.rollout_routed_experts`
|
|
218
|
+
3. During Megatron training, routing is replayed instead of recomputed
|
|
219
|
+
4. Ensures identical expert selection between train and inference
|
|
220
|
+
|
|
221
|
+
### INT4 Quantization-Aware Training
|
|
222
|
+
|
|
223
|
+
Enables single-machine deployment of 1TB+ models (e.g., on H200).
|
|
224
|
+
|
|
225
|
+
**Memory Savings with INT4**:
|
|
226
|
+
|
|
227
|
+
| Model Size | BF16 VRAM | INT4 VRAM | Reduction |
|
|
228
|
+
|------------|-----------|-----------|-----------|
|
|
229
|
+
| 70B | 140GB | 45GB | 3.1x |
|
|
230
|
+
| 235B | 470GB | 150GB | 3.1x |
|
|
231
|
+
| 671B | 1.3TB | 420GB | 3.1x |
|
|
232
|
+
|
|
233
|
+
### Train-Inference Alignment
|
|
234
|
+
|
|
235
|
+
miles achieves "exactly 0 KL divergence" between training and inference through:
|
|
236
|
+
- Flash Attention 3
|
|
237
|
+
- DeepGEMM
|
|
238
|
+
- Batch-invariant kernels from Thinking Machines Lab
|
|
239
|
+
- `torch.compile` integration
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Sample Data Structure
|
|
244
|
+
|
|
245
|
+
miles uses the same `Sample` dataclass as slime with the `rollout_routed_experts` field for MoE routing replay:
|
|
246
|
+
|
|
247
|
+
```python
|
|
248
|
+
@dataclass
|
|
249
|
+
class Sample:
|
|
250
|
+
prompt: str | list[dict]
|
|
251
|
+
tokens: list[int]
|
|
252
|
+
response: str
|
|
253
|
+
reward: float | dict
|
|
254
|
+
loss_mask: list[int]
|
|
255
|
+
status: Status
|
|
256
|
+
metadata: dict
|
|
257
|
+
rollout_log_probs: list[float]
|
|
258
|
+
rollout_routed_experts: list[list[int]] # MoE routing for R3
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
See [slime API Reference](../slime/references/api-reference.md) for the complete Sample definition.
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## Common Issues and Solutions
|
|
266
|
+
|
|
267
|
+
### Issue: FP8 Training Collapse
|
|
268
|
+
|
|
269
|
+
**Symptoms**: Loss explodes, NaN values
|
|
270
|
+
|
|
271
|
+
**Solutions**:
|
|
272
|
+
- Use block scaling: `export NVTE_FP8_BLOCK_SCALING_FP32_SCALES=1`
|
|
273
|
+
- Reduce learning rate: `--lr 5e-7`
|
|
274
|
+
- Ensure MoE routing is consistent between train/inference
|
|
275
|
+
|
|
276
|
+
### Issue: Speculative Draft Drift
|
|
277
|
+
|
|
278
|
+
**Symptoms**: Low acceptance rate over time
|
|
279
|
+
|
|
280
|
+
**Solutions**:
|
|
281
|
+
- Enable online MTP training to keep draft model aligned
|
|
282
|
+
- Reduce speculative steps: `--sglang-speculative-num-steps 2`
|
|
283
|
+
- Use CPU backup: `--sglang-enable-draft-weights-cpu-backup`
|
|
284
|
+
|
|
285
|
+
### Issue: Train-Inference Mismatch
|
|
286
|
+
|
|
287
|
+
**Symptoms**: Policy divergence, reward collapse
|
|
288
|
+
|
|
289
|
+
**Solutions**:
|
|
290
|
+
- Use TIS for off-policy correction: `--use-tis --tis-threshold 0.9`
|
|
291
|
+
- Verify log probs match between SGLang and Megatron
|
|
292
|
+
- Enable R3 for MoE models
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## Supported Models
|
|
297
|
+
|
|
298
|
+
| Family | Models | MoE Support |
|
|
299
|
+
|--------|--------|-------------|
|
|
300
|
+
| DeepSeek | R1, V3, V3.2 | Full |
|
|
301
|
+
| Qwen | 2, 2.5, 3 (including MoE) | Full |
|
|
302
|
+
| Llama | 3, 3.1, 3.3, 4 | Dense only |
|
|
303
|
+
| Gemma | 2, 3, 3N | Dense only |
|
|
304
|
+
| GLM | 4.5, 4.6, 4.7 | Dense only |
|
|
305
|
+
| MiniMax | M2, M2.1 | Full |
|
|
306
|
+
|
|
307
|
+
---
|
|
308
|
+
|
|
309
|
+
## Resources
|
|
310
|
+
|
|
311
|
+
- **GitHub**: https://github.com/radixark/miles
|
|
312
|
+
- **Introduction Blog**: https://lmsys.org/blog/2025-11-19-miles/
|
|
313
|
+
- **Slime (upstream)**: https://github.com/THUDM/slime
|
|
314
|
+
- **SGLang**: https://github.com/sgl-project/sglang
|
|
315
|
+
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# miles API Reference
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
miles is an enterprise-grade RL framework built on slime, adding advanced features for large-scale MoE training:
|
|
6
|
+
|
|
7
|
+
- Unified FP8 training and inference
|
|
8
|
+
- INT4 Quantization-Aware Training
|
|
9
|
+
- Rollout Routing Replay (R3)
|
|
10
|
+
- Speculative RL training
|
|
11
|
+
|
|
12
|
+
**Note**: miles inherits slime's configuration system. See [slime API Reference](../../slime/references/api-reference.md) for base arguments.
|
|
13
|
+
|
|
14
|
+
## Core Data Structures
|
|
15
|
+
|
|
16
|
+
miles uses the same `Sample` dataclass as slime with the `rollout_routed_experts` field for MoE routing replay.
|
|
17
|
+
|
|
18
|
+
## Quick Start
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
python train.py \
|
|
22
|
+
--advantage-estimator grpo \
|
|
23
|
+
--model-name qwen3-30b-a3b \
|
|
24
|
+
--hf-checkpoint /path/to/qwen3-30b-a3b-hf \
|
|
25
|
+
--rollout-batch-size 512 \
|
|
26
|
+
--n-samples-per-prompt 8
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Configuration Options
|
|
30
|
+
|
|
31
|
+
miles inherits slime's three argument categories (Megatron, SGLang with `--sglang-` prefix, and slime-specific). Key additions:
|
|
32
|
+
|
|
33
|
+
### Cluster Resources (inherited from slime)
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
--actor-num-nodes 1
|
|
37
|
+
--actor-num-gpus-per-node 8
|
|
38
|
+
--rollout-num-gpus 8
|
|
39
|
+
--rollout-num-gpus-per-engine 2
|
|
40
|
+
--colocate
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### Megatron Parallelism (inherited from slime)
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
--tensor-model-parallel-size 8
|
|
47
|
+
--pipeline-model-parallel-size 2
|
|
48
|
+
--expert-model-parallel-size 4 # MoE expert parallelism
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Speculative Decoding
|
|
52
|
+
|
|
53
|
+
Verified flags from miles documentation:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
# Basic speculative decoding
|
|
57
|
+
--sglang-speculative-algorithm EAGLE
|
|
58
|
+
--sglang-speculative-num-steps 3
|
|
59
|
+
--sglang-speculative-eagle-topk 1
|
|
60
|
+
--sglang-speculative-num-draft-tokens 4
|
|
61
|
+
--sglang-enable-draft-weights-cpu-backup
|
|
62
|
+
|
|
63
|
+
# Draft model path
|
|
64
|
+
--sglang-speculative-draft-model-path /your/draft/model/path
|
|
65
|
+
|
|
66
|
+
# Online SFT for draft model (MTP)
|
|
67
|
+
--mtp-num-layers 1
|
|
68
|
+
--enable-mtp-training
|
|
69
|
+
--mtp-loss-scaling-factor 0.2
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Note**: Online MTP training requires a torch dist checkpoint with MTP weights. Add `--mtp-num-layers 1` during checkpoint conversion from HuggingFace to torch dist format.
|
|
73
|
+
|
|
74
|
+
## Key Features (Conceptual)
|
|
75
|
+
|
|
76
|
+
The following features are documented in miles but specific CLI flags are not publicly documented. Consult the miles repository for latest configuration options.
|
|
77
|
+
|
|
78
|
+
### Unified FP8 Pipeline
|
|
79
|
+
|
|
80
|
+
End-to-end FP8 sampling and training that eliminates quantization-induced discrepancy causing RL collapse in MoE models.
|
|
81
|
+
|
|
82
|
+
### Rollout Routing Replay (R3)
|
|
83
|
+
|
|
84
|
+
Records expert routing decisions during SGLang inference and replays them during Megatron training for bit-wise expert alignment.
|
|
85
|
+
|
|
86
|
+
**How R3 Works**:
|
|
87
|
+
1. During SGLang inference, expert routing decisions are recorded
|
|
88
|
+
2. Routing decisions stored in `sample.rollout_routed_experts`
|
|
89
|
+
3. During Megatron training, routing is replayed instead of recomputed
|
|
90
|
+
4. Ensures identical expert selection between train and inference
|
|
91
|
+
|
|
92
|
+
### INT4 Quantization-Aware Training
|
|
93
|
+
|
|
94
|
+
Enables single-machine deployment of 1TB+ models (e.g., on H200).
|
|
95
|
+
|
|
96
|
+
**Memory Savings with INT4**:
|
|
97
|
+
|
|
98
|
+
| Model Size | BF16 VRAM | INT4 VRAM | Reduction |
|
|
99
|
+
|------------|-----------|-----------|-----------|
|
|
100
|
+
| 70B | 140GB | 45GB | 3.1x |
|
|
101
|
+
| 235B | 470GB | 150GB | 3.1x |
|
|
102
|
+
| 671B | 1.3TB | 420GB | 3.1x |
|
|
103
|
+
|
|
104
|
+
### Train-Inference Alignment
|
|
105
|
+
|
|
106
|
+
miles achieves "exactly 0 KL divergence" between training and inference through infrastructure optimizations:
|
|
107
|
+
- Flash Attention 3
|
|
108
|
+
- DeepGEMM
|
|
109
|
+
- Batch-invariant kernels from Thinking Machines Lab
|
|
110
|
+
- `torch.compile` integration
|
|
111
|
+
|
|
112
|
+
### Truncated/Masked Importance Sampling (TIS/MIS)
|
|
113
|
+
|
|
114
|
+
Algorithmic corrections for off-policy training. See slime documentation for `--use-tis` flag.
|
|
115
|
+
|
|
116
|
+
## Custom Functions
|
|
117
|
+
|
|
118
|
+
Same interface as slime:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
--custom-generate-function-path generate.py
|
|
122
|
+
--custom-rm-path reward.py
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Supported Models
|
|
126
|
+
|
|
127
|
+
| Family | Models | MoE Support |
|
|
128
|
+
|--------|--------|-------------|
|
|
129
|
+
| DeepSeek | R1, V3, V3.2 | Full |
|
|
130
|
+
| Qwen | 2, 2.5, 3 (including MoE) | Full |
|
|
131
|
+
| Llama | 3, 3.1, 3.3, 4 | Dense only |
|
|
132
|
+
| Gemma | 2, 3, 3N | Dense only |
|
|
133
|
+
| GLM | 4.5, 4.6, 4.7 | Dense only |
|
|
134
|
+
| MiniMax | M2, M2.1 | Full |
|
|
135
|
+
|
|
136
|
+
## Resources
|
|
137
|
+
|
|
138
|
+
- GitHub: https://github.com/radixark/miles
|
|
139
|
+
- Introduction Blog: https://lmsys.org/blog/2025-11-19-miles/
|
|
140
|
+
- Slime (upstream): https://github.com/THUDM/slime
|
|
141
|
+
- SGLang: https://github.com/sgl-project/sglang
|