@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,413 @@
|
|
|
1
|
+
# RadixAttention Deep Dive
|
|
2
|
+
|
|
3
|
+
Complete guide to RadixAttention - SGLang's key innovation for automatic prefix caching.
|
|
4
|
+
|
|
5
|
+
## What is RadixAttention?
|
|
6
|
+
|
|
7
|
+
**RadixAttention** is an algorithm that automatically caches and reuses KV cache for common prefixes across requests using a radix tree data structure.
|
|
8
|
+
|
|
9
|
+
**Key insight**: In real-world LLM serving:
|
|
10
|
+
- System prompts are repeated across requests
|
|
11
|
+
- Few-shot examples are shared
|
|
12
|
+
- Multi-turn conversations build on previous context
|
|
13
|
+
- Agent tools/functions are defined once
|
|
14
|
+
|
|
15
|
+
**Problem with traditional serving**:
|
|
16
|
+
- Every request recomputes the entire prompt
|
|
17
|
+
- Wasteful for shared prefixes
|
|
18
|
+
- 5-10× slower than necessary
|
|
19
|
+
|
|
20
|
+
**RadixAttention solution**:
|
|
21
|
+
- Build radix tree of all processed tokens
|
|
22
|
+
- Automatically detect shared prefixes
|
|
23
|
+
- Reuse KV cache for matching tokens
|
|
24
|
+
- Only compute new/different tokens
|
|
25
|
+
|
|
26
|
+
## How It Works
|
|
27
|
+
|
|
28
|
+
### Radix Tree Structure
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
Example requests:
|
|
32
|
+
1. "System: You are helpful\nUser: What's AI?"
|
|
33
|
+
2. "System: You are helpful\nUser: What's ML?"
|
|
34
|
+
3. "System: You are helpful\nUser: What's DL?"
|
|
35
|
+
|
|
36
|
+
Radix tree:
|
|
37
|
+
Root
|
|
38
|
+
└── "System: You are helpful\nUser: What's "
|
|
39
|
+
├── "AI?" → [KV cache for request 1]
|
|
40
|
+
├── "ML?" → [KV cache for request 2]
|
|
41
|
+
└── "DL?" → [KV cache for request 3]
|
|
42
|
+
|
|
43
|
+
Shared prefix: "System: You are helpful\nUser: What's "
|
|
44
|
+
→ Computed once, reused 3 times
|
|
45
|
+
→ 5× speedup!
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### Token-Level Matching
|
|
49
|
+
|
|
50
|
+
RadixAttention works at the token level:
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
# Request 1: "Hello world"
|
|
54
|
+
Tokens: [15496, 1917] # Hello=15496, world=1917
|
|
55
|
+
→ KV cache computed and stored in tree
|
|
56
|
+
|
|
57
|
+
# Request 2: "Hello there"
|
|
58
|
+
Tokens: [15496, 612] # Hello=15496, there=612
|
|
59
|
+
→ Reuses KV cache for token 15496
|
|
60
|
+
→ Only computes token 612
|
|
61
|
+
→ 2× faster
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Automatic Eviction
|
|
65
|
+
|
|
66
|
+
When memory is full:
|
|
67
|
+
1. **LRU policy**: Evict least recently used prefixes
|
|
68
|
+
2. **Leaf-first**: Remove leaf nodes before internal nodes
|
|
69
|
+
3. **Preserves common prefixes**: Frequently used prefixes stay cached
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
Before eviction (memory full):
|
|
73
|
+
Root
|
|
74
|
+
├── "System A" (used 5 min ago)
|
|
75
|
+
│ ├── "Task 1" (used 1 min ago) ← Keep (recent)
|
|
76
|
+
│ └── "Task 2" (used 30 min ago) ← Evict (old + leaf)
|
|
77
|
+
└── "System B" (used 60 min ago) ← Evict (very old)
|
|
78
|
+
|
|
79
|
+
After eviction:
|
|
80
|
+
Root
|
|
81
|
+
└── "System A"
|
|
82
|
+
└── "Task 1"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## Performance Analysis
|
|
86
|
+
|
|
87
|
+
### Few-Shot Prompting
|
|
88
|
+
|
|
89
|
+
**Scenario**: 10 examples in prompt (2000 tokens), user query (50 tokens)
|
|
90
|
+
|
|
91
|
+
**Without RadixAttention** (vLLM):
|
|
92
|
+
- Request 1: Compute 2050 tokens (2000 examples + 50 query)
|
|
93
|
+
- Request 2: Compute 2050 tokens (recompute all examples)
|
|
94
|
+
- Request 3: Compute 2050 tokens (recompute all examples)
|
|
95
|
+
- Total: 6150 tokens computed
|
|
96
|
+
|
|
97
|
+
**With RadixAttention** (SGLang):
|
|
98
|
+
- Request 1: Compute 2050 tokens (initial)
|
|
99
|
+
- Request 2: Reuse 2000 tokens, compute 50 (query only)
|
|
100
|
+
- Request 3: Reuse 2000 tokens, compute 50 (query only)
|
|
101
|
+
- Total: 2150 tokens computed
|
|
102
|
+
- **Speedup: 2.86×** (6150 / 2150)
|
|
103
|
+
|
|
104
|
+
### Agent Workflows
|
|
105
|
+
|
|
106
|
+
**Scenario**: System prompt (1000 tokens) + tools (500 tokens) + query (100 tokens)
|
|
107
|
+
|
|
108
|
+
**Without RadixAttention**:
|
|
109
|
+
- Request 1: 1600 tokens
|
|
110
|
+
- Request 2: 1600 tokens
|
|
111
|
+
- Request 3: 1600 tokens
|
|
112
|
+
- Total: 4800 tokens
|
|
113
|
+
|
|
114
|
+
**With RadixAttention**:
|
|
115
|
+
- Request 1: 1600 tokens (initial)
|
|
116
|
+
- Request 2: Reuse 1500, compute 100
|
|
117
|
+
- Request 3: Reuse 1500, compute 100
|
|
118
|
+
- Total: 1800 tokens
|
|
119
|
+
- **Speedup: 2.67×**
|
|
120
|
+
|
|
121
|
+
### Multi-Turn Conversations
|
|
122
|
+
|
|
123
|
+
**Scenario**: Conversation grows from 100 → 500 → 1000 tokens
|
|
124
|
+
|
|
125
|
+
| Turn | Tokens | vLLM | SGLang (RadixAttention) |
|
|
126
|
+
|------|--------|------|-------------------------|
|
|
127
|
+
| 1 | 100 | 100 | 100 (initial) |
|
|
128
|
+
| 2 | 500 | 500 | 400 (reuse 100) |
|
|
129
|
+
| 3 | 1000 | 1000 | 500 (reuse 500) |
|
|
130
|
+
| **Total** | | **1600** | **1000** |
|
|
131
|
+
| **Speedup** | | | **1.6×** |
|
|
132
|
+
|
|
133
|
+
As conversation grows, speedup increases!
|
|
134
|
+
|
|
135
|
+
## Benchmarks
|
|
136
|
+
|
|
137
|
+
### Throughput Comparison (Llama 3-8B, A100)
|
|
138
|
+
|
|
139
|
+
| Workload | Prefix Length | vLLM | SGLang | Speedup |
|
|
140
|
+
|----------|---------------|------|--------|---------|
|
|
141
|
+
| Simple generation | 0 | 2500 tok/s | 2800 tok/s | 1.12× |
|
|
142
|
+
| Few-shot (5 ex) | 1000 | 800 tok/s | 3200 tok/s | 4× |
|
|
143
|
+
| Few-shot (10 ex) | 2000 | 500 tok/s | 5000 tok/s | **10×** |
|
|
144
|
+
| Agent (tools) | 1500 | 800 tok/s | 4000 tok/s | 5× |
|
|
145
|
+
| Chat (history) | 500-2000 | 1200 tok/s | 3600 tok/s | 3× |
|
|
146
|
+
|
|
147
|
+
**Key insight**: Longer shared prefixes = bigger speedups
|
|
148
|
+
|
|
149
|
+
### Latency Reduction
|
|
150
|
+
|
|
151
|
+
**Agent workflow** (1000-token system prompt):
|
|
152
|
+
|
|
153
|
+
| Metric | vLLM | SGLang | Improvement |
|
|
154
|
+
|--------|------|--------|-------------|
|
|
155
|
+
| First request | 1.8s | 1.8s | Same (no cache) |
|
|
156
|
+
| Subsequent requests | 1.8s | **0.35s** | **5× faster** |
|
|
157
|
+
| P50 latency (100 req) | 1.8s | 0.42s | 4.3× faster |
|
|
158
|
+
| P99 latency | 2.1s | 0.58s | 3.6× faster |
|
|
159
|
+
|
|
160
|
+
### Memory Efficiency
|
|
161
|
+
|
|
162
|
+
**Without RadixAttention**:
|
|
163
|
+
- Each request stores its own KV cache
|
|
164
|
+
- 100 requests with 2000-token prefix = 200K tokens cached
|
|
165
|
+
- Memory: ~1.5 GB (Llama 3-8B, FP16)
|
|
166
|
+
|
|
167
|
+
**With RadixAttention**:
|
|
168
|
+
- Prefix stored once in radix tree
|
|
169
|
+
- 100 requests share 2000-token prefix
|
|
170
|
+
- Memory: ~15 MB for prefix + unique tokens
|
|
171
|
+
- **Savings: 99%** for shared portions
|
|
172
|
+
|
|
173
|
+
## Configuration
|
|
174
|
+
|
|
175
|
+
### Enable/Disable RadixAttention
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
# Enabled by default
|
|
179
|
+
python -m sglang.launch_server \
|
|
180
|
+
--model-path meta-llama/Meta-Llama-3-8B-Instruct
|
|
181
|
+
|
|
182
|
+
# Disable (for comparison)
|
|
183
|
+
python -m sglang.launch_server \
|
|
184
|
+
--model-path meta-llama/Meta-Llama-3-8B-Instruct \
|
|
185
|
+
--disable-radix-cache
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Cache Size Tuning
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
# Set max cache size (default: 90% of GPU memory)
|
|
192
|
+
python -m sglang.launch_server \
|
|
193
|
+
--model-path meta-llama/Meta-Llama-3-8B-Instruct \
|
|
194
|
+
--max-radix-cache-len 16384 # Max 16K tokens cached
|
|
195
|
+
|
|
196
|
+
# Reserve memory for KV cache
|
|
197
|
+
--mem-fraction-static 0.85 # Use 85% GPU memory for cache
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Eviction Policy
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
# LRU eviction (default)
|
|
204
|
+
--eviction-policy lru
|
|
205
|
+
|
|
206
|
+
# FIFO eviction
|
|
207
|
+
--eviction-policy fifo
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## Best Practices
|
|
211
|
+
|
|
212
|
+
### Design prompts for prefix sharing
|
|
213
|
+
|
|
214
|
+
**Bad** (no prefix sharing):
|
|
215
|
+
```python
|
|
216
|
+
# Each request has unique prefix
|
|
217
|
+
request_1 = "User Alice asks: What is AI?"
|
|
218
|
+
request_2 = "User Bob asks: What is ML?"
|
|
219
|
+
request_3 = "User Carol asks: What is DL?"
|
|
220
|
+
|
|
221
|
+
# No common prefix → No speedup
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
**Good** (maximize prefix sharing):
|
|
225
|
+
```python
|
|
226
|
+
# Shared system prompt
|
|
227
|
+
system = "You are a helpful AI assistant.\n\n"
|
|
228
|
+
|
|
229
|
+
request_1 = system + "User: What is AI?"
|
|
230
|
+
request_2 = system + "User: What is ML?"
|
|
231
|
+
request_3 = system + "User: What is DL?"
|
|
232
|
+
|
|
233
|
+
# Shared prefix → 5× speedup!
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Structure agent prompts
|
|
237
|
+
|
|
238
|
+
```python
|
|
239
|
+
# Template for maximum caching
|
|
240
|
+
@sgl.function
|
|
241
|
+
def agent_template(s, user_query):
|
|
242
|
+
# Layer 1: System prompt (always cached)
|
|
243
|
+
s += "You are a helpful assistant.\n\n"
|
|
244
|
+
|
|
245
|
+
# Layer 2: Tools definition (always cached)
|
|
246
|
+
s += "Available tools:\n"
|
|
247
|
+
s += "- get_weather(location)\n"
|
|
248
|
+
s += "- send_email(to, subject, body)\n\n"
|
|
249
|
+
|
|
250
|
+
# Layer 3: Examples (always cached)
|
|
251
|
+
s += "Examples:\n"
|
|
252
|
+
s += "User: What's the weather?\n"
|
|
253
|
+
s += "Assistant: <tool>get_weather('NYC')</tool>\n\n"
|
|
254
|
+
|
|
255
|
+
# Layer 4: User query (unique per request)
|
|
256
|
+
s += f"User: {user_query}\n"
|
|
257
|
+
s += "Assistant: "
|
|
258
|
+
s += sgl.gen("response", max_tokens=200)
|
|
259
|
+
|
|
260
|
+
# Layers 1-3 cached, only Layer 4 computed
|
|
261
|
+
# 5× faster for typical agent queries
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### Optimize few-shot prompting
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
# BAD: Examples mixed with query
|
|
268
|
+
def bad_few_shot(s, query):
|
|
269
|
+
s += f"Query: {query}\n" # Unique
|
|
270
|
+
s += "Example 1: ..." # Can't be cached
|
|
271
|
+
s += "Example 2: ..."
|
|
272
|
+
s += sgl.gen("answer")
|
|
273
|
+
|
|
274
|
+
# GOOD: Examples first, then query
|
|
275
|
+
def good_few_shot(s, query):
|
|
276
|
+
# Examples (shared prefix, always cached)
|
|
277
|
+
s += "Example 1: ...\n"
|
|
278
|
+
s += "Example 2: ...\n"
|
|
279
|
+
s += "Example 3: ...\n\n"
|
|
280
|
+
|
|
281
|
+
# Query (unique suffix, computed)
|
|
282
|
+
s += f"Query: {query}\n"
|
|
283
|
+
s += sgl.gen("answer")
|
|
284
|
+
|
|
285
|
+
# 10× faster with RadixAttention
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Monitoring
|
|
289
|
+
|
|
290
|
+
### Cache hit rate
|
|
291
|
+
|
|
292
|
+
```python
|
|
293
|
+
# Check cache statistics
|
|
294
|
+
import requests
|
|
295
|
+
response = requests.get("http://localhost:30000/stats")
|
|
296
|
+
stats = response.json()
|
|
297
|
+
|
|
298
|
+
print(f"Cache hit rate: {stats['radix_cache_hit_rate']:.2%}")
|
|
299
|
+
print(f"Tokens cached: {stats['radix_cache_tokens']}")
|
|
300
|
+
print(f"Cache size: {stats['radix_cache_size_mb']} MB")
|
|
301
|
+
|
|
302
|
+
# Target: >80% hit rate for agent/few-shot workloads
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### Optimization metrics
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
# Monitor cache usage
|
|
309
|
+
curl http://localhost:30000/metrics | grep radix
|
|
310
|
+
|
|
311
|
+
# Key metrics:
|
|
312
|
+
# - radix_cache_hit_tokens: Tokens reused from cache
|
|
313
|
+
# - radix_cache_miss_tokens: Tokens computed (not cached)
|
|
314
|
+
# - radix_cache_evictions: Number of evictions (should be low)
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
## Advanced Patterns
|
|
318
|
+
|
|
319
|
+
### Hierarchical caching
|
|
320
|
+
|
|
321
|
+
```python
|
|
322
|
+
@sgl.function
|
|
323
|
+
def hierarchical_agent(s, domain, task, query):
|
|
324
|
+
# Level 1: Global system (cached across all requests)
|
|
325
|
+
s += "You are an AI assistant.\n\n"
|
|
326
|
+
|
|
327
|
+
# Level 2: Domain knowledge (cached per domain)
|
|
328
|
+
s += f"Domain: {domain}\n"
|
|
329
|
+
s += f"Knowledge: {get_domain_knowledge(domain)}\n\n"
|
|
330
|
+
|
|
331
|
+
# Level 3: Task context (cached per task)
|
|
332
|
+
s += f"Task: {task}\n"
|
|
333
|
+
s += f"Instructions: {get_task_instructions(task)}\n\n"
|
|
334
|
+
|
|
335
|
+
# Level 4: User query (unique)
|
|
336
|
+
s += f"Query: {query}\n"
|
|
337
|
+
s += sgl.gen("response")
|
|
338
|
+
|
|
339
|
+
# Example cache tree:
|
|
340
|
+
# Root
|
|
341
|
+
# └── "You are an AI assistant\n\n" (L1)
|
|
342
|
+
# ├── "Domain: Finance\n..." (L2)
|
|
343
|
+
# │ ├── "Task: Analysis\n..." (L3)
|
|
344
|
+
# │ │ └── "Query: ..." (L4)
|
|
345
|
+
# │ └── "Task: Forecast\n..." (L3)
|
|
346
|
+
# └── "Domain: Legal\n..." (L2)
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
### Batch requests with common prefix
|
|
350
|
+
|
|
351
|
+
```python
|
|
352
|
+
# All requests share system prompt
|
|
353
|
+
system_prompt = "You are a helpful assistant.\n\n"
|
|
354
|
+
|
|
355
|
+
queries = [
|
|
356
|
+
"What is AI?",
|
|
357
|
+
"What is ML?",
|
|
358
|
+
"What is DL?",
|
|
359
|
+
]
|
|
360
|
+
|
|
361
|
+
# Run in batch (RadixAttention automatically optimizes)
|
|
362
|
+
results = sgl.run_batch([
|
|
363
|
+
agent.bind(prefix=system_prompt, query=q)
|
|
364
|
+
for q in queries
|
|
365
|
+
])
|
|
366
|
+
|
|
367
|
+
# System prompt computed once, shared across all 3 requests
|
|
368
|
+
# 3× faster than sequential
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
## Troubleshooting
|
|
372
|
+
|
|
373
|
+
### Low cache hit rate (<50%)
|
|
374
|
+
|
|
375
|
+
**Causes**:
|
|
376
|
+
1. Prompts have no common structure
|
|
377
|
+
2. Dynamic content in prefix (timestamps, IDs)
|
|
378
|
+
3. Cache size too small (evictions)
|
|
379
|
+
|
|
380
|
+
**Solutions**:
|
|
381
|
+
1. Restructure prompts (shared prefix first)
|
|
382
|
+
2. Move dynamic content to suffix
|
|
383
|
+
3. Increase `--max-radix-cache-len`
|
|
384
|
+
|
|
385
|
+
### High memory usage
|
|
386
|
+
|
|
387
|
+
**Cause**: Too many unique prefixes cached
|
|
388
|
+
|
|
389
|
+
**Solutions**:
|
|
390
|
+
```bash
|
|
391
|
+
# Reduce cache size
|
|
392
|
+
--max-radix-cache-len 8192
|
|
393
|
+
|
|
394
|
+
# More aggressive eviction
|
|
395
|
+
--mem-fraction-static 0.75
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
### Performance worse than vLLM
|
|
399
|
+
|
|
400
|
+
**Cause**: No prefix sharing in workload
|
|
401
|
+
|
|
402
|
+
**Solution**: RadixAttention has small overhead if no sharing. Use vLLM for simple generation workloads without repeated prefixes.
|
|
403
|
+
|
|
404
|
+
## Comparison with Other Systems
|
|
405
|
+
|
|
406
|
+
| System | Prefix Caching | Automatic | Performance |
|
|
407
|
+
|--------|----------------|-----------|-------------|
|
|
408
|
+
| **SGLang** | ✅ RadixAttention | ✅ Automatic | 5-10× for agents |
|
|
409
|
+
| vLLM | ❌ No prefix caching | N/A | Baseline |
|
|
410
|
+
| Text Generation Inference | ✅ Prefix caching | ❌ Manual | 2-3× (if configured) |
|
|
411
|
+
| TensorRT-LLM | ✅ Static prefix | ❌ Manual | 2× (if configured) |
|
|
412
|
+
|
|
413
|
+
**SGLang advantage**: Fully automatic - no configuration needed, works for any workload with prefix sharing.
|