@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,336 @@
|
|
|
1
|
+
# Supported Models
|
|
2
|
+
|
|
3
|
+
Complete list of model architectures supported by LitGPT with parameter sizes and variants.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
LitGPT supports **20+ model families** with **100+ model variants** ranging from 135M to 405B parameters.
|
|
8
|
+
|
|
9
|
+
**List all models**:
|
|
10
|
+
```bash
|
|
11
|
+
litgpt download list
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
**List pretrain-capable models**:
|
|
15
|
+
```bash
|
|
16
|
+
litgpt pretrain list
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Model Families
|
|
20
|
+
|
|
21
|
+
### Llama Family
|
|
22
|
+
|
|
23
|
+
**Llama 3, 3.1, 3.2, 3.3**:
|
|
24
|
+
- **Sizes**: 1B, 3B, 8B, 70B, 405B
|
|
25
|
+
- **Use Cases**: General-purpose, long-context (128K), multimodal
|
|
26
|
+
- **Best For**: Production applications, research, instruction following
|
|
27
|
+
|
|
28
|
+
**Code Llama**:
|
|
29
|
+
- **Sizes**: 7B, 13B, 34B, 70B
|
|
30
|
+
- **Use Cases**: Code generation, completion, infilling
|
|
31
|
+
- **Best For**: Programming assistants, code analysis
|
|
32
|
+
|
|
33
|
+
**Function Calling Llama 2**:
|
|
34
|
+
- **Sizes**: 7B
|
|
35
|
+
- **Use Cases**: Tool use, API integration
|
|
36
|
+
- **Best For**: Agents, function execution
|
|
37
|
+
|
|
38
|
+
**Llama 2**:
|
|
39
|
+
- **Sizes**: 7B, 13B, 70B
|
|
40
|
+
- **Use Cases**: General-purpose (predecessor to Llama 3)
|
|
41
|
+
- **Best For**: Established baselines, research comparisons
|
|
42
|
+
|
|
43
|
+
**Llama 3.1 Nemotron**:
|
|
44
|
+
- **Sizes**: 70B
|
|
45
|
+
- **Use Cases**: NVIDIA-optimized variant
|
|
46
|
+
- **Best For**: Enterprise deployments
|
|
47
|
+
|
|
48
|
+
**TinyLlama**:
|
|
49
|
+
- **Sizes**: 1.1B
|
|
50
|
+
- **Use Cases**: Edge devices, resource-constrained environments
|
|
51
|
+
- **Best For**: Fast inference, mobile deployment
|
|
52
|
+
|
|
53
|
+
**OpenLLaMA**:
|
|
54
|
+
- **Sizes**: 3B, 7B, 13B
|
|
55
|
+
- **Use Cases**: Open-source Llama reproduction
|
|
56
|
+
- **Best For**: Research, education
|
|
57
|
+
|
|
58
|
+
**Vicuna**:
|
|
59
|
+
- **Sizes**: 7B, 13B, 33B
|
|
60
|
+
- **Use Cases**: Chatbot, instruction following
|
|
61
|
+
- **Best For**: Conversational AI
|
|
62
|
+
|
|
63
|
+
**R1 Distill Llama**:
|
|
64
|
+
- **Sizes**: 8B, 70B
|
|
65
|
+
- **Use Cases**: Distilled reasoning models
|
|
66
|
+
- **Best For**: Efficient reasoning tasks
|
|
67
|
+
|
|
68
|
+
**MicroLlama**:
|
|
69
|
+
- **Sizes**: 300M
|
|
70
|
+
- **Use Cases**: Extremely small Llama variant
|
|
71
|
+
- **Best For**: Prototyping, testing
|
|
72
|
+
|
|
73
|
+
**Platypus**:
|
|
74
|
+
- **Sizes**: 7B, 13B, 70B
|
|
75
|
+
- **Use Cases**: STEM-focused fine-tune
|
|
76
|
+
- **Best For**: Science, math, technical domains
|
|
77
|
+
|
|
78
|
+
### Mistral Family
|
|
79
|
+
|
|
80
|
+
**Mistral**:
|
|
81
|
+
- **Sizes**: 7B, 123B
|
|
82
|
+
- **Use Cases**: Efficient open models, long-context
|
|
83
|
+
- **Best For**: Cost-effective deployments
|
|
84
|
+
|
|
85
|
+
**Mathstral**:
|
|
86
|
+
- **Sizes**: 7B
|
|
87
|
+
- **Use Cases**: Math reasoning
|
|
88
|
+
- **Best For**: Mathematical problem solving
|
|
89
|
+
|
|
90
|
+
**Mixtral MoE**:
|
|
91
|
+
- **Sizes**: 8×7B (47B total, 13B active), 8×22B (141B total, 39B active)
|
|
92
|
+
- **Use Cases**: Sparse mixture of experts
|
|
93
|
+
- **Best For**: High capacity with lower compute
|
|
94
|
+
|
|
95
|
+
### Falcon Family
|
|
96
|
+
|
|
97
|
+
**Falcon**:
|
|
98
|
+
- **Sizes**: 7B, 40B, 180B
|
|
99
|
+
- **Use Cases**: Open-source models from TII
|
|
100
|
+
- **Best For**: Multilingual applications
|
|
101
|
+
|
|
102
|
+
**Falcon 3**:
|
|
103
|
+
- **Sizes**: 1B, 3B, 7B, 10B
|
|
104
|
+
- **Use Cases**: Newer Falcon generation
|
|
105
|
+
- **Best For**: Efficient multilingual models
|
|
106
|
+
|
|
107
|
+
### Phi Family (Microsoft)
|
|
108
|
+
|
|
109
|
+
**Phi 1.5 & 2**:
|
|
110
|
+
- **Sizes**: 1.3B, 2.7B
|
|
111
|
+
- **Use Cases**: Small language models with strong performance
|
|
112
|
+
- **Best For**: Edge deployment, low-resource environments
|
|
113
|
+
|
|
114
|
+
**Phi 3 & 3.5**:
|
|
115
|
+
- **Sizes**: 3.8B
|
|
116
|
+
- **Use Cases**: Improved small models
|
|
117
|
+
- **Best For**: Mobile, browser-based applications
|
|
118
|
+
|
|
119
|
+
**Phi 4**:
|
|
120
|
+
- **Sizes**: 14B
|
|
121
|
+
- **Use Cases**: Medium-size high-performance model
|
|
122
|
+
- **Best For**: Balance of size and capability
|
|
123
|
+
|
|
124
|
+
**Phi 4 Mini Instruct**:
|
|
125
|
+
- **Sizes**: 3.8B
|
|
126
|
+
- **Use Cases**: Instruction-tuned variant
|
|
127
|
+
- **Best For**: Chat, task completion
|
|
128
|
+
|
|
129
|
+
### Gemma Family (Google)
|
|
130
|
+
|
|
131
|
+
**Gemma**:
|
|
132
|
+
- **Sizes**: 2B, 7B
|
|
133
|
+
- **Use Cases**: Google's open models
|
|
134
|
+
- **Best For**: Research, education
|
|
135
|
+
|
|
136
|
+
**Gemma 2**:
|
|
137
|
+
- **Sizes**: 2B, 9B, 27B
|
|
138
|
+
- **Use Cases**: Second generation improvements
|
|
139
|
+
- **Best For**: Enhanced performance
|
|
140
|
+
|
|
141
|
+
**Gemma 3**:
|
|
142
|
+
- **Sizes**: 1B, 4B, 12B, 27B
|
|
143
|
+
- **Use Cases**: Latest Gemma generation
|
|
144
|
+
- **Best For**: State-of-the-art open models
|
|
145
|
+
|
|
146
|
+
**CodeGemma**:
|
|
147
|
+
- **Sizes**: 7B
|
|
148
|
+
- **Use Cases**: Code-specialized Gemma
|
|
149
|
+
- **Best For**: Code generation, analysis
|
|
150
|
+
|
|
151
|
+
### Qwen Family (Alibaba)
|
|
152
|
+
|
|
153
|
+
**Qwen2.5**:
|
|
154
|
+
- **Sizes**: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B
|
|
155
|
+
- **Use Cases**: General-purpose multilingual models
|
|
156
|
+
- **Best For**: Chinese/English applications
|
|
157
|
+
|
|
158
|
+
**Qwen2.5 Coder**:
|
|
159
|
+
- **Sizes**: 0.5B, 1.5B, 3B, 7B, 14B, 32B
|
|
160
|
+
- **Use Cases**: Code-specialized variants
|
|
161
|
+
- **Best For**: Programming in multiple languages
|
|
162
|
+
|
|
163
|
+
**Qwen2.5 Math**:
|
|
164
|
+
- **Sizes**: 1.5B, 7B, 72B
|
|
165
|
+
- **Use Cases**: Mathematical reasoning
|
|
166
|
+
- **Best For**: Math problems, STEM education
|
|
167
|
+
|
|
168
|
+
**QwQ & QwQ-Preview**:
|
|
169
|
+
- **Sizes**: 32B
|
|
170
|
+
- **Use Cases**: Question-answering focus
|
|
171
|
+
- **Best For**: Reasoning tasks
|
|
172
|
+
|
|
173
|
+
### Pythia Family (EleutherAI)
|
|
174
|
+
|
|
175
|
+
**Pythia**:
|
|
176
|
+
- **Sizes**: 14M, 31M, 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B
|
|
177
|
+
- **Use Cases**: Research, interpretability
|
|
178
|
+
- **Best For**: Scientific studies, ablations
|
|
179
|
+
|
|
180
|
+
### StableLM Family (Stability AI)
|
|
181
|
+
|
|
182
|
+
**StableLM**:
|
|
183
|
+
- **Sizes**: 3B, 7B
|
|
184
|
+
- **Use Cases**: Open models from Stability AI
|
|
185
|
+
- **Best For**: Research, commercial use
|
|
186
|
+
|
|
187
|
+
**StableLM Zephyr**:
|
|
188
|
+
- **Sizes**: 3B
|
|
189
|
+
- **Use Cases**: Instruction-tuned variant
|
|
190
|
+
- **Best For**: Chat applications
|
|
191
|
+
|
|
192
|
+
**StableCode**:
|
|
193
|
+
- **Sizes**: 3B
|
|
194
|
+
- **Use Cases**: Code generation
|
|
195
|
+
- **Best For**: Programming tasks
|
|
196
|
+
|
|
197
|
+
**FreeWilly2 (Stable Beluga 2)**:
|
|
198
|
+
- **Sizes**: 70B
|
|
199
|
+
- **Use Cases**: Large Stability AI model
|
|
200
|
+
- **Best For**: High-capability tasks
|
|
201
|
+
|
|
202
|
+
### Other Models
|
|
203
|
+
|
|
204
|
+
**Danube2**:
|
|
205
|
+
- **Sizes**: 1.8B
|
|
206
|
+
- **Use Cases**: Efficient small model
|
|
207
|
+
- **Best For**: Resource-constrained environments
|
|
208
|
+
|
|
209
|
+
**Dolly**:
|
|
210
|
+
- **Sizes**: 3B, 7B, 12B
|
|
211
|
+
- **Use Cases**: Databricks' instruction-following model
|
|
212
|
+
- **Best For**: Enterprise applications
|
|
213
|
+
|
|
214
|
+
**LongChat**:
|
|
215
|
+
- **Sizes**: 7B, 13B
|
|
216
|
+
- **Use Cases**: Extended context windows
|
|
217
|
+
- **Best For**: Long-document understanding
|
|
218
|
+
|
|
219
|
+
**Nous-Hermes**:
|
|
220
|
+
- **Sizes**: 7B, 13B, 70B
|
|
221
|
+
- **Use Cases**: Instruction-following fine-tune
|
|
222
|
+
- **Best For**: Task completion, reasoning
|
|
223
|
+
|
|
224
|
+
**OLMo**:
|
|
225
|
+
- **Sizes**: 1B, 7B
|
|
226
|
+
- **Use Cases**: Allen AI's fully open model
|
|
227
|
+
- **Best For**: Research transparency
|
|
228
|
+
|
|
229
|
+
**RedPajama-INCITE**:
|
|
230
|
+
- **Sizes**: 3B, 7B
|
|
231
|
+
- **Use Cases**: Open reproduction project
|
|
232
|
+
- **Best For**: Research, education
|
|
233
|
+
|
|
234
|
+
**Salamandra**:
|
|
235
|
+
- **Sizes**: 2B, 7B
|
|
236
|
+
- **Use Cases**: Multilingual European model
|
|
237
|
+
- **Best For**: European language support
|
|
238
|
+
|
|
239
|
+
**SmolLM2**:
|
|
240
|
+
- **Sizes**: 135M, 360M, 1.7B
|
|
241
|
+
- **Use Cases**: Ultra-small models
|
|
242
|
+
- **Best For**: Edge devices, testing
|
|
243
|
+
|
|
244
|
+
## Download Examples
|
|
245
|
+
|
|
246
|
+
**Download specific model**:
|
|
247
|
+
```bash
|
|
248
|
+
litgpt download meta-llama/Llama-3.2-1B
|
|
249
|
+
litgpt download microsoft/phi-2
|
|
250
|
+
litgpt download google/gemma-2-9b
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
**Download with HuggingFace token** (for gated models):
|
|
254
|
+
```bash
|
|
255
|
+
export HF_TOKEN=hf_...
|
|
256
|
+
litgpt download meta-llama/Llama-3.1-405B
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
## Model Selection Guide
|
|
260
|
+
|
|
261
|
+
### By Use Case
|
|
262
|
+
|
|
263
|
+
**General Chat/Instruction Following**:
|
|
264
|
+
- Small: Phi-2 (2.7B), TinyLlama (1.1B)
|
|
265
|
+
- Medium: Llama-3.2-8B, Mistral-7B
|
|
266
|
+
- Large: Llama-3.1-70B, Mixtral-8x22B
|
|
267
|
+
|
|
268
|
+
**Code Generation**:
|
|
269
|
+
- Small: Qwen2.5-Coder-3B
|
|
270
|
+
- Medium: CodeLlama-13B, CodeGemma-7B
|
|
271
|
+
- Large: CodeLlama-70B, Qwen2.5-Coder-32B
|
|
272
|
+
|
|
273
|
+
**Math/Reasoning**:
|
|
274
|
+
- Small: Qwen2.5-Math-1.5B
|
|
275
|
+
- Medium: Mathstral-7B, Qwen2.5-Math-7B
|
|
276
|
+
- Large: QwQ-32B, Qwen2.5-Math-72B
|
|
277
|
+
|
|
278
|
+
**Multilingual**:
|
|
279
|
+
- Small: SmolLM2-1.7B
|
|
280
|
+
- Medium: Qwen2.5-7B, Falcon-7B
|
|
281
|
+
- Large: Qwen2.5-72B
|
|
282
|
+
|
|
283
|
+
**Research/Education**:
|
|
284
|
+
- Pythia family (14M-12B for ablations)
|
|
285
|
+
- OLMo (fully open)
|
|
286
|
+
- TinyLlama (fast iteration)
|
|
287
|
+
|
|
288
|
+
### By Hardware
|
|
289
|
+
|
|
290
|
+
**Consumer GPU (8-16GB VRAM)**:
|
|
291
|
+
- Phi-2 (2.7B)
|
|
292
|
+
- TinyLlama (1.1B)
|
|
293
|
+
- Gemma-2B
|
|
294
|
+
- SmolLM2 family
|
|
295
|
+
|
|
296
|
+
**Single A100 (40-80GB)**:
|
|
297
|
+
- Llama-3.2-8B
|
|
298
|
+
- Mistral-7B
|
|
299
|
+
- CodeLlama-13B
|
|
300
|
+
- Gemma-9B
|
|
301
|
+
|
|
302
|
+
**Multi-GPU (200GB+ total)**:
|
|
303
|
+
- Llama-3.1-70B (TP=4)
|
|
304
|
+
- Mixtral-8x22B (TP=2)
|
|
305
|
+
- Falcon-40B
|
|
306
|
+
|
|
307
|
+
**Large Cluster**:
|
|
308
|
+
- Llama-3.1-405B (FSDP)
|
|
309
|
+
- Falcon-180B
|
|
310
|
+
|
|
311
|
+
## Model Capabilities
|
|
312
|
+
|
|
313
|
+
### Context Lengths
|
|
314
|
+
|
|
315
|
+
| Model | Context Window |
|
|
316
|
+
|-------|----------------|
|
|
317
|
+
| Llama 3.1 | 128K |
|
|
318
|
+
| Llama 3.2/3.3 | 128K |
|
|
319
|
+
| Mistral-123B | 128K |
|
|
320
|
+
| Mixtral | 32K |
|
|
321
|
+
| Gemma 2 | 8K |
|
|
322
|
+
| Phi-3 | 128K |
|
|
323
|
+
| Qwen2.5 | 32K |
|
|
324
|
+
|
|
325
|
+
### Training Data
|
|
326
|
+
|
|
327
|
+
- **Llama 3**: 15T tokens (multilingual)
|
|
328
|
+
- **Mistral**: Web data, code
|
|
329
|
+
- **Qwen**: Multilingual (Chinese/English focus)
|
|
330
|
+
- **Pythia**: The Pile (controlled training)
|
|
331
|
+
|
|
332
|
+
## References
|
|
333
|
+
|
|
334
|
+
- LitGPT GitHub: https://github.com/Lightning-AI/litgpt
|
|
335
|
+
- Model configs: `litgpt/config.py`
|
|
336
|
+
- Download tutorial: `tutorials/download_model_weights.md`
|