@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,539 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: model-merging
|
|
3
|
+
description: Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Emerging Techniques, Model Merging, Mergekit, SLERP, TIES, DARE, Task Arithmetic, Model Fusion, No Retraining, Multi-Capability, Arcee AI]
|
|
8
|
+
dependencies: [mergekit, transformers, torch]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Model Merging: Combining Pre-trained Models
|
|
12
|
+
|
|
13
|
+
## When to Use This Skill
|
|
14
|
+
|
|
15
|
+
Use Model Merging when you need to:
|
|
16
|
+
- **Combine capabilities** from multiple fine-tuned models without retraining
|
|
17
|
+
- **Create specialized models** by blending domain-specific expertise (math + coding + chat)
|
|
18
|
+
- **Improve performance** beyond single models (often +5-10% on benchmarks)
|
|
19
|
+
- **Reduce training costs** - no GPUs needed, merges run on CPU
|
|
20
|
+
- **Experiment rapidly** - create new model variants in minutes, not days
|
|
21
|
+
- **Preserve multiple skills** - merge without catastrophic forgetting
|
|
22
|
+
|
|
23
|
+
**Success Stories**: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
|
|
24
|
+
|
|
25
|
+
**Tools**: mergekit (Arcee AI), LazyMergekit, Model Soup
|
|
26
|
+
|
|
27
|
+
## Installation
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
# Install mergekit
|
|
31
|
+
git clone https://github.com/arcee-ai/mergekit.git
|
|
32
|
+
cd mergekit
|
|
33
|
+
pip install -e .
|
|
34
|
+
|
|
35
|
+
# Or via pip
|
|
36
|
+
pip install mergekit
|
|
37
|
+
|
|
38
|
+
# Optional: Transformer library
|
|
39
|
+
pip install transformers torch
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Quick Start
|
|
43
|
+
|
|
44
|
+
### Simple Linear Merge
|
|
45
|
+
|
|
46
|
+
```yaml
|
|
47
|
+
# config.yml - Merge two models with equal weights
|
|
48
|
+
merge_method: linear
|
|
49
|
+
models:
|
|
50
|
+
- model: mistralai/Mistral-7B-v0.1
|
|
51
|
+
parameters:
|
|
52
|
+
weight: 0.5
|
|
53
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
54
|
+
parameters:
|
|
55
|
+
weight: 0.5
|
|
56
|
+
dtype: bfloat16
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# Run merge
|
|
61
|
+
mergekit-yaml config.yml ./merged-model --cuda
|
|
62
|
+
|
|
63
|
+
# Use merged model
|
|
64
|
+
python -m transformers.models.auto --model_name_or_path ./merged-model
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### SLERP Merge (Best for 2 Models)
|
|
68
|
+
|
|
69
|
+
```yaml
|
|
70
|
+
# config.yml - Spherical interpolation
|
|
71
|
+
merge_method: slerp
|
|
72
|
+
slices:
|
|
73
|
+
- sources:
|
|
74
|
+
- model: mistralai/Mistral-7B-v0.1
|
|
75
|
+
layer_range: [0, 32]
|
|
76
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
77
|
+
layer_range: [0, 32]
|
|
78
|
+
parameters:
|
|
79
|
+
t: 0.5 # Interpolation factor (0=model1, 1=model2)
|
|
80
|
+
dtype: bfloat16
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Core Concepts
|
|
84
|
+
|
|
85
|
+
### 1. Merge Methods
|
|
86
|
+
|
|
87
|
+
**Linear (Model Soup)**
|
|
88
|
+
- Simple weighted average of parameters
|
|
89
|
+
- Fast, works well for similar models
|
|
90
|
+
- Can merge 2+ models
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
|
|
94
|
+
# where w1 + w2 + w3 = 1
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**SLERP (Spherical Linear Interpolation)**
|
|
98
|
+
- Interpolates along sphere in weight space
|
|
99
|
+
- Preserves magnitude of weight vectors
|
|
100
|
+
- Best for merging 2 models
|
|
101
|
+
- Smoother than linear
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
# SLERP formula
|
|
105
|
+
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
|
|
106
|
+
# where θ = arccos(dot(model1, model2))
|
|
107
|
+
# t ∈ [0, 1]
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Task Arithmetic**
|
|
111
|
+
- Extract "task vectors" (fine-tuned - base)
|
|
112
|
+
- Combine task vectors, add to base
|
|
113
|
+
- Good for merging multiple specialized models
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
# Task vector
|
|
117
|
+
task_vector = finetuned_model - base_model
|
|
118
|
+
|
|
119
|
+
# Merge multiple task vectors
|
|
120
|
+
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
**TIES-Merging**
|
|
124
|
+
- Task arithmetic + sparsification
|
|
125
|
+
- Resolves sign conflicts in parameters
|
|
126
|
+
- Best for merging many task-specific models
|
|
127
|
+
|
|
128
|
+
**DARE (Drop And REscale)**
|
|
129
|
+
- Randomly drops fine-tuned parameters
|
|
130
|
+
- Rescales remaining parameters
|
|
131
|
+
- Reduces redundancy, maintains performance
|
|
132
|
+
|
|
133
|
+
### 2. Configuration Structure
|
|
134
|
+
|
|
135
|
+
```yaml
|
|
136
|
+
# Basic structure
|
|
137
|
+
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
|
|
138
|
+
base_model: <path> # Optional: base model for task arithmetic
|
|
139
|
+
|
|
140
|
+
models:
|
|
141
|
+
- model: <path/to/model1>
|
|
142
|
+
parameters:
|
|
143
|
+
weight: <float> # Merge weight
|
|
144
|
+
density: <float> # For TIES/DARE
|
|
145
|
+
|
|
146
|
+
- model: <path/to/model2>
|
|
147
|
+
parameters:
|
|
148
|
+
weight: <float>
|
|
149
|
+
|
|
150
|
+
parameters:
|
|
151
|
+
# Method-specific parameters
|
|
152
|
+
|
|
153
|
+
dtype: <dtype> # bfloat16, float16, float32
|
|
154
|
+
|
|
155
|
+
# Optional
|
|
156
|
+
slices: # Layer-wise merging
|
|
157
|
+
tokenizer: # Tokenizer configuration
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
## Merge Methods Guide
|
|
161
|
+
|
|
162
|
+
### Linear Merge
|
|
163
|
+
|
|
164
|
+
**Best for**: Simple model combinations, equal weighting
|
|
165
|
+
|
|
166
|
+
```yaml
|
|
167
|
+
merge_method: linear
|
|
168
|
+
models:
|
|
169
|
+
- model: WizardLM/WizardMath-7B-V1.1
|
|
170
|
+
parameters:
|
|
171
|
+
weight: 0.4
|
|
172
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
173
|
+
parameters:
|
|
174
|
+
weight: 0.3
|
|
175
|
+
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
|
|
176
|
+
parameters:
|
|
177
|
+
weight: 0.3
|
|
178
|
+
dtype: bfloat16
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### SLERP Merge
|
|
182
|
+
|
|
183
|
+
**Best for**: Two models, smooth interpolation
|
|
184
|
+
|
|
185
|
+
```yaml
|
|
186
|
+
merge_method: slerp
|
|
187
|
+
slices:
|
|
188
|
+
- sources:
|
|
189
|
+
- model: mistralai/Mistral-7B-v0.1
|
|
190
|
+
layer_range: [0, 32]
|
|
191
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
192
|
+
layer_range: [0, 32]
|
|
193
|
+
parameters:
|
|
194
|
+
t: 0.5 # 0.0 = first model, 1.0 = second model
|
|
195
|
+
dtype: bfloat16
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Layer-specific SLERP:**
|
|
199
|
+
|
|
200
|
+
```yaml
|
|
201
|
+
merge_method: slerp
|
|
202
|
+
slices:
|
|
203
|
+
- sources:
|
|
204
|
+
- model: model_a
|
|
205
|
+
layer_range: [0, 32]
|
|
206
|
+
- model: model_b
|
|
207
|
+
layer_range: [0, 32]
|
|
208
|
+
parameters:
|
|
209
|
+
t:
|
|
210
|
+
- filter: self_attn # Attention layers
|
|
211
|
+
value: 0.3
|
|
212
|
+
- filter: mlp # MLP layers
|
|
213
|
+
value: 0.7
|
|
214
|
+
- value: 0.5 # Default for other layers
|
|
215
|
+
dtype: bfloat16
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
### Task Arithmetic
|
|
219
|
+
|
|
220
|
+
**Best for**: Combining specialized skills
|
|
221
|
+
|
|
222
|
+
```yaml
|
|
223
|
+
merge_method: task_arithmetic
|
|
224
|
+
base_model: mistralai/Mistral-7B-v0.1
|
|
225
|
+
models:
|
|
226
|
+
- model: WizardLM/WizardMath-7B-V1.1 # Math
|
|
227
|
+
parameters:
|
|
228
|
+
weight: 0.5
|
|
229
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B # Chat
|
|
230
|
+
parameters:
|
|
231
|
+
weight: 0.3
|
|
232
|
+
- model: ajibawa-2023/Code-Mistral-7B # Code
|
|
233
|
+
parameters:
|
|
234
|
+
weight: 0.2
|
|
235
|
+
dtype: bfloat16
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
### TIES-Merging
|
|
239
|
+
|
|
240
|
+
**Best for**: Many models, resolving conflicts
|
|
241
|
+
|
|
242
|
+
```yaml
|
|
243
|
+
merge_method: ties
|
|
244
|
+
base_model: mistralai/Mistral-7B-v0.1
|
|
245
|
+
models:
|
|
246
|
+
- model: WizardLM/WizardMath-7B-V1.1
|
|
247
|
+
parameters:
|
|
248
|
+
density: 0.5 # Keep top 50% of parameters
|
|
249
|
+
weight: 1.0
|
|
250
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
251
|
+
parameters:
|
|
252
|
+
density: 0.5
|
|
253
|
+
weight: 1.0
|
|
254
|
+
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
|
|
255
|
+
parameters:
|
|
256
|
+
density: 0.5
|
|
257
|
+
weight: 1.0
|
|
258
|
+
parameters:
|
|
259
|
+
normalize: true
|
|
260
|
+
dtype: bfloat16
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### DARE Merge
|
|
264
|
+
|
|
265
|
+
**Best for**: Reducing redundancy
|
|
266
|
+
|
|
267
|
+
```yaml
|
|
268
|
+
merge_method: dare_ties
|
|
269
|
+
base_model: mistralai/Mistral-7B-v0.1
|
|
270
|
+
models:
|
|
271
|
+
- model: WizardLM/WizardMath-7B-V1.1
|
|
272
|
+
parameters:
|
|
273
|
+
density: 0.5 # Drop 50% of deltas
|
|
274
|
+
weight: 0.6
|
|
275
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
276
|
+
parameters:
|
|
277
|
+
density: 0.5
|
|
278
|
+
weight: 0.4
|
|
279
|
+
parameters:
|
|
280
|
+
int8_mask: true # Use int8 for masks (saves memory)
|
|
281
|
+
dtype: bfloat16
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
## Advanced Patterns
|
|
285
|
+
|
|
286
|
+
### Layer-wise Merging
|
|
287
|
+
|
|
288
|
+
```yaml
|
|
289
|
+
# Different models for different layers
|
|
290
|
+
merge_method: passthrough
|
|
291
|
+
slices:
|
|
292
|
+
- sources:
|
|
293
|
+
- model: mistralai/Mistral-7B-v0.1
|
|
294
|
+
layer_range: [0, 16] # First half
|
|
295
|
+
- sources:
|
|
296
|
+
- model: teknium/OpenHermes-2.5-Mistral-7B
|
|
297
|
+
layer_range: [16, 32] # Second half
|
|
298
|
+
dtype: bfloat16
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### MoE from Merged Models
|
|
302
|
+
|
|
303
|
+
```yaml
|
|
304
|
+
# Create Mixture of Experts
|
|
305
|
+
merge_method: moe
|
|
306
|
+
base_model: mistralai/Mistral-7B-v0.1
|
|
307
|
+
experts:
|
|
308
|
+
- source_model: WizardLM/WizardMath-7B-V1.1
|
|
309
|
+
positive_prompts:
|
|
310
|
+
- "math"
|
|
311
|
+
- "calculate"
|
|
312
|
+
- source_model: teknium/OpenHermes-2.5-Mistral-7B
|
|
313
|
+
positive_prompts:
|
|
314
|
+
- "chat"
|
|
315
|
+
- "conversation"
|
|
316
|
+
- source_model: ajibawa-2023/Code-Mistral-7B
|
|
317
|
+
positive_prompts:
|
|
318
|
+
- "code"
|
|
319
|
+
- "python"
|
|
320
|
+
dtype: bfloat16
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
### Tokenizer Merging
|
|
324
|
+
|
|
325
|
+
```yaml
|
|
326
|
+
merge_method: linear
|
|
327
|
+
models:
|
|
328
|
+
- model: mistralai/Mistral-7B-v0.1
|
|
329
|
+
- model: custom/specialized-model
|
|
330
|
+
|
|
331
|
+
tokenizer:
|
|
332
|
+
source: "union" # Combine vocabularies from both models
|
|
333
|
+
tokens:
|
|
334
|
+
<|special_token|>:
|
|
335
|
+
source: "custom/specialized-model"
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
## Best Practices
|
|
339
|
+
|
|
340
|
+
### 1. Model Compatibility
|
|
341
|
+
|
|
342
|
+
```python
|
|
343
|
+
# ✅ Good: Same architecture
|
|
344
|
+
models = [
|
|
345
|
+
"mistralai/Mistral-7B-v0.1",
|
|
346
|
+
"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
|
|
347
|
+
]
|
|
348
|
+
|
|
349
|
+
# ❌ Bad: Different architectures
|
|
350
|
+
models = [
|
|
351
|
+
"meta-llama/Llama-2-7b-hf", # Llama
|
|
352
|
+
"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
|
|
353
|
+
]
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
### 2. Weight Selection
|
|
357
|
+
|
|
358
|
+
```yaml
|
|
359
|
+
# ✅ Good: Weights sum to 1.0
|
|
360
|
+
models:
|
|
361
|
+
- model: model_a
|
|
362
|
+
parameters:
|
|
363
|
+
weight: 0.6
|
|
364
|
+
- model: model_b
|
|
365
|
+
parameters:
|
|
366
|
+
weight: 0.4 # 0.6 + 0.4 = 1.0
|
|
367
|
+
|
|
368
|
+
# ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
|
|
369
|
+
models:
|
|
370
|
+
- model: model_a
|
|
371
|
+
parameters:
|
|
372
|
+
weight: 0.8
|
|
373
|
+
- model: model_b
|
|
374
|
+
parameters:
|
|
375
|
+
weight: 0.8 # May boost performance
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
### 3. Method Selection
|
|
379
|
+
|
|
380
|
+
```python
|
|
381
|
+
# Choose merge method based on use case:
|
|
382
|
+
|
|
383
|
+
# 2 models, smooth blend → SLERP
|
|
384
|
+
merge_method = "slerp"
|
|
385
|
+
|
|
386
|
+
# 3+ models, simple average → Linear
|
|
387
|
+
merge_method = "linear"
|
|
388
|
+
|
|
389
|
+
# Multiple task-specific models → Task Arithmetic or TIES
|
|
390
|
+
merge_method = "ties"
|
|
391
|
+
|
|
392
|
+
# Want to reduce redundancy → DARE
|
|
393
|
+
merge_method = "dare_ties"
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
### 4. Density Tuning (TIES/DARE)
|
|
397
|
+
|
|
398
|
+
```yaml
|
|
399
|
+
# Start conservative (keep more parameters)
|
|
400
|
+
parameters:
|
|
401
|
+
density: 0.8 # Keep 80%
|
|
402
|
+
|
|
403
|
+
# If performance good, increase sparsity
|
|
404
|
+
parameters:
|
|
405
|
+
density: 0.5 # Keep 50%
|
|
406
|
+
|
|
407
|
+
# If performance degrades, reduce sparsity
|
|
408
|
+
parameters:
|
|
409
|
+
density: 0.9 # Keep 90%
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
### 5. Layer-specific Merging
|
|
413
|
+
|
|
414
|
+
```yaml
|
|
415
|
+
# Preserve base model's beginning and end
|
|
416
|
+
merge_method: passthrough
|
|
417
|
+
slices:
|
|
418
|
+
- sources:
|
|
419
|
+
- model: base_model
|
|
420
|
+
layer_range: [0, 2] # Keep first layers
|
|
421
|
+
- sources:
|
|
422
|
+
- model: merged_middle # Merge middle layers
|
|
423
|
+
layer_range: [2, 30]
|
|
424
|
+
- sources:
|
|
425
|
+
- model: base_model
|
|
426
|
+
layer_range: [30, 32] # Keep last layers
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
## Evaluation & Testing
|
|
430
|
+
|
|
431
|
+
### Benchmark Merged Models
|
|
432
|
+
|
|
433
|
+
```python
|
|
434
|
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
435
|
+
|
|
436
|
+
# Load merged model
|
|
437
|
+
model = AutoModelForCausalLM.from_pretrained("./merged-model")
|
|
438
|
+
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
|
|
439
|
+
|
|
440
|
+
# Test on various tasks
|
|
441
|
+
test_prompts = {
|
|
442
|
+
"math": "Calculate: 25 * 17 =",
|
|
443
|
+
"code": "Write a Python function to reverse a string:",
|
|
444
|
+
"chat": "What is the capital of France?",
|
|
445
|
+
}
|
|
446
|
+
|
|
447
|
+
for task, prompt in test_prompts.items():
|
|
448
|
+
inputs = tokenizer(prompt, return_tensors="pt")
|
|
449
|
+
outputs = model.generate(**inputs, max_length=100)
|
|
450
|
+
print(f"{task}: {tokenizer.decode(outputs[0])}")
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
### Common Benchmarks
|
|
454
|
+
|
|
455
|
+
- **Open LLM Leaderboard**: General capabilities
|
|
456
|
+
- **MT-Bench**: Multi-turn conversation
|
|
457
|
+
- **MMLU**: Multitask accuracy
|
|
458
|
+
- **HumanEval**: Code generation
|
|
459
|
+
- **GSM8K**: Math reasoning
|
|
460
|
+
|
|
461
|
+
## Production Deployment
|
|
462
|
+
|
|
463
|
+
### Save and Upload
|
|
464
|
+
|
|
465
|
+
```python
|
|
466
|
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
467
|
+
|
|
468
|
+
# Load merged model
|
|
469
|
+
model = AutoModelForCausalLM.from_pretrained("./merged-model")
|
|
470
|
+
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
|
|
471
|
+
|
|
472
|
+
# Upload to HuggingFace Hub
|
|
473
|
+
model.push_to_hub("username/my-merged-model")
|
|
474
|
+
tokenizer.push_to_hub("username/my-merged-model")
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
### Quantize Merged Model
|
|
478
|
+
|
|
479
|
+
```bash
|
|
480
|
+
# Quantize with GGUF
|
|
481
|
+
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
|
|
482
|
+
|
|
483
|
+
# Quantize with GPTQ
|
|
484
|
+
python quantize_gptq.py ./merged-model --bits 4 --group_size 128
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
## Common Pitfalls
|
|
488
|
+
|
|
489
|
+
### ❌ Pitfall 1: Merging Incompatible Models
|
|
490
|
+
|
|
491
|
+
```yaml
|
|
492
|
+
# Wrong: Different architectures
|
|
493
|
+
models:
|
|
494
|
+
- model: meta-llama/Llama-2-7b # Llama architecture
|
|
495
|
+
- model: mistralai/Mistral-7B # Mistral architecture
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
**Fix**: Only merge models with same architecture
|
|
499
|
+
|
|
500
|
+
### ❌ Pitfall 2: Over-weighting One Model
|
|
501
|
+
|
|
502
|
+
```yaml
|
|
503
|
+
# Suboptimal: One model dominates
|
|
504
|
+
models:
|
|
505
|
+
- model: model_a
|
|
506
|
+
parameters:
|
|
507
|
+
weight: 0.95 # Too high
|
|
508
|
+
- model: model_b
|
|
509
|
+
parameters:
|
|
510
|
+
weight: 0.05 # Too low
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
**Fix**: Use more balanced weights (0.3-0.7 range)
|
|
514
|
+
|
|
515
|
+
### ❌ Pitfall 3: Not Evaluating
|
|
516
|
+
|
|
517
|
+
```bash
|
|
518
|
+
# Wrong: Merge and deploy without testing
|
|
519
|
+
mergekit-yaml config.yml ./merged-model
|
|
520
|
+
# Deploy immediately (risky!)
|
|
521
|
+
```
|
|
522
|
+
|
|
523
|
+
**Fix**: Always benchmark before deploying
|
|
524
|
+
|
|
525
|
+
## Resources
|
|
526
|
+
|
|
527
|
+
- **mergekit GitHub**: https://github.com/arcee-ai/mergekit
|
|
528
|
+
- **HuggingFace Tutorial**: https://huggingface.co/blog/mlabonne/merge-models
|
|
529
|
+
- **LazyMergekit**: Automated merging notebook
|
|
530
|
+
- **TIES Paper**: https://arxiv.org/abs/2306.01708
|
|
531
|
+
- **DARE Paper**: https://arxiv.org/abs/2311.03099
|
|
532
|
+
|
|
533
|
+
## See Also
|
|
534
|
+
|
|
535
|
+
- `references/methods.md` - Deep dive into merge algorithms
|
|
536
|
+
- `references/examples.md` - Real-world merge configurations
|
|
537
|
+
- `references/evaluation.md` - Benchmarking and testing strategies
|
|
538
|
+
|
|
539
|
+
|