@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,539 @@
1
+ ---
2
+ name: model-merging
3
+ description: Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.
4
+ version: 1.0.0
5
+ author: Synthetic Sciences
6
+ license: MIT
7
+ tags: [Emerging Techniques, Model Merging, Mergekit, SLERP, TIES, DARE, Task Arithmetic, Model Fusion, No Retraining, Multi-Capability, Arcee AI]
8
+ dependencies: [mergekit, transformers, torch]
9
+ ---
10
+
11
+ # Model Merging: Combining Pre-trained Models
12
+
13
+ ## When to Use This Skill
14
+
15
+ Use Model Merging when you need to:
16
+ - **Combine capabilities** from multiple fine-tuned models without retraining
17
+ - **Create specialized models** by blending domain-specific expertise (math + coding + chat)
18
+ - **Improve performance** beyond single models (often +5-10% on benchmarks)
19
+ - **Reduce training costs** - no GPUs needed, merges run on CPU
20
+ - **Experiment rapidly** - create new model variants in minutes, not days
21
+ - **Preserve multiple skills** - merge without catastrophic forgetting
22
+
23
+ **Success Stories**: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
24
+
25
+ **Tools**: mergekit (Arcee AI), LazyMergekit, Model Soup
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ # Install mergekit
31
+ git clone https://github.com/arcee-ai/mergekit.git
32
+ cd mergekit
33
+ pip install -e .
34
+
35
+ # Or via pip
36
+ pip install mergekit
37
+
38
+ # Optional: Transformer library
39
+ pip install transformers torch
40
+ ```
41
+
42
+ ## Quick Start
43
+
44
+ ### Simple Linear Merge
45
+
46
+ ```yaml
47
+ # config.yml - Merge two models with equal weights
48
+ merge_method: linear
49
+ models:
50
+ - model: mistralai/Mistral-7B-v0.1
51
+ parameters:
52
+ weight: 0.5
53
+ - model: teknium/OpenHermes-2.5-Mistral-7B
54
+ parameters:
55
+ weight: 0.5
56
+ dtype: bfloat16
57
+ ```
58
+
59
+ ```bash
60
+ # Run merge
61
+ mergekit-yaml config.yml ./merged-model --cuda
62
+
63
+ # Use merged model
64
+ python -m transformers.models.auto --model_name_or_path ./merged-model
65
+ ```
66
+
67
+ ### SLERP Merge (Best for 2 Models)
68
+
69
+ ```yaml
70
+ # config.yml - Spherical interpolation
71
+ merge_method: slerp
72
+ slices:
73
+ - sources:
74
+ - model: mistralai/Mistral-7B-v0.1
75
+ layer_range: [0, 32]
76
+ - model: teknium/OpenHermes-2.5-Mistral-7B
77
+ layer_range: [0, 32]
78
+ parameters:
79
+ t: 0.5 # Interpolation factor (0=model1, 1=model2)
80
+ dtype: bfloat16
81
+ ```
82
+
83
+ ## Core Concepts
84
+
85
+ ### 1. Merge Methods
86
+
87
+ **Linear (Model Soup)**
88
+ - Simple weighted average of parameters
89
+ - Fast, works well for similar models
90
+ - Can merge 2+ models
91
+
92
+ ```python
93
+ merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
94
+ # where w1 + w2 + w3 = 1
95
+ ```
96
+
97
+ **SLERP (Spherical Linear Interpolation)**
98
+ - Interpolates along sphere in weight space
99
+ - Preserves magnitude of weight vectors
100
+ - Best for merging 2 models
101
+ - Smoother than linear
102
+
103
+ ```python
104
+ # SLERP formula
105
+ merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
106
+ # where θ = arccos(dot(model1, model2))
107
+ # t ∈ [0, 1]
108
+ ```
109
+
110
+ **Task Arithmetic**
111
+ - Extract "task vectors" (fine-tuned - base)
112
+ - Combine task vectors, add to base
113
+ - Good for merging multiple specialized models
114
+
115
+ ```python
116
+ # Task vector
117
+ task_vector = finetuned_model - base_model
118
+
119
+ # Merge multiple task vectors
120
+ merged = base_model + α₁*task_vector₁ + α₂*task_vector₂
121
+ ```
122
+
123
+ **TIES-Merging**
124
+ - Task arithmetic + sparsification
125
+ - Resolves sign conflicts in parameters
126
+ - Best for merging many task-specific models
127
+
128
+ **DARE (Drop And REscale)**
129
+ - Randomly drops fine-tuned parameters
130
+ - Rescales remaining parameters
131
+ - Reduces redundancy, maintains performance
132
+
133
+ ### 2. Configuration Structure
134
+
135
+ ```yaml
136
+ # Basic structure
137
+ merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
138
+ base_model: <path> # Optional: base model for task arithmetic
139
+
140
+ models:
141
+ - model: <path/to/model1>
142
+ parameters:
143
+ weight: <float> # Merge weight
144
+ density: <float> # For TIES/DARE
145
+
146
+ - model: <path/to/model2>
147
+ parameters:
148
+ weight: <float>
149
+
150
+ parameters:
151
+ # Method-specific parameters
152
+
153
+ dtype: <dtype> # bfloat16, float16, float32
154
+
155
+ # Optional
156
+ slices: # Layer-wise merging
157
+ tokenizer: # Tokenizer configuration
158
+ ```
159
+
160
+ ## Merge Methods Guide
161
+
162
+ ### Linear Merge
163
+
164
+ **Best for**: Simple model combinations, equal weighting
165
+
166
+ ```yaml
167
+ merge_method: linear
168
+ models:
169
+ - model: WizardLM/WizardMath-7B-V1.1
170
+ parameters:
171
+ weight: 0.4
172
+ - model: teknium/OpenHermes-2.5-Mistral-7B
173
+ parameters:
174
+ weight: 0.3
175
+ - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
176
+ parameters:
177
+ weight: 0.3
178
+ dtype: bfloat16
179
+ ```
180
+
181
+ ### SLERP Merge
182
+
183
+ **Best for**: Two models, smooth interpolation
184
+
185
+ ```yaml
186
+ merge_method: slerp
187
+ slices:
188
+ - sources:
189
+ - model: mistralai/Mistral-7B-v0.1
190
+ layer_range: [0, 32]
191
+ - model: teknium/OpenHermes-2.5-Mistral-7B
192
+ layer_range: [0, 32]
193
+ parameters:
194
+ t: 0.5 # 0.0 = first model, 1.0 = second model
195
+ dtype: bfloat16
196
+ ```
197
+
198
+ **Layer-specific SLERP:**
199
+
200
+ ```yaml
201
+ merge_method: slerp
202
+ slices:
203
+ - sources:
204
+ - model: model_a
205
+ layer_range: [0, 32]
206
+ - model: model_b
207
+ layer_range: [0, 32]
208
+ parameters:
209
+ t:
210
+ - filter: self_attn # Attention layers
211
+ value: 0.3
212
+ - filter: mlp # MLP layers
213
+ value: 0.7
214
+ - value: 0.5 # Default for other layers
215
+ dtype: bfloat16
216
+ ```
217
+
218
+ ### Task Arithmetic
219
+
220
+ **Best for**: Combining specialized skills
221
+
222
+ ```yaml
223
+ merge_method: task_arithmetic
224
+ base_model: mistralai/Mistral-7B-v0.1
225
+ models:
226
+ - model: WizardLM/WizardMath-7B-V1.1 # Math
227
+ parameters:
228
+ weight: 0.5
229
+ - model: teknium/OpenHermes-2.5-Mistral-7B # Chat
230
+ parameters:
231
+ weight: 0.3
232
+ - model: ajibawa-2023/Code-Mistral-7B # Code
233
+ parameters:
234
+ weight: 0.2
235
+ dtype: bfloat16
236
+ ```
237
+
238
+ ### TIES-Merging
239
+
240
+ **Best for**: Many models, resolving conflicts
241
+
242
+ ```yaml
243
+ merge_method: ties
244
+ base_model: mistralai/Mistral-7B-v0.1
245
+ models:
246
+ - model: WizardLM/WizardMath-7B-V1.1
247
+ parameters:
248
+ density: 0.5 # Keep top 50% of parameters
249
+ weight: 1.0
250
+ - model: teknium/OpenHermes-2.5-Mistral-7B
251
+ parameters:
252
+ density: 0.5
253
+ weight: 1.0
254
+ - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
255
+ parameters:
256
+ density: 0.5
257
+ weight: 1.0
258
+ parameters:
259
+ normalize: true
260
+ dtype: bfloat16
261
+ ```
262
+
263
+ ### DARE Merge
264
+
265
+ **Best for**: Reducing redundancy
266
+
267
+ ```yaml
268
+ merge_method: dare_ties
269
+ base_model: mistralai/Mistral-7B-v0.1
270
+ models:
271
+ - model: WizardLM/WizardMath-7B-V1.1
272
+ parameters:
273
+ density: 0.5 # Drop 50% of deltas
274
+ weight: 0.6
275
+ - model: teknium/OpenHermes-2.5-Mistral-7B
276
+ parameters:
277
+ density: 0.5
278
+ weight: 0.4
279
+ parameters:
280
+ int8_mask: true # Use int8 for masks (saves memory)
281
+ dtype: bfloat16
282
+ ```
283
+
284
+ ## Advanced Patterns
285
+
286
+ ### Layer-wise Merging
287
+
288
+ ```yaml
289
+ # Different models for different layers
290
+ merge_method: passthrough
291
+ slices:
292
+ - sources:
293
+ - model: mistralai/Mistral-7B-v0.1
294
+ layer_range: [0, 16] # First half
295
+ - sources:
296
+ - model: teknium/OpenHermes-2.5-Mistral-7B
297
+ layer_range: [16, 32] # Second half
298
+ dtype: bfloat16
299
+ ```
300
+
301
+ ### MoE from Merged Models
302
+
303
+ ```yaml
304
+ # Create Mixture of Experts
305
+ merge_method: moe
306
+ base_model: mistralai/Mistral-7B-v0.1
307
+ experts:
308
+ - source_model: WizardLM/WizardMath-7B-V1.1
309
+ positive_prompts:
310
+ - "math"
311
+ - "calculate"
312
+ - source_model: teknium/OpenHermes-2.5-Mistral-7B
313
+ positive_prompts:
314
+ - "chat"
315
+ - "conversation"
316
+ - source_model: ajibawa-2023/Code-Mistral-7B
317
+ positive_prompts:
318
+ - "code"
319
+ - "python"
320
+ dtype: bfloat16
321
+ ```
322
+
323
+ ### Tokenizer Merging
324
+
325
+ ```yaml
326
+ merge_method: linear
327
+ models:
328
+ - model: mistralai/Mistral-7B-v0.1
329
+ - model: custom/specialized-model
330
+
331
+ tokenizer:
332
+ source: "union" # Combine vocabularies from both models
333
+ tokens:
334
+ <|special_token|>:
335
+ source: "custom/specialized-model"
336
+ ```
337
+
338
+ ## Best Practices
339
+
340
+ ### 1. Model Compatibility
341
+
342
+ ```python
343
+ # ✅ Good: Same architecture
344
+ models = [
345
+ "mistralai/Mistral-7B-v0.1",
346
+ "teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
347
+ ]
348
+
349
+ # ❌ Bad: Different architectures
350
+ models = [
351
+ "meta-llama/Llama-2-7b-hf", # Llama
352
+ "mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
353
+ ]
354
+ ```
355
+
356
+ ### 2. Weight Selection
357
+
358
+ ```yaml
359
+ # ✅ Good: Weights sum to 1.0
360
+ models:
361
+ - model: model_a
362
+ parameters:
363
+ weight: 0.6
364
+ - model: model_b
365
+ parameters:
366
+ weight: 0.4 # 0.6 + 0.4 = 1.0
367
+
368
+ # ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
369
+ models:
370
+ - model: model_a
371
+ parameters:
372
+ weight: 0.8
373
+ - model: model_b
374
+ parameters:
375
+ weight: 0.8 # May boost performance
376
+ ```
377
+
378
+ ### 3. Method Selection
379
+
380
+ ```python
381
+ # Choose merge method based on use case:
382
+
383
+ # 2 models, smooth blend → SLERP
384
+ merge_method = "slerp"
385
+
386
+ # 3+ models, simple average → Linear
387
+ merge_method = "linear"
388
+
389
+ # Multiple task-specific models → Task Arithmetic or TIES
390
+ merge_method = "ties"
391
+
392
+ # Want to reduce redundancy → DARE
393
+ merge_method = "dare_ties"
394
+ ```
395
+
396
+ ### 4. Density Tuning (TIES/DARE)
397
+
398
+ ```yaml
399
+ # Start conservative (keep more parameters)
400
+ parameters:
401
+ density: 0.8 # Keep 80%
402
+
403
+ # If performance good, increase sparsity
404
+ parameters:
405
+ density: 0.5 # Keep 50%
406
+
407
+ # If performance degrades, reduce sparsity
408
+ parameters:
409
+ density: 0.9 # Keep 90%
410
+ ```
411
+
412
+ ### 5. Layer-specific Merging
413
+
414
+ ```yaml
415
+ # Preserve base model's beginning and end
416
+ merge_method: passthrough
417
+ slices:
418
+ - sources:
419
+ - model: base_model
420
+ layer_range: [0, 2] # Keep first layers
421
+ - sources:
422
+ - model: merged_middle # Merge middle layers
423
+ layer_range: [2, 30]
424
+ - sources:
425
+ - model: base_model
426
+ layer_range: [30, 32] # Keep last layers
427
+ ```
428
+
429
+ ## Evaluation & Testing
430
+
431
+ ### Benchmark Merged Models
432
+
433
+ ```python
434
+ from transformers import AutoModelForCausalLM, AutoTokenizer
435
+
436
+ # Load merged model
437
+ model = AutoModelForCausalLM.from_pretrained("./merged-model")
438
+ tokenizer = AutoTokenizer.from_pretrained("./merged-model")
439
+
440
+ # Test on various tasks
441
+ test_prompts = {
442
+ "math": "Calculate: 25 * 17 =",
443
+ "code": "Write a Python function to reverse a string:",
444
+ "chat": "What is the capital of France?",
445
+ }
446
+
447
+ for task, prompt in test_prompts.items():
448
+ inputs = tokenizer(prompt, return_tensors="pt")
449
+ outputs = model.generate(**inputs, max_length=100)
450
+ print(f"{task}: {tokenizer.decode(outputs[0])}")
451
+ ```
452
+
453
+ ### Common Benchmarks
454
+
455
+ - **Open LLM Leaderboard**: General capabilities
456
+ - **MT-Bench**: Multi-turn conversation
457
+ - **MMLU**: Multitask accuracy
458
+ - **HumanEval**: Code generation
459
+ - **GSM8K**: Math reasoning
460
+
461
+ ## Production Deployment
462
+
463
+ ### Save and Upload
464
+
465
+ ```python
466
+ from transformers import AutoModelForCausalLM, AutoTokenizer
467
+
468
+ # Load merged model
469
+ model = AutoModelForCausalLM.from_pretrained("./merged-model")
470
+ tokenizer = AutoTokenizer.from_pretrained("./merged-model")
471
+
472
+ # Upload to HuggingFace Hub
473
+ model.push_to_hub("username/my-merged-model")
474
+ tokenizer.push_to_hub("username/my-merged-model")
475
+ ```
476
+
477
+ ### Quantize Merged Model
478
+
479
+ ```bash
480
+ # Quantize with GGUF
481
+ python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
482
+
483
+ # Quantize with GPTQ
484
+ python quantize_gptq.py ./merged-model --bits 4 --group_size 128
485
+ ```
486
+
487
+ ## Common Pitfalls
488
+
489
+ ### ❌ Pitfall 1: Merging Incompatible Models
490
+
491
+ ```yaml
492
+ # Wrong: Different architectures
493
+ models:
494
+ - model: meta-llama/Llama-2-7b # Llama architecture
495
+ - model: mistralai/Mistral-7B # Mistral architecture
496
+ ```
497
+
498
+ **Fix**: Only merge models with same architecture
499
+
500
+ ### ❌ Pitfall 2: Over-weighting One Model
501
+
502
+ ```yaml
503
+ # Suboptimal: One model dominates
504
+ models:
505
+ - model: model_a
506
+ parameters:
507
+ weight: 0.95 # Too high
508
+ - model: model_b
509
+ parameters:
510
+ weight: 0.05 # Too low
511
+ ```
512
+
513
+ **Fix**: Use more balanced weights (0.3-0.7 range)
514
+
515
+ ### ❌ Pitfall 3: Not Evaluating
516
+
517
+ ```bash
518
+ # Wrong: Merge and deploy without testing
519
+ mergekit-yaml config.yml ./merged-model
520
+ # Deploy immediately (risky!)
521
+ ```
522
+
523
+ **Fix**: Always benchmark before deploying
524
+
525
+ ## Resources
526
+
527
+ - **mergekit GitHub**: https://github.com/arcee-ai/mergekit
528
+ - **HuggingFace Tutorial**: https://huggingface.co/blog/mlabonne/merge-models
529
+ - **LazyMergekit**: Automated merging notebook
530
+ - **TIES Paper**: https://arxiv.org/abs/2306.01708
531
+ - **DARE Paper**: https://arxiv.org/abs/2311.03099
532
+
533
+ ## See Also
534
+
535
+ - `references/methods.md` - Deep dive into merge algorithms
536
+ - `references/examples.md` - Real-world merge configurations
537
+ - `references/evaluation.md` - Benchmarking and testing strategies
538
+
539
+