@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,404 @@
1
+ # Parallelism Strategies Guide
2
+
3
+ Complete guide to parallelism in Megatron-Core: when to use each strategy, performance characteristics, and configuration examples.
4
+
5
+ ## Parallelism Types
6
+
7
+ ### 1. Data Parallel (DP)
8
+
9
+ **How it works**: Replicate full model on each GPU, split data batches, synchronize gradients.
10
+
11
+ **Memory**: Low efficiency - full model replication
12
+ **Communication**: Low - only gradients (all-reduce)
13
+ **Scalability**: Good for small models
14
+
15
+ **When to use**:
16
+ - Models <1B parameters that fit on single GPU
17
+ - Plenty of GPUs available
18
+ - Limited model complexity
19
+
20
+ **Configuration**:
21
+ ```bash
22
+ # 8 GPUs, no model parallelism
23
+ torchrun --nproc_per_node=8 train.py \
24
+ --tensor-model-parallel-size 1 \
25
+ --pipeline-model-parallel-size 1
26
+ # Implicit DP = 8
27
+ ```
28
+
29
+ **Performance**:
30
+ - Near-linear scaling for small models
31
+ - 7B model on 8×A100: ~90% efficiency
32
+
33
+ ### 2. Tensor Parallel (TP)
34
+
35
+ **How it works**: Split individual layers/tensors across GPUs (column/row partitioning of weight matrices).
36
+
37
+ **Memory**: Excellent - 1/N reduction per GPU
38
+ **Communication**: Very high - all-reduce after every layer
39
+ **Scalability**: Best ≤8 GPUs within single node (needs NVLink)
40
+
41
+ **When to use**:
42
+ - Models >10B parameters
43
+ - Have NVLink-connected GPUs
44
+ - Within single node (network latency kills performance across nodes)
45
+
46
+ **Configuration**:
47
+ ```bash
48
+ # Split model across 4 GPUs with TP
49
+ torchrun --nproc_per_node=4 train.py \
50
+ --tensor-model-parallel-size 4
51
+ ```
52
+
53
+ **Performance**:
54
+ - **1 node (8 GPUs, NVLink)**: 85-95% efficiency
55
+ - **Across nodes**: <50% efficiency (avoid)
56
+
57
+ **Memory savings**:
58
+ ```
59
+ LLaMA 70B without TP: 140GB (won't fit on 80GB GPU)
60
+ LLaMA 70B with TP=4: 35GB per GPU (fits easily)
61
+ ```
62
+
63
+ **Communication volume** (70B model):
64
+ - Per layer: ~20GB all-reduce
65
+ - 80 layers × 20GB = 1.6TB total traffic
66
+ - With NVLink (600GB/s): Manageable
67
+ - With Ethernet (100Gb/s = 12.5GB/s): Too slow
68
+
69
+ ### 3. Pipeline Parallel (PP)
70
+
71
+ **How it works**: Divide model layers into stages, assign stages to different GPUs, process microbatches in pipeline.
72
+
73
+ **Memory**: Very high - divide layers evenly
74
+ **Communication**: Low-medium - only activations between stages
75
+ **Scalability**: Good across nodes
76
+
77
+ **Pipeline Schedules**:
78
+
79
+ **GPipe** (simple but inefficient):
80
+ ```
81
+ GPU0: F F F F ........ B B B B
82
+ GPU1: .... F F F F .... B B B B
83
+ GPU2: ........ F F F F B B B B
84
+ ```
85
+ Bubble: 50% idle time
86
+
87
+ **1F1B** (one-forward-one-backward):
88
+ ```
89
+ GPU0: F F F F B B B B B B B B
90
+ GPU1: .. F F F F B B B B B B B B
91
+ GPU2: .... F F F F B B B B B B B B
92
+ ```
93
+ Bubble: ~25% idle time
94
+
95
+ **Interleaved 1F1B** (best):
96
+ ```
97
+ GPU0: F1 F2 F3 F4 B1 B2 B3 B4 ...
98
+ GPU1: F1 F2 F3 F4 B1 B2 B3 B4 ...
99
+ ```
100
+ Bubble: 5-10% idle time
101
+
102
+ **When to use**:
103
+ - Models >70B parameters
104
+ - Multi-node training
105
+ - Limited intra-node bandwidth
106
+
107
+ **Configuration**:
108
+ ```bash
109
+ # 4-stage pipeline
110
+ torchrun --nproc_per_node=8 --nnodes=4 train.py \
111
+ --pipeline-model-parallel-size 4 \
112
+ --num-layers 80 \
113
+ --num-layers-per-virtual-pipeline-stage 2 # Interleaved
114
+ ```
115
+
116
+ **Performance**:
117
+ - Interleaved schedule: 90-95% efficiency
118
+ - Standard 1F1B: 75-85% efficiency
119
+
120
+ ### 4. Sequence Parallel (SP)
121
+
122
+ **How it works**: Split sequence dimension across tensor-parallel GPUs, reduce activation memory.
123
+
124
+ **Memory**: Reduces activations by TP factor
125
+ **Communication**: Same as TP (already using all-reduce)
126
+ **Scalability**: Tied to TP
127
+
128
+ **When to use**:
129
+ - Long sequences (>4K tokens)
130
+ - Using TP already
131
+ - Activation memory is bottleneck
132
+
133
+ **Configuration**:
134
+ ```bash
135
+ torchrun --nproc_per_node=8 train.py \
136
+ --tensor-model-parallel-size 4 \
137
+ --sequence-parallel # Requires TP > 1
138
+ ```
139
+
140
+ **Memory savings**:
141
+ ```
142
+ 70B model, 4K sequence, TP=4:
143
+ Without SP: 48GB activations per GPU
144
+ With SP: 12GB activations per GPU
145
+ Savings: 75%
146
+ ```
147
+
148
+ ### 5. Context Parallel (CP)
149
+
150
+ **How it works**: Split very long sequences across GPUs using Ring Attention.
151
+
152
+ **Memory**: Reduces KV cache and activations
153
+ **Communication**: Medium - ring communication pattern
154
+ **Scalability**: Good for >8K sequences
155
+
156
+ **When to use**:
157
+ - Sequences >8K tokens
158
+ - Long-context models (>32K)
159
+ - KV cache memory bottleneck
160
+
161
+ **Configuration**:
162
+ ```bash
163
+ torchrun --nproc_per_node=8 train.py \
164
+ --context-parallel-size 2 \
165
+ --seq-length 32768 # 32K tokens
166
+ ```
167
+
168
+ **Memory savings** (32K sequence):
169
+ ```
170
+ Without CP: 64GB KV cache
171
+ With CP=4: 16GB KV cache per GPU
172
+ ```
173
+
174
+ ### 6. Expert Parallel (EP)
175
+
176
+ **How it works**: For MoE models, distribute different experts across GPUs.
177
+
178
+ **Memory**: Excellent - only store 1/N experts per GPU
179
+ **Communication**: Low - only route tokens to experts
180
+ **Scalability**: Matches number of experts
181
+
182
+ **When to use**:
183
+ - Mixture of Experts models
184
+ - Want model capacity without memory cost
185
+ - Have ≥8 GPUs
186
+
187
+ **Configuration**:
188
+ ```bash
189
+ # Mixtral 8x7B: 8 experts
190
+ torchrun --nproc_per_node=8 train.py \
191
+ --expert-model-parallel-size 4 \
192
+ --num-experts 8 \
193
+ --tensor-model-parallel-size 2
194
+ ```
195
+
196
+ **Memory** (Mixtral 8×7B):
197
+ ```
198
+ Without EP: 8 experts × 7B = 56GB
199
+ With EP=4: 2 experts × 7B = 14GB
200
+ Savings: 75%
201
+ ```
202
+
203
+ ## Combining Parallelism Strategies
204
+
205
+ ### 3D Parallelism (TP + PP + DP)
206
+
207
+ Standard for large models.
208
+
209
+ **LLaMA 3 70B on 64 GPUs**:
210
+ ```bash
211
+ TP=4 # Within each node
212
+ PP=4 # Across nodes
213
+ DP=4 # Remaining dimension
214
+ Total = 4 × 4 × 4 = 64 GPUs
215
+ ```
216
+
217
+ **Memory per GPU**: 70B / 4 (TP) / 4 (PP) = 4.4B params ≈ 20GB
218
+
219
+ **Configuration**:
220
+ ```bash
221
+ torchrun --nproc_per_node=8 --nnodes=8 train.py \
222
+ --tensor-model-parallel-size 4 \
223
+ --pipeline-model-parallel-size 4
224
+ # DP is implicit: 64 / (4*4) = 4
225
+ ```
226
+
227
+ ### 4D Parallelism (TP + PP + DP + CP)
228
+
229
+ For very large models or long context.
230
+
231
+ **LLaMA 3 405B on 256 GPUs**:
232
+ ```bash
233
+ TP=8 # Max NVLink
234
+ PP=8 # Across nodes
235
+ CP=2 # Long sequences
236
+ DP=2 # Remaining
237
+ Total = 8 × 8 × 2 × 2 = 256 GPUs
238
+ ```
239
+
240
+ **Configuration**:
241
+ ```bash
242
+ torchrun --nproc_per_node=8 --nnodes=32 train.py \
243
+ --tensor-model-parallel-size 8 \
244
+ --pipeline-model-parallel-size 8 \
245
+ --context-parallel-size 2
246
+ ```
247
+
248
+ ### 4D + EP (5D Parallelism)
249
+
250
+ For sparse MoE models.
251
+
252
+ **DeepSeek-V3 671B (37B active) on 1024 GPUs**:
253
+ ```bash
254
+ TP=2 # Limited by active params
255
+ PP=16 # Many stages
256
+ EP=64 # 256 experts / 4 experts per GPU
257
+ DP=2 # Small data parallel
258
+ Total = 2 × 16 × 64 × 2 = 4096 (uses 1024 in practice)
259
+ ```
260
+
261
+ ## Decision Guide
262
+
263
+ ### By Model Size
264
+
265
+ | Model Size | GPUs | Recommended Strategy |
266
+ |------------|------|---------------------|
267
+ | <1B | 1-8 | DP only |
268
+ | 1-10B | 8-16 | TP=2-4 + DP |
269
+ | 10-70B | 16-64 | TP=4 + PP=2-4 + DP |
270
+ | 70-175B | 64-256 | TP=8 + PP=4-8 + DP |
271
+ | 175-500B | 256-1024 | TP=8 + PP=8-16 + CP=2 + DP |
272
+ | 500B+ | 1024+ | 4D or 5D (with EP) |
273
+
274
+ ### By Hardware Topology
275
+
276
+ **Single node (8 GPUs with NVLink)**:
277
+ ```bash
278
+ # Up to 70B
279
+ TP=8 # Use all NVLink bandwidth
280
+ ```
281
+
282
+ **Multiple nodes (InfiniBand)**:
283
+ ```bash
284
+ # Minimize cross-node communication
285
+ TP=8 # Within node only
286
+ PP=N # Across nodes
287
+ DP=remaining
288
+ ```
289
+
290
+ **Limited network (Ethernet)**:
291
+ ```bash
292
+ # Avoid TP across nodes
293
+ TP=1-4 # Within node
294
+ PP=many # PP has low communication
295
+ ```
296
+
297
+ ### By Sequence Length
298
+
299
+ | Sequence | Parallelism |
300
+ |----------|------------|
301
+ | <2K | Standard (TP + PP + DP) |
302
+ | 2K-8K | + SP (sequence parallel) |
303
+ | 8K-32K | + CP=2 (context parallel) |
304
+ | 32K+ | + CP=4-8 |
305
+
306
+ ## Performance Characteristics
307
+
308
+ ### Communication Volume (per iteration)
309
+
310
+ **Data Parallel**: O(model_size) - all-reduce gradients
311
+ **Tensor Parallel**: O(model_size × layers) - all-reduce per layer
312
+ **Pipeline Parallel**: O(batch × hidden × layers/stages) - activations only
313
+ **Context Parallel**: O(sequence × hidden) - ring communication
314
+
315
+ ### Memory Breakdown (70B model example)
316
+
317
+ Without parallelism:
318
+ ```
319
+ Model parameters: 140GB (FP16)
320
+ Gradients: 140GB
321
+ Optimizer states: 280GB (Adam)
322
+ Activations: 48GB (batch=1, seq=4K)
323
+ Total: 608GB (won't fit!)
324
+ ```
325
+
326
+ With TP=4, PP=4, DP=4 (64 GPUs):
327
+ ```
328
+ Parameters: 140GB / 4 / 4 = 8.75GB per GPU
329
+ Gradients: 8.75GB per GPU
330
+ Optimizer: 17.5GB per GPU
331
+ Activations: 48GB / 4 / 4 = 3GB per GPU
332
+ Total: ~38GB per GPU (fits on A100 80GB)
333
+ ```
334
+
335
+ ## Best Practices
336
+
337
+ 1. **Start with TP within single node**
338
+ ```bash
339
+ --tensor-model-parallel-size 8 # Use all NVLink
340
+ ```
341
+
342
+ 2. **Add PP for cross-node scaling**
343
+ ```bash
344
+ --pipeline-model-parallel-size 4
345
+ --num-layers-per-virtual-pipeline-stage 2 # Interleaved
346
+ ```
347
+
348
+ 3. **Enable SP when using TP**
349
+ ```bash
350
+ --sequence-parallel # Free activation savings
351
+ ```
352
+
353
+ 4. **Use CP for long sequences**
354
+ ```bash
355
+ --context-parallel-size 2 # If seq_len > 8K
356
+ ```
357
+
358
+ 5. **Avoid TP across nodes** (network latency kills performance)
359
+
360
+ 6. **Match TP to GPU topology** (TP=8 for 8-GPU nodes)
361
+
362
+ 7. **Profile first iteration** to check memory and communication:
363
+ ```bash
364
+ --profile # Enable profiling
365
+ --profile-ranks 0 # Profile first rank only
366
+ ```
367
+
368
+ ## Troubleshooting
369
+
370
+ **High communication overhead (low MFU)**:
371
+ - Reduce TP degree (especially across nodes)
372
+ - Increase PP degree instead
373
+ - Enable interleaved pipeline schedule
374
+
375
+ **Out of memory**:
376
+ - Increase TP/PP (split model more)
377
+ - Enable gradient checkpointing:
378
+ ```bash
379
+ --recompute-granularity full
380
+ --recompute-method block
381
+ ```
382
+ - Reduce micro-batch size
383
+
384
+ **Pipeline bubbles (low GPU util)**:
385
+ - Use interleaved schedule:
386
+ ```bash
387
+ --num-layers-per-virtual-pipeline-stage 2
388
+ ```
389
+ - Increase number of microbatches:
390
+ ```bash
391
+ --global-batch-size 1024
392
+ --micro-batch-size 1 # More microbatches = smaller bubbles
393
+ ```
394
+
395
+ **Load imbalance in MoE**:
396
+ - Tune load balancing:
397
+ ```bash
398
+ --moe-router-load-balancing-type aux_loss
399
+ --moe-aux-loss-coeff 0.01
400
+ ```
401
+ - Increase expert parallel degree:
402
+ ```bash
403
+ --expert-model-parallel-size 8 # More experts per GPU
404
+ ```