@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,547 @@
1
+ # Training Recipes
2
+
3
+ Complete hyperparameter configurations and training recipes for GPT, LLaMA, and Mixtral models.
4
+
5
+ ## GPT-3 Training Recipes
6
+
7
+ ### GPT-3 15B Configuration
8
+
9
+ **Model Architecture**:
10
+ ```yaml
11
+ num-layers: 32
12
+ hidden-size: 6144
13
+ num-attention-heads: 48
14
+ ffn-hidden-size: 24576 # 4 × hidden-size
15
+ seq-length: 4096
16
+ max-position-embeddings: 4096
17
+ position-embedding-type: rope
18
+ squared-relu: true
19
+ group-query-attention: true
20
+ num-query-groups: 8
21
+ ```
22
+
23
+ **Training Hyperparameters**:
24
+ ```yaml
25
+ # Batch Configuration
26
+ micro-batch-size: 4
27
+ global-batch-size: 1152
28
+ rampup-batch-size: [384, 384, 97656250] # start, increment, total samples
29
+
30
+ # Learning Rate Schedule
31
+ lr: 4.5e-4
32
+ min-lr: 4.5e-5
33
+ lr-decay-style: cosine
34
+ lr-decay-samples: 1949218748
35
+ lr-warmup-samples: 3906252 # ~2B tokens with seq_len=4096
36
+
37
+ # Optimizer
38
+ optimizer: adam
39
+ adam-beta1: 0.9
40
+ adam-beta2: 0.95
41
+ weight-decay: 0.1
42
+ clip-grad: 1.0
43
+
44
+ # Precision
45
+ bf16: true
46
+
47
+ # Parallelism
48
+ tensor-model-parallel-size: 8
49
+ pipeline-model-parallel-size: 1
50
+ sequence-parallel: true
51
+ use-distributed-optimizer: true
52
+ overlap-grad-reduce: true
53
+ overlap-param-gather: true
54
+ ```
55
+
56
+ **Command**:
57
+ ```bash
58
+ torchrun --nproc_per_node=8 --nnodes=4 pretrain_gpt.py \
59
+ --num-layers 32 \
60
+ --hidden-size 6144 \
61
+ --num-attention-heads 48 \
62
+ --ffn-hidden-size 24576 \
63
+ --seq-length 4096 \
64
+ --max-position-embeddings 4096 \
65
+ --micro-batch-size 4 \
66
+ --global-batch-size 1152 \
67
+ --lr 4.5e-4 \
68
+ --min-lr 4.5e-5 \
69
+ --lr-decay-style cosine \
70
+ --lr-warmup-samples 3906252 \
71
+ --train-samples 1953125000 \
72
+ --adam-beta1 0.9 \
73
+ --adam-beta2 0.95 \
74
+ --weight-decay 0.1 \
75
+ --clip-grad 1.0 \
76
+ --bf16 \
77
+ --tensor-model-parallel-size 8 \
78
+ --pipeline-model-parallel-size 1 \
79
+ --sequence-parallel \
80
+ --use-distributed-optimizer \
81
+ --overlap-grad-reduce \
82
+ --overlap-param-gather \
83
+ --data-path /path/to/data \
84
+ --vocab-file /path/to/vocab.json \
85
+ --merge-file /path/to/merges.txt \
86
+ --save /checkpoints/gpt3-15b \
87
+ --load /checkpoints/gpt3-15b \
88
+ --save-interval 1000 \
89
+ --eval-interval 100
90
+ ```
91
+
92
+ ### GPT-3 175B Configuration
93
+
94
+ **Model Architecture**:
95
+ ```yaml
96
+ num-layers: 96
97
+ hidden-size: 12288
98
+ num-attention-heads: 96
99
+ ffn-hidden-size: 49152
100
+ seq-length: 2048
101
+ max-position-embeddings: 2048
102
+ ```
103
+
104
+ **Training Hyperparameters**:
105
+ ```yaml
106
+ micro-batch-size: 1
107
+ global-batch-size: 1536
108
+ lr: 6e-5
109
+ min-lr: 6e-6
110
+ lr-decay-style: cosine
111
+ lr-warmup-steps: 2000
112
+ train-iters: 150000
113
+ adam-beta1: 0.9
114
+ adam-beta2: 0.95
115
+ weight-decay: 0.1
116
+ clip-grad: 1.0
117
+ bf16: true
118
+
119
+ # Parallelism for 512 GPUs
120
+ tensor-model-parallel-size: 4
121
+ pipeline-model-parallel-size: 8
122
+ # Data parallel: 512 / (4 * 8) = 16
123
+ ```
124
+
125
+ ## LLaMA Training Recipes
126
+
127
+ ### LLaMA-3 8B
128
+
129
+ **Model Architecture**:
130
+ ```yaml
131
+ num-layers: 32
132
+ hidden-size: 4096
133
+ num-attention-heads: 32
134
+ num-query-groups: 8 # GQA
135
+ ffn-hidden-size: 14336
136
+ seq-length: 8192
137
+ max-position-embeddings: 8192
138
+ position-embedding-type: rope
139
+ rope-theta: 500000
140
+ normalization: RMSNorm
141
+ swiglu: true
142
+ untie-embeddings-and-output-weights: true
143
+ ```
144
+
145
+ **Training Hyperparameters**:
146
+ ```yaml
147
+ micro-batch-size: 4
148
+ global-batch-size: 128
149
+ lr: 3e-4
150
+ min-lr: 3e-5
151
+ lr-decay-style: cosine
152
+ lr-warmup-iters: 2000
153
+ train-iters: 100000
154
+ adam-beta1: 0.9
155
+ adam-beta2: 0.95
156
+ weight-decay: 0.1
157
+ clip-grad: 1.0
158
+ bf16: true
159
+
160
+ # Parallelism for 8 GPUs
161
+ tensor-model-parallel-size: 1
162
+ pipeline-model-parallel-size: 1
163
+ context-parallel-size: 2 # For 8K sequences
164
+ ```
165
+
166
+ **FP8 Training** (H100):
167
+ ```bash
168
+ ./examples/llama/train_llama3_8b_fp8.sh
169
+ ```
170
+
171
+ Contents:
172
+ ```bash
173
+ #!/bin/bash
174
+ torchrun --nproc_per_node=8 pretrain_gpt.py \
175
+ --num-layers 32 \
176
+ --hidden-size 4096 \
177
+ --num-attention-heads 32 \
178
+ --num-query-groups 8 \
179
+ --ffn-hidden-size 14336 \
180
+ --seq-length 8192 \
181
+ --max-position-embeddings 8192 \
182
+ --micro-batch-size 2 \
183
+ --global-batch-size 128 \
184
+ --lr 3e-4 \
185
+ --train-iters 100000 \
186
+ --lr-decay-style cosine \
187
+ --lr-warmup-iters 2000 \
188
+ --weight-decay 0.1 \
189
+ --clip-grad 1.0 \
190
+ --fp8-hybrid \
191
+ --fp8-amax-history-len 1024 \
192
+ --fp8-amax-compute-algo max \
193
+ --apply-query-key-layer-scaling \
194
+ --attention-softmax-in-fp32 \
195
+ --tensor-model-parallel-size 1 \
196
+ --pipeline-model-parallel-size 1 \
197
+ --context-parallel-size 2 \
198
+ --sequence-parallel \
199
+ --use-mcore-models \
200
+ --transformer-impl transformer_engine \
201
+ --data-path /data/llama_train \
202
+ --vocab-file /data/tokenizer.model \
203
+ --save-interval 1000
204
+ ```
205
+
206
+ ### LLaMA-3 70B
207
+
208
+ **Model Architecture**:
209
+ ```yaml
210
+ num-layers: 80
211
+ hidden-size: 8192
212
+ num-attention-heads: 64
213
+ num-query-groups: 8
214
+ ffn-hidden-size: 28672
215
+ seq-length: 4096
216
+ max-position-embeddings: 4096
217
+ position-embedding-type: rope
218
+ rope-theta: 500000
219
+ normalization: RMSNorm
220
+ swiglu: true
221
+ ```
222
+
223
+ **Training Hyperparameters**:
224
+ ```yaml
225
+ micro-batch-size: 1
226
+ global-batch-size: 1024
227
+ lr: 1.5e-4
228
+ min-lr: 1.5e-5
229
+ lr-decay-style: cosine
230
+ lr-warmup-iters: 2000
231
+ adam-beta1: 0.9
232
+ adam-beta2: 0.95
233
+ weight-decay: 0.1
234
+ clip-grad: 1.0
235
+ bf16: true
236
+
237
+ # Parallelism for 64 GPUs
238
+ tensor-model-parallel-size: 4
239
+ pipeline-model-parallel-size: 4
240
+ context-parallel-size: 2
241
+ # Data parallel: 64 / (4 * 4 * 2) = 2
242
+ ```
243
+
244
+ ### LLaMA-3.1 405B
245
+
246
+ **Model Architecture**:
247
+ ```yaml
248
+ num-layers: 126
249
+ hidden-size: 16384
250
+ num-attention-heads: 128
251
+ num-query-groups: 8
252
+ ffn-hidden-size: 53248
253
+ seq-length: 4096
254
+ max-position-embeddings: 131072 # Supports up to 128K
255
+ position-embedding-type: rope
256
+ rope-theta: 500000
257
+ ```
258
+
259
+ **Training Hyperparameters**:
260
+ ```yaml
261
+ micro-batch-size: 1
262
+ global-batch-size: 2048
263
+ lr: 8e-5
264
+ min-lr: 8e-6
265
+ lr-decay-style: cosine
266
+ lr-warmup-iters: 8000
267
+ train-samples: 15000000000000 # 15T tokens
268
+ adam-beta1: 0.9
269
+ adam-beta2: 0.95
270
+ weight-decay: 0.1
271
+ clip-grad: 1.0
272
+ bf16: true
273
+
274
+ # Parallelism for 1024 GPUs
275
+ tensor-model-parallel-size: 8
276
+ pipeline-model-parallel-size: 8
277
+ context-parallel-size: 2
278
+ # Data parallel: 1024 / (8 * 8 * 2) = 8
279
+ ```
280
+
281
+ **Production Configuration** (Meta):
282
+ ```bash
283
+ torchrun --nproc_per_node=8 --nnodes=128 pretrain_gpt.py \
284
+ --num-layers 126 \
285
+ --hidden-size 16384 \
286
+ --num-attention-heads 128 \
287
+ --num-query-groups 8 \
288
+ --ffn-hidden-size 53248 \
289
+ --seq-length 4096 \
290
+ --max-position-embeddings 131072 \
291
+ --micro-batch-size 1 \
292
+ --global-batch-size 2048 \
293
+ --lr 8e-5 \
294
+ --min-lr 8e-6 \
295
+ --lr-decay-style cosine \
296
+ --lr-warmup-iters 8000 \
297
+ --train-samples 3662109375 \
298
+ --adam-beta1 0.9 \
299
+ --adam-beta2 0.95 \
300
+ --weight-decay 0.1 \
301
+ --clip-grad 1.0 \
302
+ --bf16 \
303
+ --tensor-model-parallel-size 8 \
304
+ --pipeline-model-parallel-size 8 \
305
+ --context-parallel-size 2 \
306
+ --sequence-parallel \
307
+ --use-distributed-optimizer \
308
+ --overlap-grad-reduce \
309
+ --overlap-param-gather \
310
+ --use-flash-attn-v2 \
311
+ --position-embedding-type rope \
312
+ --normalization RMSNorm \
313
+ --swiglu \
314
+ --untie-embeddings-and-output-weights \
315
+ --use-mcore-models \
316
+ --transformer-impl transformer_engine \
317
+ --data-path /data/llama3_pretraining \
318
+ --vocab-file /data/llama3_tokenizer.model \
319
+ --save /checkpoints/llama3-405b \
320
+ --save-interval 500 \
321
+ --eval-interval 100
322
+ ```
323
+
324
+ ## Mixtral Training Recipes
325
+
326
+ ### Mixtral 8×7B (56B Total, 13B Active)
327
+
328
+ **Model Architecture**:
329
+ ```yaml
330
+ num-layers: 32
331
+ hidden-size: 4096
332
+ num-attention-heads: 32
333
+ num-query-groups: 8
334
+ ffn-hidden-size: 14336
335
+ seq-length: 4096
336
+ max-position-embeddings: 32768 # Sliding window
337
+ position-embedding-type: rope
338
+ normalization: RMSNorm
339
+ swiglu: true
340
+
341
+ # MoE Configuration
342
+ num-experts: 8
343
+ moe-router-topk: 2 # Activate 2 experts per token
344
+ moe-router-load-balancing-type: aux_loss
345
+ moe-aux-loss-coeff: 0.01
346
+ ```
347
+
348
+ **Training Hyperparameters**:
349
+ ```yaml
350
+ micro-batch-size: 2
351
+ global-batch-size: 512
352
+ lr: 1e-4
353
+ min-lr: 1e-5
354
+ lr-decay-style: cosine
355
+ lr-warmup-iters: 2000
356
+ adam-beta1: 0.9
357
+ adam-beta2: 0.95
358
+ weight-decay: 0.1
359
+ clip-grad: 1.0
360
+ bf16: true
361
+
362
+ # Parallelism for 64 GPUs
363
+ tensor-model-parallel-size: 1
364
+ pipeline-model-parallel-size: 4
365
+ expert-model-parallel-size: 8
366
+ context-parallel-size: 1
367
+ # Data parallel: 64 / (1 * 4 * 8 * 1) = 2
368
+ ```
369
+
370
+ **Training Command**:
371
+ ```bash
372
+ torchrun --nproc_per_node=8 --nnodes=8 pretrain_gpt.py \
373
+ --num-layers 32 \
374
+ --hidden-size 4096 \
375
+ --num-attention-heads 32 \
376
+ --num-query-groups 8 \
377
+ --ffn-hidden-size 14336 \
378
+ --seq-length 4096 \
379
+ --max-position-embeddings 32768 \
380
+ --micro-batch-size 2 \
381
+ --global-batch-size 512 \
382
+ --lr 1e-4 \
383
+ --min-lr 1e-5 \
384
+ --lr-decay-style cosine \
385
+ --lr-warmup-iters 2000 \
386
+ --train-iters 100000 \
387
+ --adam-beta1 0.9 \
388
+ --adam-beta2 0.95 \
389
+ --weight-decay 0.1 \
390
+ --clip-grad 1.0 \
391
+ --bf16 \
392
+ --tensor-model-parallel-size 1 \
393
+ --pipeline-model-parallel-size 4 \
394
+ --expert-model-parallel-size 8 \
395
+ --num-experts 8 \
396
+ --moe-router-topk 2 \
397
+ --moe-router-load-balancing-type aux_loss \
398
+ --moe-aux-loss-coeff 0.01 \
399
+ --position-embedding-type rope \
400
+ --normalization RMSNorm \
401
+ --swiglu \
402
+ --use-mcore-models \
403
+ --transformer-impl transformer_engine \
404
+ --data-path /data/mixtral_train \
405
+ --vocab-file /data/mixtral_tokenizer.model \
406
+ --save /checkpoints/mixtral-8x7b \
407
+ --save-interval 1000
408
+ ```
409
+
410
+ ### Mixtral 8×22B (176B Total, 39B Active)
411
+
412
+ **Model Architecture**:
413
+ ```yaml
414
+ num-layers: 56
415
+ hidden-size: 6144
416
+ num-attention-heads: 48
417
+ num-query-groups: 8
418
+ ffn-hidden-size: 16384
419
+ seq-length: 4096
420
+ max-position-embeddings: 65536
421
+
422
+ # MoE Configuration
423
+ num-experts: 8
424
+ moe-router-topk: 2
425
+ ```
426
+
427
+ **Training Hyperparameters**:
428
+ ```yaml
429
+ micro-batch-size: 1
430
+ global-batch-size: 1024
431
+ lr: 7e-5
432
+ min-lr: 7e-6
433
+ lr-decay-style: cosine
434
+ adam-beta1: 0.9
435
+ adam-beta2: 0.95
436
+ weight-decay: 0.1
437
+ clip-grad: 1.0
438
+ bf16: true
439
+
440
+ # Parallelism for 256 GPUs
441
+ tensor-model-parallel-size: 4
442
+ pipeline-model-parallel-size: 4
443
+ expert-model-parallel-size: 8
444
+ # Data parallel: 256 / (4 * 4 * 8) = 2
445
+ ```
446
+
447
+ ## DeepSeek-V3 (671B Total, 37B Active)
448
+
449
+ **Model Architecture**:
450
+ ```yaml
451
+ num-layers: 61
452
+ hidden-size: 7168
453
+ num-attention-heads: 128
454
+ num-query-groups: 16
455
+ ffn-hidden-size: 18432
456
+
457
+ # MoE Configuration
458
+ num-experts: 256
459
+ moe-router-topk: 8 # Multi-head latent attention
460
+ shared-expert-intermediate-size: 18432
461
+ ```
462
+
463
+ **Training Hyperparameters**:
464
+ ```yaml
465
+ micro-batch-size: 1
466
+ global-batch-size: 4096
467
+ lr: 2.7e-4
468
+ min-lr: 2.7e-5
469
+ lr-decay-style: cosine
470
+ lr-warmup-tokens: 5B
471
+ train-tokens: 14.8T
472
+ adam-beta1: 0.9
473
+ adam-beta2: 0.95
474
+ weight-decay: 0.1
475
+ clip-grad: 1.0
476
+ bf16: true
477
+
478
+ # Parallelism for 1024 GPUs
479
+ tensor-model-parallel-size: 2
480
+ pipeline-model-parallel-size: 16
481
+ expert-model-parallel-size: 64
482
+ # Data parallel: 1024 / (2 * 16 * 64) = 0.5 (overlapping)
483
+ ```
484
+
485
+ ## Common Training Patterns
486
+
487
+ ### Batch Size Ramp-Up
488
+
489
+ Many models use gradual batch size increase:
490
+
491
+ ```yaml
492
+ rampup-batch-size: [start_batch, increment, total_samples]
493
+ # Example: [384, 384, 97656250]
494
+ # Start with 384, increase by 384 every step until total_samples
495
+ ```
496
+
497
+ ### Learning Rate Schedules
498
+
499
+ **Cosine Decay** (most common):
500
+ ```python
501
+ lr(step) = min_lr + 0.5 * (max_lr - min_lr) * (1 + cos(π * step / total_steps))
502
+ ```
503
+
504
+ **Linear Warmup + Cosine Decay**:
505
+ ```python
506
+ if step < warmup_steps:
507
+ lr(step) = max_lr * step / warmup_steps
508
+ else:
509
+ lr(step) = cosine_decay(step - warmup_steps)
510
+ ```
511
+
512
+ ### Optimizer Settings
513
+
514
+ **Standard Adam**:
515
+ ```yaml
516
+ optimizer: adam
517
+ adam-beta1: 0.9
518
+ adam-beta2: 0.95 # Lower than typical 0.999
519
+ weight-decay: 0.1
520
+ clip-grad: 1.0
521
+ ```
522
+
523
+ **Why beta2=0.95?**
524
+ - More responsive to recent gradients
525
+ - Better for large-scale training
526
+ - Proven in GPT-3, LLaMA, Mixtral
527
+
528
+ ### Data Configuration
529
+
530
+ **Vocabulary Sizes**:
531
+ - GPT-3: 50,257 tokens
532
+ - LLaMA-3: 128,256 tokens (expanded for multilingual)
533
+ - Mixtral: 32,000 tokens
534
+
535
+ **Typical Data Mix** (by tokens):
536
+ - Web pages: 60-70%
537
+ - Books: 10-15%
538
+ - GitHub code: 5-10%
539
+ - Academic papers: 5-10%
540
+ - Other (Wikipedia, etc.): 5-10%
541
+
542
+ ## References
543
+
544
+ - Megatron-LM configurations: `tests/functional_tests/test_cases/`
545
+ - LLaMA-3 training: Meta AI technical report
546
+ - Mixtral training: Mistral AI blog
547
+ - DeepSeek-V3: DeepSeek technical report