@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,406 @@
1
+ ---
2
+ name: ray-train
3
+ description: Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.
4
+ version: 1.0.0
5
+ author: Synthetic Sciences
6
+ license: MIT
7
+ tags: [Ray Train, Distributed Training, Synthetic Sciencestion, Ray, Hyperparameter Tuning, Fault Tolerance, Elastic Scaling, Multi-Node, PyTorch, TensorFlow]
8
+ dependencies: [ray[train], torch, transformers]
9
+ ---
10
+
11
+ # Ray Train - Distributed Training Synthetic Sciencestion
12
+
13
+ ## Quick start
14
+
15
+ Ray Train scales machine learning training from single GPU to multi-node clusters with minimal code changes.
16
+
17
+ **Installation**:
18
+ ```bash
19
+ pip install -U "ray[train]"
20
+ ```
21
+
22
+ **Basic PyTorch training** (single node):
23
+
24
+ ```python
25
+ import ray
26
+ from ray import train
27
+ from ray.train import ScalingConfig
28
+ from ray.train.torch import TorchTrainer
29
+ import torch
30
+ import torch.nn as nn
31
+
32
+ # Define training function
33
+ def train_func(config):
34
+ # Your normal PyTorch code
35
+ model = nn.Linear(10, 1)
36
+ optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
37
+
38
+ # Prepare for distributed (Ray handles device placement)
39
+ model = train.torch.prepare_model(model)
40
+
41
+ for epoch in range(10):
42
+ # Your training loop
43
+ output = model(torch.randn(32, 10))
44
+ loss = output.sum()
45
+ loss.backward()
46
+ optimizer.step()
47
+ optimizer.zero_grad()
48
+
49
+ # Report metrics (logged automatically)
50
+ train.report({"loss": loss.item(), "epoch": epoch})
51
+
52
+ # Run distributed training
53
+ trainer = TorchTrainer(
54
+ train_func,
55
+ scaling_config=ScalingConfig(
56
+ num_workers=4, # 4 GPUs/workers
57
+ use_gpu=True
58
+ )
59
+ )
60
+
61
+ result = trainer.fit()
62
+ print(f"Final loss: {result.metrics['loss']}")
63
+ ```
64
+
65
+ **That's it!** Ray handles:
66
+ - Distributed coordination
67
+ - GPU allocation
68
+ - Fault tolerance
69
+ - Checkpointing
70
+ - Metric aggregation
71
+
72
+ ## Common workflows
73
+
74
+ ### Workflow 1: Scale existing PyTorch code
75
+
76
+ **Original single-GPU code**:
77
+ ```python
78
+ model = MyModel().cuda()
79
+ optimizer = torch.optim.Adam(model.parameters())
80
+
81
+ for epoch in range(epochs):
82
+ for batch in dataloader:
83
+ loss = model(batch)
84
+ loss.backward()
85
+ optimizer.step()
86
+ ```
87
+
88
+ **Ray Train version** (scales to multi-GPU/multi-node):
89
+ ```python
90
+ from ray.train.torch import TorchTrainer
91
+ from ray import train
92
+
93
+ def train_func(config):
94
+ model = MyModel()
95
+ optimizer = torch.optim.Adam(model.parameters())
96
+
97
+ # Prepare for distributed (automatic device placement)
98
+ model = train.torch.prepare_model(model)
99
+ dataloader = train.torch.prepare_data_loader(dataloader)
100
+
101
+ for epoch in range(epochs):
102
+ for batch in dataloader:
103
+ loss = model(batch)
104
+ loss.backward()
105
+ optimizer.step()
106
+
107
+ # Report metrics
108
+ train.report({"loss": loss.item()})
109
+
110
+ # Scale to 8 GPUs
111
+ trainer = TorchTrainer(
112
+ train_func,
113
+ scaling_config=ScalingConfig(num_workers=8, use_gpu=True)
114
+ )
115
+ trainer.fit()
116
+ ```
117
+
118
+ **Benefits**: Same code runs on 1 GPU or 1000 GPUs
119
+
120
+ ### Workflow 2: HuggingFace Transformers integration
121
+
122
+ ```python
123
+ from ray.train.huggingface import TransformersTrainer
124
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
125
+
126
+ def train_func(config):
127
+ # Load model and tokenizer
128
+ model = AutoModelForCausalLM.from_pretrained("gpt2")
129
+ tokenizer = AutoTokenizer.from_pretrained("gpt2")
130
+
131
+ # Training arguments (HuggingFace API)
132
+ training_args = TrainingArguments(
133
+ output_dir="./output",
134
+ num_train_epochs=3,
135
+ per_device_train_batch_size=8,
136
+ learning_rate=2e-5,
137
+ )
138
+
139
+ # Ray automatically handles distributed training
140
+ from transformers import Trainer
141
+ trainer = Trainer(
142
+ model=model,
143
+ args=training_args,
144
+ train_dataset=train_dataset,
145
+ )
146
+
147
+ trainer.train()
148
+
149
+ # Scale to multi-node (2 nodes × 8 GPUs = 16 workers)
150
+ trainer = TransformersTrainer(
151
+ train_func,
152
+ scaling_config=ScalingConfig(
153
+ num_workers=16,
154
+ use_gpu=True,
155
+ resources_per_worker={"GPU": 1}
156
+ )
157
+ )
158
+
159
+ result = trainer.fit()
160
+ ```
161
+
162
+ ### Workflow 3: Hyperparameter tuning with Ray Tune
163
+
164
+ ```python
165
+ from ray import tune
166
+ from ray.train.torch import TorchTrainer
167
+ from ray.tune.schedulers import ASHAScheduler
168
+
169
+ def train_func(config):
170
+ # Use hyperparameters from config
171
+ lr = config["lr"]
172
+ batch_size = config["batch_size"]
173
+
174
+ model = MyModel()
175
+ optimizer = torch.optim.Adam(model.parameters(), lr=lr)
176
+
177
+ model = train.torch.prepare_model(model)
178
+
179
+ for epoch in range(10):
180
+ # Training loop
181
+ loss = train_epoch(model, optimizer, batch_size)
182
+ train.report({"loss": loss, "epoch": epoch})
183
+
184
+ # Define search space
185
+ param_space = {
186
+ "lr": tune.loguniform(1e-5, 1e-2),
187
+ "batch_size": tune.choice([16, 32, 64, 128])
188
+ }
189
+
190
+ # Run 20 trials with early stopping
191
+ tuner = tune.Tuner(
192
+ TorchTrainer(
193
+ train_func,
194
+ scaling_config=ScalingConfig(num_workers=4, use_gpu=True)
195
+ ),
196
+ param_space=param_space,
197
+ tune_config=tune.TuneConfig(
198
+ num_samples=20,
199
+ scheduler=ASHAScheduler(metric="loss", mode="min")
200
+ )
201
+ )
202
+
203
+ results = tuner.fit()
204
+ best = results.get_best_result(metric="loss", mode="min")
205
+ print(f"Best hyperparameters: {best.config}")
206
+ ```
207
+
208
+ **Result**: Distributed hyperparameter search across cluster
209
+
210
+ ### Workflow 4: Checkpointing and fault tolerance
211
+
212
+ ```python
213
+ from ray import train
214
+ from ray.train import Checkpoint
215
+
216
+ def train_func(config):
217
+ model = MyModel()
218
+ optimizer = torch.optim.Adam(model.parameters())
219
+
220
+ # Try to resume from checkpoint
221
+ checkpoint = train.get_checkpoint()
222
+ if checkpoint:
223
+ with checkpoint.as_directory() as checkpoint_dir:
224
+ state = torch.load(f"{checkpoint_dir}/model.pt")
225
+ model.load_state_dict(state["model"])
226
+ optimizer.load_state_dict(state["optimizer"])
227
+ start_epoch = state["epoch"]
228
+ else:
229
+ start_epoch = 0
230
+
231
+ model = train.torch.prepare_model(model)
232
+
233
+ for epoch in range(start_epoch, 100):
234
+ loss = train_epoch(model, optimizer)
235
+
236
+ # Save checkpoint every 10 epochs
237
+ if epoch % 10 == 0:
238
+ checkpoint = Checkpoint.from_directory(
239
+ train.get_context().get_trial_dir()
240
+ )
241
+ torch.save({
242
+ "model": model.state_dict(),
243
+ "optimizer": optimizer.state_dict(),
244
+ "epoch": epoch
245
+ }, checkpoint.path / "model.pt")
246
+
247
+ train.report({"loss": loss}, checkpoint=checkpoint)
248
+
249
+ trainer = TorchTrainer(
250
+ train_func,
251
+ scaling_config=ScalingConfig(num_workers=8, use_gpu=True)
252
+ )
253
+
254
+ # Automatically resumes from checkpoint if training fails
255
+ result = trainer.fit()
256
+ ```
257
+
258
+ ### Workflow 5: Multi-node training
259
+
260
+ ```python
261
+ from ray.train import ScalingConfig
262
+
263
+ # Connect to Ray cluster
264
+ ray.init(address="auto") # Or ray.init("ray://head-node:10001")
265
+
266
+ # Train across 4 nodes × 8 GPUs = 32 workers
267
+ trainer = TorchTrainer(
268
+ train_func,
269
+ scaling_config=ScalingConfig(
270
+ num_workers=32,
271
+ use_gpu=True,
272
+ resources_per_worker={"GPU": 1, "CPU": 4},
273
+ placement_strategy="SPREAD" # Spread across nodes
274
+ )
275
+ )
276
+
277
+ result = trainer.fit()
278
+ ```
279
+
280
+ **Launch Ray cluster**:
281
+ ```bash
282
+ # On head node
283
+ ray start --head --port=6379
284
+
285
+ # On worker nodes
286
+ ray start --address=<head-node-ip>:6379
287
+ ```
288
+
289
+ ## When to use vs alternatives
290
+
291
+ **Use Ray Train when**:
292
+ - Training across multiple machines (multi-node)
293
+ - Need hyperparameter tuning at scale
294
+ - Want fault tolerance (auto-restart failed workers)
295
+ - Elastic scaling (add/remove nodes during training)
296
+ - Unified framework (same code for PyTorch/TF/HF)
297
+
298
+ **Key advantages**:
299
+ - **Multi-node orchestration**: Easiest multi-node setup
300
+ - **Ray Tune integration**: Best-in-class hyperparameter tuning
301
+ - **Fault tolerance**: Automatic recovery from failures
302
+ - **Elastic**: Add/remove nodes without restarting
303
+ - **Framework agnostic**: PyTorch, TensorFlow, HuggingFace, XGBoost
304
+
305
+ **Use alternatives instead**:
306
+ - **Accelerate**: Single-node multi-GPU, simpler
307
+ - **PyTorch Lightning**: High-level abstractions, callbacks
308
+ - **DeepSpeed**: Maximum performance, complex setup
309
+ - **Raw DDP**: Maximum control, minimal overhead
310
+
311
+ ## Common issues
312
+
313
+ **Issue: Ray cluster not connecting**
314
+
315
+ Check ray status:
316
+ ```bash
317
+ ray status
318
+
319
+ # Should show:
320
+ # - Nodes: 4
321
+ # - GPUs: 32
322
+ # - Workers: Ready
323
+ ```
324
+
325
+ If not connected:
326
+ ```bash
327
+ # Restart head node
328
+ ray stop
329
+ ray start --head --port=6379 --dashboard-host=0.0.0.0
330
+
331
+ # Restart worker nodes
332
+ ray stop
333
+ ray start --address=<head-ip>:6379
334
+ ```
335
+
336
+ **Issue: Out of memory**
337
+
338
+ Reduce workers or use gradient accumulation:
339
+ ```python
340
+ scaling_config=ScalingConfig(
341
+ num_workers=4, # Reduce from 8
342
+ use_gpu=True
343
+ )
344
+
345
+ # In train_func, accumulate gradients
346
+ for i, batch in enumerate(dataloader):
347
+ loss = model(batch) / accumulation_steps
348
+ loss.backward()
349
+
350
+ if (i + 1) % accumulation_steps == 0:
351
+ optimizer.step()
352
+ optimizer.zero_grad()
353
+ ```
354
+
355
+ **Issue: Slow training**
356
+
357
+ Check if data loading is bottleneck:
358
+ ```python
359
+ import time
360
+
361
+ def train_func(config):
362
+ for epoch in range(epochs):
363
+ start = time.time()
364
+ for batch in dataloader:
365
+ data_time = time.time() - start
366
+ # Train...
367
+ start = time.time()
368
+ print(f"Data loading: {data_time:.3f}s")
369
+ ```
370
+
371
+ If data loading is slow, increase workers:
372
+ ```python
373
+ dataloader = DataLoader(dataset, num_workers=8)
374
+ ```
375
+
376
+ ## Advanced topics
377
+
378
+ **Multi-node setup**: See [references/multi-node.md](references/multi-node.md) for Ray cluster deployment on AWS, GCP, Kubernetes, and SLURM.
379
+
380
+ **Hyperparameter tuning**: See [references/hyperparameter-tuning.md](references/hyperparameter-tuning.md) for Ray Tune integration, search algorithms (Optuna, HyperOpt), and population-based training.
381
+
382
+ **Custom training loops**: See [references/custom-loops.md](references/custom-loops.md) for advanced Ray Train usage, custom backends, and integration with other frameworks.
383
+
384
+ ## Hardware requirements
385
+
386
+ - **Single node**: 1+ GPUs (or CPUs)
387
+ - **Multi-node**: 2+ machines with network connectivity
388
+ - **Cloud**: AWS, GCP, Azure (Ray autoscaling)
389
+ - **On-prem**: Kubernetes, SLURM clusters
390
+
391
+ **Supported accelerators**:
392
+ - NVIDIA GPUs (CUDA)
393
+ - AMD GPUs (ROCm)
394
+ - TPUs (Google Cloud)
395
+ - CPUs
396
+
397
+ ## Resources
398
+
399
+ - Docs: https://docs.ray.io/en/latest/train/train.html
400
+ - GitHub: https://github.com/ray-project/ray ⭐ 36,000+
401
+ - Version: 2.40.0+
402
+ - Examples: https://docs.ray.io/en/latest/train/examples.html
403
+ - Slack: https://forms.gle/9TSdDYUgxYs8SA9e8
404
+ - Used by: OpenAI, Uber, Spotify, Shopify, Instacart
405
+
406
+