@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,432 @@
1
+ # MoE Model Architectures
2
+
3
+ Comprehensive guide to different Mixture of Experts architectures and their design patterns.
4
+
5
+ ## Table of Contents
6
+ - Mixtral 8x7B (Mistral AI)
7
+ - DeepSeek-V3 (DeepSeek AI)
8
+ - Switch Transformers (Google)
9
+ - GLaM (Google)
10
+ - Comparison Table
11
+
12
+ ## Mixtral 8x7B (Mistral AI - 2024)
13
+
14
+ ### Architecture Overview
15
+
16
+ **Parameters:**
17
+ - Total: 47B parameters
18
+ - Active per token: 13B (2 experts out of 8)
19
+ - Each expert: ~7B parameters
20
+
21
+ **Key Features:**
22
+ - **Top-2 routing**: Each token routed to 2 experts
23
+ - **8 experts per layer**: Sparse activation
24
+ - **SMoE architecture**: Sparse Mixture of Experts
25
+ - **Grouped-Query Attention (GQA)**: Efficient attention mechanism
26
+
27
+ ### Layer Structure
28
+
29
+ ```python
30
+ # Mixtral Transformer Block
31
+ class MixtralDecoderLayer(nn.Module):
32
+ def __init__(self, config):
33
+ super().__init__()
34
+ self.hidden_size = config.hidden_size
35
+
36
+ # Self-attention
37
+ self.self_attn = MixtralAttention(config)
38
+
39
+ # MoE Feed-Forward
40
+ self.block_sparse_moe = MixtralSparseMoeBlock(config)
41
+
42
+ # Layer norms
43
+ self.input_layernorm = MixtralRMSNorm(config.hidden_size)
44
+ self.post_attention_layernorm = MixtralRMSNorm(config.hidden_size)
45
+
46
+ def forward(self, hidden_states, attention_mask=None):
47
+ residual = hidden_states
48
+
49
+ # Self-attention
50
+ hidden_states = self.input_layernorm(hidden_states)
51
+ hidden_states = self.self_attn(hidden_states, attention_mask)
52
+ hidden_states = residual + hidden_states
53
+
54
+ # MoE FFN
55
+ residual = hidden_states
56
+ hidden_states = self.post_attention_layernorm(hidden_states)
57
+ hidden_states = self.block_sparse_moe(hidden_states)
58
+ hidden_states = residual + hidden_states
59
+
60
+ return hidden_states
61
+ ```
62
+
63
+ ### Sparse MoE Block
64
+
65
+ ```python
66
+ class MixtralSparseMoeBlock(nn.Module):
67
+ def __init__(self, config):
68
+ super().__init__()
69
+ self.hidden_dim = config.hidden_size
70
+ self.ffn_dim = config.intermediate_size
71
+ self.num_experts = config.num_local_experts # 8
72
+ self.top_k = config.num_experts_per_tok # 2
73
+
74
+ # Router (gating network)
75
+ self.gate = nn.Linear(self.hidden_dim, self.num_experts, bias=False)
76
+
77
+ # 8 expert FFNs
78
+ self.experts = nn.ModuleList([
79
+ MixtralBlockSparseTop2MLP(config)
80
+ for _ in range(self.num_experts)
81
+ ])
82
+
83
+ def forward(self, hidden_states):
84
+ batch_size, sequence_length, hidden_dim = hidden_states.shape
85
+ hidden_states = hidden_states.view(-1, hidden_dim)
86
+
87
+ # Router logits (batch * seq_len, num_experts)
88
+ router_logits = self.gate(hidden_states)
89
+
90
+ # Top-2 routing
91
+ routing_weights = F.softmax(router_logits, dim=1)
92
+ routing_weights, selected_experts = torch.topk(
93
+ routing_weights, self.top_k, dim=-1
94
+ )
95
+
96
+ # Normalize top-2 weights to sum to 1
97
+ routing_weights /= routing_weights.sum(dim=-1, keepdim=True)
98
+
99
+ # Route to experts
100
+ final_hidden_states = torch.zeros(
101
+ (batch_size * sequence_length, hidden_dim),
102
+ dtype=hidden_states.dtype,
103
+ device=hidden_states.device
104
+ )
105
+
106
+ # Process each expert
107
+ for expert_idx in range(self.num_experts):
108
+ expert_layer = self.experts[expert_idx]
109
+ idx, top_x = torch.where(selected_experts == expert_idx)
110
+
111
+ if idx.shape[0] == 0:
112
+ continue
113
+
114
+ # Tokens routed to this expert
115
+ top_x_list = top_x.tolist()
116
+ idx_list = idx.tolist()
117
+
118
+ # Current expert input
119
+ current_state = hidden_states[None, idx_list].reshape(-1, hidden_dim)
120
+ current_hidden_states = expert_layer(current_state)
121
+
122
+ # Weight by routing scores
123
+ current_hidden_states *= routing_weights[idx_list, top_x_list, None]
124
+
125
+ # Accumulate
126
+ final_hidden_states.index_add_(0, idx, current_hidden_states.to(hidden_states.dtype))
127
+
128
+ final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
129
+ return final_hidden_states
130
+ ```
131
+
132
+ ### Expert FFN
133
+
134
+ ```python
135
+ class MixtralBlockSparseTop2MLP(nn.Module):
136
+ def __init__(self, config):
137
+ super().__init__()
138
+ self.ffn_dim = config.intermediate_size
139
+ self.hidden_dim = config.hidden_size
140
+
141
+ self.w1 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
142
+ self.w2 = nn.Linear(self.ffn_dim, self.hidden_dim, bias=False)
143
+ self.w3 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
144
+
145
+ self.act_fn = nn.SiLU()
146
+
147
+ def forward(self, hidden_states):
148
+ # SwiGLU activation
149
+ current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
150
+ current_hidden_states = self.w2(current_hidden_states)
151
+ return current_hidden_states
152
+ ```
153
+
154
+ ### Configuration
155
+
156
+ ```json
157
+ {
158
+ "architectures": ["MixtralForCausalLM"],
159
+ "hidden_size": 4096,
160
+ "intermediate_size": 14336,
161
+ "num_attention_heads": 32,
162
+ "num_hidden_layers": 32,
163
+ "num_key_value_heads": 8,
164
+ "num_local_experts": 8,
165
+ "num_experts_per_tok": 2,
166
+ "vocab_size": 32000,
167
+ "max_position_embeddings": 32768,
168
+ "rms_norm_eps": 1e-5,
169
+ "rope_theta": 1000000.0
170
+ }
171
+ ```
172
+
173
+ ## DeepSeek-V3 (DeepSeek AI - December 2024)
174
+
175
+ ### Architecture Overview
176
+
177
+ **Parameters:**
178
+ - Total: 671B parameters
179
+ - Active per token: 37B
180
+ - Model size: Massive-scale MoE
181
+
182
+ **Key Innovations:**
183
+ 1. **DeepSeekMoE**: Finer-grained experts with shared experts
184
+ 2. **Multi-Head Latent Attention (MLA)**: Reduced KV cache memory
185
+ 3. **Auxiliary-Loss-Free Load Balancing**: No auxiliary loss needed
186
+ 4. **Multi-Token Prediction (MTP)**: Predict multiple tokens simultaneously
187
+
188
+ ### DeepSeekMoE Architecture
189
+
190
+ ```python
191
+ class DeepSeekMoE(nn.Module):
192
+ """Finer-grained experts with shared experts."""
193
+
194
+ def __init__(self, config):
195
+ super().__init__()
196
+ self.num_experts = config.num_experts # More fine-grained
197
+ self.num_shared_experts = config.num_shared_experts # e.g., 2
198
+ self.num_routed_experts = self.num_experts - self.num_shared_experts
199
+ self.top_k = config.top_k
200
+
201
+ # Shared experts (always activated)
202
+ self.shared_experts = nn.ModuleList([
203
+ FFN(config) for _ in range(self.num_shared_experts)
204
+ ])
205
+
206
+ # Routed experts (top-k activated)
207
+ self.routed_experts = nn.ModuleList([
208
+ FFN(config) for _ in range(self.num_routed_experts)
209
+ ])
210
+
211
+ # Router for routed experts only
212
+ self.gate = nn.Linear(config.hidden_size, self.num_routed_experts, bias=False)
213
+
214
+ def forward(self, x):
215
+ # Shared experts (always computed)
216
+ shared_output = sum(expert(x) for expert in self.shared_experts)
217
+
218
+ # Router for top-k routed experts
219
+ router_logits = self.gate(x)
220
+ routing_weights = F.softmax(router_logits, dim=-1)
221
+ routing_weights, selected_experts = torch.topk(routing_weights, self.top_k, dim=-1)
222
+ routing_weights /= routing_weights.sum(dim=-1, keepdim=True)
223
+
224
+ # Routed experts output
225
+ routed_output = torch.zeros_like(x)
226
+ for i in range(self.top_k):
227
+ expert_idx = selected_experts[:, :, i]
228
+ expert_weight = routing_weights[:, :, i:i+1]
229
+ for eidx in range(self.num_routed_experts):
230
+ mask = (expert_idx == eidx)
231
+ if mask.any():
232
+ routed_output[mask] += expert_weight[mask] * self.routed_experts[eidx](x[mask])
233
+
234
+ # Combine shared and routed
235
+ return shared_output + routed_output
236
+ ```
237
+
238
+ ### Multi-Head Latent Attention (MLA)
239
+
240
+ ```python
241
+ class MultiHeadLatentAttention(nn.Module):
242
+ """Compress KV cache with latent vectors."""
243
+
244
+ def __init__(self, config):
245
+ super().__init__()
246
+ self.hidden_size = config.hidden_size
247
+ self.num_heads = config.num_attention_heads
248
+ self.head_dim = self.hidden_size // self.num_heads
249
+ self.latent_dim = config.latent_dim # Compressed dimension
250
+
251
+ # Project to latent space
252
+ self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim)
253
+ self.kv_proj = nn.Linear(self.hidden_size, self.latent_dim) # Compress!
254
+
255
+ # Decompress for attention
256
+ self.k_decompress = nn.Linear(self.latent_dim, self.num_heads * self.head_dim)
257
+ self.v_decompress = nn.Linear(self.latent_dim, self.num_heads * self.head_dim)
258
+
259
+ self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size)
260
+
261
+ def forward(self, hidden_states, past_key_value=None):
262
+ batch_size, seq_len, _ = hidden_states.shape
263
+
264
+ # Query
265
+ q = self.q_proj(hidden_states)
266
+ q = q.view(batch_size, seq_len, self.num_heads, self.head_dim).transpose(1, 2)
267
+
268
+ # Compress KV to latent
269
+ kv_latent = self.kv_proj(hidden_states) # (batch, seq, latent_dim)
270
+
271
+ # Store compressed KV in cache (huge memory savings!)
272
+ if past_key_value is not None:
273
+ kv_latent = torch.cat([past_key_value, kv_latent], dim=1)
274
+
275
+ # Decompress for attention
276
+ k = self.k_decompress(kv_latent)
277
+ v = self.v_decompress(kv_latent)
278
+ k = k.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
279
+ v = v.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
280
+
281
+ # Attention
282
+ attn_output = F.scaled_dot_product_attention(q, k, v)
283
+ attn_output = attn_output.transpose(1, 2).contiguous()
284
+ attn_output = attn_output.view(batch_size, seq_len, -1)
285
+
286
+ return self.o_proj(attn_output), kv_latent
287
+ ```
288
+
289
+ ### Auxiliary-Loss-Free Load Balancing
290
+
291
+ ```python
292
+ # DeepSeek-V3 uses bias terms instead of auxiliary loss
293
+ class DeepSeekRouter(nn.Module):
294
+ def __init__(self, hidden_size, num_experts):
295
+ super().__init__()
296
+ self.weight = nn.Parameter(torch.empty(num_experts, hidden_size))
297
+ self.bias = nn.Parameter(torch.zeros(num_experts)) # Load balancing bias!
298
+
299
+ # Initialize
300
+ nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
301
+
302
+ def forward(self, x):
303
+ # Router with bias for load balancing
304
+ logits = F.linear(x, self.weight, self.bias)
305
+ return logits
306
+ ```
307
+
308
+ ## Switch Transformers (Google - 2021)
309
+
310
+ ### Architecture Overview
311
+
312
+ **Key Innovation**: Simplest MoE - Top-1 routing
313
+
314
+ **Parameters:**
315
+ - Switch-C: 1.6T parameters
316
+ - Active per token: ~10B
317
+
318
+ ### Top-1 Routing
319
+
320
+ ```python
321
+ class SwitchTransformersTop1Router(nn.Module):
322
+ """Simplest routing: one expert per token."""
323
+
324
+ def __init__(self, config):
325
+ super().__init__()
326
+ self.num_experts = config.num_experts
327
+ self.expert_capacity = config.expert_capacity
328
+
329
+ # Router
330
+ self.classifier = nn.Linear(config.d_model, config.num_experts)
331
+
332
+ def forward(self, hidden_states):
333
+ # Router logits
334
+ router_logits = self.classifier(hidden_states)
335
+
336
+ # Add noise for load balancing (during training)
337
+ if self.training:
338
+ router_logits += torch.randn_like(router_logits) * config.router_jitter_noise
339
+
340
+ # Top-1: Argmax (hard routing)
341
+ router_probs = F.softmax(router_logits, dim=-1)
342
+ expert_index = torch.argmax(router_probs, dim=-1)
343
+
344
+ # Expert capacity: drop tokens if expert is full
345
+ expert_mask = F.one_hot(expert_index, self.num_experts)
346
+ expert_capacity_mask = self._get_capacity_mask(expert_mask)
347
+
348
+ return expert_index, expert_mask, expert_capacity_mask
349
+
350
+ def _get_capacity_mask(self, expert_mask):
351
+ """Enforce expert capacity limits."""
352
+ # Count tokens per expert
353
+ tokens_per_expert = expert_mask.sum(dim=0)
354
+
355
+ # Mark tokens exceeding capacity
356
+ capacity_mask = tokens_per_expert < self.expert_capacity
357
+ return capacity_mask
358
+ ```
359
+
360
+ ### Load Balancing Loss
361
+
362
+ ```python
363
+ def switch_load_balancing_loss(router_probs, expert_indices, num_experts):
364
+ """Auxiliary loss to encourage uniform expert usage."""
365
+ # Fraction of probability mass assigned to each expert
366
+ router_prob_per_expert = router_probs.mean(dim=0) # (num_experts,)
367
+
368
+ # Fraction of tokens routed to each expert
369
+ expert_counts = F.one_hot(expert_indices, num_experts).float().mean(dim=0)
370
+
371
+ # Loss: num_experts * sum(prob_mass * token_fraction)
372
+ # Minimized when both are uniform (1/num_experts)
373
+ loss = num_experts * (router_prob_per_expert * expert_counts).sum()
374
+
375
+ return loss
376
+ ```
377
+
378
+ ## Architecture Comparison Table
379
+
380
+ | Model | Total Params | Active Params | Routing | Experts/Layer | Top-K | Key Innovation |
381
+ |-------|-------------|---------------|---------|---------------|-------|----------------|
382
+ | **Mixtral 8x7B** | 47B | 13B | Top-2 | 8 | 2 | Balanced top-2, GQA |
383
+ | **DeepSeek-V3** | 671B | 37B | Top-K | Many | Variable | MLA, shared experts, no aux loss |
384
+ | **Switch-C** | 1.6T | ~10B | Top-1 | 2048 | 1 | Simplest routing |
385
+ | **GLaM** | 1.2T | ~97B | Top-2 | 64 | 2 | Capacity factor tuning |
386
+
387
+ ## Design Patterns
388
+
389
+ ### Pattern 1: Shared + Routed Experts (DeepSeek)
390
+
391
+ ```python
392
+ # Best for: Ensuring some experts always activated
393
+ output = shared_experts(x) + routed_experts(x)
394
+ ```
395
+
396
+ **Pros:**
397
+ - Guarantees minimum computation
398
+ - Shared experts learn common patterns
399
+ - Routed experts specialize
400
+
401
+ ### Pattern 2: Pure Sparse Routing (Mixtral, Switch)
402
+
403
+ ```python
404
+ # Best for: Maximum sparsity and efficiency
405
+ output = sum(weight_i * expert_i(x) for i in top_k)
406
+ ```
407
+
408
+ **Pros:**
409
+ - Simplest implementation
410
+ - Maximum parameter efficiency
411
+ - Clear expert specialization
412
+
413
+ ### Pattern 3: Expert Choice Routing
414
+
415
+ ```python
416
+ # Experts choose tokens (instead of tokens choosing experts)
417
+ for expert in experts:
418
+ top_k_tokens = expert.select_top_k_tokens(all_tokens)
419
+ expert.process(top_k_tokens)
420
+ ```
421
+
422
+ **Pros:**
423
+ - Perfect load balancing
424
+ - No token dropping
425
+ - Variable tokens per expert
426
+
427
+ ## Resources
428
+
429
+ - **Mixtral Paper**: https://arxiv.org/abs/2401.04088
430
+ - **DeepSeek-V3**: https://arxiv.org/abs/2412.19437
431
+ - **Switch Transformers**: https://arxiv.org/abs/2101.03961
432
+ - **GLaM**: https://arxiv.org/abs/2112.06905