@synsci/cli-darwin-arm64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,382 @@
1
+ # NanoGPT Architecture
2
+
3
+ ## Model Structure (~300 Lines)
4
+
5
+ NanoGPT implements a clean GPT-2 architecture in minimal code for educational purposes.
6
+
7
+ ### Complete Model (model.py)
8
+
9
+ ```python
10
+ import torch
11
+ import torch.nn as nn
12
+ from torch.nn import functional as F
13
+
14
+ class CausalSelfAttention(nn.Module):
15
+ """Multi-head masked self-attention layer."""
16
+
17
+ def __init__(self, config):
18
+ super().__init__()
19
+ assert config.n_embd % config.n_head == 0
20
+
21
+ # Key, query, value projections for all heads (batched)
22
+ self.c_attn = nn.Linear(config.n_embd, 3 * config.n_embd, bias=config.bias)
23
+ # Output projection
24
+ self.c_proj = nn.Linear(config.n_embd, config.n_embd, bias=config.bias)
25
+
26
+ # Regularization
27
+ self.attn_dropout = nn.Dropout(config.dropout)
28
+ self.resid_dropout = nn.Dropout(config.dropout)
29
+
30
+ self.n_head = config.n_head
31
+ self.n_embd = config.n_embd
32
+ self.dropout = config.dropout
33
+
34
+ # Flash attention flag
35
+ self.flash = hasattr(torch.nn.functional, 'scaled_dot_product_attention')
36
+
37
+ if not self.flash:
38
+ # Causal mask (lower triangular)
39
+ self.register_buffer("bias", torch.tril(
40
+ torch.ones(config.block_size, config.block_size)
41
+ ).view(1, 1, config.block_size, config.block_size))
42
+
43
+ def forward(self, x):
44
+ B, T, C = x.size() # batch, seq_len, embedding_dim
45
+
46
+ # Calculate Q, K, V for all heads in batch
47
+ q, k, v = self.c_attn(x).split(self.n_embd, dim=2)
48
+
49
+ # Reshape for multi-head attention
50
+ k = k.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
51
+ q = q.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
52
+ v = v.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
53
+
54
+ # Attention
55
+ if self.flash:
56
+ # Flash Attention (PyTorch 2.0+)
57
+ y = torch.nn.functional.scaled_dot_product_attention(
58
+ q, k, v,
59
+ attn_mask=None,
60
+ dropout_p=self.dropout if self.training else 0,
61
+ is_causal=True
62
+ )
63
+ else:
64
+ # Manual attention implementation
65
+ att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
66
+ att = att.masked_fill(self.bias[:, :, :T, :T] == 0, float('-inf'))
67
+ att = F.softmax(att, dim=-1)
68
+ att = self.attn_dropout(att)
69
+ y = att @ v # (B, nh, T, hs)
70
+
71
+ # Reassemble all head outputs
72
+ y = y.transpose(1, 2).contiguous().view(B, T, C)
73
+
74
+ # Output projection
75
+ y = self.resid_dropout(self.c_proj(y))
76
+ return y
77
+
78
+
79
+ class MLP(nn.Module):
80
+ """Feedforward network (2-layer with GELU activation)."""
81
+
82
+ def __init__(self, config):
83
+ super().__init__()
84
+ self.c_fc = nn.Linear(config.n_embd, 4 * config.n_embd, bias=config.bias)
85
+ self.gelu = nn.GELU()
86
+ self.c_proj = nn.Linear(4 * config.n_embd, config.n_embd, bias=config.bias)
87
+ self.dropout = nn.Dropout(config.dropout)
88
+
89
+ def forward(self, x):
90
+ x = self.c_fc(x)
91
+ x = self.gelu(x)
92
+ x = self.c_proj(x)
93
+ x = self.dropout(x)
94
+ return x
95
+
96
+
97
+ class Block(nn.Module):
98
+ """Transformer block (attention + MLP with residuals)."""
99
+
100
+ def __init__(self, config):
101
+ super().__init__()
102
+ self.ln_1 = nn.LayerNorm(config.n_embd)
103
+ self.attn = CausalSelfAttention(config)
104
+ self.ln_2 = nn.LayerNorm(config.n_embd)
105
+ self.mlp = MLP(config)
106
+
107
+ def forward(self, x):
108
+ x = x + self.attn(self.ln_1(x)) # Pre-norm + residual
109
+ x = x + self.mlp(self.ln_2(x)) # Pre-norm + residual
110
+ return x
111
+
112
+
113
+ @dataclass
114
+ class GPTConfig:
115
+ """GPT model configuration."""
116
+ block_size: int = 1024 # Max sequence length
117
+ vocab_size: int = 50304 # GPT-2 vocab size (50257 rounded up for efficiency)
118
+ n_layer: int = 12 # Number of layers
119
+ n_head: int = 12 # Number of attention heads
120
+ n_embd: int = 768 # Embedding dimension
121
+ dropout: float = 0.0 # Dropout rate
122
+ bias: bool = True # Use bias in Linear and LayerNorm layers
123
+
124
+
125
+ class GPT(nn.Module):
126
+ """GPT Language Model."""
127
+
128
+ def __init__(self, config):
129
+ super().__init__()
130
+ assert config.vocab_size is not None
131
+ assert config.block_size is not None
132
+ self.config = config
133
+
134
+ self.transformer = nn.ModuleDict(dict(
135
+ wte=nn.Embedding(config.vocab_size, config.n_embd), # Token embeddings
136
+ wpe=nn.Embedding(config.block_size, config.n_embd), # Position embeddings
137
+ drop=nn.Dropout(config.dropout),
138
+ h=nn.ModuleList([Block(config) for _ in range(config.n_layer)]),
139
+ ln_f=nn.LayerNorm(config.n_embd),
140
+ ))
141
+ self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
142
+
143
+ # Weight tying (share embeddings and output projection)
144
+ self.transformer.wte.weight = self.lm_head.weight
145
+
146
+ # Initialize weights
147
+ self.apply(self._init_weights)
148
+ # Apply special scaled init to residual projections
149
+ for pn, p in self.named_parameters():
150
+ if pn.endswith('c_proj.weight'):
151
+ torch.nn.init.normal_(p, mean=0.0, std=0.02/math.sqrt(2 * config.n_layer))
152
+
153
+ def _init_weights(self, module):
154
+ if isinstance(module, nn.Linear):
155
+ torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
156
+ if module.bias is not None:
157
+ torch.nn.init.zeros_(module.bias)
158
+ elif isinstance(module, nn.Embedding):
159
+ torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
160
+
161
+ def forward(self, idx, targets=None):
162
+ device = idx.device
163
+ b, t = idx.size()
164
+ assert t <= self.config.block_size, f"Cannot forward sequence length {t}, max is {self.config.block_size}"
165
+
166
+ # Generate position indices
167
+ pos = torch.arange(0, t, dtype=torch.long, device=device).unsqueeze(0) # (1, t)
168
+
169
+ # Forward the GPT model
170
+ tok_emb = self.transformer.wte(idx) # Token embeddings (b, t, n_embd)
171
+ pos_emb = self.transformer.wpe(pos) # Position embeddings (1, t, n_embd)
172
+ x = self.transformer.drop(tok_emb + pos_emb)
173
+
174
+ for block in self.transformer.h:
175
+ x = block(x)
176
+
177
+ x = self.transformer.ln_f(x)
178
+
179
+ if targets is not None:
180
+ # Training mode: compute loss
181
+ logits = self.lm_head(x)
182
+ loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
183
+ else:
184
+ # Inference mode: only compute logits for last token
185
+ logits = self.lm_head(x[:, [-1], :]) # (b, 1, vocab_size)
186
+ loss = None
187
+
188
+ return logits, loss
189
+
190
+ @torch.no_grad()
191
+ def generate(self, idx, max_new_tokens, temperature=1.0, top_k=None):
192
+ """Generate new tokens autoregressively."""
193
+ for _ in range(max_new_tokens):
194
+ # Crop context if needed
195
+ idx_cond = idx if idx.size(1) <= self.config.block_size else idx[:, -self.config.block_size:]
196
+
197
+ # Forward pass
198
+ logits, _ = self(idx_cond)
199
+ logits = logits[:, -1, :] / temperature # Scale by temperature
200
+
201
+ # Optionally crop logits to top k
202
+ if top_k is not None:
203
+ v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
204
+ logits[logits < v[:, [-1]]] = -float('Inf')
205
+
206
+ # Sample from distribution
207
+ probs = F.softmax(logits, dim=-1)
208
+ idx_next = torch.multinomial(probs, num_samples=1)
209
+
210
+ # Append to sequence
211
+ idx = torch.cat((idx, idx_next), dim=1)
212
+
213
+ return idx
214
+ ```
215
+
216
+ ## Key Design Decisions
217
+
218
+ ### 1. Pre-Norm vs Post-Norm
219
+
220
+ **NanoGPT uses Pre-Norm** (LayerNorm before sub-layers):
221
+
222
+ ```python
223
+ # Pre-norm (NanoGPT)
224
+ x = x + attn(ln(x))
225
+ x = x + mlp(ln(x))
226
+
227
+ # Post-norm (original Transformer)
228
+ x = ln(x + attn(x))
229
+ x = ln(x + mlp(x))
230
+ ```
231
+
232
+ **Why Pre-Norm?**
233
+ - More stable training (no gradient explosion)
234
+ - Used in GPT-2, GPT-3
235
+ - Standard for large language models
236
+
237
+ ### 2. Weight Tying
238
+
239
+ **Shared weights between embeddings and output**:
240
+
241
+ ```python
242
+ self.transformer.wte.weight = self.lm_head.weight
243
+ ```
244
+
245
+ **Why?**
246
+ - Reduces parameters: `vocab_size × n_embd` saved
247
+ - Improves training (same semantic space)
248
+ - Standard in GPT-2
249
+
250
+ ### 3. Scaled Residual Initialization
251
+
252
+ ```python
253
+ # Scale down residual projections by layer depth
254
+ std = 0.02 / math.sqrt(2 * n_layer)
255
+ torch.nn.init.normal_(c_proj.weight, mean=0.0, std=std)
256
+ ```
257
+
258
+ **Why?**
259
+ - Prevents gradient explosion in deep networks
260
+ - Each residual path contributes ~equally
261
+ - From GPT-2 paper
262
+
263
+ ### 4. Flash Attention
264
+
265
+ ```python
266
+ if hasattr(torch.nn.functional, 'scaled_dot_product_attention'):
267
+ # Use PyTorch 2.0 Flash Attention (2× faster!)
268
+ y = F.scaled_dot_product_attention(q, k, v, is_causal=True)
269
+ else:
270
+ # Fallback to manual attention
271
+ att = (q @ k.T) / sqrt(d)
272
+ att = masked_fill(att, causal_mask, -inf)
273
+ y = softmax(att) @ v
274
+ ```
275
+
276
+ **Speedup**: 2× faster with same accuracy
277
+
278
+ ## Model Sizes
279
+
280
+ | Model | n_layer | n_head | n_embd | Params | Config Name |
281
+ |-------|---------|--------|--------|--------|-------------|
282
+ | GPT-2 Small | 12 | 12 | 768 | 124M | `gpt2` |
283
+ | GPT-2 Medium | 24 | 16 | 1024 | 350M | `gpt2-medium` |
284
+ | GPT-2 Large | 36 | 20 | 1280 | 774M | `gpt2-large` |
285
+ | GPT-2 XL | 48 | 25 | 1600 | 1558M | `gpt2-xl` |
286
+
287
+ **NanoGPT default** (Shakespeare):
288
+ ```python
289
+ config = GPTConfig(
290
+ block_size=256, # Short context for char-level
291
+ vocab_size=65, # Small vocab (a-z, A-Z, punctuation)
292
+ n_layer=6, # Shallow network
293
+ n_head=6,
294
+ n_embd=384, # Small embeddings
295
+ dropout=0.2 # Regularization
296
+ )
297
+ # Total: ~10M parameters
298
+ ```
299
+
300
+ ## Attention Visualization
301
+
302
+ ```python
303
+ # What each token attends to (lower triangular)
304
+ # Token t can only attend to tokens 0...t
305
+
306
+ Attention Pattern (causal mask):
307
+ t=0 t=1 t=2 t=3
308
+ t=0 ✓ - - -
309
+ t=1 ✓ ✓ - -
310
+ t=2 ✓ ✓ ✓ -
311
+ t=3 ✓ ✓ ✓ ✓
312
+
313
+ # Prevents "cheating" by looking at future tokens
314
+ ```
315
+
316
+ ## Residual Stream
317
+
318
+ **Information flow through residuals**:
319
+
320
+ ```python
321
+ # Input
322
+ x = token_emb + pos_emb
323
+
324
+ # Block 1
325
+ x = x + attn_1(ln(x)) # Attention adds to residual
326
+ x = x + mlp_1(ln(x)) # MLP adds to residual
327
+
328
+ # Block 2
329
+ x = x + attn_2(ln(x))
330
+ x = x + mlp_2(ln(x))
331
+
332
+ # ... (repeat for all layers)
333
+
334
+ # Output
335
+ logits = lm_head(ln(x))
336
+ ```
337
+
338
+ **Key insight**: Each layer refines the representation, residuals preserve gradients
339
+
340
+ ## Tokenization
341
+
342
+ ### Character-Level (Shakespeare)
343
+
344
+ ```python
345
+ # data/shakespeare_char/prepare.py
346
+ text = open('input.txt', 'r').read()
347
+ chars = sorted(list(set(text))) # ['!', ',', '.', 'A', 'B', ..., 'z']
348
+ vocab_size = len(chars) # 65
349
+
350
+ stoi = {ch: i for i, ch in enumerate(chars)}
351
+ itos = {i: ch for i, ch in enumerate(chars)}
352
+
353
+ # Encode
354
+ encode = lambda s: [stoi[c] for c in s]
355
+ decode = lambda l: ''.join([itos[i] for i in l])
356
+
357
+ data = torch.tensor(encode(text), dtype=torch.long)
358
+ ```
359
+
360
+ ### BPE (GPT-2)
361
+
362
+ ```python
363
+ # data/openwebtext/prepare.py
364
+ import tiktoken
365
+
366
+ enc = tiktoken.get_encoding("gpt2") # GPT-2 BPE tokenizer
367
+ vocab_size = enc.n_vocab # 50257
368
+
369
+ # Encode
370
+ tokens = enc.encode_ordinary("Hello world") # [15496, 995]
371
+
372
+ # Decode
373
+ text = enc.decode(tokens) # "Hello world"
374
+ ```
375
+
376
+ ## Resources
377
+
378
+ - **GitHub**: https://github.com/karpathy/nanoGPT ⭐ 48,000+
379
+ - **Video**: "Let's build GPT" by Andrej Karpathy
380
+ - **Paper**: "Attention is All You Need" (Vaswani et al.)
381
+ - **Paper**: "Language Models are Unsupervised Multitask Learners" (GPT-2)
382
+ - **Code walkthrough**: https://github.com/karpathy/nanoGPT/blob/master/ARCHITECTURE.md