@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,468 @@
1
+ # Context Extension Methods
2
+
3
+ Comprehensive comparison of YaRN, ALiBi, and Position Interpolation based on published research.
4
+
5
+ ## Table of Contents
6
+ - YaRN (Yet another RoPE extensioN)
7
+ - ALiBi (Attention with Linear Biases)
8
+ - Position Interpolation
9
+ - Method Comparison
10
+
11
+ ## YaRN: Yet another RoPE extensioN
12
+
13
+ **Paper**: arXiv 2309.00071 (2023)
14
+ **Authors**: Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole
15
+
16
+ ### Overview
17
+
18
+ YaRN extends RoPE-based models to 128k+ context with 10× less training data than previous methods.
19
+
20
+ ### Key Innovations
21
+
22
+ 1. **NTK-aware interpolation**: Scales different frequency components differently
23
+ 2. **Attention temperature scaling**: Adjusts attention sharpness
24
+ 3. **NTK-by-parts**: Hybrid interpolation/extrapolation
25
+
26
+ ### Technical Details
27
+
28
+ **Problem**: Naive position interpolation compresses all frequencies uniformly, losing high-frequency information.
29
+
30
+ **Solution**: Different treatment for different frequencies.
31
+
32
+ ```python
33
+ # Frequency decomposition
34
+ # Low frequencies (< 1/β_slow): Interpolate (compress)
35
+ # High frequencies (> 1/β_fast): Extrapolate (extend as-is)
36
+ # Middle frequencies: Smooth ramp between the two
37
+
38
+ def yarn_get_mscale(scale=1.0):
39
+ """Attention temperature scaling."""
40
+ if scale <= 1:
41
+ return 1.0
42
+ return 0.1 * math.log(scale) + 1.0
43
+
44
+ def yarn_find_correction_dim(num_rotations, dim, base=10000, max_position_embeddings=2048):
45
+ """Find dimension cutoffs for NTK-by-parts."""
46
+ return (dim * math.log(max_position_embeddings / (num_rotations * 2 * math.pi))) / (2 * math.log(base))
47
+
48
+ def yarn_find_correction_range(low_rot, high_rot, dim, base=10000, max_position_embeddings=2048):
49
+ """Find frequency ranges for interpolation."""
50
+ low = math.floor(yarn_find_correction_dim(low_rot, dim, base, max_position_embeddings))
51
+ high = math.ceil(yarn_find_correction_dim(high_rot, dim, base, max_position_embeddings))
52
+ return max(low, 0), min(high, dim - 1)
53
+
54
+ def yarn_linear_ramp_mask(min_val, max_val, dim):
55
+ """Create smooth ramp between interpolation and extrapolation."""
56
+ if min_val == max_val:
57
+ max_val += 0.001 # Avoid division by zero
58
+ linear_func = (torch.arange(dim, dtype=torch.float32) - min_val) / (max_val - min_val)
59
+ ramp_func = torch.clamp(linear_func, 0, 1)
60
+ return ramp_func
61
+ ```
62
+
63
+ ### Complete YaRN Implementation
64
+
65
+ ```python
66
+ class YaRNScaledRoPE(nn.Module):
67
+ """Full YaRN implementation."""
68
+
69
+ def __init__(
70
+ self,
71
+ dim,
72
+ max_position_embeddings=2048,
73
+ base=10000,
74
+ scale=1.0,
75
+ original_max_position_embeddings=2048,
76
+ extrapolation_factor=1.0,
77
+ attn_factor=1.0,
78
+ beta_fast=32,
79
+ beta_slow=1,
80
+ device=None
81
+ ):
82
+ super().__init__()
83
+ self.dim = dim
84
+ self.max_position_embeddings = max_position_embeddings
85
+ self.base = base
86
+ self.scale = scale
87
+ self.original_max_position_embeddings = original_max_position_embeddings
88
+ self.extrapolation_factor = extrapolation_factor
89
+ self.attn_factor = attn_factor
90
+ self.beta_fast = beta_fast
91
+ self.beta_slow = beta_slow
92
+
93
+ # Compute mscale (attention temperature)
94
+ self.mscale = float(yarn_get_mscale(self.scale) * self.attn_factor)
95
+
96
+ # Compute frequency bands
97
+ self.low, self.high = yarn_find_correction_range(
98
+ self.beta_fast,
99
+ self.beta_slow,
100
+ self.dim,
101
+ self.base,
102
+ self.original_max_position_embeddings
103
+ )
104
+
105
+ # Compute inverse frequencies
106
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.float32) / self.dim))
107
+
108
+ # Create ramp mask
109
+ inv_freq_mask = 1.0 - yarn_linear_ramp_mask(self.low, self.high, self.dim // 2)
110
+ inv_freq = inv_freq / ((1 - inv_freq_mask) * self.extrapolation_factor + inv_freq_mask)
111
+
112
+ self.register_buffer("inv_freq", inv_freq)
113
+
114
+ def forward(self, seq_len, device):
115
+ t = torch.arange(seq_len, device=device, dtype=self.inv_freq.dtype)
116
+
117
+ # Apply YaRN scaling
118
+ freqs = torch.outer(t, self.inv_freq)
119
+
120
+ # Attention temperature scaling
121
+ emb = torch.cat((freqs, freqs), dim=-1)
122
+ cos = emb.cos() * self.mscale
123
+ sin = emb.sin() * self.mscale
124
+
125
+ return cos, sin
126
+ ```
127
+
128
+ ### YaRN Parameters
129
+
130
+ ```python
131
+ # Default YaRN configuration (from paper)
132
+ yarn_config = {
133
+ "scale": 16, # 16× extension (2k → 32k)
134
+ "original_max_position": 2048, # Original context length
135
+ "extrapolation_factor": 1.0, # How much to extrapolate high freqs
136
+ "attn_factor": 1.0, # Base attention temperature
137
+ "beta_fast": 32, # High-frequency threshold
138
+ "beta_slow": 1, # Low-frequency threshold
139
+ }
140
+
141
+ # For larger extensions (64k, 128k)
142
+ yarn_config_large = {
143
+ "scale": 64,
144
+ "beta_fast": 64, # Increase for larger scales
145
+ "beta_slow": 2,
146
+ }
147
+ ```
148
+
149
+ ### Performance
150
+
151
+ **Results from paper (LLaMA 7B)**:
152
+
153
+ | Method | Training Tokens | Steps | Final Perplexity | Context Length |
154
+ |--------|----------------|-------|------------------|----------------|
155
+ | Full Fine-tune | 10B | 10000 | 11.2 | 32k |
156
+ | Position Interpolation | 1B | 1000 | 12.5 | 32k |
157
+ | **YaRN** | **100M** | **400** | **11.8** | **32k** |
158
+
159
+ **10× less data, 2.5× less steps than Position Interpolation!**
160
+
161
+ ## ALiBi: Attention with Linear Biases
162
+
163
+ **Paper**: arXiv 2108.12409 (ICLR 2022)
164
+ **Authors**: Ofir Press, Noah A. Smith, Mike Lewis
165
+ **Title**: "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation"
166
+
167
+ ### Core Concept
168
+
169
+ **Key idea**: Don't add positional embeddings. Instead, bias attention scores based on distance.
170
+
171
+ ```
172
+ attention_score[i, j] = q_i · k_j + bias[i, j]
173
+
174
+ where bias[i, j] = -m * |i - j|
175
+ m = slope for each head
176
+ ```
177
+
178
+ ### Mathematical Formulation
179
+
180
+ **Standard attention**:
181
+ ```
182
+ Attention(Q, K, V) = softmax(QK^T / √d_k) V
183
+ ```
184
+
185
+ **ALiBi attention**:
186
+ ```
187
+ Attention(Q, K, V) = softmax((QK^T + m · L) / √d_k) V
188
+
189
+ where L[i,j] = -(i - j) (lower triangular)
190
+ m = head-specific slope
191
+ ```
192
+
193
+ ### Implementation
194
+
195
+ ```python
196
+ import math
197
+ import torch
198
+ import torch.nn.functional as F
199
+
200
+ def get_alibi_slopes(num_heads):
201
+ """Compute ALiBi slope for each attention head.
202
+
203
+ Source: Official ALiBi implementation
204
+ """
205
+ def get_slopes_power_of_2(n):
206
+ start = 2 ** (-(2 ** -(math.log2(n) - 3)))
207
+ ratio = start
208
+ return [start * (ratio ** i) for i in range(n)]
209
+
210
+ # If power of 2
211
+ if math.log2(num_heads).is_integer():
212
+ return get_slopes_power_of_2(num_heads)
213
+
214
+ # If not power of 2, use closest power of 2 and interpolate
215
+ closest_power_of_2 = 2 ** math.floor(math.log2(num_heads))
216
+ slopes = get_slopes_power_of_2(closest_power_of_2)
217
+
218
+ # Add extra slopes from next power of 2
219
+ extra_slopes = get_slopes_power_of_2(2 * closest_power_of_2)
220
+ slopes.extend(extra_slopes[0::2][:num_heads - closest_power_of_2])
221
+
222
+ return slopes
223
+
224
+ def create_alibi_bias(seq_len, num_heads, device='cpu'):
225
+ """Create ALiBi attention bias matrix."""
226
+ # Relative positions: L[i, j] = -(i - j)
227
+ context_position = torch.arange(seq_len, device=device)[:, None]
228
+ memory_position = torch.arange(seq_len, device=device)[None, :]
229
+
230
+ # Distance matrix (negative for causal)
231
+ relative_position = memory_position - context_position
232
+ relative_position = torch.abs(relative_position).unsqueeze(0) # (1, seq_len, seq_len)
233
+
234
+ # Get slopes for each head
235
+ slopes = torch.tensor(get_alibi_slopes(num_heads), device=device).unsqueeze(-1).unsqueeze(-1)
236
+
237
+ # Apply slopes: (num_heads, seq_len, seq_len)
238
+ alibi = -slopes * relative_position
239
+
240
+ return alibi
241
+
242
+ def alibi_attention(query, key, value, num_heads, scale=None):
243
+ """Multi-head attention with ALiBi."""
244
+ batch_size, seq_len, embed_dim = query.shape
245
+ head_dim = embed_dim // num_heads
246
+
247
+ if scale is None:
248
+ scale = head_dim ** -0.5
249
+
250
+ # Reshape for multi-head: (batch, num_heads, seq_len, head_dim)
251
+ query = query.reshape(batch_size, seq_len, num_heads, head_dim).transpose(1, 2)
252
+ key = key.reshape(batch_size, seq_len, num_heads, head_dim).transpose(1, 2)
253
+ value = value.reshape(batch_size, seq_len, num_heads, head_dim).transpose(1, 2)
254
+
255
+ # Attention scores: (batch, num_heads, seq_len, seq_len)
256
+ attn_scores = torch.matmul(query, key.transpose(-2, -1)) * scale
257
+
258
+ # Add ALiBi bias
259
+ alibi_bias = create_alibi_bias(seq_len, num_heads, device=query.device)
260
+ attn_scores = attn_scores + alibi_bias
261
+
262
+ # Softmax and apply to values
263
+ attn_weights = F.softmax(attn_scores, dim=-1)
264
+ output = torch.matmul(attn_weights, value)
265
+
266
+ # Reshape back: (batch, seq_len, embed_dim)
267
+ output = output.transpose(1, 2).reshape(batch_size, seq_len, embed_dim)
268
+
269
+ return output
270
+ ```
271
+
272
+ ### Slope Values
273
+
274
+ **Example slopes for 8 heads**:
275
+ ```python
276
+ slopes = get_alibi_slopes(8)
277
+ # Output: [0.0625, 0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0]
278
+
279
+ # Each head has different slope
280
+ # → Different heads attend to different distance ranges
281
+ # → Head 1: Strong recency bias (slope=8.0)
282
+ # → Head 8: Weak recency bias (slope=0.0625)
283
+ ```
284
+
285
+ ### Advantages
286
+
287
+ 1. **No position limit**: Works for any sequence length
288
+ 2. **Efficient**: 11% less memory than sinusoidal embeddings
289
+ 3. **Fast**: 11% faster training
290
+ 4. **Extrapolates well**: Train 1k, test 2k+ tokens
291
+ 5. **Simple**: No learned parameters for position
292
+
293
+ ### Disadvantages
294
+
295
+ 1. **Requires pre-training**: Can't retrofit existing models
296
+ 2. **Recency bias**: Always biases toward recent tokens (may not suit all tasks)
297
+
298
+ ## Position Interpolation
299
+
300
+ **Paper**: arXiv 2306.15595 (2023)
301
+ **Authors**: Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
302
+ **Title**: "Extending Context Window of Large Language Models via Positional Interpolation"
303
+
304
+ ### Core Idea
305
+
306
+ Instead of extrapolating positions beyond training range, interpolate within trained range.
307
+
308
+ ```
309
+ # Extrapolation (bad): positions [0, 1, 2, ..., 2048, 2049, ..., 32768]
310
+ # Positions > 2048 are out-of-distribution
311
+
312
+ # Interpolation (good): positions [0, 0.0625, 0.125, ..., 2048]
313
+ # All positions within [0, 2048] (in-distribution)
314
+ ```
315
+
316
+ ### Mathematical Formulation
317
+
318
+ **Original RoPE**:
319
+ ```
320
+ position_ids = [0, 1, 2, 3, ..., L-1]
321
+ ```
322
+
323
+ **Position Interpolation** (scale factor s):
324
+ ```
325
+ position_ids = [0, 1/s, 2/s, 3/s, ..., (L-1)/s]
326
+ ```
327
+
328
+ ### Implementation
329
+
330
+ ```python
331
+ class InterpolatedRoPE(nn.Module):
332
+ """RoPE with position interpolation."""
333
+
334
+ def __init__(self, dim, max_seq_len=2048, base=10000, scaling_factor=1.0):
335
+ super().__init__()
336
+ self.scaling_factor = scaling_factor
337
+
338
+ # Standard RoPE frequencies
339
+ inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim))
340
+ self.register_buffer("inv_freq", inv_freq)
341
+
342
+ def forward(self, seq_len, device):
343
+ # Position indices
344
+ t = torch.arange(seq_len, device=device).type_as(self.inv_freq)
345
+
346
+ # Interpolate positions
347
+ t = t / self.scaling_factor # KEY LINE
348
+
349
+ # Standard RoPE
350
+ freqs = torch.outer(t, self.inv_freq)
351
+ emb = torch.cat((freqs, freqs), dim=-1)
352
+ return emb.cos(), emb.sin()
353
+ ```
354
+
355
+ ### Fine-tuning Requirements
356
+
357
+ **Minimal fine-tuning needed**:
358
+
359
+ ```python
360
+ # Extension: 2k → 32k (16× scale)
361
+ scaling_factor = 16.0
362
+
363
+ # Training config
364
+ training_args = {
365
+ "max_steps": 1000, # Only 1000 steps!
366
+ "learning_rate": 2e-5, # Small LR
367
+ "batch_size": 1,
368
+ "gradient_accumulation_steps": 16,
369
+ }
370
+
371
+ # Results: Near-perfect perplexity retention
372
+ ```
373
+
374
+ ### Theoretical Analysis
375
+
376
+ **Interpolation bound** (from paper):
377
+
378
+ Upper bound of interpolation error is ~600× smaller than extrapolation error.
379
+
380
+ ```
381
+ Extrapolation error: O(L^2) # Grows quadratically
382
+ Interpolation error: O(1/s) # Shrinks linearly with scale
383
+ ```
384
+
385
+ ### Results
386
+
387
+ **LLaMA models extended to 32k**:
388
+
389
+ | Model | Original Context | Extended Context | Fine-tune Steps | Perplexity |
390
+ |-------|-----------------|------------------|----------------|------------|
391
+ | LLaMA 7B | 2048 | 32768 | 1000 | 2.72 |
392
+ | LLaMA 13B | 2048 | 32768 | 1000 | 2.55 |
393
+ | LLaMA 33B | 2048 | 32768 | 1000 | 2.38 |
394
+ | LLaMA 65B | 2048 | 32768 | 1000 | 2.26 |
395
+
396
+ **Passkey retrieval**: 100% accuracy up to 32k tokens
397
+
398
+ ### Advantages
399
+
400
+ 1. **Minimal training**: 1000 steps sufficient
401
+ 2. **Stable**: Interpolation more stable than extrapolation
402
+ 3. **Simple**: One-line code change
403
+ 4. **Effective**: Works across all LLaMA sizes
404
+
405
+ ### Disadvantages
406
+
407
+ 1. **Limited extrapolation**: Can't go beyond trained range without fine-tuning
408
+ 2. **Information compression**: All positions compressed into trained range
409
+
410
+ ## Method Comparison
411
+
412
+ ### Training Requirements
413
+
414
+ | Method | Pre-training Needed | Fine-tuning Steps | Training Tokens |
415
+ |--------|---------------------|-------------------|-----------------|
416
+ | **ALiBi** | Yes (from scratch) | 0 | Full (100B+) |
417
+ | **Position Interpolation** | No | 1,000 | ~100M |
418
+ | **YaRN** | No | 400 | ~100M |
419
+ | **Linear RoPE Scaling** | No | 1,000-5,000 | ~1B |
420
+
421
+ ### Extrapolation Performance
422
+
423
+ **Test**: Train on 2k, test on 8k, 16k, 32k
424
+
425
+ | Method | 8k PPL | 16k PPL | 32k PPL | Extrapolation Quality |
426
+ |--------|--------|---------|---------|----------------------|
427
+ | **ALiBi** | 12.1 | 12.3 | 12.5 | Excellent |
428
+ | **YaRN** | 11.8 | 12.0 | 12.2 | Excellent |
429
+ | **Position Interpolation** | 12.5 | 13.2 | 14.8 | Poor |
430
+ | **Linear Scaling** | 13.1 | 15.2 | 19.4 | Poor |
431
+
432
+ ### Memory and Speed
433
+
434
+ | Method | Memory vs Baseline | Speed vs Baseline |
435
+ |--------|--------------------|--------------------|
436
+ | **ALiBi** | -11% | +11% |
437
+ | **Position Interpolation** | 0% | 0% |
438
+ | **YaRN** | 0% | -5% |
439
+ | **Linear Scaling** | 0% | 0% |
440
+
441
+ ### Use Case Recommendations
442
+
443
+ ```python
444
+ # New model from scratch → ALiBi
445
+ if training_from_scratch:
446
+ use_method = "ALiBi"
447
+
448
+ # Extending existing RoPE model with best quality → YaRN
449
+ elif need_sota_quality:
450
+ use_method = "YaRN"
451
+
452
+ # Quick extension with minimal compute → Position Interpolation
453
+ elif need_quick_solution:
454
+ use_method = "Position Interpolation"
455
+
456
+ # Moderate extension, simple implementation → Linear Scaling
457
+ else:
458
+ use_method = "Linear RoPE Scaling"
459
+ ```
460
+
461
+ ## Resources
462
+
463
+ - **YaRN Paper**: https://arxiv.org/abs/2309.00071
464
+ - **ALiBi Paper**: https://arxiv.org/abs/2108.12409
465
+ - **Position Interpolation Paper**: https://arxiv.org/abs/2306.15595
466
+ - **YaRN Implementation**: https://github.com/jquesnelle/yarn
467
+ - **ALiBi Implementation**: https://github.com/ofirpress/attention_with_linear_biases
468
+ - **Together AI Blog**: https://www.together.ai/blog/llama-2-7b-32k