@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,369 @@
1
+ # RWKV State Management
2
+
3
+ ## Understanding RWKV State
4
+
5
+ Unlike Transformers with KV cache, RWKV maintains a **fixed-size recurrent state** that summarizes all previous context.
6
+
7
+ ### State Components
8
+
9
+ ```python
10
+ state = {
11
+ 'att_aa': torch.zeros(n_layers, d_model), # Attention numerator accumulator
12
+ 'att_ab': torch.zeros(n_layers, d_model), # Attention denominator accumulator
13
+ 'att_x_prev': torch.zeros(n_layers, d_model), # Previous x for time-mixing
14
+ 'ffn_x_prev': torch.zeros(n_layers, d_model) # Previous x for channel-mixing
15
+ }
16
+ ```
17
+
18
+ **Total state size**: `4 × n_layers × d_model` parameters
19
+
20
+ | Model | Layers | d_model | State Size |
21
+ |-------|--------|---------|------------|
22
+ | RWKV-169M | 12 | 768 | 37 KB |
23
+ | RWKV-430M | 24 | 1024 | 98 KB |
24
+ | RWKV-1.5B | 24 | 2048 | 196 KB |
25
+ | RWKV-3B | 32 | 2560 | 327 KB |
26
+ | RWKV-7B | 32 | 4096 | 524 KB |
27
+ | RWKV-14B | 40 | 5120 | 819 KB |
28
+
29
+ **Constant memory** regardless of context length!
30
+
31
+ ## State Initialization
32
+
33
+ ### Zero State (Default)
34
+
35
+ ```python
36
+ from rwkv.model import RWKV
37
+
38
+ model = RWKV(model='/path/to/RWKV-4-Pile-1B5', strategy='cuda fp16')
39
+
40
+ # Start with zero state (no context)
41
+ state = None
42
+ out, state = model.forward(tokens, state)
43
+ ```
44
+
45
+ ### Warm State (Preloaded Context)
46
+
47
+ ```python
48
+ # Load context once
49
+ context = "The capital of France is Paris. The capital of Germany is Berlin."
50
+ context_tokens = tokenizer.encode(context)
51
+
52
+ # Process context to build state
53
+ state = None
54
+ for token in context_tokens:
55
+ _, state = model.forward([token], state)
56
+
57
+ # Now use warm state for queries
58
+ query = " The capital of Italy is"
59
+ query_tokens = tokenizer.encode(query)
60
+ out, state = model.forward(query_tokens, state)
61
+ # Model "remembers" Paris and Berlin examples!
62
+ ```
63
+
64
+ ### Shared State (Multi-turn Conversations)
65
+
66
+ ```python
67
+ # Conversation with persistent state
68
+ state = None
69
+
70
+ # Turn 1
71
+ user1 = "My name is Alice."
72
+ tokens1 = tokenizer.encode(user1)
73
+ _, state = model.forward(tokens1, state)
74
+
75
+ # Turn 2
76
+ user2 = "What is my name?"
77
+ tokens2 = tokenizer.encode(user2)
78
+ response, state = model.forward(tokens2, state)
79
+ # Response: "Alice" (state remembers!)
80
+ ```
81
+
82
+ ## State Update Rules
83
+
84
+ ### Time-Mixing State Update
85
+
86
+ ```python
87
+ # Before processing token t
88
+ att_aa_t = att_aa_{t-1} # Previous numerator
89
+ att_ab_t = att_ab_{t-1} # Previous denominator
90
+
91
+ # Compute WKV
92
+ wkv_t = (exp(u) * k_t * v_t + att_aa_t) / (exp(u) * k_t + att_ab_t)
93
+
94
+ # Update state for token t+1
95
+ w = -exp(time_decay) # Decay factor
96
+ att_aa_{t+1} = exp(w) * att_aa_t + k_t * v_t
97
+ att_ab_{t+1} = exp(w) * att_ab_t + k_t
98
+ att_x_prev_{t+1} = x_t
99
+ ```
100
+
101
+ **Effect of time_decay**:
102
+ - **w = -0.01** (small decay): State decays slowly → long memory
103
+ - **w = -5.0** (large decay): State decays quickly → short memory
104
+
105
+ ### Channel-Mixing State Update
106
+
107
+ ```python
108
+ # Simply store previous x for next token
109
+ ffn_x_prev_{t+1} = x_t
110
+ ```
111
+
112
+ ## State Serialization
113
+
114
+ ### Save/Load State (PyTorch)
115
+
116
+ ```python
117
+ import torch
118
+
119
+ # Save conversation state
120
+ state_dict = {
121
+ 'att_aa': state[0],
122
+ 'att_ab': state[1],
123
+ 'att_x_prev': state[2],
124
+ 'ffn_x_prev': state[3]
125
+ }
126
+ torch.save(state_dict, 'conversation_123.pt')
127
+
128
+ # Load state
129
+ loaded = torch.load('conversation_123.pt')
130
+ state = (loaded['att_aa'], loaded['att_ab'], loaded['att_x_prev'], loaded['ffn_x_prev'])
131
+
132
+ # Continue conversation
133
+ out, state = model.forward(new_tokens, state)
134
+ ```
135
+
136
+ ### State Compression (Optional)
137
+
138
+ ```python
139
+ # FP16 state (half size)
140
+ state_fp16 = tuple(s.half() for s in state)
141
+ torch.save(state_fp16, 'state_compressed.pt')
142
+
143
+ # Restore
144
+ state = tuple(s.float() for s in torch.load('state_compressed.pt'))
145
+ ```
146
+
147
+ ## Multi-Session State Management
148
+
149
+ ### Session State Store
150
+
151
+ ```python
152
+ class StateManager:
153
+ def __init__(self):
154
+ self.sessions = {} # session_id -> state
155
+
156
+ def get_state(self, session_id):
157
+ return self.sessions.get(session_id, None)
158
+
159
+ def save_state(self, session_id, state):
160
+ self.sessions[session_id] = state
161
+
162
+ def clear_session(self, session_id):
163
+ if session_id in self.sessions:
164
+ del self.sessions[session_id]
165
+
166
+ # Usage
167
+ manager = StateManager()
168
+
169
+ # User 1 conversation
170
+ state1 = manager.get_state('user_1')
171
+ out1, state1 = model.forward(tokens1, state1)
172
+ manager.save_state('user_1', state1)
173
+
174
+ # User 2 conversation (independent state)
175
+ state2 = manager.get_state('user_2')
176
+ out2, state2 = model.forward(tokens2, state2)
177
+ manager.save_state('user_2', state2)
178
+ ```
179
+
180
+ ### State Expiration
181
+
182
+ ```python
183
+ import time
184
+
185
+ class StateManagerWithExpiry:
186
+ def __init__(self, expiry_seconds=3600):
187
+ self.sessions = {} # session_id -> (state, timestamp)
188
+ self.expiry = expiry_seconds
189
+
190
+ def get_state(self, session_id):
191
+ if session_id in self.sessions:
192
+ state, timestamp = self.sessions[session_id]
193
+ if time.time() - timestamp < self.expiry:
194
+ return state
195
+ else:
196
+ del self.sessions[session_id] # Expired
197
+ return None
198
+
199
+ def save_state(self, session_id, state):
200
+ self.sessions[session_id] = (state, time.time())
201
+ ```
202
+
203
+ ## State Interpolation
204
+
205
+ ### Blending States
206
+
207
+ ```python
208
+ # Average two states (e.g., merging conversations)
209
+ def blend_states(state1, state2, alpha=0.5):
210
+ """Blend state1 and state2 with weight alpha."""
211
+ return tuple(
212
+ alpha * s1 + (1 - alpha) * s2
213
+ for s1, s2 in zip(state1, state2)
214
+ )
215
+
216
+ # Example: Blend Alice and Bob conversation contexts
217
+ state_blended = blend_states(state_alice, state_bob, alpha=0.7)
218
+ # 70% Alice context, 30% Bob context
219
+ ```
220
+
221
+ ### State Editing
222
+
223
+ ```python
224
+ # Manually edit state (advanced)
225
+ # Example: Reduce long-term memory influence
226
+
227
+ def decay_state(state, decay_factor=0.5):
228
+ """Reduce state magnitude (forget older context)."""
229
+ att_aa, att_ab, att_x_prev, ffn_x_prev = state
230
+ return (
231
+ att_aa * decay_factor,
232
+ att_ab * decay_factor,
233
+ att_x_prev, # Keep recent x
234
+ ffn_x_prev # Keep recent x
235
+ )
236
+
237
+ # Usage
238
+ state = decay_state(state, decay_factor=0.3) # Forget 70% of history
239
+ ```
240
+
241
+ ## Batch Inference with States
242
+
243
+ ### Independent Batch States
244
+
245
+ ```python
246
+ # Each sequence in batch has separate state
247
+ batch_size = 4
248
+ states = [None] * batch_size
249
+
250
+ for i, tokens in enumerate(batch_sequences):
251
+ out, states[i] = model.forward(tokens, states[i])
252
+ ```
253
+
254
+ ### Shared Prefix Optimization
255
+
256
+ ```python
257
+ # All sequences share common prefix (e.g., system prompt)
258
+ prefix = "You are a helpful assistant."
259
+ prefix_tokens = tokenizer.encode(prefix)
260
+
261
+ # Compute prefix state once
262
+ prefix_state = None
263
+ _, prefix_state = model.forward(prefix_tokens, None)
264
+
265
+ # Clone prefix state for each sequence
266
+ states = [prefix_state] * batch_size
267
+
268
+ # Process user queries (independent)
269
+ for i, user_query in enumerate(user_queries):
270
+ tokens = tokenizer.encode(user_query)
271
+ out, states[i] = model.forward(tokens, states[i])
272
+ ```
273
+
274
+ ## State Debugging
275
+
276
+ ### Inspect State Magnitudes
277
+
278
+ ```python
279
+ def inspect_state(state):
280
+ """Print state statistics for debugging."""
281
+ att_aa, att_ab, att_x_prev, ffn_x_prev = state
282
+
283
+ print("State magnitudes:")
284
+ print(f" att_aa: mean={att_aa.abs().mean():.4f}, max={att_aa.abs().max():.4f}")
285
+ print(f" att_ab: mean={att_ab.abs().mean():.4f}, max={att_ab.abs().max():.4f}")
286
+ print(f" att_x_prev: mean={att_x_prev.abs().mean():.4f}, max={att_x_prev.abs().max():.4f}")
287
+ print(f" ffn_x_prev: mean={ffn_x_prev.abs().mean():.4f}, max={ffn_x_prev.abs().max():.4f}")
288
+
289
+ # Usage
290
+ inspect_state(state)
291
+ ```
292
+
293
+ **Healthy ranges**:
294
+ - `att_aa`, `att_ab`: 0.1 - 10.0 (if much larger, may overflow)
295
+ - `att_x_prev`, `ffn_x_prev`: Similar to input embedding magnitude
296
+
297
+ ### State Divergence Check
298
+
299
+ ```python
300
+ def state_distance(state1, state2):
301
+ """Compute L2 distance between two states."""
302
+ return sum(
303
+ torch.dist(s1, s2).item()
304
+ for s1, s2 in zip(state1, state2)
305
+ )
306
+
307
+ # Example: Check if states diverged
308
+ distance = state_distance(state_alice, state_bob)
309
+ print(f"State distance: {distance:.2f}")
310
+ # Large distance → very different contexts
311
+ ```
312
+
313
+ ## Numerical Stability Considerations
314
+
315
+ ### Overflow Prevention
316
+
317
+ ```python
318
+ # Issue: att_aa, att_ab can grow unbounded
319
+ # If att_aa > 1e10, numerical precision issues
320
+
321
+ # Solution 1: Periodic normalization
322
+ if att_aa.abs().max() > 1e6:
323
+ scale = att_aa.abs().max()
324
+ att_aa = att_aa / scale
325
+ att_ab = att_ab / scale
326
+ ```
327
+
328
+ ### Underflow Prevention
329
+
330
+ ```python
331
+ # Issue: With large negative time_decay, state can underflow to 0
332
+
333
+ # Solution: Clip time_decay
334
+ time_decay = torch.clamp(time_decay, min=-8.0, max=-0.1)
335
+ # Ensures state doesn't decay too fast
336
+ ```
337
+
338
+ ## State vs KV Cache Comparison
339
+
340
+ ### Memory Usage (8K context)
341
+
342
+ | Model Type | Model Size | KV Cache Size | RWKV State Size |
343
+ |------------|------------|---------------|-----------------|
344
+ | Transformer | 1.3B | 4.1 GB | - |
345
+ | **RWKV** | **1.5B** | **-** | **196 KB** |
346
+ | Transformer | 7B | 21.3 GB | - |
347
+ | **RWKV** | **7B** | **-** | **524 KB** |
348
+
349
+ **RWKV advantage**: 10,000× smaller than KV cache!
350
+
351
+ ### Information Retention
352
+
353
+ **KV Cache (Transformer)**:
354
+ - Perfect: Stores all previous keys and values
355
+ - Retrieval: Exact attention to any previous token
356
+ - Cost: O(n) memory growth
357
+
358
+ **RWKV State**:
359
+ - Lossy: Compressed representation of history
360
+ - Retrieval: Weighted blend of previous tokens (decay-based)
361
+ - Cost: O(1) constant memory
362
+
363
+ **Trade-off**: RWKV sacrifices perfect recall for constant memory
364
+
365
+ ## Resources
366
+
367
+ - State management examples: https://github.com/BlinkDL/ChatRWKV
368
+ - Wiki: https://wiki.rwkv.com/state-management
369
+ - Discord: https://discord.gg/bDSBUMeFpc (RWKV community)