@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,206 @@
1
+ # Mamba Architecture Details
2
+
3
+ ## Selective State Space Mechanism
4
+
5
+ Mamba's core innovation is the **Selective SSM (S6)** layer that makes state space model parameters input-dependent.
6
+
7
+ ### How S6 Works
8
+
9
+ **Traditional SSMs** (non-selective):
10
+ ```python
11
+ # Fixed A, B, C matrices for all inputs
12
+ h(t) = A * h(t-1) + B * x(t) # State update
13
+ y(t) = C * h(t) # Output
14
+ ```
15
+
16
+ **Mamba's Selective SSM**:
17
+ ```python
18
+ # Input-dependent parameters
19
+ B(t) = Linear_B(x(t)) # Selection mechanism
20
+ C(t) = Linear_C(x(t)) # Output projection
21
+ Δ(t) = Linear_Δ(x(t)) # Discretization step
22
+
23
+ # Selective state update
24
+ h(t) = discretize(A, Δ(t)) * h(t-1) + Δ(t) * B(t) * x(t)
25
+ y(t) = C(t) * h(t)
26
+ ```
27
+
28
+ ### Key Advantages
29
+
30
+ **1. Content-based reasoning**:
31
+ - Can selectively remember or forget based on input
32
+ - Addresses discrete modality weakness of traditional SSMs
33
+ - Example: Remembers important tokens, forgets padding
34
+
35
+ **2. Input-dependent selection**:
36
+ ```python
37
+ # Mamba decides per token what to remember
38
+ if is_important(x(t)):
39
+ Δ(t) = large_value # Keep in state
40
+ else:
41
+ Δ(t) = small_value # Forget quickly
42
+ ```
43
+
44
+ **3. No attention required**:
45
+ - Replaces O(n²) attention with O(n) state updates
46
+ - State dimension is constant (typically 16)
47
+
48
+ ## Model Configuration
49
+
50
+ ### Core Parameters
51
+
52
+ ```python
53
+ from mamba_ssm import Mamba
54
+
55
+ model = Mamba(
56
+ d_model=256, # Hidden dimension (256, 512, 768, 1024, 2048)
57
+ d_state=16, # SSM state dimension (fixed at 16 is optimal)
58
+ d_conv=4, # Local convolution width (4 is standard)
59
+ expand=2, # Expansion factor (1.5-2.0)
60
+ dt_rank="auto", # Rank of Δ projection (auto = d_model / 16)
61
+ dt_min=0.001, # Min Δ init (controls forgetting rate)
62
+ dt_max=0.1, # Max Δ init
63
+ dt_init="random", # Δ initialization (random, constant)
64
+ dt_scale=1.0, # Δ scaling factor
65
+ conv_bias=True, # Use bias in convolution
66
+ bias=False # Use bias in linear projections
67
+ )
68
+ ```
69
+
70
+ ### Parameter Impact
71
+
72
+ **d_state** (SSM state dimension):
73
+ - Standard: 16 (optimal from ablations)
74
+ - Smaller (8): Faster but less capacity
75
+ - Larger (32, 64): Minimal improvement, 2× slower
76
+
77
+ **expand** (block expansion):
78
+ - Standard: 2.0
79
+ - Range: 1.5-2.0
80
+ - Controls inner dimension = expand * d_model
81
+
82
+ **d_conv** (convolution width):
83
+ - Standard: 4
84
+ - Local context window before SSM
85
+ - Helps with positional information
86
+
87
+ **dt_rank** (Δ projection rank):
88
+ - Auto: d_model / 16 (recommended)
89
+ - Controls Δ parameter efficiency
90
+ - Lower rank = more efficient but less expressive
91
+
92
+ ## Mamba Block Structure
93
+
94
+ ```python
95
+ # Mamba block (replaces Transformer block)
96
+ class MambaBlock(nn.Module):
97
+ def __init__(self, d_model):
98
+ self.norm = RMSNorm(d_model)
99
+ self.mamba = Mamba(d_model, d_state=16, d_conv=4, expand=2)
100
+
101
+ def forward(self, x):
102
+ return x + self.mamba(self.norm(x)) # Residual
103
+
104
+ # Full model (stack of Mamba blocks)
105
+ model = nn.Sequential(
106
+ Embedding(...),
107
+ *[MambaBlock(d_model) for _ in range(n_layers)],
108
+ RMSNorm(d_model),
109
+ LMHead(...)
110
+ )
111
+ ```
112
+
113
+ **Key differences from Transformers**:
114
+ - No multi-head attention (MHA)
115
+ - No feedforward network (FFN)
116
+ - Single Mamba layer per block
117
+ - 2× more layers than equivalent Transformer
118
+
119
+ ## Hardware-Aware Implementation
120
+
121
+ ### Parallel Algorithm
122
+
123
+ Mamba uses a **scan-based parallel algorithm** for training:
124
+
125
+ ```python
126
+ # Parallel mode (training)
127
+ # GPU kernel fuses operations
128
+ y = parallel_scan(A, B, C, x) # O(n log n) parallel
129
+
130
+ # Sequential mode (inference)
131
+ # Constant memory RNN-style
132
+ h = 0
133
+ for x_t in sequence:
134
+ h = A*h + B*x_t
135
+ y_t = C*h
136
+ ```
137
+
138
+ ### Memory Efficiency
139
+
140
+ **Training**:
141
+ - Recomputes activations in backward pass
142
+ - Similar to FlashAttention strategy
143
+ - Memory: O(batch_size * seq_len * d_model)
144
+
145
+ **Inference**:
146
+ - RNN-style sequential processing
147
+ - State size: O(d_model * d_state) = constant
148
+ - No KV cache needed (huge advantage!)
149
+
150
+ ### CUDA Kernel Optimizations
151
+
152
+ ```python
153
+ # Fused kernel operations
154
+ - Discretization (continuous → discrete A, B)
155
+ - SSM recurrence (parallel scan)
156
+ - Convolution (efficient 1D conv)
157
+ - All in single GPU kernel
158
+ ```
159
+
160
+ ## Layer Count Scaling
161
+
162
+ Mamba models use **2× layers** compared to Transformers:
163
+
164
+ | Model | d_model | n_layers | Params |
165
+ |-------|---------|----------|--------|
166
+ | Mamba-130M | 768 | 24 | 130M |
167
+ | Mamba-370M | 1024 | 48 | 370M |
168
+ | Mamba-790M | 1536 | 48 | 790M |
169
+ | Mamba-1.4B | 2048 | 48 | 1.4B |
170
+ | Mamba-2.8B | 2560 | 64 | 2.8B |
171
+
172
+ **Why 2× layers?**
173
+ - Mamba blocks are simpler (no MHA, no FFN)
174
+ - ~50% fewer parameters per layer
175
+ - Doubling layers matches compute budget
176
+
177
+ ## Initialization Strategy
178
+
179
+ ```python
180
+ # Δ (discretization step) initialization
181
+ dt_init_floor = 1e-4
182
+ dt = torch.exp(
183
+ torch.rand(d_inner) * (math.log(dt_max) - math.log(dt_min))
184
+ + math.log(dt_min)
185
+ ).clamp(min=dt_init_floor)
186
+
187
+ # A (state transition) initialization
188
+ A = -torch.exp(torch.rand(d_inner, d_state)) # Negative for stability
189
+
190
+ # B, C (input/output) initialization
191
+ B = torch.randn(d_inner, d_state)
192
+ C = torch.randn(d_inner, d_state)
193
+ ```
194
+
195
+ **Critical for stability**:
196
+ - A must be negative (exponential decay)
197
+ - Δ in range [dt_min, dt_max]
198
+ - Random initialization helps diversity
199
+
200
+ ## Resources
201
+
202
+ - Paper: https://arxiv.org/abs/2312.00752 (Mamba-1)
203
+ - Paper: https://arxiv.org/abs/2405.21060 (Mamba-2)
204
+ - GitHub: https://github.com/state-spaces/mamba
205
+ - Models: https://huggingface.co/state-spaces
206
+ - CUDA kernels: https://github.com/state-spaces/mamba/tree/main/csrc
@@ -0,0 +1,255 @@
1
+ # Mamba Performance Benchmarks
2
+
3
+ ## Inference Speed Comparison
4
+
5
+ ### Throughput (tokens/sec)
6
+
7
+ **Mamba-1.4B vs Transformer-1.3B** on single A100 80GB:
8
+
9
+ | Sequence Length | Mamba-1.4B | Transformer-1.3B | Speedup |
10
+ |----------------|------------|------------------|---------|
11
+ | 512 | 8,300 | 6,200 | 1.3× |
12
+ | 1024 | 7,800 | 4,100 | 1.9× |
13
+ | 2048 | 7,200 | 2,300 | 3.1× |
14
+ | 4096 | 6,800 | 1,200 | 5.7× |
15
+ | 8192 | 6,400 | 600 | **10.7×** |
16
+ | 16384 | 6,100 | OOM | ∞ |
17
+
18
+ **Key insight**: Speedup grows with sequence length (Mamba O(n) vs Transformer O(n²))
19
+
20
+ ### Latency (ms per token)
21
+
22
+ **Generation latency** (batch size 1, autoregressive):
23
+
24
+ | Model | First Token | Per Token | 100 Tokens Total |
25
+ |-------|-------------|-----------|------------------|
26
+ | Mamba-130M | 3 ms | 0.8 ms | 83 ms |
27
+ | Transformer-130M | 5 ms | 1.2 ms | 125 ms |
28
+ | Mamba-1.4B | 12 ms | 3.2 ms | 332 ms |
29
+ | Transformer-1.3B | 18 ms | 8.5 ms | 868 ms |
30
+ | Mamba-2.8B | 20 ms | 6.1 ms | 631 ms |
31
+ | Transformer-2.7B | 35 ms | 18.2 ms | 1855 ms |
32
+
33
+ **Mamba advantage**: Constant per-token latency regardless of context length
34
+
35
+ ## Memory Usage
36
+
37
+ ### Training Memory (BF16, per GPU)
38
+
39
+ **Mamba-1.4B** training memory breakdown:
40
+
41
+ | Sequence Length | Activations | Gradients | Optimizer | Total | vs Transformer |
42
+ |----------------|-------------|-----------|-----------|-------|----------------|
43
+ | 512 | 2.1 GB | 3.2 GB | 11.2 GB | 16.5 GB | 0.9× |
44
+ | 1024 | 3.8 GB | 3.2 GB | 11.2 GB | 18.2 GB | 0.6× |
45
+ | 2048 | 7.2 GB | 3.2 GB | 11.2 GB | 21.6 GB | 0.4× |
46
+ | 4096 | 14.1 GB | 3.2 GB | 11.2 GB | 28.5 GB | 0.25× |
47
+ | 8192 | 28.0 GB | 3.2 GB | 11.2 GB | 42.4 GB | 0.15× |
48
+
49
+ **Note**: Transformer OOMs at 8K sequence length on 40GB A100
50
+
51
+ ### Inference Memory (FP16, batch size 1)
52
+
53
+ | Model | KV Cache (8K ctx) | State (Mamba) | Ratio |
54
+ |-------|------------------|---------------|-------|
55
+ | 130M | 2.1 GB | 0 MB | ∞ |
56
+ | 370M | 5.2 GB | 0 MB | ∞ |
57
+ | 1.4B | 19.7 GB | 0 MB | ∞ |
58
+ | 2.8B | 38.4 GB | 0 MB | ∞ |
59
+
60
+ **Mamba stores no KV cache** - constant memory per token!
61
+
62
+ Actual Mamba state size:
63
+ - 130M: ~3 MB (d_model × d_state × n_layers = 768 × 16 × 24)
64
+ - 2.8B: ~13 MB (2560 × 16 × 64)
65
+
66
+ ## Language Modeling Benchmarks
67
+
68
+ ### Perplexity on Common Datasets
69
+
70
+ **Models trained on The Pile (300B tokens)**:
71
+
72
+ | Model | Params | Pile (val) | WikiText-103 | C4 | Lambada |
73
+ |-------|--------|------------|--------------|-----|---------|
74
+ | Pythia | 160M | 29.6 | 28.4 | 23.1 | 51.2 |
75
+ | **Mamba** | **130M** | **28.1** | **26.7** | **21.8** | **48.3** |
76
+ | Pythia | 410M | 18.3 | 17.6 | 16.2 | 32.1 |
77
+ | **Mamba** | **370M** | **16.7** | **16.2** | **15.1** | **28.4** |
78
+ | Pythia | 1.4B | 10.8 | 10.2 | 11.3 | 15.2 |
79
+ | **Mamba** | **1.4B** | **9.1** | **9.6** | **10.1** | **12.8** |
80
+ | Pythia | 2.8B | 8.3 | 7.9 | 9.2 | 10.6 |
81
+ | **Mamba** | **2.8B** | **7.4** | **7.2** | **8.3** | **9.1** |
82
+
83
+ **Mamba consistently outperforms** Transformers of similar size by 10-20%
84
+
85
+ ### Zero-Shot Task Performance
86
+
87
+ **Mamba-2.8B vs Transformer-2.7B** on common benchmarks:
88
+
89
+ | Task | Mamba-2.8B | Transformer-2.7B | Delta |
90
+ |------|------------|------------------|-------|
91
+ | HellaSwag | 61.3 | 58.7 | +2.6 |
92
+ | PIQA | 78.1 | 76.4 | +1.7 |
93
+ | ARC-Easy | 68.2 | 65.9 | +2.3 |
94
+ | ARC-Challenge | 42.7 | 40.1 | +2.6 |
95
+ | WinoGrande | 64.8 | 62.3 | +2.5 |
96
+ | OpenBookQA | 43.2 | 41.8 | +1.4 |
97
+ | BoolQ | 71.4 | 68.2 | +3.2 |
98
+ | MMLU (5-shot) | 35.2 | 33.8 | +1.4 |
99
+
100
+ **Average improvement**: +2.2 points across benchmarks
101
+
102
+ ## Audio Modeling Benchmarks
103
+
104
+ ### SC09 (Speech Commands)
105
+
106
+ **Task**: Audio classification (10 classes)
107
+
108
+ | Model | Params | Accuracy | Inference (ms) |
109
+ |-------|--------|----------|----------------|
110
+ | Transformer | 8.2M | 96.2% | 18 ms |
111
+ | S4 | 6.1M | 97.1% | 8 ms |
112
+ | **Mamba** | **6.3M** | **98.4%** | **6 ms** |
113
+
114
+ ### LJSpeech (Speech Generation)
115
+
116
+ **Task**: Text-to-speech quality (MOS score)
117
+
118
+ | Model | Params | MOS ↑ | RTF ↓ |
119
+ |-------|--------|-------|-------|
120
+ | Transformer | 12M | 3.82 | 0.45 |
121
+ | Conformer | 11M | 3.91 | 0.38 |
122
+ | **Mamba** | **10M** | **4.03** | **0.21** |
123
+
124
+ **RTF** (Real-Time Factor): Lower is better (0.21 = 5× faster than real-time)
125
+
126
+ ## Genomics Benchmarks
127
+
128
+ ### Human Reference Genome (HG38)
129
+
130
+ **Task**: Next nucleotide prediction
131
+
132
+ | Model | Context Length | Perplexity | Throughput |
133
+ |-------|----------------|------------|------------|
134
+ | Transformer | 1024 | 3.21 | 1,200 bp/s |
135
+ | Hyena | 32768 | 2.87 | 8,500 bp/s |
136
+ | **Mamba** | **1M** | **2.14** | **45,000 bp/s** |
137
+
138
+ **Mamba handles million-length sequences** efficiently
139
+
140
+ ## Scaling Laws
141
+
142
+ ### Compute-Optimal Training
143
+
144
+ **FLOPs vs perplexity** (The Pile validation):
145
+
146
+ | Model Size | Training FLOPs | Mamba Perplexity | Transformer Perplexity |
147
+ |------------|----------------|------------------|------------------------|
148
+ | 130M | 6e19 | 28.1 | 29.6 |
149
+ | 370M | 3e20 | 16.7 | 18.3 |
150
+ | 790M | 8e20 | 12.3 | 13.9 |
151
+ | 1.4B | 2e21 | 9.1 | 10.8 |
152
+ | 2.8B | 6e21 | 7.4 | 8.3 |
153
+
154
+ **Scaling coefficient**: Mamba achieves same perplexity as Transformer with **0.8×** compute
155
+
156
+ ### Parameter Efficiency
157
+
158
+ **Perplexity 10.0 target** on The Pile:
159
+
160
+ | Model Type | Parameters Needed | Memory (inference) |
161
+ |------------|-------------------|-------------------|
162
+ | Transformer | 1.6B | 3.2 GB |
163
+ | **Mamba** | **1.1B** | **2.2 GB** |
164
+
165
+ **Mamba needs ~30% fewer parameters** for same performance
166
+
167
+ ## Long-Range Arena (LRA)
168
+
169
+ **Task**: Long-context understanding benchmarks
170
+
171
+ | Task | Length | Transformer | S4 | Mamba |
172
+ |------|--------|-------------|-----|-------|
173
+ | ListOps | 2K | 36.4% | 59.6% | **61.2%** |
174
+ | Text | 4K | 64.3% | 86.8% | **88.1%** |
175
+ | Retrieval | 4K | 57.5% | 90.9% | **92.3%** |
176
+ | Image | 1K | 42.4% | 88.7% | **89.4%** |
177
+ | PathFinder | 1K | 71.4% | 86.1% | **87.8%** |
178
+ | Path-X | 16K | OOM | 88.3% | **91.2%** |
179
+
180
+ **Average**: Mamba 85.0%, S4 83.4%, Transformer 54.4%
181
+
182
+ ## Training Throughput
183
+
184
+ ### Tokens/sec During Training
185
+
186
+ **8× A100 80GB** cluster, BF16, different sequence lengths:
187
+
188
+ | Model | Seq Len 512 | Seq Len 2K | Seq Len 8K | Seq Len 32K |
189
+ |-------|-------------|------------|------------|-------------|
190
+ | Transformer-1.3B | 180K | 52K | OOM | OOM |
191
+ | **Mamba-1.4B** | **195K** | **158K** | **121K** | **89K** |
192
+ | Transformer-2.7B | 92K | 26K | OOM | OOM |
193
+ | **Mamba-2.8B** | **98K** | **81K** | **62K** | **45K** |
194
+
195
+ **Mamba scales to longer sequences** without OOM
196
+
197
+ ## Hardware Utilization
198
+
199
+ ### GPU Memory Bandwidth
200
+
201
+ **Mamba-1.4B** inference on different GPUs:
202
+
203
+ | GPU | Memory BW | Tokens/sec | Efficiency |
204
+ |-----|-----------|------------|------------|
205
+ | A100 80GB | 2.0 TB/s | 6,800 | 85% |
206
+ | A100 40GB | 1.6 TB/s | 5,400 | 84% |
207
+ | V100 32GB | 900 GB/s | 3,100 | 86% |
208
+ | RTX 4090 | 1.0 TB/s | 3,600 | 90% |
209
+
210
+ **High efficiency**: Mamba is memory-bandwidth bound (good!)
211
+
212
+ ### Multi-GPU Scaling
213
+
214
+ **Mamba-2.8B** training throughput:
215
+
216
+ | GPUs | Tokens/sec | Scaling Efficiency |
217
+ |------|------------|-------------------|
218
+ | 1× A100 | 12,300 | 100% |
219
+ | 2× A100 | 23,800 | 97% |
220
+ | 4× A100 | 46,100 | 94% |
221
+ | 8× A100 | 89,400 | 91% |
222
+ | 16× A100 | 172,000 | 88% |
223
+
224
+ **Near-linear scaling** up to 16 GPUs
225
+
226
+ ## Cost Analysis
227
+
228
+ ### Training Cost (USD)
229
+
230
+ **Training to The Pile perplexity 10.0** on cloud GPUs:
231
+
232
+ | Model | Cloud GPUs | Hours | Cost (A100) | Cost (H100) |
233
+ |-------|------------|-------|-------------|-------------|
234
+ | Transformer-1.6B | 8× A100 | 280 | $8,400 | $4,200 |
235
+ | **Mamba-1.1B** | **8× A100** | **180** | **$5,400** | **$2,700** |
236
+
237
+ **Savings**: 36% cost reduction vs Transformer
238
+
239
+ ### Inference Cost (USD/million tokens)
240
+
241
+ **API-style inference** (batch size 1, 2K context):
242
+
243
+ | Model | Latency | Cost/M tokens | Quality (perplexity) |
244
+ |-------|---------|---------------|---------------------|
245
+ | Transformer-1.3B | 8.5 ms/tok | $0.42 | 10.8 |
246
+ | **Mamba-1.4B** | **3.2 ms/tok** | **$0.18** | **9.1** |
247
+
248
+ **Mamba provides**: 2.6× faster, 57% cheaper, better quality
249
+
250
+ ## Resources
251
+
252
+ - Benchmarks code: https://github.com/state-spaces/mamba/tree/main/benchmarks
253
+ - Paper (Mamba-1): https://arxiv.org/abs/2312.00752 (Section 4: Experiments)
254
+ - Paper (Mamba-2): https://arxiv.org/abs/2405.21060 (Section 5: Experiments)
255
+ - Pretrained models: https://huggingface.co/state-spaces