@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,337 @@
1
+ # GPTQ Calibration Guide
2
+
3
+ Complete guide to calibration data selection and quantization process.
4
+
5
+ ## Calibration Data Selection
6
+
7
+ ### Why calibration matters
8
+
9
+ Calibration data is used to:
10
+ 1. **Compute weight importance** (Hessian matrix)
11
+ 2. **Minimize quantization error** for important weights
12
+ 3. **Preserve model accuracy** after quantization
13
+
14
+ **Impact**:
15
+ - Good calibration: <1.5% perplexity increase
16
+ - Poor calibration: 5-10% perplexity increase
17
+ - No calibration: Model may output gibberish
18
+
19
+ ### Dataset size
20
+
21
+ **Recommended**:
22
+ - **128-256 samples** of 512 tokens each
23
+ - Total: 65K-131K tokens
24
+
25
+ **More is not always better**:
26
+ - <64 samples: Underfitting (poor quality)
27
+ - 128-256 samples: Sweet spot
28
+ - >512 samples: Diminishing returns, slower quantization
29
+
30
+ ### Dataset selection by domain
31
+
32
+ **General purpose models (GPT, Llama)**:
33
+ ```python
34
+ from datasets import load_dataset
35
+
36
+ # C4 dataset (recommended for general models)
37
+ dataset = load_dataset("c4", split="train", streaming=True)
38
+ calibration_data = [
39
+ tokenizer(example["text"])["input_ids"][:512]
40
+ for example in dataset.take(128)
41
+ ]
42
+ ```
43
+
44
+ **Code models (CodeLlama, StarCoder)**:
45
+ ```python
46
+ # The Stack dataset
47
+ dataset = load_dataset("bigcode/the-stack", split="train", streaming=True)
48
+ calibration_data = [
49
+ tokenizer(example["content"])["input_ids"][:512]
50
+ for example in dataset.take(128)
51
+ if example["lang"] == "Python" # Or your target language
52
+ ]
53
+ ```
54
+
55
+ **Chat models**:
56
+ ```python
57
+ # ShareGPT or Alpaca format
58
+ dataset = load_dataset("anon8231489123/ShareGPT_Vicuna_unfiltered", split="train")
59
+
60
+ calibration_data = []
61
+ for example in dataset.select(range(128)):
62
+ # Format as conversation
63
+ conversation = tokenizer.apply_chat_template(
64
+ example["conversations"],
65
+ tokenize=True,
66
+ max_length=512
67
+ )
68
+ calibration_data.append(conversation)
69
+ ```
70
+
71
+ **Domain-specific (medical, legal)**:
72
+ ```python
73
+ # Use domain-specific text
74
+ dataset = load_dataset("medical_dataset", split="train")
75
+ calibration_data = [
76
+ tokenizer(example["text"])["input_ids"][:512]
77
+ for example in dataset.take(256) # More samples for niche domains
78
+ ]
79
+ ```
80
+
81
+ ## Quantization Process
82
+
83
+ ### Basic quantization
84
+
85
+ ```python
86
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
87
+ from transformers import AutoTokenizer
88
+ from datasets import load_dataset
89
+
90
+ # 1. Load model
91
+ model_name = "meta-llama/Llama-2-7b-hf"
92
+ model = AutoGPTQForCausalLM.from_pretrained(
93
+ model_name,
94
+ quantize_config=BaseQuantizeConfig(
95
+ bits=4,
96
+ group_size=128,
97
+ desc_act=False
98
+ )
99
+ )
100
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
101
+
102
+ # 2. Prepare calibration data
103
+ dataset = load_dataset("c4", split="train", streaming=True)
104
+ calibration_data = [
105
+ tokenizer(example["text"])["input_ids"][:512]
106
+ for example in dataset.take(128)
107
+ ]
108
+
109
+ # 3. Quantize
110
+ model.quantize(calibration_data)
111
+
112
+ # 4. Save
113
+ model.save_quantized("llama-2-7b-gptq")
114
+ ```
115
+
116
+ **Time**: ~10-30 minutes for 7B model on A100
117
+
118
+ ### Advanced configuration
119
+
120
+ ```python
121
+ config = BaseQuantizeConfig(
122
+ bits=4, # 3, 4, or 8 bits
123
+ group_size=128, # 32, 64, 128, or -1 (per-column)
124
+ desc_act=False, # Activation order (True = better accuracy, slower)
125
+ damp_percent=0.01, # Dampening (0.001-0.1, default 0.01)
126
+ static_groups=False, # Static quantization
127
+ sym=True, # Symmetric quantization
128
+ true_sequential=True, # Sequential quantization (more accurate)
129
+ model_seqlen=2048 # Model sequence length
130
+ )
131
+ ```
132
+
133
+ **Parameter tuning**:
134
+ - `damp_percent`: Lower = more accurate, slower. Try 0.005-0.02.
135
+ - `desc_act=True`: 0.5-1% better accuracy, 20-30% slower inference
136
+ - `group_size=32`: Better accuracy, slightly larger model
137
+
138
+ ### Multi-GPU quantization
139
+
140
+ ```python
141
+ # Quantize on multiple GPUs (faster)
142
+ model = AutoGPTQForCausalLM.from_pretrained(
143
+ model_name,
144
+ quantize_config=config,
145
+ device_map="auto", # Distribute across GPUs
146
+ max_memory={0: "40GB", 1: "40GB"}
147
+ )
148
+
149
+ model.quantize(calibration_data)
150
+ ```
151
+
152
+ ## Quality Evaluation
153
+
154
+ ### Perplexity testing
155
+
156
+ ```python
157
+ from datasets import load_dataset
158
+ import torch
159
+
160
+ # Load test dataset
161
+ test_dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
162
+ test_text = "\n\n".join(test_dataset["text"])
163
+
164
+ # Tokenize
165
+ encodings = tokenizer(test_text, return_tensors="pt")
166
+ max_length = model.seqlen
167
+
168
+ # Calculate perplexity
169
+ nlls = []
170
+ for i in range(0, encodings.input_ids.size(1), max_length):
171
+ begin_loc = i
172
+ end_loc = min(i + max_length, encodings.input_ids.size(1))
173
+ input_ids = encodings.input_ids[:, begin_loc:end_loc].to("cuda")
174
+
175
+ with torch.no_grad():
176
+ outputs = model(input_ids, labels=input_ids)
177
+ nll = outputs.loss
178
+
179
+ nlls.append(nll)
180
+
181
+ ppl = torch.exp(torch.stack(nlls).mean())
182
+ print(f"Perplexity: {ppl.item():.2f}")
183
+ ```
184
+
185
+ **Quality targets**:
186
+ - <1.5% increase: Excellent
187
+ - 1.5-3% increase: Good
188
+ - 3-5% increase: Acceptable for some use cases
189
+ - >5% increase: Poor, redo calibration
190
+
191
+ ### Benchmark evaluation
192
+
193
+ ```python
194
+ from lm_eval import evaluator
195
+
196
+ # Evaluate on standard benchmarks
197
+ results = evaluator.simple_evaluate(
198
+ model=model,
199
+ tasks=["hellaswag", "mmlu", "arc_challenge"],
200
+ num_fewshot=5
201
+ )
202
+
203
+ print(results["results"])
204
+
205
+ # Compare to baseline FP16 scores
206
+ ```
207
+
208
+ ## Optimization Tips
209
+
210
+ ### Improving accuracy
211
+
212
+ **1. Use more calibration samples**:
213
+ ```python
214
+ # Try 256 or 512 samples
215
+ calibration_data = [... for example in dataset.take(256)]
216
+ ```
217
+
218
+ **2. Use domain-specific data**:
219
+ ```python
220
+ # Match your use case
221
+ if code_model:
222
+ dataset = load_dataset("bigcode/the-stack")
223
+ elif chat_model:
224
+ dataset = load_dataset("ShareGPT")
225
+ ```
226
+
227
+ **3. Enable activation reordering**:
228
+ ```python
229
+ config = BaseQuantizeConfig(
230
+ bits=4,
231
+ group_size=128,
232
+ desc_act=True # Better accuracy, slower inference
233
+ )
234
+ ```
235
+
236
+ **4. Use smaller group size**:
237
+ ```python
238
+ config = BaseQuantizeConfig(
239
+ bits=4,
240
+ group_size=32, # vs 128
241
+ desc_act=False
242
+ )
243
+ ```
244
+
245
+ ### Reducing quantization time
246
+
247
+ **1. Use fewer samples**:
248
+ ```python
249
+ # 64-128 samples usually sufficient
250
+ calibration_data = [... for example in dataset.take(64)]
251
+ ```
252
+
253
+ **2. Disable activation ordering**:
254
+ ```python
255
+ config = BaseQuantizeConfig(
256
+ desc_act=False # Faster quantization
257
+ )
258
+ ```
259
+
260
+ **3. Use multi-GPU**:
261
+ ```python
262
+ model = AutoGPTQForCausalLM.from_pretrained(
263
+ model_name,
264
+ device_map="auto" # Parallelize across GPUs
265
+ )
266
+ ```
267
+
268
+ ## Troubleshooting
269
+
270
+ ### Poor quality after quantization
271
+
272
+ **Symptom**: >5% perplexity increase or gibberish output
273
+
274
+ **Solutions**:
275
+ 1. **Check calibration data**:
276
+ ```python
277
+ # Verify data is representative
278
+ for sample in calibration_data[:5]:
279
+ print(tokenizer.decode(sample))
280
+ ```
281
+
282
+ 2. **Try more samples**:
283
+ ```python
284
+ calibration_data = [... for example in dataset.take(256)]
285
+ ```
286
+
287
+ 3. **Use domain-specific data**:
288
+ ```python
289
+ # Match your model's use case
290
+ dataset = load_dataset("domain_specific_dataset")
291
+ ```
292
+
293
+ 4. **Adjust dampening**:
294
+ ```python
295
+ config = BaseQuantizeConfig(damp_percent=0.005) # Lower dampening
296
+ ```
297
+
298
+ ### Quantization OOM
299
+
300
+ **Solutions**:
301
+ 1. **Reduce batch size**:
302
+ ```python
303
+ model.quantize(calibration_data, batch_size=1) # Default: auto
304
+ ```
305
+
306
+ 2. **Use CPU offloading**:
307
+ ```python
308
+ model = AutoGPTQForCausalLM.from_pretrained(
309
+ model_name,
310
+ device_map="auto",
311
+ max_memory={"cpu": "100GB"}
312
+ )
313
+ ```
314
+
315
+ 3. **Quantize on larger GPU** or use multi-GPU
316
+
317
+ ### Slow quantization
318
+
319
+ **Typical times** (7B model):
320
+ - Single A100: 10-15 minutes
321
+ - Single RTX 4090: 20-30 minutes
322
+ - CPU: 2-4 hours (not recommended)
323
+
324
+ **Speedup**:
325
+ - Use fewer samples (64 vs 256)
326
+ - Disable `desc_act`
327
+ - Use multi-GPU
328
+
329
+ ## Best Practices
330
+
331
+ 1. **Use C4 dataset for general models** - well-balanced, diverse
332
+ 2. **Match domain** - code models need code data, chat needs conversations
333
+ 3. **Start with 128 samples** - good balance of speed and quality
334
+ 4. **Test perplexity** - always verify quality before deployment
335
+ 5. **Compare kernels** - try ExLlama, Marlin, Triton for speed
336
+ 6. **Save multiple versions** - try group_size 32, 128, 256
337
+ 7. **Document settings** - save quantize_config.json for reproducibility
@@ -0,0 +1,129 @@
1
+ # GPTQ Integration Guide
2
+
3
+ Integration with transformers, PEFT, vLLM, and other frameworks.
4
+
5
+ ## Transformers Integration
6
+
7
+ ### Auto-detection
8
+ ```python
9
+ from transformers import AutoModelForCausalLM
10
+
11
+ # Automatically detects and loads GPTQ model
12
+ model = AutoModelForCausalLM.from_pretrained(
13
+ "TheBloke/Llama-2-13B-GPTQ",
14
+ device_map="auto"
15
+ )
16
+ ```
17
+
18
+ ### Manual loading
19
+ ```python
20
+ from auto_gptq import AutoGPTQForCausalLM
21
+
22
+ model = AutoGPTQForCausalLM.from_quantized(
23
+ "TheBloke/Llama-2-13B-GPTQ",
24
+ device="cuda:0",
25
+ use_exllama=True
26
+ )
27
+ ```
28
+
29
+ ## QLoRA Fine-Tuning
30
+
31
+ ```python
32
+ from transformers import AutoModelForCausalLM, TrainingArguments
33
+ from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
34
+ from trl import SFTTrainer
35
+
36
+ # Load GPTQ model
37
+ model = AutoModelForCausalLM.from_pretrained(
38
+ "TheBloke/Llama-2-70B-GPTQ",
39
+ device_map="auto"
40
+ )
41
+
42
+ # Prepare for training
43
+ model = prepare_model_for_kbit_training(model)
44
+
45
+ # LoRA config
46
+ lora_config = LoraConfig(
47
+ r=16,
48
+ lora_alpha=32,
49
+ target_modules=["q_proj", "v_proj"],
50
+ lora_dropout=0.05,
51
+ bias="none",
52
+ task_type="CAUSAL_LM"
53
+ )
54
+
55
+ model = get_peft_model(model, lora_config)
56
+
57
+ # Train (70B model on single A100!)
58
+ trainer = SFTTrainer(
59
+ model=model,
60
+ train_dataset=dataset,
61
+ max_seq_length=2048,
62
+ args=TrainingArguments(
63
+ per_device_train_batch_size=1,
64
+ gradient_accumulation_steps=16,
65
+ learning_rate=2e-4,
66
+ num_train_epochs=3,
67
+ output_dir="./results"
68
+ )
69
+ )
70
+
71
+ trainer.train()
72
+ ```
73
+
74
+ ## vLLM Integration
75
+
76
+ ```python
77
+ from vllm import LLM, SamplingParams
78
+
79
+ # Load GPTQ model in vLLM
80
+ llm = LLM(
81
+ model="TheBloke/Llama-2-70B-GPTQ",
82
+ quantization="gptq",
83
+ dtype="float16",
84
+ gpu_memory_utilization=0.95
85
+ )
86
+
87
+ # Generate
88
+ sampling_params = SamplingParams(
89
+ temperature=0.7,
90
+ top_p=0.9,
91
+ max_tokens=200
92
+ )
93
+
94
+ outputs = llm.generate(["Explain AI"], sampling_params)
95
+ ```
96
+
97
+ ## Text Generation Inference (TGI)
98
+
99
+ ```bash
100
+ # Docker with GPTQ support
101
+ docker run --gpus all -p 8080:80 \
102
+ -v $PWD/data:/data \
103
+ ghcr.io/huggingface/text-generation-inference:latest \
104
+ --model-id TheBloke/Llama-2-70B-GPTQ \
105
+ --quantize gptq
106
+ ```
107
+
108
+ ## LangChain Integration
109
+
110
+ ```python
111
+ from langchain.llms import HuggingFacePipeline
112
+ from transformers import AutoTokenizer, pipeline
113
+
114
+ tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-GPTQ")
115
+ model = AutoModelForCausalLM.from_pretrained(
116
+ "TheBloke/Llama-2-13B-GPTQ",
117
+ device_map="auto"
118
+ )
119
+
120
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200)
121
+ llm = HuggingFacePipeline(pipeline=pipe)
122
+
123
+ # Use in LangChain
124
+ from langchain.chains import LLMChain
125
+ from langchain.prompts import PromptTemplate
126
+
127
+ chain = LLMChain(llm=llm, prompt=PromptTemplate(...))
128
+ result = chain.run(input="...")
129
+ ```
@@ -0,0 +1,95 @@
1
+ # GPTQ Troubleshooting Guide
2
+
3
+ Common issues and solutions for GPTQ quantization and inference.
4
+
5
+ ## Installation Issues
6
+
7
+ ### CUDA mismatch
8
+ ```bash
9
+ # Check CUDA version
10
+ nvcc --version
11
+ python -c "import torch; print(torch.version.cuda)"
12
+
13
+ # Install matching version
14
+ pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # CUDA 11.8
15
+ ```
16
+
17
+ ### Build errors
18
+ ```bash
19
+ # Install build dependencies
20
+ pip install auto-gptq --no-build-isolation
21
+
22
+ # On Ubuntu
23
+ sudo apt-get install python3-dev
24
+ ```
25
+
26
+ ## Runtime Issues
27
+
28
+ ### Slow inference
29
+ ```python
30
+ # Try different backends
31
+ model = AutoGPTQForCausalLM.from_quantized(
32
+ model_name,
33
+ use_exllama=True # Fastest (try v1 or v2)
34
+ )
35
+
36
+ # Or Marlin (Ampere+ GPUs)
37
+ model = AutoGPTQForCausalLM.from_quantized(
38
+ model_name,
39
+ use_marlin=True
40
+ )
41
+ ```
42
+
43
+ ### OOM during inference
44
+ ```python
45
+ # Reduce batch size
46
+ outputs = model.generate(**inputs, batch_size=1)
47
+
48
+ # Use CPU offloading
49
+ model = AutoGPTQForCausalLM.from_quantized(
50
+ model_name,
51
+ device_map="auto",
52
+ max_memory={"cpu": "100GB"}
53
+ )
54
+
55
+ # Reduce context
56
+ model.seqlen = 1024 # Instead of 2048
57
+ ```
58
+
59
+ ### Poor quality outputs
60
+ ```python
61
+ # Requantize with better calibration
62
+ # 1. Use more samples (256 instead of 128)
63
+ # 2. Use domain-specific data
64
+ # 3. Lower dampening: damp_percent=0.005
65
+ # 4. Enable desc_act=True
66
+ ```
67
+
68
+ ## Quantization Issues
69
+
70
+ ### Very slow quantization
71
+ ```bash
72
+ # Expected times (7B model):
73
+ # - A100: 10-15 min
74
+ # - RTX 4090: 20-30 min
75
+ # - CPU: 2-4 hours
76
+
77
+ # Speed up:
78
+ # 1. Use GPU
79
+ # 2. Reduce samples (64 instead of 256)
80
+ # 3. Disable desc_act
81
+ # 4. Use multi-GPU
82
+ ```
83
+
84
+ ### Quantization crashes
85
+ ```python
86
+ # Reduce memory usage
87
+ model = AutoGPTQForCausalLM.from_pretrained(
88
+ model_name,
89
+ device_map="auto",
90
+ max_memory={"cpu": "100GB"} # Offload to CPU
91
+ )
92
+
93
+ # Or quantize layer-by-layer (slower but works)
94
+ model.quantize(calibration_data, batch_size=1)
95
+ ```
@@ -0,0 +1,97 @@
1
+ # GRPO/RL Training Skill
2
+
3
+ **Expert-level guidance for Group Relative Policy Optimization with TRL**
4
+
5
+ ## 📁 Skill Structure
6
+
7
+ ```
8
+ grpo-rl-training/
9
+ ├── SKILL.md # Main skill documentation (READ THIS FIRST)
10
+ ├── README.md # This file
11
+ ├── templates/
12
+ │ └── basic_grpo_training.py # Production-ready training template
13
+ └── examples/
14
+ └── reward_functions_library.py # 20+ reward function examples
15
+ ```
16
+
17
+ ## 🚀 Quick Start
18
+
19
+ 1. **Read SKILL.md** - Comprehensive guide with all concepts and patterns
20
+ 2. **Copy `templates/basic_grpo_training.py`** - Start with working code
21
+ 3. **Browse `examples/reward_functions_library.py`** - Pick reward functions for your task
22
+ 4. **Modify for your use case** - Adapt dataset, rewards, and config
23
+
24
+ ## 💡 What's Inside
25
+
26
+ ### SKILL.md (Main Documentation)
27
+ - Core GRPO concepts and algorithm fundamentals
28
+ - Complete implementation workflow (dataset → rewards → training → deployment)
29
+ - 10+ reward function examples with code
30
+ - Hyperparameter tuning guide
31
+ - Training insights (loss behavior, metrics, debugging)
32
+ - Troubleshooting guide
33
+ - Production best practices
34
+
35
+ ### Templates
36
+ - **basic_grpo_training.py**: Minimal, production-ready training script
37
+ - Uses Qwen 2.5 1.5B Instruct
38
+ - 3 reward functions (format + correctness)
39
+ - LoRA for efficient training
40
+ - Fully documented and ready to run
41
+
42
+ ### Examples
43
+ - **reward_functions_library.py**: 20+ battle-tested reward functions
44
+ - Correctness rewards (exact match, fuzzy match, numeric, code execution)
45
+ - Format rewards (XML, JSON, strict/soft)
46
+ - Length rewards (ideal length, min/max)
47
+ - Style rewards (reasoning quality, citations, repetition penalty)
48
+ - Combined rewards (multi-objective optimization)
49
+ - Preset collections for common tasks
50
+
51
+ ## 📖 Usage for Agents
52
+
53
+ When this skill is loaded in your agent's context:
54
+
55
+ 1. **Always read SKILL.md first** before implementing
56
+ 2. **Start simple** - Use length-based reward to validate setup
57
+ 3. **Build incrementally** - Add one reward function at a time
58
+ 4. **Reference examples** - Copy patterns from reward_functions_library.py
59
+ 5. **Monitor training** - Watch reward metrics (not loss!)
60
+
61
+ ## 🎯 Common Use Cases
62
+
63
+ | Task Type | Recommended Rewards | Template |
64
+ |-----------|---------------------|----------|
65
+ | Math reasoning | `MATH_REASONING_REWARDS` preset | basic_grpo_training.py |
66
+ | Code generation | `CODE_GENERATION_REWARDS` preset | Modify dataset in template |
67
+ | Summarization | `SUMMARIZATION_REWARDS` preset | Adjust prompts + rewards |
68
+ | Q&A | `QA_REWARDS` preset | Use fuzzy match + citations |
69
+
70
+ ## ⚠️ Critical Reminders
71
+
72
+ - **Loss goes UP during training** - This is normal (it's KL divergence)
73
+ - **Use 3-5 reward functions** - Single rewards often fail
74
+ - **Test rewards before training** - Debug each function independently
75
+ - **Monitor reward_std** - Should stay > 0.1 (avoid mode collapse)
76
+ - **Start with num_generations=4-8** - Scale up if GPU allows
77
+
78
+ ## 🔗 External Resources
79
+
80
+ - [TRL Documentation](https://huggingface.co/docs/trl)
81
+ - [DeepSeek R1 Paper](https://arxiv.org/abs/2501.12948)
82
+ - [Open R1 Implementation](https://github.com/huggingface/open-r1)
83
+ - [Unsloth (2-3x faster)](https://docs.unsloth.ai/)
84
+
85
+ ## 📝 Version
86
+
87
+ **v1.0.0** - Initial release (January 2025)
88
+
89
+ ## 👨‍💻 Maintained By
90
+
91
+ Synthetic Sciences
92
+ For questions or improvements, see https://orchestra.com
93
+
94
+ ---
95
+
96
+ **License:** MIT
97
+ **Last Updated:** January 2025