@synsci/cli-darwin-x64 1.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (373) hide show
  1. package/bin/skills/accelerate/SKILL.md +332 -0
  2. package/bin/skills/accelerate/references/custom-plugins.md +453 -0
  3. package/bin/skills/accelerate/references/megatron-integration.md +489 -0
  4. package/bin/skills/accelerate/references/performance.md +525 -0
  5. package/bin/skills/audiocraft/SKILL.md +564 -0
  6. package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
  7. package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
  8. package/bin/skills/autogpt/SKILL.md +403 -0
  9. package/bin/skills/autogpt/references/advanced-usage.md +535 -0
  10. package/bin/skills/autogpt/references/troubleshooting.md +420 -0
  11. package/bin/skills/awq/SKILL.md +310 -0
  12. package/bin/skills/awq/references/advanced-usage.md +324 -0
  13. package/bin/skills/awq/references/troubleshooting.md +344 -0
  14. package/bin/skills/axolotl/SKILL.md +158 -0
  15. package/bin/skills/axolotl/references/api.md +5548 -0
  16. package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
  17. package/bin/skills/axolotl/references/index.md +15 -0
  18. package/bin/skills/axolotl/references/other.md +3563 -0
  19. package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
  20. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
  21. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
  22. package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
  23. package/bin/skills/bitsandbytes/SKILL.md +411 -0
  24. package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
  25. package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
  26. package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
  27. package/bin/skills/blip-2/SKILL.md +564 -0
  28. package/bin/skills/blip-2/references/advanced-usage.md +680 -0
  29. package/bin/skills/blip-2/references/troubleshooting.md +526 -0
  30. package/bin/skills/chroma/SKILL.md +406 -0
  31. package/bin/skills/chroma/references/integration.md +38 -0
  32. package/bin/skills/clip/SKILL.md +253 -0
  33. package/bin/skills/clip/references/applications.md +207 -0
  34. package/bin/skills/constitutional-ai/SKILL.md +290 -0
  35. package/bin/skills/crewai/SKILL.md +498 -0
  36. package/bin/skills/crewai/references/flows.md +438 -0
  37. package/bin/skills/crewai/references/tools.md +429 -0
  38. package/bin/skills/crewai/references/troubleshooting.md +480 -0
  39. package/bin/skills/deepspeed/SKILL.md +141 -0
  40. package/bin/skills/deepspeed/references/08.md +17 -0
  41. package/bin/skills/deepspeed/references/09.md +173 -0
  42. package/bin/skills/deepspeed/references/2020.md +378 -0
  43. package/bin/skills/deepspeed/references/2023.md +279 -0
  44. package/bin/skills/deepspeed/references/assets.md +179 -0
  45. package/bin/skills/deepspeed/references/index.md +35 -0
  46. package/bin/skills/deepspeed/references/mii.md +118 -0
  47. package/bin/skills/deepspeed/references/other.md +1191 -0
  48. package/bin/skills/deepspeed/references/tutorials.md +6554 -0
  49. package/bin/skills/dspy/SKILL.md +590 -0
  50. package/bin/skills/dspy/references/examples.md +663 -0
  51. package/bin/skills/dspy/references/modules.md +475 -0
  52. package/bin/skills/dspy/references/optimizers.md +566 -0
  53. package/bin/skills/faiss/SKILL.md +221 -0
  54. package/bin/skills/faiss/references/index_types.md +280 -0
  55. package/bin/skills/flash-attention/SKILL.md +367 -0
  56. package/bin/skills/flash-attention/references/benchmarks.md +215 -0
  57. package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
  58. package/bin/skills/gguf/SKILL.md +427 -0
  59. package/bin/skills/gguf/references/advanced-usage.md +504 -0
  60. package/bin/skills/gguf/references/troubleshooting.md +442 -0
  61. package/bin/skills/gptq/SKILL.md +450 -0
  62. package/bin/skills/gptq/references/calibration.md +337 -0
  63. package/bin/skills/gptq/references/integration.md +129 -0
  64. package/bin/skills/gptq/references/troubleshooting.md +95 -0
  65. package/bin/skills/grpo-rl-training/README.md +97 -0
  66. package/bin/skills/grpo-rl-training/SKILL.md +572 -0
  67. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
  68. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
  69. package/bin/skills/guidance/SKILL.md +572 -0
  70. package/bin/skills/guidance/references/backends.md +554 -0
  71. package/bin/skills/guidance/references/constraints.md +674 -0
  72. package/bin/skills/guidance/references/examples.md +767 -0
  73. package/bin/skills/hqq/SKILL.md +445 -0
  74. package/bin/skills/hqq/references/advanced-usage.md +528 -0
  75. package/bin/skills/hqq/references/troubleshooting.md +503 -0
  76. package/bin/skills/hugging-face-cli/SKILL.md +191 -0
  77. package/bin/skills/hugging-face-cli/references/commands.md +954 -0
  78. package/bin/skills/hugging-face-cli/references/examples.md +374 -0
  79. package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
  80. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
  81. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
  82. package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
  83. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
  84. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
  85. package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
  86. package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
  87. package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
  88. package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
  89. package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
  90. package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
  91. package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
  92. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
  93. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
  94. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
  95. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
  96. package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
  97. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
  98. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
  99. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
  100. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
  101. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
  102. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
  103. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
  104. package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
  105. package/bin/skills/hugging-face-jobs/index.html +216 -0
  106. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
  107. package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
  108. package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
  109. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
  110. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
  111. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
  112. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
  113. package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
  114. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
  115. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
  116. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
  117. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
  118. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
  119. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
  120. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
  121. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
  122. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
  123. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
  124. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
  125. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
  126. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
  127. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
  128. package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
  129. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
  130. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
  131. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
  132. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
  133. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
  134. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
  135. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
  136. package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
  137. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
  138. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
  139. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
  140. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
  141. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
  142. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
  143. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
  144. package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
  145. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
  146. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
  147. package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
  148. package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
  149. package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
  150. package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
  151. package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
  152. package/bin/skills/instructor/SKILL.md +740 -0
  153. package/bin/skills/instructor/references/examples.md +107 -0
  154. package/bin/skills/instructor/references/providers.md +70 -0
  155. package/bin/skills/instructor/references/validation.md +606 -0
  156. package/bin/skills/knowledge-distillation/SKILL.md +458 -0
  157. package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
  158. package/bin/skills/lambda-labs/SKILL.md +545 -0
  159. package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
  160. package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
  161. package/bin/skills/langchain/SKILL.md +480 -0
  162. package/bin/skills/langchain/references/agents.md +499 -0
  163. package/bin/skills/langchain/references/integration.md +562 -0
  164. package/bin/skills/langchain/references/rag.md +600 -0
  165. package/bin/skills/langsmith/SKILL.md +422 -0
  166. package/bin/skills/langsmith/references/advanced-usage.md +548 -0
  167. package/bin/skills/langsmith/references/troubleshooting.md +537 -0
  168. package/bin/skills/litgpt/SKILL.md +469 -0
  169. package/bin/skills/litgpt/references/custom-models.md +568 -0
  170. package/bin/skills/litgpt/references/distributed-training.md +451 -0
  171. package/bin/skills/litgpt/references/supported-models.md +336 -0
  172. package/bin/skills/litgpt/references/training-recipes.md +619 -0
  173. package/bin/skills/llama-cpp/SKILL.md +258 -0
  174. package/bin/skills/llama-cpp/references/optimization.md +89 -0
  175. package/bin/skills/llama-cpp/references/quantization.md +213 -0
  176. package/bin/skills/llama-cpp/references/server.md +125 -0
  177. package/bin/skills/llama-factory/SKILL.md +80 -0
  178. package/bin/skills/llama-factory/references/_images.md +23 -0
  179. package/bin/skills/llama-factory/references/advanced.md +1055 -0
  180. package/bin/skills/llama-factory/references/getting_started.md +349 -0
  181. package/bin/skills/llama-factory/references/index.md +19 -0
  182. package/bin/skills/llama-factory/references/other.md +31 -0
  183. package/bin/skills/llamaguard/SKILL.md +337 -0
  184. package/bin/skills/llamaindex/SKILL.md +569 -0
  185. package/bin/skills/llamaindex/references/agents.md +83 -0
  186. package/bin/skills/llamaindex/references/data_connectors.md +108 -0
  187. package/bin/skills/llamaindex/references/query_engines.md +406 -0
  188. package/bin/skills/llava/SKILL.md +304 -0
  189. package/bin/skills/llava/references/training.md +197 -0
  190. package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
  191. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  192. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  193. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  194. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  195. package/bin/skills/long-context/SKILL.md +536 -0
  196. package/bin/skills/long-context/references/extension_methods.md +468 -0
  197. package/bin/skills/long-context/references/fine_tuning.md +611 -0
  198. package/bin/skills/long-context/references/rope.md +402 -0
  199. package/bin/skills/mamba/SKILL.md +260 -0
  200. package/bin/skills/mamba/references/architecture-details.md +206 -0
  201. package/bin/skills/mamba/references/benchmarks.md +255 -0
  202. package/bin/skills/mamba/references/training-guide.md +388 -0
  203. package/bin/skills/megatron-core/SKILL.md +366 -0
  204. package/bin/skills/megatron-core/references/benchmarks.md +249 -0
  205. package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
  206. package/bin/skills/megatron-core/references/production-examples.md +473 -0
  207. package/bin/skills/megatron-core/references/training-recipes.md +547 -0
  208. package/bin/skills/miles/SKILL.md +315 -0
  209. package/bin/skills/miles/references/api-reference.md +141 -0
  210. package/bin/skills/miles/references/troubleshooting.md +352 -0
  211. package/bin/skills/mlflow/SKILL.md +704 -0
  212. package/bin/skills/mlflow/references/deployment.md +744 -0
  213. package/bin/skills/mlflow/references/model-registry.md +770 -0
  214. package/bin/skills/mlflow/references/tracking.md +680 -0
  215. package/bin/skills/modal/SKILL.md +341 -0
  216. package/bin/skills/modal/references/advanced-usage.md +503 -0
  217. package/bin/skills/modal/references/troubleshooting.md +494 -0
  218. package/bin/skills/model-merging/SKILL.md +539 -0
  219. package/bin/skills/model-merging/references/evaluation.md +462 -0
  220. package/bin/skills/model-merging/references/examples.md +428 -0
  221. package/bin/skills/model-merging/references/methods.md +352 -0
  222. package/bin/skills/model-pruning/SKILL.md +495 -0
  223. package/bin/skills/model-pruning/references/wanda.md +347 -0
  224. package/bin/skills/moe-training/SKILL.md +526 -0
  225. package/bin/skills/moe-training/references/architectures.md +432 -0
  226. package/bin/skills/moe-training/references/inference.md +348 -0
  227. package/bin/skills/moe-training/references/training.md +425 -0
  228. package/bin/skills/nanogpt/SKILL.md +290 -0
  229. package/bin/skills/nanogpt/references/architecture.md +382 -0
  230. package/bin/skills/nanogpt/references/data.md +476 -0
  231. package/bin/skills/nanogpt/references/training.md +564 -0
  232. package/bin/skills/nemo-curator/SKILL.md +383 -0
  233. package/bin/skills/nemo-curator/references/deduplication.md +87 -0
  234. package/bin/skills/nemo-curator/references/filtering.md +102 -0
  235. package/bin/skills/nemo-evaluator/SKILL.md +494 -0
  236. package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
  237. package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
  238. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
  239. package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
  240. package/bin/skills/nemo-guardrails/SKILL.md +297 -0
  241. package/bin/skills/nnsight/SKILL.md +436 -0
  242. package/bin/skills/nnsight/references/README.md +78 -0
  243. package/bin/skills/nnsight/references/api.md +344 -0
  244. package/bin/skills/nnsight/references/tutorials.md +300 -0
  245. package/bin/skills/openrlhf/SKILL.md +249 -0
  246. package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
  247. package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
  248. package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
  249. package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
  250. package/bin/skills/outlines/SKILL.md +652 -0
  251. package/bin/skills/outlines/references/backends.md +615 -0
  252. package/bin/skills/outlines/references/examples.md +773 -0
  253. package/bin/skills/outlines/references/json_generation.md +652 -0
  254. package/bin/skills/peft/SKILL.md +431 -0
  255. package/bin/skills/peft/references/advanced-usage.md +514 -0
  256. package/bin/skills/peft/references/troubleshooting.md +480 -0
  257. package/bin/skills/phoenix/SKILL.md +475 -0
  258. package/bin/skills/phoenix/references/advanced-usage.md +619 -0
  259. package/bin/skills/phoenix/references/troubleshooting.md +538 -0
  260. package/bin/skills/pinecone/SKILL.md +358 -0
  261. package/bin/skills/pinecone/references/deployment.md +181 -0
  262. package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
  263. package/bin/skills/pytorch-fsdp/references/index.md +7 -0
  264. package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
  265. package/bin/skills/pytorch-lightning/SKILL.md +346 -0
  266. package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
  267. package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
  268. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
  269. package/bin/skills/pyvene/SKILL.md +473 -0
  270. package/bin/skills/pyvene/references/README.md +73 -0
  271. package/bin/skills/pyvene/references/api.md +383 -0
  272. package/bin/skills/pyvene/references/tutorials.md +376 -0
  273. package/bin/skills/qdrant/SKILL.md +493 -0
  274. package/bin/skills/qdrant/references/advanced-usage.md +648 -0
  275. package/bin/skills/qdrant/references/troubleshooting.md +631 -0
  276. package/bin/skills/ray-data/SKILL.md +326 -0
  277. package/bin/skills/ray-data/references/integration.md +82 -0
  278. package/bin/skills/ray-data/references/transformations.md +83 -0
  279. package/bin/skills/ray-train/SKILL.md +406 -0
  280. package/bin/skills/ray-train/references/multi-node.md +628 -0
  281. package/bin/skills/rwkv/SKILL.md +260 -0
  282. package/bin/skills/rwkv/references/architecture-details.md +344 -0
  283. package/bin/skills/rwkv/references/rwkv7.md +386 -0
  284. package/bin/skills/rwkv/references/state-management.md +369 -0
  285. package/bin/skills/saelens/SKILL.md +386 -0
  286. package/bin/skills/saelens/references/README.md +70 -0
  287. package/bin/skills/saelens/references/api.md +333 -0
  288. package/bin/skills/saelens/references/tutorials.md +318 -0
  289. package/bin/skills/segment-anything/SKILL.md +500 -0
  290. package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
  291. package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
  292. package/bin/skills/sentence-transformers/SKILL.md +255 -0
  293. package/bin/skills/sentence-transformers/references/models.md +123 -0
  294. package/bin/skills/sentencepiece/SKILL.md +235 -0
  295. package/bin/skills/sentencepiece/references/algorithms.md +200 -0
  296. package/bin/skills/sentencepiece/references/training.md +304 -0
  297. package/bin/skills/sglang/SKILL.md +442 -0
  298. package/bin/skills/sglang/references/deployment.md +490 -0
  299. package/bin/skills/sglang/references/radix-attention.md +413 -0
  300. package/bin/skills/sglang/references/structured-generation.md +541 -0
  301. package/bin/skills/simpo/SKILL.md +219 -0
  302. package/bin/skills/simpo/references/datasets.md +478 -0
  303. package/bin/skills/simpo/references/hyperparameters.md +452 -0
  304. package/bin/skills/simpo/references/loss-functions.md +350 -0
  305. package/bin/skills/skypilot/SKILL.md +509 -0
  306. package/bin/skills/skypilot/references/advanced-usage.md +491 -0
  307. package/bin/skills/skypilot/references/troubleshooting.md +570 -0
  308. package/bin/skills/slime/SKILL.md +464 -0
  309. package/bin/skills/slime/references/api-reference.md +392 -0
  310. package/bin/skills/slime/references/troubleshooting.md +386 -0
  311. package/bin/skills/speculative-decoding/SKILL.md +467 -0
  312. package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
  313. package/bin/skills/speculative-decoding/references/medusa.md +350 -0
  314. package/bin/skills/stable-diffusion/SKILL.md +519 -0
  315. package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
  316. package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
  317. package/bin/skills/tensorboard/SKILL.md +629 -0
  318. package/bin/skills/tensorboard/references/integrations.md +638 -0
  319. package/bin/skills/tensorboard/references/profiling.md +545 -0
  320. package/bin/skills/tensorboard/references/visualization.md +620 -0
  321. package/bin/skills/tensorrt-llm/SKILL.md +187 -0
  322. package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
  323. package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
  324. package/bin/skills/tensorrt-llm/references/serving.md +470 -0
  325. package/bin/skills/tinker/SKILL.md +362 -0
  326. package/bin/skills/tinker/references/api-reference.md +168 -0
  327. package/bin/skills/tinker/references/getting-started.md +157 -0
  328. package/bin/skills/tinker/references/loss-functions.md +163 -0
  329. package/bin/skills/tinker/references/models-and-lora.md +139 -0
  330. package/bin/skills/tinker/references/recipes.md +280 -0
  331. package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
  332. package/bin/skills/tinker/references/rendering.md +243 -0
  333. package/bin/skills/tinker/references/supervised-learning.md +232 -0
  334. package/bin/skills/tinker-training-cost/SKILL.md +187 -0
  335. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
  336. package/bin/skills/torchforge/SKILL.md +433 -0
  337. package/bin/skills/torchforge/references/api-reference.md +327 -0
  338. package/bin/skills/torchforge/references/troubleshooting.md +409 -0
  339. package/bin/skills/torchtitan/SKILL.md +358 -0
  340. package/bin/skills/torchtitan/references/checkpoint.md +181 -0
  341. package/bin/skills/torchtitan/references/custom-models.md +258 -0
  342. package/bin/skills/torchtitan/references/float8.md +133 -0
  343. package/bin/skills/torchtitan/references/fsdp.md +126 -0
  344. package/bin/skills/transformer-lens/SKILL.md +346 -0
  345. package/bin/skills/transformer-lens/references/README.md +54 -0
  346. package/bin/skills/transformer-lens/references/api.md +362 -0
  347. package/bin/skills/transformer-lens/references/tutorials.md +339 -0
  348. package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
  349. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
  350. package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
  351. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
  352. package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
  353. package/bin/skills/unsloth/SKILL.md +80 -0
  354. package/bin/skills/unsloth/references/index.md +7 -0
  355. package/bin/skills/unsloth/references/llms-full.md +16799 -0
  356. package/bin/skills/unsloth/references/llms-txt.md +12044 -0
  357. package/bin/skills/unsloth/references/llms.md +82 -0
  358. package/bin/skills/verl/SKILL.md +391 -0
  359. package/bin/skills/verl/references/api-reference.md +301 -0
  360. package/bin/skills/verl/references/troubleshooting.md +391 -0
  361. package/bin/skills/vllm/SKILL.md +364 -0
  362. package/bin/skills/vllm/references/optimization.md +226 -0
  363. package/bin/skills/vllm/references/quantization.md +284 -0
  364. package/bin/skills/vllm/references/server-deployment.md +255 -0
  365. package/bin/skills/vllm/references/troubleshooting.md +447 -0
  366. package/bin/skills/weights-and-biases/SKILL.md +590 -0
  367. package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
  368. package/bin/skills/weights-and-biases/references/integrations.md +700 -0
  369. package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
  370. package/bin/skills/whisper/SKILL.md +317 -0
  371. package/bin/skills/whisper/references/languages.md +189 -0
  372. package/bin/synsc +0 -0
  373. package/package.json +10 -0
@@ -0,0 +1,405 @@
1
+ ---
2
+ name: evaluating-code-models
3
+ description: Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.
4
+ version: 1.0.0
5
+ author: Synthetic Sciences
6
+ license: MIT
7
+ tags: [Evaluation, Code Generation, HumanEval, MBPP, MultiPL-E, Pass@k, BigCode, Benchmarking, Code Models]
8
+ dependencies: [bigcode-evaluation-harness, transformers>=4.25.1, accelerate>=0.13.2, datasets>=2.6.1]
9
+ ---
10
+
11
+ # BigCode Evaluation Harness - Code Model Benchmarking
12
+
13
+ ## Quick Start
14
+
15
+ BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).
16
+
17
+ **Installation**:
18
+ ```bash
19
+ git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
20
+ cd bigcode-evaluation-harness
21
+ pip install -e .
22
+ accelerate config
23
+ ```
24
+
25
+ **Evaluate on HumanEval**:
26
+ ```bash
27
+ accelerate launch main.py \
28
+ --model bigcode/starcoder2-7b \
29
+ --tasks humaneval \
30
+ --max_length_generation 512 \
31
+ --temperature 0.2 \
32
+ --n_samples 20 \
33
+ --batch_size 10 \
34
+ --allow_code_execution \
35
+ --save_generations
36
+ ```
37
+
38
+ **View available tasks**:
39
+ ```bash
40
+ python -c "from bigcode_eval.tasks import ALL_TASKS; print(ALL_TASKS)"
41
+ ```
42
+
43
+ ## Common Workflows
44
+
45
+ ### Workflow 1: Standard Code Benchmark Evaluation
46
+
47
+ Evaluate model on core code benchmarks (HumanEval, MBPP, HumanEval+).
48
+
49
+ **Checklist**:
50
+ ```
51
+ Code Benchmark Evaluation:
52
+ - [ ] Step 1: Choose benchmark suite
53
+ - [ ] Step 2: Configure model and generation
54
+ - [ ] Step 3: Run evaluation with code execution
55
+ - [ ] Step 4: Analyze pass@k results
56
+ ```
57
+
58
+ **Step 1: Choose benchmark suite**
59
+
60
+ **Python code generation** (most common):
61
+ - **HumanEval**: 164 handwritten problems, function completion
62
+ - **HumanEval+**: Same 164 problems with 80× more tests (stricter)
63
+ - **MBPP**: 500 crowd-sourced problems, entry-level difficulty
64
+ - **MBPP+**: 399 curated problems with 35× more tests
65
+
66
+ **Multi-language** (18 languages):
67
+ - **MultiPL-E**: HumanEval/MBPP translated to C++, Java, JavaScript, Go, Rust, etc.
68
+
69
+ **Advanced**:
70
+ - **APPS**: 10,000 problems (introductory/interview/competition)
71
+ - **DS-1000**: 1,000 data science problems across 7 libraries
72
+
73
+ **Step 2: Configure model and generation**
74
+
75
+ ```bash
76
+ # Standard HuggingFace model
77
+ accelerate launch main.py \
78
+ --model bigcode/starcoder2-7b \
79
+ --tasks humaneval \
80
+ --max_length_generation 512 \
81
+ --temperature 0.2 \
82
+ --do_sample True \
83
+ --n_samples 200 \
84
+ --batch_size 50 \
85
+ --allow_code_execution
86
+
87
+ # Quantized model (4-bit)
88
+ accelerate launch main.py \
89
+ --model codellama/CodeLlama-34b-hf \
90
+ --tasks humaneval \
91
+ --load_in_4bit \
92
+ --max_length_generation 512 \
93
+ --allow_code_execution
94
+
95
+ # Custom/private model
96
+ accelerate launch main.py \
97
+ --model /path/to/my-code-model \
98
+ --tasks humaneval \
99
+ --trust_remote_code \
100
+ --use_auth_token \
101
+ --allow_code_execution
102
+ ```
103
+
104
+ **Step 3: Run evaluation**
105
+
106
+ ```bash
107
+ # Full evaluation with pass@k estimation (k=1,10,100)
108
+ accelerate launch main.py \
109
+ --model bigcode/starcoder2-7b \
110
+ --tasks humaneval \
111
+ --temperature 0.8 \
112
+ --n_samples 200 \
113
+ --batch_size 50 \
114
+ --allow_code_execution \
115
+ --save_generations \
116
+ --metric_output_path results/starcoder2-humaneval.json
117
+ ```
118
+
119
+ **Step 4: Analyze results**
120
+
121
+ Results in `results/starcoder2-humaneval.json`:
122
+ ```json
123
+ {
124
+ "humaneval": {
125
+ "pass@1": 0.354,
126
+ "pass@10": 0.521,
127
+ "pass@100": 0.689
128
+ },
129
+ "config": {
130
+ "model": "bigcode/starcoder2-7b",
131
+ "temperature": 0.8,
132
+ "n_samples": 200
133
+ }
134
+ }
135
+ ```
136
+
137
+ ### Workflow 2: Multi-Language Evaluation (MultiPL-E)
138
+
139
+ Evaluate code generation across 18 programming languages.
140
+
141
+ **Checklist**:
142
+ ```
143
+ Multi-Language Evaluation:
144
+ - [ ] Step 1: Generate solutions (host machine)
145
+ - [ ] Step 2: Run evaluation in Docker (safe execution)
146
+ - [ ] Step 3: Compare across languages
147
+ ```
148
+
149
+ **Step 1: Generate solutions on host**
150
+
151
+ ```bash
152
+ # Generate without execution (safe)
153
+ accelerate launch main.py \
154
+ --model bigcode/starcoder2-7b \
155
+ --tasks multiple-py,multiple-js,multiple-java,multiple-cpp \
156
+ --max_length_generation 650 \
157
+ --temperature 0.8 \
158
+ --n_samples 50 \
159
+ --batch_size 50 \
160
+ --generation_only \
161
+ --save_generations \
162
+ --save_generations_path generations_multi.json
163
+ ```
164
+
165
+ **Step 2: Evaluate in Docker container**
166
+
167
+ ```bash
168
+ # Pull the MultiPL-E Docker image
169
+ docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
170
+
171
+ # Run evaluation inside container
172
+ docker run -v $(pwd)/generations_multi.json:/app/generations.json:ro \
173
+ -it evaluation-harness-multiple python3 main.py \
174
+ --model bigcode/starcoder2-7b \
175
+ --tasks multiple-py,multiple-js,multiple-java,multiple-cpp \
176
+ --load_generations_path /app/generations.json \
177
+ --allow_code_execution \
178
+ --n_samples 50
179
+ ```
180
+
181
+ **Supported languages**: Python, JavaScript, Java, C++, Go, Rust, TypeScript, C#, PHP, Ruby, Swift, Kotlin, Scala, Perl, Julia, Lua, R, Racket
182
+
183
+ ### Workflow 3: Instruction-Tuned Model Evaluation
184
+
185
+ Evaluate chat/instruction models with proper formatting.
186
+
187
+ **Checklist**:
188
+ ```
189
+ Instruction Model Evaluation:
190
+ - [ ] Step 1: Use instruction-tuned tasks
191
+ - [ ] Step 2: Configure instruction tokens
192
+ - [ ] Step 3: Run evaluation
193
+ ```
194
+
195
+ **Step 1: Choose instruction tasks**
196
+
197
+ - **instruct-humaneval**: HumanEval with instruction prompts
198
+ - **humanevalsynthesize-{lang}**: HumanEvalPack synthesis tasks
199
+
200
+ **Step 2: Configure instruction tokens**
201
+
202
+ ```bash
203
+ # For models with chat templates (e.g., CodeLlama-Instruct)
204
+ accelerate launch main.py \
205
+ --model codellama/CodeLlama-7b-Instruct-hf \
206
+ --tasks instruct-humaneval \
207
+ --instruction_tokens "<s>[INST],</s>,[/INST]" \
208
+ --max_length_generation 512 \
209
+ --allow_code_execution
210
+ ```
211
+
212
+ **Step 3: HumanEvalPack for instruction models**
213
+
214
+ ```bash
215
+ # Test code synthesis across 6 languages
216
+ accelerate launch main.py \
217
+ --model codellama/CodeLlama-7b-Instruct-hf \
218
+ --tasks humanevalsynthesize-python,humanevalsynthesize-js \
219
+ --prompt instruct \
220
+ --max_length_generation 512 \
221
+ --allow_code_execution
222
+ ```
223
+
224
+ ### Workflow 4: Compare Multiple Models
225
+
226
+ Benchmark suite for model comparison.
227
+
228
+ **Step 1: Create evaluation script**
229
+
230
+ ```bash
231
+ #!/bin/bash
232
+ # eval_models.sh
233
+
234
+ MODELS=(
235
+ "bigcode/starcoder2-7b"
236
+ "codellama/CodeLlama-7b-hf"
237
+ "deepseek-ai/deepseek-coder-6.7b-base"
238
+ )
239
+ TASKS="humaneval,mbpp"
240
+
241
+ for model in "${MODELS[@]}"; do
242
+ model_name=$(echo $model | tr '/' '-')
243
+ echo "Evaluating $model"
244
+
245
+ accelerate launch main.py \
246
+ --model $model \
247
+ --tasks $TASKS \
248
+ --temperature 0.2 \
249
+ --n_samples 20 \
250
+ --batch_size 20 \
251
+ --allow_code_execution \
252
+ --metric_output_path results/${model_name}.json
253
+ done
254
+ ```
255
+
256
+ **Step 2: Generate comparison table**
257
+
258
+ ```python
259
+ import json
260
+ import pandas as pd
261
+
262
+ models = ["bigcode-starcoder2-7b", "codellama-CodeLlama-7b-hf", "deepseek-ai-deepseek-coder-6.7b-base"]
263
+ results = []
264
+
265
+ for model in models:
266
+ with open(f"results/{model}.json") as f:
267
+ data = json.load(f)
268
+ results.append({
269
+ "Model": model,
270
+ "HumanEval pass@1": f"{data['humaneval']['pass@1']:.3f}",
271
+ "MBPP pass@1": f"{data['mbpp']['pass@1']:.3f}"
272
+ })
273
+
274
+ df = pd.DataFrame(results)
275
+ print(df.to_markdown(index=False))
276
+ ```
277
+
278
+ ## When to Use vs Alternatives
279
+
280
+ **Use BigCode Evaluation Harness when:**
281
+ - Evaluating **code generation** models specifically
282
+ - Need **multi-language** evaluation (18 languages via MultiPL-E)
283
+ - Testing **functional correctness** with unit tests (pass@k)
284
+ - Benchmarking for **BigCode/HuggingFace leaderboards**
285
+ - Evaluating **fill-in-the-middle** (FIM) capabilities
286
+
287
+ **Use alternatives instead:**
288
+ - **lm-evaluation-harness**: General LLM benchmarks (MMLU, GSM8K, HellaSwag)
289
+ - **EvalPlus**: Stricter HumanEval+/MBPP+ with more test cases
290
+ - **SWE-bench**: Real-world GitHub issue resolution
291
+ - **LiveCodeBench**: Contamination-free, continuously updated problems
292
+ - **CodeXGLUE**: Code understanding tasks (clone detection, defect prediction)
293
+
294
+ ## Supported Benchmarks
295
+
296
+ | Benchmark | Problems | Languages | Metric | Use Case |
297
+ |-----------|----------|-----------|--------|----------|
298
+ | HumanEval | 164 | Python | pass@k | Standard code completion |
299
+ | HumanEval+ | 164 | Python | pass@k | Stricter evaluation (80× tests) |
300
+ | MBPP | 500 | Python | pass@k | Entry-level problems |
301
+ | MBPP+ | 399 | Python | pass@k | Stricter evaluation (35× tests) |
302
+ | MultiPL-E | 164×18 | 18 languages | pass@k | Multi-language evaluation |
303
+ | APPS | 10,000 | Python | pass@k | Competition-level |
304
+ | DS-1000 | 1,000 | Python | pass@k | Data science (pandas, numpy, etc.) |
305
+ | HumanEvalPack | 164×3×6 | 6 languages | pass@k | Synthesis/fix/explain |
306
+ | Mercury | 1,889 | Python | Efficiency | Computational efficiency |
307
+
308
+ ## Common Issues
309
+
310
+ **Issue: Different results than reported in papers**
311
+
312
+ Check these factors:
313
+ ```bash
314
+ # 1. Verify n_samples (need 200 for accurate pass@k)
315
+ --n_samples 200
316
+
317
+ # 2. Check temperature (0.2 for greedy-ish, 0.8 for sampling)
318
+ --temperature 0.8
319
+
320
+ # 3. Verify task name matches exactly
321
+ --tasks humaneval # Not "human_eval" or "HumanEval"
322
+
323
+ # 4. Check max_length_generation
324
+ --max_length_generation 512 # Increase for longer problems
325
+ ```
326
+
327
+ **Issue: CUDA out of memory**
328
+
329
+ ```bash
330
+ # Use quantization
331
+ --load_in_8bit
332
+ # OR
333
+ --load_in_4bit
334
+
335
+ # Reduce batch size
336
+ --batch_size 1
337
+
338
+ # Set memory limit
339
+ --max_memory_per_gpu "20GiB"
340
+ ```
341
+
342
+ **Issue: Code execution hangs or times out**
343
+
344
+ Use Docker for safe execution:
345
+ ```bash
346
+ # Generate on host (no execution)
347
+ --generation_only --save_generations
348
+
349
+ # Evaluate in Docker
350
+ docker run ... --allow_code_execution --load_generations_path ...
351
+ ```
352
+
353
+ **Issue: Low scores on instruction models**
354
+
355
+ Ensure proper instruction formatting:
356
+ ```bash
357
+ # Use instruction-specific tasks
358
+ --tasks instruct-humaneval
359
+
360
+ # Set instruction tokens for your model
361
+ --instruction_tokens "<s>[INST],</s>,[/INST]"
362
+ ```
363
+
364
+ **Issue: MultiPL-E language failures**
365
+
366
+ Use the dedicated Docker image:
367
+ ```bash
368
+ docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
369
+ ```
370
+
371
+ ## Command Reference
372
+
373
+ | Argument | Default | Description |
374
+ |----------|---------|-------------|
375
+ | `--model` | - | HuggingFace model ID or local path |
376
+ | `--tasks` | - | Comma-separated task names |
377
+ | `--n_samples` | 1 | Samples per problem (200 for pass@k) |
378
+ | `--temperature` | 0.2 | Sampling temperature |
379
+ | `--max_length_generation` | 512 | Max tokens (prompt + generation) |
380
+ | `--batch_size` | 1 | Batch size per GPU |
381
+ | `--allow_code_execution` | False | Enable code execution (required) |
382
+ | `--generation_only` | False | Generate without evaluation |
383
+ | `--load_generations_path` | - | Load pre-generated solutions |
384
+ | `--save_generations` | False | Save generated code |
385
+ | `--metric_output_path` | results.json | Output file for metrics |
386
+ | `--load_in_8bit` | False | 8-bit quantization |
387
+ | `--load_in_4bit` | False | 4-bit quantization |
388
+ | `--trust_remote_code` | False | Allow custom model code |
389
+ | `--precision` | fp32 | Model precision (fp32/fp16/bf16) |
390
+
391
+ ## Hardware Requirements
392
+
393
+ | Model Size | VRAM (fp16) | VRAM (4-bit) | Time (HumanEval, n=200) |
394
+ |------------|-------------|--------------|-------------------------|
395
+ | 7B | 14GB | 6GB | ~30 min (A100) |
396
+ | 13B | 26GB | 10GB | ~1 hour (A100) |
397
+ | 34B | 68GB | 20GB | ~2 hours (A100) |
398
+
399
+ ## Resources
400
+
401
+ - **GitHub**: https://github.com/bigcode-project/bigcode-evaluation-harness
402
+ - **Documentation**: https://github.com/bigcode-project/bigcode-evaluation-harness/tree/main/docs
403
+ - **BigCode Leaderboard**: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
404
+ - **HumanEval Dataset**: https://huggingface.co/datasets/openai/openai_humaneval
405
+ - **MultiPL-E**: https://github.com/nuprl/MultiPL-E