@synsci/cli-darwin-x64 1.1.70 → 1.1.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (339) hide show
  1. package/bin/skills/citation-management/SKILL.md +1109 -0
  2. package/bin/skills/citation-management/assets/bibtex_template.bib +264 -0
  3. package/bin/skills/citation-management/assets/citation_checklist.md +386 -0
  4. package/bin/skills/citation-management/references/bibtex_formatting.md +908 -0
  5. package/bin/skills/citation-management/references/citation_validation.md +794 -0
  6. package/bin/skills/citation-management/references/google_scholar_search.md +725 -0
  7. package/bin/skills/citation-management/references/metadata_extraction.md +870 -0
  8. package/bin/skills/citation-management/references/pubmed_search.md +839 -0
  9. package/bin/skills/citation-management/scripts/doi_to_bibtex.py +182 -0
  10. package/bin/skills/citation-management/scripts/extract_metadata.py +570 -0
  11. package/bin/skills/citation-management/scripts/format_bibtex.py +349 -0
  12. package/bin/skills/citation-management/scripts/search_google_scholar.py +251 -0
  13. package/bin/skills/citation-management/scripts/search_pubmed.py +348 -0
  14. package/bin/skills/citation-management/scripts/validate_citations.py +494 -0
  15. package/bin/skills/clinical-decision-support/README.md +129 -0
  16. package/bin/skills/clinical-decision-support/SKILL.md +506 -0
  17. package/bin/skills/clinical-decision-support/assets/biomarker_report_template.tex +380 -0
  18. package/bin/skills/clinical-decision-support/assets/clinical_pathway_template.tex +222 -0
  19. package/bin/skills/clinical-decision-support/assets/cohort_analysis_template.tex +359 -0
  20. package/bin/skills/clinical-decision-support/assets/color_schemes.tex +149 -0
  21. package/bin/skills/clinical-decision-support/assets/example_gbm_cohort.md +208 -0
  22. package/bin/skills/clinical-decision-support/assets/recommendation_strength_guide.md +328 -0
  23. package/bin/skills/clinical-decision-support/assets/treatment_recommendation_template.tex +529 -0
  24. package/bin/skills/clinical-decision-support/references/biomarker_classification.md +719 -0
  25. package/bin/skills/clinical-decision-support/references/clinical_decision_algorithms.md +604 -0
  26. package/bin/skills/clinical-decision-support/references/evidence_synthesis.md +840 -0
  27. package/bin/skills/clinical-decision-support/references/outcome_analysis.md +640 -0
  28. package/bin/skills/clinical-decision-support/references/patient_cohort_analysis.md +427 -0
  29. package/bin/skills/clinical-decision-support/references/treatment_recommendations.md +521 -0
  30. package/bin/skills/clinical-decision-support/scripts/biomarker_classifier.py +383 -0
  31. package/bin/skills/clinical-decision-support/scripts/build_decision_tree.py +417 -0
  32. package/bin/skills/clinical-decision-support/scripts/create_cohort_tables.py +509 -0
  33. package/bin/skills/clinical-decision-support/scripts/generate_survival_analysis.py +441 -0
  34. package/bin/skills/clinical-decision-support/scripts/validate_cds_document.py +326 -0
  35. package/bin/skills/clinical-reports/IMPLEMENTATION_SUMMARY.md +641 -0
  36. package/bin/skills/clinical-reports/README.md +236 -0
  37. package/bin/skills/clinical-reports/SKILL.md +1127 -0
  38. package/bin/skills/clinical-reports/assets/case_report_template.md +352 -0
  39. package/bin/skills/clinical-reports/assets/clinical_trial_csr_template.md +353 -0
  40. package/bin/skills/clinical-reports/assets/clinical_trial_sae_template.md +359 -0
  41. package/bin/skills/clinical-reports/assets/consult_note_template.md +305 -0
  42. package/bin/skills/clinical-reports/assets/discharge_summary_template.md +453 -0
  43. package/bin/skills/clinical-reports/assets/hipaa_compliance_checklist.md +395 -0
  44. package/bin/skills/clinical-reports/assets/history_physical_template.md +305 -0
  45. package/bin/skills/clinical-reports/assets/lab_report_template.md +309 -0
  46. package/bin/skills/clinical-reports/assets/pathology_report_template.md +249 -0
  47. package/bin/skills/clinical-reports/assets/quality_checklist.md +338 -0
  48. package/bin/skills/clinical-reports/assets/radiology_report_template.md +318 -0
  49. package/bin/skills/clinical-reports/assets/soap_note_template.md +253 -0
  50. package/bin/skills/clinical-reports/references/case_report_guidelines.md +570 -0
  51. package/bin/skills/clinical-reports/references/clinical_trial_reporting.md +693 -0
  52. package/bin/skills/clinical-reports/references/data_presentation.md +530 -0
  53. package/bin/skills/clinical-reports/references/diagnostic_reports_standards.md +629 -0
  54. package/bin/skills/clinical-reports/references/medical_terminology.md +588 -0
  55. package/bin/skills/clinical-reports/references/patient_documentation.md +744 -0
  56. package/bin/skills/clinical-reports/references/peer_review_standards.md +585 -0
  57. package/bin/skills/clinical-reports/references/regulatory_compliance.md +577 -0
  58. package/bin/skills/clinical-reports/scripts/check_deidentification.py +332 -0
  59. package/bin/skills/clinical-reports/scripts/compliance_checker.py +78 -0
  60. package/bin/skills/clinical-reports/scripts/extract_clinical_data.py +97 -0
  61. package/bin/skills/clinical-reports/scripts/format_adverse_events.py +97 -0
  62. package/bin/skills/clinical-reports/scripts/generate_report_template.py +149 -0
  63. package/bin/skills/clinical-reports/scripts/terminology_validator.py +126 -0
  64. package/bin/skills/clinical-reports/scripts/validate_case_report.py +323 -0
  65. package/bin/skills/clinical-reports/scripts/validate_trial_report.py +88 -0
  66. package/bin/skills/fireworks-ai/SKILL.md +665 -0
  67. package/bin/skills/generate-image/SKILL.md +178 -0
  68. package/bin/skills/generate-image/scripts/generate_image.py +254 -0
  69. package/bin/skills/groq/SKILL.md +347 -0
  70. package/bin/skills/hypothesis-generation/SKILL.md +293 -0
  71. package/bin/skills/hypothesis-generation/assets/FORMATTING_GUIDE.md +672 -0
  72. package/bin/skills/hypothesis-generation/assets/hypothesis_generation.sty +307 -0
  73. package/bin/skills/hypothesis-generation/assets/hypothesis_report_template.tex +572 -0
  74. package/bin/skills/hypothesis-generation/references/experimental_design_patterns.md +329 -0
  75. package/bin/skills/hypothesis-generation/references/hypothesis_quality_criteria.md +198 -0
  76. package/bin/skills/hypothesis-generation/references/literature_search_strategies.md +622 -0
  77. package/bin/skills/latex-posters/README.md +417 -0
  78. package/bin/skills/latex-posters/SKILL.md +1602 -0
  79. package/bin/skills/latex-posters/assets/baposter_template.tex +257 -0
  80. package/bin/skills/latex-posters/assets/beamerposter_template.tex +244 -0
  81. package/bin/skills/latex-posters/assets/poster_quality_checklist.md +358 -0
  82. package/bin/skills/latex-posters/assets/tikzposter_template.tex +251 -0
  83. package/bin/skills/latex-posters/references/latex_poster_packages.md +745 -0
  84. package/bin/skills/latex-posters/references/poster_content_guide.md +748 -0
  85. package/bin/skills/latex-posters/references/poster_design_principles.md +806 -0
  86. package/bin/skills/latex-posters/references/poster_layout_design.md +900 -0
  87. package/bin/skills/latex-posters/scripts/review_poster.sh +214 -0
  88. package/bin/skills/literature-review/SKILL.md +641 -0
  89. package/bin/skills/literature-review/assets/review_template.md +412 -0
  90. package/bin/skills/literature-review/references/citation_styles.md +166 -0
  91. package/bin/skills/literature-review/references/database_strategies.md +455 -0
  92. package/bin/skills/literature-review/scripts/generate_pdf.py +184 -0
  93. package/bin/skills/literature-review/scripts/search_databases.py +310 -0
  94. package/bin/skills/literature-review/scripts/verify_citations.py +218 -0
  95. package/bin/skills/market-research-reports/SKILL.md +904 -0
  96. package/bin/skills/market-research-reports/assets/FORMATTING_GUIDE.md +428 -0
  97. package/bin/skills/market-research-reports/assets/market_report_template.tex +1380 -0
  98. package/bin/skills/market-research-reports/assets/market_research.sty +564 -0
  99. package/bin/skills/market-research-reports/references/data_analysis_patterns.md +548 -0
  100. package/bin/skills/market-research-reports/references/report_structure_guide.md +999 -0
  101. package/bin/skills/market-research-reports/references/visual_generation_guide.md +1077 -0
  102. package/bin/skills/market-research-reports/scripts/generate_market_visuals.py +472 -0
  103. package/bin/skills/markitdown/INSTALLATION_GUIDE.md +318 -0
  104. package/bin/skills/markitdown/LICENSE.txt +22 -0
  105. package/bin/skills/markitdown/OPENROUTER_INTEGRATION.md +359 -0
  106. package/bin/skills/markitdown/QUICK_REFERENCE.md +309 -0
  107. package/bin/skills/markitdown/README.md +184 -0
  108. package/bin/skills/markitdown/SKILL.md +486 -0
  109. package/bin/skills/markitdown/SKILL_SUMMARY.md +307 -0
  110. package/bin/skills/markitdown/assets/example_usage.md +463 -0
  111. package/bin/skills/markitdown/references/api_reference.md +399 -0
  112. package/bin/skills/markitdown/references/file_formats.md +542 -0
  113. package/bin/skills/markitdown/scripts/batch_convert.py +195 -0
  114. package/bin/skills/markitdown/scripts/convert_literature.py +262 -0
  115. package/bin/skills/markitdown/scripts/convert_with_ai.py +224 -0
  116. package/bin/skills/ml-paper-writing/SKILL.md +937 -0
  117. package/bin/skills/ml-paper-writing/references/checklists.md +361 -0
  118. package/bin/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  119. package/bin/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  120. package/bin/skills/ml-paper-writing/references/sources.md +159 -0
  121. package/bin/skills/ml-paper-writing/references/writing-guide.md +476 -0
  122. package/bin/skills/ml-paper-writing/templates/README.md +251 -0
  123. package/bin/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  124. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  125. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  126. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  127. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  128. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  129. package/bin/skills/ml-paper-writing/templates/acl/README.md +50 -0
  130. package/bin/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  131. package/bin/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  132. package/bin/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  133. package/bin/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  134. package/bin/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  135. package/bin/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  136. package/bin/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  137. package/bin/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  138. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  139. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  140. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  141. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  142. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  143. package/bin/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  144. package/bin/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  145. package/bin/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  146. package/bin/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  147. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  148. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  149. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  150. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  151. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  152. package/bin/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  153. package/bin/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  154. package/bin/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  155. package/bin/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  156. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  157. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  158. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  159. package/bin/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  160. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  161. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  162. package/bin/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  163. package/bin/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  164. package/bin/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  165. package/bin/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  166. package/bin/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  167. package/bin/skills/paper-2-web/SKILL.md +491 -0
  168. package/bin/skills/paper-2-web/references/installation.md +141 -0
  169. package/bin/skills/paper-2-web/references/paper2poster.md +346 -0
  170. package/bin/skills/paper-2-web/references/paper2video.md +305 -0
  171. package/bin/skills/paper-2-web/references/paper2web.md +187 -0
  172. package/bin/skills/paper-2-web/references/usage_examples.md +436 -0
  173. package/bin/skills/peer-review/SKILL.md +702 -0
  174. package/bin/skills/peer-review/references/calibration_guidelines.md +196 -0
  175. package/bin/skills/peer-review/references/common_issues.md +552 -0
  176. package/bin/skills/peer-review/references/paper_mechanics.md +269 -0
  177. package/bin/skills/peer-review/references/reporting_standards.md +290 -0
  178. package/bin/skills/peer-review/references/scoring_rubric.md +239 -0
  179. package/bin/skills/pptx-posters/SKILL.md +410 -0
  180. package/bin/skills/pptx-posters/assets/poster_html_template.html +257 -0
  181. package/bin/skills/pptx-posters/assets/poster_quality_checklist.md +358 -0
  182. package/bin/skills/pptx-posters/references/poster_content_guide.md +748 -0
  183. package/bin/skills/pptx-posters/references/poster_design_principles.md +806 -0
  184. package/bin/skills/pptx-posters/references/poster_layout_design.md +900 -0
  185. package/bin/skills/research-grants/README.md +285 -0
  186. package/bin/skills/research-grants/SKILL.md +938 -0
  187. package/bin/skills/research-grants/assets/budget_justification_template.md +453 -0
  188. package/bin/skills/research-grants/assets/nih_specific_aims_template.md +166 -0
  189. package/bin/skills/research-grants/assets/nsf_project_summary_template.md +92 -0
  190. package/bin/skills/research-grants/references/broader_impacts.md +392 -0
  191. package/bin/skills/research-grants/references/darpa_guidelines.md +636 -0
  192. package/bin/skills/research-grants/references/doe_guidelines.md +586 -0
  193. package/bin/skills/research-grants/references/nih_guidelines.md +851 -0
  194. package/bin/skills/research-grants/references/nsf_guidelines.md +570 -0
  195. package/bin/skills/research-grants/references/specific_aims_guide.md +458 -0
  196. package/bin/skills/research-lookup/README.md +156 -0
  197. package/bin/skills/research-lookup/SKILL.md +606 -0
  198. package/bin/skills/research-lookup/examples.py +174 -0
  199. package/bin/skills/research-lookup/lookup.py +187 -0
  200. package/bin/skills/research-lookup/research_lookup.py +483 -0
  201. package/bin/skills/research-lookup/scripts/research_lookup.py +483 -0
  202. package/bin/skills/scholar-evaluation/SKILL.md +289 -0
  203. package/bin/skills/scholar-evaluation/references/evaluation_framework.md +663 -0
  204. package/bin/skills/scholar-evaluation/scripts/calculate_scores.py +366 -0
  205. package/bin/skills/scientific-critical-thinking/SKILL.md +566 -0
  206. package/bin/skills/scientific-critical-thinking/references/common_biases.md +364 -0
  207. package/bin/skills/scientific-critical-thinking/references/evidence_hierarchy.md +484 -0
  208. package/bin/skills/scientific-critical-thinking/references/experimental_design.md +496 -0
  209. package/bin/skills/scientific-critical-thinking/references/logical_fallacies.md +478 -0
  210. package/bin/skills/scientific-critical-thinking/references/scientific_method.md +169 -0
  211. package/bin/skills/scientific-critical-thinking/references/statistical_pitfalls.md +506 -0
  212. package/bin/skills/scientific-schematics/QUICK_REFERENCE.md +207 -0
  213. package/bin/skills/scientific-schematics/README.md +327 -0
  214. package/bin/skills/scientific-schematics/SKILL.md +615 -0
  215. package/bin/skills/scientific-schematics/example_usage.sh +89 -0
  216. package/bin/skills/scientific-schematics/references/best_practices.md +559 -0
  217. package/bin/skills/scientific-schematics/scripts/generate_schematic.py +135 -0
  218. package/bin/skills/scientific-schematics/scripts/generate_schematic_ai.py +807 -0
  219. package/bin/skills/scientific-schematics/test_ai_generation.py +243 -0
  220. package/bin/skills/scientific-slides/SKILL.md +942 -0
  221. package/bin/skills/scientific-slides/assets/timing_guidelines.md +597 -0
  222. package/bin/skills/scientific-slides/references/data_visualization_slides.md +708 -0
  223. package/bin/skills/scientific-slides/references/presentation_structure.md +642 -0
  224. package/bin/skills/scientific-slides/references/slide_design_principles.md +849 -0
  225. package/bin/skills/scientific-slides/references/talk_types_guide.md +687 -0
  226. package/bin/skills/scientific-slides/references/visual_review_workflow.md +775 -0
  227. package/bin/skills/scientific-slides/scripts/generate_slide_image.py +143 -0
  228. package/bin/skills/scientific-slides/scripts/generate_slide_image_ai.py +748 -0
  229. package/bin/skills/scientific-slides/scripts/pdf_to_images.py +201 -0
  230. package/bin/skills/scientific-slides/scripts/slides_to_pdf.py +220 -0
  231. package/bin/skills/scientific-slides/scripts/validate_presentation.py +367 -0
  232. package/bin/skills/scientific-writing/SKILL.md +714 -0
  233. package/bin/skills/scientific-writing/assets/REPORT_FORMATTING_GUIDE.md +574 -0
  234. package/bin/skills/scientific-writing/assets/scientific_report.sty +606 -0
  235. package/bin/skills/scientific-writing/assets/scientific_report_template.tex +449 -0
  236. package/bin/skills/scientific-writing/references/citation_styles.md +720 -0
  237. package/bin/skills/scientific-writing/references/figures_tables.md +806 -0
  238. package/bin/skills/scientific-writing/references/imrad_structure.md +686 -0
  239. package/bin/skills/scientific-writing/references/professional_report_formatting.md +664 -0
  240. package/bin/skills/scientific-writing/references/reporting_guidelines.md +748 -0
  241. package/bin/skills/scientific-writing/references/writing_principles.md +824 -0
  242. package/bin/skills/tinker/SKILL.md +2 -3
  243. package/bin/skills/together-ai/SKILL.md +722 -0
  244. package/bin/skills/treatment-plans/README.md +488 -0
  245. package/bin/skills/treatment-plans/SKILL.md +1579 -0
  246. package/bin/skills/treatment-plans/assets/STYLING_QUICK_REFERENCE.md +185 -0
  247. package/bin/skills/treatment-plans/assets/chronic_disease_management_plan.tex +665 -0
  248. package/bin/skills/treatment-plans/assets/general_medical_treatment_plan.tex +547 -0
  249. package/bin/skills/treatment-plans/assets/medical_treatment_plan.sty +222 -0
  250. package/bin/skills/treatment-plans/assets/mental_health_treatment_plan.tex +774 -0
  251. package/bin/skills/treatment-plans/assets/one_page_treatment_plan.tex +193 -0
  252. package/bin/skills/treatment-plans/assets/pain_management_plan.tex +799 -0
  253. package/bin/skills/treatment-plans/assets/perioperative_care_plan.tex +753 -0
  254. package/bin/skills/treatment-plans/assets/quality_checklist.md +471 -0
  255. package/bin/skills/treatment-plans/assets/rehabilitation_treatment_plan.tex +756 -0
  256. package/bin/skills/treatment-plans/references/goal_setting_frameworks.md +411 -0
  257. package/bin/skills/treatment-plans/references/intervention_guidelines.md +507 -0
  258. package/bin/skills/treatment-plans/references/regulatory_compliance.md +476 -0
  259. package/bin/skills/treatment-plans/references/specialty_specific_guidelines.md +655 -0
  260. package/bin/skills/treatment-plans/references/treatment_plan_standards.md +485 -0
  261. package/bin/skills/treatment-plans/scripts/check_completeness.py +322 -0
  262. package/bin/skills/treatment-plans/scripts/generate_template.py +233 -0
  263. package/bin/skills/treatment-plans/scripts/timeline_generator.py +385 -0
  264. package/bin/skills/treatment-plans/scripts/validate_treatment_plan.py +369 -0
  265. package/bin/skills/unsloth/SKILL.md +565 -47
  266. package/bin/skills/unsloth/docs/advanced-rl.md +222 -0
  267. package/bin/skills/unsloth/docs/chat-templates.md +141 -0
  268. package/bin/skills/unsloth/docs/datasets.md +489 -0
  269. package/bin/skills/unsloth/docs/docker-extended.md +99 -0
  270. package/bin/skills/unsloth/docs/dynamic-ggufs-2.0.md +116 -0
  271. package/bin/skills/unsloth/docs/dynamic-ggufs-aider.md +118 -0
  272. package/bin/skills/unsloth/docs/faq.md +91 -0
  273. package/bin/skills/unsloth/docs/fp16-vs-bf16.md +61 -0
  274. package/bin/skills/unsloth/docs/fp8-rl.md +224 -0
  275. package/bin/skills/unsloth/docs/glm-4.7-flash.md +997 -0
  276. package/bin/skills/unsloth/docs/inference-deployment-overview.md +17 -0
  277. package/bin/skills/unsloth/docs/inference.md +27 -0
  278. package/bin/skills/unsloth/docs/installation-docker.md +155 -0
  279. package/bin/skills/unsloth/docs/installation-pip.md +148 -0
  280. package/bin/skills/unsloth/docs/kernels-packing.md +190 -0
  281. package/bin/skills/unsloth/docs/kimi-k2.5.md +634 -0
  282. package/bin/skills/unsloth/docs/lm-studio.md +235 -0
  283. package/bin/skills/unsloth/docs/lora-hot-swapping.md +75 -0
  284. package/bin/skills/unsloth/docs/lora-hyperparameters.md +363 -0
  285. package/bin/skills/unsloth/docs/memory-efficient-rl.md +267 -0
  286. package/bin/skills/unsloth/docs/model-selection.md +70 -0
  287. package/bin/skills/unsloth/docs/models.md +532 -0
  288. package/bin/skills/unsloth/docs/multi-gpu-ddp.md +90 -0
  289. package/bin/skills/unsloth/docs/notebooks.md +223 -0
  290. package/bin/skills/unsloth/docs/overview.md +110 -0
  291. package/bin/skills/unsloth/docs/qwen3-coder-next-extended.md +900 -0
  292. package/bin/skills/unsloth/docs/qwen3-coder-next.md +900 -0
  293. package/bin/skills/unsloth/docs/requirements.md +45 -0
  294. package/bin/skills/unsloth/docs/reward-hacking.md +25 -0
  295. package/bin/skills/unsloth/docs/saving-to-gguf.md +138 -0
  296. package/bin/skills/unsloth/docs/saving-to-ollama.md +46 -0
  297. package/bin/skills/unsloth/docs/sglang-guide.md +278 -0
  298. package/bin/skills/unsloth/docs/speculative-decoding.md +70 -0
  299. package/bin/skills/unsloth/docs/tool-calling.md +334 -0
  300. package/bin/skills/unsloth/docs/troubleshooting-faq.md +204 -0
  301. package/bin/skills/unsloth/docs/troubleshooting-inference.md +26 -0
  302. package/bin/skills/unsloth/docs/tts-fine-tuning.md +149 -0
  303. package/bin/skills/unsloth/docs/tutorial-grpo.md +273 -0
  304. package/bin/skills/unsloth/docs/tutorial-llama3-ollama.md +356 -0
  305. package/bin/skills/unsloth/docs/vision-fine-tuning.md +135 -0
  306. package/bin/skills/unsloth/docs/vision-rl.md +170 -0
  307. package/bin/skills/unsloth/docs/vllm-engine-arguments.md +43 -0
  308. package/bin/skills/unsloth/docs/vllm-guide.md +98 -0
  309. package/bin/skills/venue-templates/SKILL.md +686 -0
  310. package/bin/skills/venue-templates/assets/examples/cell_summary_example.md +247 -0
  311. package/bin/skills/venue-templates/assets/examples/medical_structured_abstract.md +313 -0
  312. package/bin/skills/venue-templates/assets/examples/nature_abstract_examples.md +213 -0
  313. package/bin/skills/venue-templates/assets/examples/neurips_introduction_example.md +245 -0
  314. package/bin/skills/venue-templates/assets/grants/nih_specific_aims.tex +235 -0
  315. package/bin/skills/venue-templates/assets/grants/nsf_proposal_template.tex +375 -0
  316. package/bin/skills/venue-templates/assets/journals/nature_article.tex +171 -0
  317. package/bin/skills/venue-templates/assets/journals/neurips_article.tex +283 -0
  318. package/bin/skills/venue-templates/assets/journals/plos_one.tex +317 -0
  319. package/bin/skills/venue-templates/assets/posters/beamerposter_academic.tex +311 -0
  320. package/bin/skills/venue-templates/references/cell_press_style.md +483 -0
  321. package/bin/skills/venue-templates/references/conferences_formatting.md +564 -0
  322. package/bin/skills/venue-templates/references/cs_conference_style.md +463 -0
  323. package/bin/skills/venue-templates/references/grants_requirements.md +787 -0
  324. package/bin/skills/venue-templates/references/journals_formatting.md +486 -0
  325. package/bin/skills/venue-templates/references/medical_journal_styles.md +535 -0
  326. package/bin/skills/venue-templates/references/ml_conference_style.md +556 -0
  327. package/bin/skills/venue-templates/references/nature_science_style.md +405 -0
  328. package/bin/skills/venue-templates/references/posters_guidelines.md +628 -0
  329. package/bin/skills/venue-templates/references/reviewer_expectations.md +417 -0
  330. package/bin/skills/venue-templates/references/venue_writing_styles.md +321 -0
  331. package/bin/skills/venue-templates/scripts/customize_template.py +195 -0
  332. package/bin/skills/venue-templates/scripts/query_template.py +266 -0
  333. package/bin/skills/venue-templates/scripts/validate_format.py +250 -0
  334. package/bin/synsc +0 -0
  335. package/package.json +1 -1
  336. package/bin/skills/unsloth/references/index.md +0 -7
  337. package/bin/skills/unsloth/references/llms-full.md +0 -16799
  338. package/bin/skills/unsloth/references/llms-txt.md +0 -12044
  339. package/bin/skills/unsloth/references/llms.md +0 -82
@@ -0,0 +1,45 @@
1
+ # Unsloth Requirements
2
+
3
+ ## System Requirements
4
+
5
+ * **Operating System**: Works on Linux and [Windows](https://docs.unsloth.ai/get-started/install-and-update/windows-installation)
6
+ * Supports NVIDIA GPUs since 2018+ including [Blackwell RTX 50](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth) and [DGX Spark](https://unsloth.ai/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth)
7
+ * [fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth "mention")
8
+ * [fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth](https://unsloth.ai/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth "mention")
9
+ * Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20 & 50, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
10
+ * The official [Unsloth Docker image](https://hub.docker.com/r/unsloth/unsloth) `unsloth/unsloth` is available on Docker Hub
11
+ * [docker](https://unsloth.ai/docs/get-started/install/docker "mention")
12
+ * Unsloth works on [AMD](https://unsloth.ai/docs/get-started/fine-tuning-for-beginners/broken-reference) and [Intel](https://github.com/unslothai/unsloth/pull/2621) GPUs! Apple/Silicon/MLX is in the works
13
+ * If you have different versions of torch, transformers etc., `pip install unsloth` will automatically install all the latest versions of those libraries so you don't need to worry about version compatibility.
14
+ * Your device should have `xformers`, `torch`, `BitsandBytes` and `triton` support.
15
+
16
+ {% hint style="info" %}
17
+ Python 3.13 is now supported!
18
+ {% endhint %}
19
+
20
+ ## Fine-tuning VRAM requirements:
21
+
22
+ How much GPU memory do I need for LLM fine-tuning using Unsloth?
23
+
24
+ {% hint style="info" %}
25
+ A common issue when you OOM or run out of memory is because you set your batch size too high. Set it to 1, 2, or 3 to use less VRAM.
26
+
27
+ **For context length benchmarks, see** [**here**](https://unsloth.ai/docs/basics/unsloth-benchmarks#context-length-benchmarks)**.**
28
+ {% endhint %}
29
+
30
+ Check this table for VRAM requirements sorted by model parameters and fine-tuning method. QLoRA uses 4-bit, LoRA uses 16-bit. Keep in mind that sometimes more VRAM is required depending on the model so these numbers are the absolute minimum:
31
+
32
+ | Model parameters | QLoRA (4-bit) VRAM | LoRA (16-bit) VRAM |
33
+ | ---------------- | ------------------ | ------------------ |
34
+ | 3B | 3.5 GB | 8 GB |
35
+ | 7B | 5 GB | 19 GB |
36
+ | 8B | 6 GB | 22 GB |
37
+ | 9B | 6.5 GB | 24 GB |
38
+ | 11B | 7.5 GB | 29 GB |
39
+ | 14B | 8.5 GB | 33 GB |
40
+ | 27B | 22GB | 64GB |
41
+ | 32B | 26 GB | 76 GB |
42
+ | 40B | 30GB | 96GB |
43
+ | 70B | 41 GB | 164 GB |
44
+ | 81B | 48GB | 192GB |
45
+ | 90B | 53GB | 212GB |
@@ -0,0 +1,25 @@
1
+ # RL Reward Hacking
2
+
3
+ The ultimate goal of RL is to maximize some reward (say speed, revenue, some metric). But RL can **cheat.** When the RL algorithm learns a trick or exploits something to increase the reward, without actually doing the task at end, this is called "**Reward Hacking**".
4
+
5
+ It's the reason models learn to modify unit tests to pass coding challenges, and these are critical blockers for real world deployment. Some other good examples are from [Wikipedia](https://en.wikipedia.org/wiki/Reward_hacking).
6
+
7
+ <div align="center"><figure><img src="https://i.pinimg.com/originals/55/e0/1b/55e01b94a9c5546b61b59ae300811c83.gif" alt="" width="188"><figcaption></figcaption></figure></div>
8
+
9
+ **Can you counter reward hacking? Yes!** In our [free gpt-oss RL notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-GRPO.ipynb) we explore how to counter reward hacking in a code generation setting and showcase tangible solutions to common error modes. We saw the model edit the timing function, outsource to other libraries, cache the results, and outright cheat. After countering, the result is our model generates genuinely optimized matrix multiplication kernels, not clever cheats.
10
+
11
+ ## :trophy: Reward Hacking Overview
12
+
13
+ Some common examples of reward hacking during RL include:
14
+
15
+ #### Laziness
16
+
17
+ RL learns to use Numpy, Torch, other libraries, which calls optimized CUDA kernels. We can stop the RL algorithm from calling optimized code by inspecting if the generated code imports other non standard Python libraries.
18
+
19
+ #### Caching & Cheating
20
+
21
+ RL learns to cache the result of the output and RL learns to find the actual output by inspecting Python global variables.
22
+
23
+ We can stop the RL algorithm from using cached data by wiping the cache with a large fake matrix. We also have to benchmark carefully with multiple loops and turns.
24
+
25
+ #### Cheating
@@ -0,0 +1,138 @@
1
+ # Saving to GGUF
2
+
3
+ ## Locally
4
+
5
+ To save to GGUF, use the below to save locally:
6
+
7
+ ```python
8
+ model.save_pretrained_gguf("directory", tokenizer, quantization_method = "q4_k_m")
9
+ model.save_pretrained_gguf("directory", tokenizer, quantization_method = "q8_0")
10
+ model.save_pretrained_gguf("directory", tokenizer, quantization_method = "f16")
11
+ ```
12
+
13
+ To push to Hugging Face hub:
14
+
15
+ ```python
16
+ model.push_to_hub_gguf("hf_username/directory", tokenizer, quantization_method = "q4_k_m")
17
+ model.push_to_hub_gguf("hf_username/directory", tokenizer, quantization_method = "q8_0")
18
+ ```
19
+
20
+ All supported quantization options for `quantization_method` are listed below:
21
+
22
+ ```python
23
+ # https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp#L19
24
+ # From https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html
25
+ ALLOWED_QUANTS = \
26
+ {
27
+ "not_quantized" : "Recommended. Fast conversion. Slow inference, big files.",
28
+ "fast_quantized" : "Recommended. Fast conversion. OK inference, OK file size.",
29
+ "quantized" : "Recommended. Slow conversion. Fast inference, small files.",
30
+ "f32" : "Not recommended. Retains 100% accuracy, but super slow and memory hungry.",
31
+ "f16" : "Fastest conversion + retains 100% accuracy. Slow and memory hungry.",
32
+ "q8_0" : "Fast conversion. High resource use, but generally acceptable.",
33
+ "q4_k_m" : "Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K",
34
+ "q5_k_m" : "Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K",
35
+ "q2_k" : "Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.",
36
+ "q3_k_l" : "Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K",
37
+ "q3_k_m" : "Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K",
38
+ "q3_k_s" : "Uses Q3_K for all tensors",
39
+ "q4_0" : "Original quant method, 4-bit.",
40
+ "q4_1" : "Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.",
41
+ "q4_k_s" : "Uses Q4_K for all tensors",
42
+ "q4_k" : "alias for q4_k_m",
43
+ "q5_k" : "alias for q5_k_m",
44
+ "q5_0" : "Higher accuracy, higher resource usage and slower inference.",
45
+ "q5_1" : "Even higher accuracy, resource usage and slower inference.",
46
+ "q5_k_s" : "Uses Q5_K for all tensors",
47
+ "q6_k" : "Uses Q8_K for all tensors",
48
+ "iq2_xxs" : "2.06 bpw quantization",
49
+ "iq2_xs" : "2.31 bpw quantization",
50
+ "iq3_xxs" : "3.06 bpw quantization",
51
+ "q3_k_xs" : "3-bit extra small quantization",
52
+ }
53
+ ```
54
+
55
+ ## Manual Saving
56
+
57
+ First save your model to 16bit:
58
+
59
+ ```python
60
+ model.save_pretrained_merged("merged_model", tokenizer, save_method = "merged_16bit",)
61
+ ```
62
+
63
+ Then use the terminal and do:
64
+
65
+ ```bash
66
+ apt-get update
67
+ apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
68
+ git clone https://github.com/ggml-org/llama.cpp
69
+ cmake llama.cpp -B llama.cpp/build \
70
+ -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
71
+ cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
72
+ cp llama.cpp/build/bin/llama-* llama.cpp
73
+
74
+ python llama.cpp/convert-hf-to-gguf.py FOLDER --outfile OUTPUT --outtype f16
75
+ ```
76
+
77
+ Or follow the steps at <https://rentry.org/llama-cpp-conversions#merging-loras-into-a-model> using the model name "merged_model" to merge to GGUF.
78
+
79
+ ### Running in Unsloth works well, but after exporting & running on other platforms, the results are poor
80
+
81
+ You might sometimes encounter an issue where your model runs and produces good results on Unsloth, but when you use it on another platform like Ollama or vLLM, the results are poor or you might get gibberish, endless/infinite generations or repeated outputs.
82
+
83
+ * The most common cause of this error is using an **incorrect chat template**. It's essential to use the SAME chat template that was used when training the model in Unsloth and later when you run it in another framework, such as llama.cpp or Ollama. When inferencing from a saved model, it's crucial to apply the correct template.
84
+ * You must use the correct `eos token`. If not, you might get gibberish on longer generations.
85
+ * It might also be because your inference engine adds an unnecessary "start of sequence" token (or the lack of thereof on the contrary) so ensure you check both hypotheses!
86
+ * **Use our conversational notebooks to force the chat template - this will fix most issues.**
87
+ * Qwen-3 14B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(14B)-Reasoning-Conversational.ipynb)
88
+ * Gemma-3 4B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb)
89
+ * Llama-3.2 3B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
90
+ * Phi-4 14B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)
91
+ * Mistral v0.3 7B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)
92
+ * **More notebooks in our [notebooks docs](https://unsloth.ai/docs/get-started/unsloth-notebooks)**
93
+
94
+ ### Saving to GGUF / vLLM 16bit crashes
95
+
96
+ You can try reducing the maximum GPU usage during saving by changing `maximum_memory_usage`.
97
+
98
+ The default is `model.save_pretrained(..., maximum_memory_usage = 0.75)`. Reduce it to say 0.5 to use 50% of GPU peak memory or lower. This can reduce OOM crashes during saving.
99
+
100
+ ### How do I manually save to GGUF?
101
+
102
+ First save your model to 16bit via:
103
+
104
+ ```python
105
+ model.save_pretrained_merged("merged_model", tokenizer, save_method = "merged_16bit",)
106
+ ```
107
+
108
+ Compile llama.cpp from source like below:
109
+
110
+ ```bash
111
+ apt-get update
112
+ apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
113
+ git clone https://github.com/ggml-org/llama.cpp
114
+ cmake llama.cpp -B llama.cpp/build \
115
+ -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
116
+ cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
117
+ cp llama.cpp/build/bin/llama-* llama.cpp
118
+ ```
119
+
120
+ Then, save the model to F16:
121
+
122
+ ```bash
123
+ python llama.cpp/convert_hf_to_gguf.py merged_model \
124
+ --outfile model-F16.gguf --outtype f16 \
125
+ --split-max-size 50G
126
+ ```
127
+
128
+ ```bash
129
+ # For BF16:
130
+ python llama.cpp/convert_hf_to_gguf.py merged_model \
131
+ --outfile model-BF16.gguf --outtype bf16 \
132
+ --split-max-size 50G
133
+
134
+ # For Q8_0:
135
+ python llama.cpp/convert_hf_to_gguf.py merged_model \
136
+ --outfile model-Q8_0.gguf --outtype q8_0 \
137
+ --split-max-size 50G
138
+ ```
@@ -0,0 +1,46 @@
1
+ # Saving to Ollama
2
+
3
+ See our [Tutorial: How to Finetune Llama-3 and Use in Ollama](tutorial-llama3-ollama.md) for the complete process on how to save to [Ollama](https://github.com/ollama/ollama).
4
+
5
+ ### Saving on Google Colab
6
+
7
+ You can save the finetuned model as a small 100MB file called a LoRA adapter. You can instead push to the Hugging Face hub as well if you want to upload your model! Remember to get a Hugging Face token via: <https://huggingface.co/settings/tokens> and add your token!
8
+
9
+ After saving the model, we can again use Unsloth to run the model itself! Use `FastLanguageModel` again to call it for inference!
10
+
11
+ ### Exporting to Ollama
12
+
13
+ Finally we can export our finetuned model to Ollama itself! First we have to install Ollama in the Colab notebook.
14
+
15
+ Then we export the finetuned model we have to llama.cpp's GGUF formats.
16
+
17
+ Reminder to convert `False` to `True` for 1 row, and not change every row to `True`, or else you'll be waiting for a very long time! We normally suggest the first row getting set to `True`, so we can export the finetuned model quickly to `Q8_0` format (8 bit quantization). We also allow you to export to a whole list of quantization methods as well, with a popular one being `q4_k_m`.
18
+
19
+ Head over to <https://github.com/ggerganov/llama.cpp> to learn more about GGUF. We also have some manual instructions of how to export to GGUF if you want here: <https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf>
20
+
21
+ You will see a long list of text - please wait 5 to 10 minutes!
22
+
23
+ ### Automatic `Modelfile` creation
24
+
25
+ The trick Unsloth provides is we automatically create a `Modelfile` which Ollama requires! This is just a list of settings and includes the chat template which we used for the finetune process! You can also print the `Modelfile` generated.
26
+
27
+ We then ask Ollama to create a model which is Ollama compatible, by using the `Modelfile`.
28
+
29
+ ### Ollama Inference
30
+
31
+ And we can now call the model for inference if you want to call the Ollama server itself which is running on your own local machine / in the free Colab notebook in the background.
32
+
33
+ ### Running in Unsloth works well, but after exporting & running on Ollama, the results are poor
34
+
35
+ You might sometimes encounter an issue where your model runs and produces good results on Unsloth, but when you use it on another platform like Ollama, the results are poor or you might get gibberish, endless/infinite generations or repeated outputs.
36
+
37
+ * The most common cause of this error is using an **incorrect chat template**. It's essential to use the SAME chat template that was used when training the model in Unsloth and later when you run it in another framework, such as llama.cpp or Ollama. When inferencing from a saved model, it's crucial to apply the correct template.
38
+ * You must use the correct `eos token`. If not, you might get gibberish on longer generations.
39
+ * It might also be because your inference engine adds an unnecessary "start of sequence" token (or the lack of thereof on the contrary) so ensure you check both hypotheses!
40
+ * **Use our conversational notebooks to force the chat template - this will fix most issues.**
41
+ * Qwen-3 14B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(14B)-Reasoning-Conversational.ipynb)
42
+ * Gemma-3 4B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb)
43
+ * Llama-3.2 3B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
44
+ * Phi-4 14B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)
45
+ * Mistral v0.3 7B Conversational notebook [Open in Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)
46
+ * **More notebooks in our [notebooks docs](https://unsloth.ai/docs/get-started/unsloth-notebooks)**
@@ -0,0 +1,278 @@
1
+ # SGLang Deployment & Inference Guide
2
+
3
+ You can serve any LLM or fine-tuned model via [SGLang](https://github.com/sgl-project/sglang) for low-latency, high-throughput inference. SGLang supports text, image/video model inference on any GPU setup, with support for some GGUFs.
4
+
5
+ ### Installing SGLang
6
+
7
+ To install SGLang and Unsloth on NVIDIA GPUs, you can use the below in a virtual environment (which won't break your other Python libraries)
8
+
9
+ ```bash
10
+ # OPTIONAL use a virtual environment
11
+ python -m venv unsloth_env
12
+ source unsloth_env/bin/activate
13
+
14
+ # Install Rust, outlines-core then SGLang
15
+ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
16
+ source $HOME/.cargo/env && sudo apt-get install -y pkg-config libssl-dev
17
+ pip install --upgrade pip && pip install uv
18
+ uv pip install "sglang" && uv pip install unsloth
19
+ ```
20
+
21
+ For **Docker** setups run:
22
+
23
+ ```bash
24
+ docker run --gpus all \
25
+ --shm-size 32g \
26
+ -p 30000:30000 \
27
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
28
+ --env "HF_TOKEN=<secret>" \
29
+ --ipc=host \
30
+ lmsysorg/sglang:latest \
31
+ python3 -m sglang.launch_server --model-path unsloth/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
32
+ ```
33
+
34
+ ### Debugging SGLang Installation issues
35
+
36
+ Note if you see the below, update Rust and outlines-core as specified above:
37
+
38
+ ```
39
+ hint: This usually indicates a problem with the package or the build environment.
40
+ help: `outlines-core` (v0.1.26) was included because `sglang` (v0.5.5.post2) depends on `outlines` (v0.1.11) which depends on `outlines-core`
41
+ ```
42
+
43
+ If you see a Flashinfer issue like below:
44
+
45
+ ```
46
+ /home/daniel/.cache/flashinfer/...batch_prefill_ragged_kernel_mask_1.cu:1:10: fatal error: flashinfer/attention/prefill.cuh: No such file or directory
47
+ ```
48
+
49
+ Remove the flashinfer cache via `rm -rf .cache/flashinfer` and also `rm -rf ~/.cache/flashinfer`
50
+
51
+ ### Deploying SGLang models
52
+
53
+ To deploy any model like for example [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct), do the below in a separate terminal:
54
+
55
+ ```bash
56
+ python3 -m sglang.launch_server \
57
+ --model-path unsloth/Llama-3.2-1B-Instruct \
58
+ --host 0.0.0.0 --port 30000
59
+ ```
60
+
61
+ You can then use the OpenAI Chat completions library to call the model (in another terminal or using tmux):
62
+
63
+ ```python
64
+ # Install openai via pip install openai
65
+ from openai import OpenAI
66
+ import json
67
+ openai_client = OpenAI(
68
+ base_url = "http://0.0.0.0:30000/v1",
69
+ api_key = "sk-no-key-required",
70
+ )
71
+ completion = openai_client.chat.completions.create(
72
+ model = "unsloth/Llama-3.2-1B-Instruct",
73
+ messages = [{"role": "user", "content": "What is 2+2?"},],
74
+ )
75
+ print(completion.choices[0].message.content)
76
+ ```
77
+
78
+ And you will get `2 + 2 = 4.`
79
+
80
+ ### Deploying Unsloth finetunes in SGLang
81
+
82
+ After fine-tuning or using our notebooks, you can save or deploy your models directly through SGLang within a single workflow. An example Unsloth finetuning script:
83
+
84
+ ```python
85
+ from unsloth import FastLanguageModel
86
+ import torch
87
+ model, tokenizer = FastLanguageModel.from_pretrained(
88
+ model_name = "unsloth/gpt-oss-20b",
89
+ max_seq_length = 2048,
90
+ load_in_4bit = True,
91
+ )
92
+ model = FastLanguageModel.get_peft_model(model)
93
+ ```
94
+
95
+ **To save to 16-bit for SGLang, use:**
96
+
97
+ ```python
98
+ model.save_pretrained_merged("finetuned_model", tokenizer, save_method = "merged_16bit")
99
+ ## OR to upload to HuggingFace:
100
+ model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")
101
+ ```
102
+
103
+ **To save just the LoRA adapters**, either use:
104
+
105
+ ```python
106
+ model.save_pretrained("finetuned_model")
107
+ tokenizer.save_pretrained("finetuned_model")
108
+ ```
109
+
110
+ Or just use our builtin function to do that:
111
+
112
+ ```python
113
+ model.save_pretrained_merged("model", tokenizer, save_method = "lora")
114
+ ## OR to upload to HuggingFace
115
+ model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")
116
+ ```
117
+
118
+ ### gpt-oss-20b: Unsloth & SGLang Deployment Guide
119
+
120
+ Below is a step-by-step tutorial with instructions for training the gpt-oss-20b using Unsloth and deploying it with SGLang. It includes performance benchmarks across multiple quantization formats.
121
+
122
+ #### Step 1: Unsloth Fine-tuning and Exporting Formats
123
+
124
+ After training, you can export the model in multiple formats:
125
+
126
+ ```python
127
+ model.save_pretrained_merged(
128
+ "finetuned_model",
129
+ tokenizer,
130
+ save_method = "merged_16bit",
131
+ )
132
+ ## For gpt-oss specific mxfp4 conversions:
133
+ model.save_pretrained_merged(
134
+ "finetuned_model",
135
+ tokenizer,
136
+ save_method = "mxfp4", # (ONLY FOR gpt-oss otherwise choose "merged_16bit")
137
+ )
138
+ ```
139
+
140
+ #### Step 2: Deployment with SGLang
141
+
142
+ We saved our gpt-oss finetune to the folder "finetuned_model", and so in a new terminal, we can launch the finetuned model as an inference endpoint with SGLang:
143
+
144
+ ```bash
145
+ python -m sglang.launch_server \
146
+ --model-path finetuned_model \
147
+ --host 0.0.0.0 --port 30002
148
+ ```
149
+
150
+ You might have to wait a bit on `Capturing batches (bs=1 avail_mem=20.84 GB):` !
151
+
152
+ #### Step 3: Calling the inference endpoint
153
+
154
+ To call the inference endpoint, first launch a new terminal. We then can call the model like below:
155
+
156
+ ```python
157
+ from openai import OpenAI
158
+ import json
159
+ openai_client = OpenAI(
160
+ base_url = "http://0.0.0.0:30002/v1",
161
+ api_key = "sk-no-key-required",
162
+ )
163
+ completion = openai_client.chat.completions.create(
164
+ model = "finetuned_model",
165
+ messages = [{"role": "user", "content": "What is 2+2?"},],
166
+ )
167
+ print(completion.choices[0].message.content)
168
+
169
+ ## OUTPUT ##
170
+ # 2 + 2 equals 4.
171
+ ```
172
+
173
+ ### FP8 Online Quantization
174
+
175
+ To deploy models with FP8 online quantization which allows 30 to 50% more throughput and 50% less memory usage with 2x longer context length supports with SGLang:
176
+
177
+ ```bash
178
+ python -m sglang.launch_server \
179
+ --model-path unsloth/Llama-3.2-1B-Instruct \
180
+ --host 0.0.0.0 --port 30002 \
181
+ --quantization fp8 \
182
+ --kv-cache-dtype fp8_e4m3
183
+ ```
184
+
185
+ You can also use `--kv-cache-dtype fp8_e5m2` which has a larger dynamic range which might solve FP8 inference issues if you see them. Or use our pre-quantized float8 quants listed in <https://huggingface.co/unsloth/models?search=-fp8>
186
+
187
+ ### Benchmarking SGLang
188
+
189
+ Below is some code you can run to test the performance speed of your finetuned model:
190
+
191
+ ```bash
192
+ python -m sglang.launch_server \
193
+ --model-path finetuned_model \
194
+ --host 0.0.0.0 --port 30002
195
+ ```
196
+
197
+ Then in another terminal or via tmux:
198
+
199
+ ```bash
200
+ # Batch Size=8, Input=1024, Output=1024
201
+ python -m sglang.bench_one_batch_server \
202
+ --model finetuned_model \
203
+ --base-url http://0.0.0.0:30002 \
204
+ --batch-size 8 \
205
+ --input-len 1024 \
206
+ --output-len 1024
207
+ ```
208
+
209
+ We used a B200x1 GPU with gpt-oss-20b and got the below results (~2,500 tokens throughput)
210
+
211
+ | Batch/Input/Output | TTFT (s) | ITL (s) | Input Throughput | Output Throughput |
212
+ | --- | --- | --- | --- | --- |
213
+ | 8/1024/1024 | 0.40 | 3.59 | 20,718.95 | 2,562.87 |
214
+ | 8/8192/1024 | 0.42 | 3.74 | 154,459.01 | 2,473.84 |
215
+
216
+ See <https://docs.sglang.ai/advanced_features/server_arguments.html> for server arguments for SGLang.
217
+
218
+ ### SGLang Interactive Offline Mode
219
+
220
+ You can also use SGLang in offline mode (ie not a server) inside a Python interactive environment.
221
+
222
+ ```python
223
+ import sglang as sgl
224
+ engine = sgl.Engine(model_path = "unsloth/Qwen3-0.6B", random_seed = 42)
225
+
226
+ prompt = "Today is a sunny day and I like"
227
+ sampling_params = {"temperature": 0, "max_new_tokens": 256}
228
+ outputs = engine.generate(prompt, sampling_params)["text"]
229
+ print(outputs)
230
+ engine.shutdown()
231
+ ```
232
+
233
+ ### GGUFs in SGLang
234
+
235
+ SGLang also interestingly supports GGUFs! **Qwen3 MoE is still under construction, but most dense models (Llama 3, Qwen 3, Mistral etc) are supported.**
236
+
237
+ First install the latest gguf python package via:
238
+
239
+ ```bash
240
+ pip install -e "git+https://github.com/ggml-org/llama.cpp.git#egg=gguf&subdirectory=gguf-py"
241
+ ```
242
+
243
+ Then for example in offline mode SGLang, you can do:
244
+
245
+ ```python
246
+ from huggingface_hub import hf_hub_download
247
+ model_path = hf_hub_download(
248
+ "unsloth/Qwen3-32B-GGUF",
249
+ filename = "Qwen3-32B-UD-Q4_K_XL.gguf",
250
+ )
251
+ import sglang as sgl
252
+ engine = sgl.Engine(model_path = model_path, random_seed = 42)
253
+
254
+ prompt = "Today is a sunny day and I like"
255
+ sampling_params = {"temperature": 0, "max_new_tokens": 256}
256
+ outputs = engine.generate(prompt, sampling_params)["text"]
257
+ print(outputs)
258
+ engine.shutdown()
259
+ ```
260
+
261
+ ### High throughput GGUF serving with SGLang
262
+
263
+ First download the specific GGUF file like below:
264
+
265
+ ```python
266
+ from huggingface_hub import hf_hub_download
267
+ hf_hub_download("unsloth/Qwen3-32B-GGUF", filename="Qwen3-32B-UD-Q4_K_XL.gguf", local_dir=".")
268
+ ```
269
+
270
+ Then serve the specific file `Qwen3-32B-UD-Q4_K_XL.gguf` and use `--served-model-name unsloth/Qwen3-32B` and also we need the HuggingFace compatible tokenizer via `--tokenizer-path`
271
+
272
+ ```bash
273
+ python -m sglang.launch_server \
274
+ --model-path Qwen3-32B-UD-Q4_K_XL.gguf \
275
+ --host 0.0.0.0 --port 30002 \
276
+ --served-model-name unsloth/Qwen3-32B \
277
+ --tokenizer-path unsloth/Qwen3-32B
278
+ ```
@@ -0,0 +1,70 @@
1
+ # Speculative Decoding
2
+
3
+ ## Speculative Decoding in llama.cpp, llama-server
4
+
5
+ Speculative decoding in llama.cpp can be easily enabled via `llama-cli` and `llama-server` via the `--model-draft` argument. Note you must have a draft model, which generally is a smaller model, but it must have the same tokenizer.
6
+
7
+ ### Spec Decoding for GLM 4.7
8
+
9
+ ```python
10
+ # !pip install huggingface_hub hf_transfer
11
+ import os
12
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "0" # Can sometimes rate limit, so set to 0 to disable
13
+ from huggingface_hub import snapshot_download
14
+ snapshot_download(
15
+ repo_id = "unsloth/GLM-4.7-GGUF",
16
+ local_dir = "unsloth/GLM-4.7-GGUF",
17
+ allow_patterns = ["*UD-Q2_K_XL*"], # Dynamic 2bit Use "*UD-TQ1_0*" for Dynamic 1bit
18
+ )
19
+ snapshot_download(
20
+ repo_id = "unsloth/GLM-4.5-Air-GGUF",
21
+ local_dir = "unsloth/GLM-4.5-Air-GGUF",
22
+ allow_patterns = ["*UD-Q4_K_XL*"], # Dynamic 4bit. Use "*UD-TQ1_0*" for Dynamic 1bit
23
+ )
24
+ ```
25
+
26
+ ```bash
27
+ ./llama.cpp/llama-cli \
28
+ --model unsloth/GLM-4.7-GGUF/UD-Q2_K_XL/GLM-4.7-UD-Q2_K_XL-00001-of-00003.gguf \
29
+ --threads -1 \
30
+ --fit on \
31
+ --prio 3 \
32
+ --temp 1.0 \
33
+ --top-p 0.95 \
34
+ --ctx-size 16384 \
35
+ --jinja
36
+ ```
37
+
38
+ With speculative decoding using a draft model:
39
+
40
+ ```bash
41
+ ./llama.cpp/llama-cli \
42
+ --model unsloth/GLM-4.7-GGUF/UD-Q2_K_XL/GLM-4.7-UD-Q2_K_XL-00001-of-00003.gguf \
43
+ --model-draft unsloth/GLM-4.5-Air-GGUF/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf \
44
+ --threads -1 \
45
+ --fit on \
46
+ --prio 3 \
47
+ --temp 1.0 \
48
+ --top-p 0.95 \
49
+ --ctx-size 16384 \
50
+ --ctx-size-draft 16384 \
51
+ --jinja \
52
+ --device CUDA0 \
53
+ --device-draft CUDA0,CUDA1
54
+ ```
55
+
56
+ Using llama-server:
57
+
58
+ ```bash
59
+ ./llama.cpp/llama-server \
60
+ --model unsloth/GLM-4.7-GGUF/UD-Q2_K_XL/GLM-4.7-UD-Q2_K_XL-00001-of-00003.gguf \
61
+ --alias "unsloth/GLM-4.7" \
62
+ --threads -1 \
63
+ --fit on \
64
+ --prio 3 \
65
+ --temp 1.0 \
66
+ --top-p 0.95 \
67
+ --ctx-size 16384 \
68
+ --port 8001 \
69
+ --jinja
70
+ ```