@synsci/cli-darwin-x64-baseline 1.1.71 → 1.1.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (339) hide show
  1. package/bin/skills/citation-management/SKILL.md +1109 -0
  2. package/bin/skills/citation-management/assets/bibtex_template.bib +264 -0
  3. package/bin/skills/citation-management/assets/citation_checklist.md +386 -0
  4. package/bin/skills/citation-management/references/bibtex_formatting.md +908 -0
  5. package/bin/skills/citation-management/references/citation_validation.md +794 -0
  6. package/bin/skills/citation-management/references/google_scholar_search.md +725 -0
  7. package/bin/skills/citation-management/references/metadata_extraction.md +870 -0
  8. package/bin/skills/citation-management/references/pubmed_search.md +839 -0
  9. package/bin/skills/citation-management/scripts/doi_to_bibtex.py +182 -0
  10. package/bin/skills/citation-management/scripts/extract_metadata.py +570 -0
  11. package/bin/skills/citation-management/scripts/format_bibtex.py +349 -0
  12. package/bin/skills/citation-management/scripts/search_google_scholar.py +251 -0
  13. package/bin/skills/citation-management/scripts/search_pubmed.py +348 -0
  14. package/bin/skills/citation-management/scripts/validate_citations.py +494 -0
  15. package/bin/skills/clinical-decision-support/README.md +129 -0
  16. package/bin/skills/clinical-decision-support/SKILL.md +506 -0
  17. package/bin/skills/clinical-decision-support/assets/biomarker_report_template.tex +380 -0
  18. package/bin/skills/clinical-decision-support/assets/clinical_pathway_template.tex +222 -0
  19. package/bin/skills/clinical-decision-support/assets/cohort_analysis_template.tex +359 -0
  20. package/bin/skills/clinical-decision-support/assets/color_schemes.tex +149 -0
  21. package/bin/skills/clinical-decision-support/assets/example_gbm_cohort.md +208 -0
  22. package/bin/skills/clinical-decision-support/assets/recommendation_strength_guide.md +328 -0
  23. package/bin/skills/clinical-decision-support/assets/treatment_recommendation_template.tex +529 -0
  24. package/bin/skills/clinical-decision-support/references/biomarker_classification.md +719 -0
  25. package/bin/skills/clinical-decision-support/references/clinical_decision_algorithms.md +604 -0
  26. package/bin/skills/clinical-decision-support/references/evidence_synthesis.md +840 -0
  27. package/bin/skills/clinical-decision-support/references/outcome_analysis.md +640 -0
  28. package/bin/skills/clinical-decision-support/references/patient_cohort_analysis.md +427 -0
  29. package/bin/skills/clinical-decision-support/references/treatment_recommendations.md +521 -0
  30. package/bin/skills/clinical-decision-support/scripts/biomarker_classifier.py +383 -0
  31. package/bin/skills/clinical-decision-support/scripts/build_decision_tree.py +417 -0
  32. package/bin/skills/clinical-decision-support/scripts/create_cohort_tables.py +509 -0
  33. package/bin/skills/clinical-decision-support/scripts/generate_survival_analysis.py +441 -0
  34. package/bin/skills/clinical-decision-support/scripts/validate_cds_document.py +326 -0
  35. package/bin/skills/clinical-reports/IMPLEMENTATION_SUMMARY.md +641 -0
  36. package/bin/skills/clinical-reports/README.md +236 -0
  37. package/bin/skills/clinical-reports/SKILL.md +1127 -0
  38. package/bin/skills/clinical-reports/assets/case_report_template.md +352 -0
  39. package/bin/skills/clinical-reports/assets/clinical_trial_csr_template.md +353 -0
  40. package/bin/skills/clinical-reports/assets/clinical_trial_sae_template.md +359 -0
  41. package/bin/skills/clinical-reports/assets/consult_note_template.md +305 -0
  42. package/bin/skills/clinical-reports/assets/discharge_summary_template.md +453 -0
  43. package/bin/skills/clinical-reports/assets/hipaa_compliance_checklist.md +395 -0
  44. package/bin/skills/clinical-reports/assets/history_physical_template.md +305 -0
  45. package/bin/skills/clinical-reports/assets/lab_report_template.md +309 -0
  46. package/bin/skills/clinical-reports/assets/pathology_report_template.md +249 -0
  47. package/bin/skills/clinical-reports/assets/quality_checklist.md +338 -0
  48. package/bin/skills/clinical-reports/assets/radiology_report_template.md +318 -0
  49. package/bin/skills/clinical-reports/assets/soap_note_template.md +253 -0
  50. package/bin/skills/clinical-reports/references/case_report_guidelines.md +570 -0
  51. package/bin/skills/clinical-reports/references/clinical_trial_reporting.md +693 -0
  52. package/bin/skills/clinical-reports/references/data_presentation.md +530 -0
  53. package/bin/skills/clinical-reports/references/diagnostic_reports_standards.md +629 -0
  54. package/bin/skills/clinical-reports/references/medical_terminology.md +588 -0
  55. package/bin/skills/clinical-reports/references/patient_documentation.md +744 -0
  56. package/bin/skills/clinical-reports/references/peer_review_standards.md +585 -0
  57. package/bin/skills/clinical-reports/references/regulatory_compliance.md +577 -0
  58. package/bin/skills/clinical-reports/scripts/check_deidentification.py +332 -0
  59. package/bin/skills/clinical-reports/scripts/compliance_checker.py +78 -0
  60. package/bin/skills/clinical-reports/scripts/extract_clinical_data.py +97 -0
  61. package/bin/skills/clinical-reports/scripts/format_adverse_events.py +97 -0
  62. package/bin/skills/clinical-reports/scripts/generate_report_template.py +149 -0
  63. package/bin/skills/clinical-reports/scripts/terminology_validator.py +126 -0
  64. package/bin/skills/clinical-reports/scripts/validate_case_report.py +323 -0
  65. package/bin/skills/clinical-reports/scripts/validate_trial_report.py +88 -0
  66. package/bin/skills/fireworks-ai/SKILL.md +665 -0
  67. package/bin/skills/generate-image/SKILL.md +178 -0
  68. package/bin/skills/generate-image/scripts/generate_image.py +254 -0
  69. package/bin/skills/groq/SKILL.md +347 -0
  70. package/bin/skills/hypothesis-generation/SKILL.md +293 -0
  71. package/bin/skills/hypothesis-generation/assets/FORMATTING_GUIDE.md +672 -0
  72. package/bin/skills/hypothesis-generation/assets/hypothesis_generation.sty +307 -0
  73. package/bin/skills/hypothesis-generation/assets/hypothesis_report_template.tex +572 -0
  74. package/bin/skills/hypothesis-generation/references/experimental_design_patterns.md +329 -0
  75. package/bin/skills/hypothesis-generation/references/hypothesis_quality_criteria.md +198 -0
  76. package/bin/skills/hypothesis-generation/references/literature_search_strategies.md +622 -0
  77. package/bin/skills/latex-posters/README.md +417 -0
  78. package/bin/skills/latex-posters/SKILL.md +1602 -0
  79. package/bin/skills/latex-posters/assets/baposter_template.tex +257 -0
  80. package/bin/skills/latex-posters/assets/beamerposter_template.tex +244 -0
  81. package/bin/skills/latex-posters/assets/poster_quality_checklist.md +358 -0
  82. package/bin/skills/latex-posters/assets/tikzposter_template.tex +251 -0
  83. package/bin/skills/latex-posters/references/latex_poster_packages.md +745 -0
  84. package/bin/skills/latex-posters/references/poster_content_guide.md +748 -0
  85. package/bin/skills/latex-posters/references/poster_design_principles.md +806 -0
  86. package/bin/skills/latex-posters/references/poster_layout_design.md +900 -0
  87. package/bin/skills/latex-posters/scripts/review_poster.sh +214 -0
  88. package/bin/skills/literature-review/SKILL.md +641 -0
  89. package/bin/skills/literature-review/assets/review_template.md +412 -0
  90. package/bin/skills/literature-review/references/citation_styles.md +166 -0
  91. package/bin/skills/literature-review/references/database_strategies.md +455 -0
  92. package/bin/skills/literature-review/scripts/generate_pdf.py +184 -0
  93. package/bin/skills/literature-review/scripts/search_databases.py +310 -0
  94. package/bin/skills/literature-review/scripts/verify_citations.py +218 -0
  95. package/bin/skills/market-research-reports/SKILL.md +904 -0
  96. package/bin/skills/market-research-reports/assets/FORMATTING_GUIDE.md +428 -0
  97. package/bin/skills/market-research-reports/assets/market_report_template.tex +1380 -0
  98. package/bin/skills/market-research-reports/assets/market_research.sty +564 -0
  99. package/bin/skills/market-research-reports/references/data_analysis_patterns.md +548 -0
  100. package/bin/skills/market-research-reports/references/report_structure_guide.md +999 -0
  101. package/bin/skills/market-research-reports/references/visual_generation_guide.md +1077 -0
  102. package/bin/skills/market-research-reports/scripts/generate_market_visuals.py +472 -0
  103. package/bin/skills/markitdown/INSTALLATION_GUIDE.md +318 -0
  104. package/bin/skills/markitdown/LICENSE.txt +22 -0
  105. package/bin/skills/markitdown/OPENROUTER_INTEGRATION.md +359 -0
  106. package/bin/skills/markitdown/QUICK_REFERENCE.md +309 -0
  107. package/bin/skills/markitdown/README.md +184 -0
  108. package/bin/skills/markitdown/SKILL.md +486 -0
  109. package/bin/skills/markitdown/SKILL_SUMMARY.md +307 -0
  110. package/bin/skills/markitdown/assets/example_usage.md +463 -0
  111. package/bin/skills/markitdown/references/api_reference.md +399 -0
  112. package/bin/skills/markitdown/references/file_formats.md +542 -0
  113. package/bin/skills/markitdown/scripts/batch_convert.py +195 -0
  114. package/bin/skills/markitdown/scripts/convert_literature.py +262 -0
  115. package/bin/skills/markitdown/scripts/convert_with_ai.py +224 -0
  116. package/bin/skills/ml-paper-writing/SKILL.md +937 -0
  117. package/bin/skills/ml-paper-writing/references/checklists.md +361 -0
  118. package/bin/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  119. package/bin/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  120. package/bin/skills/ml-paper-writing/references/sources.md +159 -0
  121. package/bin/skills/ml-paper-writing/references/writing-guide.md +476 -0
  122. package/bin/skills/ml-paper-writing/templates/README.md +251 -0
  123. package/bin/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  124. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  125. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  126. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  127. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  128. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  129. package/bin/skills/ml-paper-writing/templates/acl/README.md +50 -0
  130. package/bin/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  131. package/bin/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  132. package/bin/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  133. package/bin/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  134. package/bin/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  135. package/bin/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  136. package/bin/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  137. package/bin/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  138. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  139. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  140. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  141. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  142. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  143. package/bin/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  144. package/bin/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  145. package/bin/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  146. package/bin/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  147. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  148. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  149. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  150. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  151. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  152. package/bin/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  153. package/bin/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  154. package/bin/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  155. package/bin/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  156. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  157. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  158. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  159. package/bin/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  160. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  161. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  162. package/bin/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  163. package/bin/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  164. package/bin/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  165. package/bin/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  166. package/bin/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  167. package/bin/skills/paper-2-web/SKILL.md +491 -0
  168. package/bin/skills/paper-2-web/references/installation.md +141 -0
  169. package/bin/skills/paper-2-web/references/paper2poster.md +346 -0
  170. package/bin/skills/paper-2-web/references/paper2video.md +305 -0
  171. package/bin/skills/paper-2-web/references/paper2web.md +187 -0
  172. package/bin/skills/paper-2-web/references/usage_examples.md +436 -0
  173. package/bin/skills/peer-review/SKILL.md +702 -0
  174. package/bin/skills/peer-review/references/calibration_guidelines.md +196 -0
  175. package/bin/skills/peer-review/references/common_issues.md +552 -0
  176. package/bin/skills/peer-review/references/paper_mechanics.md +269 -0
  177. package/bin/skills/peer-review/references/reporting_standards.md +290 -0
  178. package/bin/skills/peer-review/references/scoring_rubric.md +239 -0
  179. package/bin/skills/pptx-posters/SKILL.md +410 -0
  180. package/bin/skills/pptx-posters/assets/poster_html_template.html +257 -0
  181. package/bin/skills/pptx-posters/assets/poster_quality_checklist.md +358 -0
  182. package/bin/skills/pptx-posters/references/poster_content_guide.md +748 -0
  183. package/bin/skills/pptx-posters/references/poster_design_principles.md +806 -0
  184. package/bin/skills/pptx-posters/references/poster_layout_design.md +900 -0
  185. package/bin/skills/research-grants/README.md +285 -0
  186. package/bin/skills/research-grants/SKILL.md +938 -0
  187. package/bin/skills/research-grants/assets/budget_justification_template.md +453 -0
  188. package/bin/skills/research-grants/assets/nih_specific_aims_template.md +166 -0
  189. package/bin/skills/research-grants/assets/nsf_project_summary_template.md +92 -0
  190. package/bin/skills/research-grants/references/broader_impacts.md +392 -0
  191. package/bin/skills/research-grants/references/darpa_guidelines.md +636 -0
  192. package/bin/skills/research-grants/references/doe_guidelines.md +586 -0
  193. package/bin/skills/research-grants/references/nih_guidelines.md +851 -0
  194. package/bin/skills/research-grants/references/nsf_guidelines.md +570 -0
  195. package/bin/skills/research-grants/references/specific_aims_guide.md +458 -0
  196. package/bin/skills/research-lookup/README.md +156 -0
  197. package/bin/skills/research-lookup/SKILL.md +606 -0
  198. package/bin/skills/research-lookup/examples.py +174 -0
  199. package/bin/skills/research-lookup/lookup.py +187 -0
  200. package/bin/skills/research-lookup/research_lookup.py +483 -0
  201. package/bin/skills/research-lookup/scripts/research_lookup.py +483 -0
  202. package/bin/skills/scholar-evaluation/SKILL.md +289 -0
  203. package/bin/skills/scholar-evaluation/references/evaluation_framework.md +663 -0
  204. package/bin/skills/scholar-evaluation/scripts/calculate_scores.py +366 -0
  205. package/bin/skills/scientific-critical-thinking/SKILL.md +566 -0
  206. package/bin/skills/scientific-critical-thinking/references/common_biases.md +364 -0
  207. package/bin/skills/scientific-critical-thinking/references/evidence_hierarchy.md +484 -0
  208. package/bin/skills/scientific-critical-thinking/references/experimental_design.md +496 -0
  209. package/bin/skills/scientific-critical-thinking/references/logical_fallacies.md +478 -0
  210. package/bin/skills/scientific-critical-thinking/references/scientific_method.md +169 -0
  211. package/bin/skills/scientific-critical-thinking/references/statistical_pitfalls.md +506 -0
  212. package/bin/skills/scientific-schematics/QUICK_REFERENCE.md +207 -0
  213. package/bin/skills/scientific-schematics/README.md +327 -0
  214. package/bin/skills/scientific-schematics/SKILL.md +615 -0
  215. package/bin/skills/scientific-schematics/example_usage.sh +89 -0
  216. package/bin/skills/scientific-schematics/references/best_practices.md +559 -0
  217. package/bin/skills/scientific-schematics/scripts/generate_schematic.py +135 -0
  218. package/bin/skills/scientific-schematics/scripts/generate_schematic_ai.py +807 -0
  219. package/bin/skills/scientific-schematics/test_ai_generation.py +243 -0
  220. package/bin/skills/scientific-slides/SKILL.md +942 -0
  221. package/bin/skills/scientific-slides/assets/timing_guidelines.md +597 -0
  222. package/bin/skills/scientific-slides/references/data_visualization_slides.md +708 -0
  223. package/bin/skills/scientific-slides/references/presentation_structure.md +642 -0
  224. package/bin/skills/scientific-slides/references/slide_design_principles.md +849 -0
  225. package/bin/skills/scientific-slides/references/talk_types_guide.md +687 -0
  226. package/bin/skills/scientific-slides/references/visual_review_workflow.md +775 -0
  227. package/bin/skills/scientific-slides/scripts/generate_slide_image.py +143 -0
  228. package/bin/skills/scientific-slides/scripts/generate_slide_image_ai.py +748 -0
  229. package/bin/skills/scientific-slides/scripts/pdf_to_images.py +201 -0
  230. package/bin/skills/scientific-slides/scripts/slides_to_pdf.py +220 -0
  231. package/bin/skills/scientific-slides/scripts/validate_presentation.py +367 -0
  232. package/bin/skills/scientific-writing/SKILL.md +714 -0
  233. package/bin/skills/scientific-writing/assets/REPORT_FORMATTING_GUIDE.md +574 -0
  234. package/bin/skills/scientific-writing/assets/scientific_report.sty +606 -0
  235. package/bin/skills/scientific-writing/assets/scientific_report_template.tex +449 -0
  236. package/bin/skills/scientific-writing/references/citation_styles.md +720 -0
  237. package/bin/skills/scientific-writing/references/figures_tables.md +806 -0
  238. package/bin/skills/scientific-writing/references/imrad_structure.md +686 -0
  239. package/bin/skills/scientific-writing/references/professional_report_formatting.md +664 -0
  240. package/bin/skills/scientific-writing/references/reporting_guidelines.md +748 -0
  241. package/bin/skills/scientific-writing/references/writing_principles.md +824 -0
  242. package/bin/skills/tinker/SKILL.md +2 -3
  243. package/bin/skills/together-ai/SKILL.md +722 -0
  244. package/bin/skills/treatment-plans/README.md +488 -0
  245. package/bin/skills/treatment-plans/SKILL.md +1579 -0
  246. package/bin/skills/treatment-plans/assets/STYLING_QUICK_REFERENCE.md +185 -0
  247. package/bin/skills/treatment-plans/assets/chronic_disease_management_plan.tex +665 -0
  248. package/bin/skills/treatment-plans/assets/general_medical_treatment_plan.tex +547 -0
  249. package/bin/skills/treatment-plans/assets/medical_treatment_plan.sty +222 -0
  250. package/bin/skills/treatment-plans/assets/mental_health_treatment_plan.tex +774 -0
  251. package/bin/skills/treatment-plans/assets/one_page_treatment_plan.tex +193 -0
  252. package/bin/skills/treatment-plans/assets/pain_management_plan.tex +799 -0
  253. package/bin/skills/treatment-plans/assets/perioperative_care_plan.tex +753 -0
  254. package/bin/skills/treatment-plans/assets/quality_checklist.md +471 -0
  255. package/bin/skills/treatment-plans/assets/rehabilitation_treatment_plan.tex +756 -0
  256. package/bin/skills/treatment-plans/references/goal_setting_frameworks.md +411 -0
  257. package/bin/skills/treatment-plans/references/intervention_guidelines.md +507 -0
  258. package/bin/skills/treatment-plans/references/regulatory_compliance.md +476 -0
  259. package/bin/skills/treatment-plans/references/specialty_specific_guidelines.md +655 -0
  260. package/bin/skills/treatment-plans/references/treatment_plan_standards.md +485 -0
  261. package/bin/skills/treatment-plans/scripts/check_completeness.py +322 -0
  262. package/bin/skills/treatment-plans/scripts/generate_template.py +233 -0
  263. package/bin/skills/treatment-plans/scripts/timeline_generator.py +385 -0
  264. package/bin/skills/treatment-plans/scripts/validate_treatment_plan.py +369 -0
  265. package/bin/skills/unsloth/SKILL.md +565 -47
  266. package/bin/skills/unsloth/docs/advanced-rl.md +222 -0
  267. package/bin/skills/unsloth/docs/chat-templates.md +141 -0
  268. package/bin/skills/unsloth/docs/datasets.md +489 -0
  269. package/bin/skills/unsloth/docs/docker-extended.md +99 -0
  270. package/bin/skills/unsloth/docs/dynamic-ggufs-2.0.md +116 -0
  271. package/bin/skills/unsloth/docs/dynamic-ggufs-aider.md +118 -0
  272. package/bin/skills/unsloth/docs/faq.md +91 -0
  273. package/bin/skills/unsloth/docs/fp16-vs-bf16.md +61 -0
  274. package/bin/skills/unsloth/docs/fp8-rl.md +224 -0
  275. package/bin/skills/unsloth/docs/glm-4.7-flash.md +997 -0
  276. package/bin/skills/unsloth/docs/inference-deployment-overview.md +17 -0
  277. package/bin/skills/unsloth/docs/inference.md +27 -0
  278. package/bin/skills/unsloth/docs/installation-docker.md +155 -0
  279. package/bin/skills/unsloth/docs/installation-pip.md +148 -0
  280. package/bin/skills/unsloth/docs/kernels-packing.md +190 -0
  281. package/bin/skills/unsloth/docs/kimi-k2.5.md +634 -0
  282. package/bin/skills/unsloth/docs/lm-studio.md +235 -0
  283. package/bin/skills/unsloth/docs/lora-hot-swapping.md +75 -0
  284. package/bin/skills/unsloth/docs/lora-hyperparameters.md +363 -0
  285. package/bin/skills/unsloth/docs/memory-efficient-rl.md +267 -0
  286. package/bin/skills/unsloth/docs/model-selection.md +70 -0
  287. package/bin/skills/unsloth/docs/models.md +532 -0
  288. package/bin/skills/unsloth/docs/multi-gpu-ddp.md +90 -0
  289. package/bin/skills/unsloth/docs/notebooks.md +223 -0
  290. package/bin/skills/unsloth/docs/overview.md +110 -0
  291. package/bin/skills/unsloth/docs/qwen3-coder-next-extended.md +900 -0
  292. package/bin/skills/unsloth/docs/qwen3-coder-next.md +900 -0
  293. package/bin/skills/unsloth/docs/requirements.md +45 -0
  294. package/bin/skills/unsloth/docs/reward-hacking.md +25 -0
  295. package/bin/skills/unsloth/docs/saving-to-gguf.md +138 -0
  296. package/bin/skills/unsloth/docs/saving-to-ollama.md +46 -0
  297. package/bin/skills/unsloth/docs/sglang-guide.md +278 -0
  298. package/bin/skills/unsloth/docs/speculative-decoding.md +70 -0
  299. package/bin/skills/unsloth/docs/tool-calling.md +334 -0
  300. package/bin/skills/unsloth/docs/troubleshooting-faq.md +204 -0
  301. package/bin/skills/unsloth/docs/troubleshooting-inference.md +26 -0
  302. package/bin/skills/unsloth/docs/tts-fine-tuning.md +149 -0
  303. package/bin/skills/unsloth/docs/tutorial-grpo.md +273 -0
  304. package/bin/skills/unsloth/docs/tutorial-llama3-ollama.md +356 -0
  305. package/bin/skills/unsloth/docs/vision-fine-tuning.md +135 -0
  306. package/bin/skills/unsloth/docs/vision-rl.md +170 -0
  307. package/bin/skills/unsloth/docs/vllm-engine-arguments.md +43 -0
  308. package/bin/skills/unsloth/docs/vllm-guide.md +98 -0
  309. package/bin/skills/venue-templates/SKILL.md +686 -0
  310. package/bin/skills/venue-templates/assets/examples/cell_summary_example.md +247 -0
  311. package/bin/skills/venue-templates/assets/examples/medical_structured_abstract.md +313 -0
  312. package/bin/skills/venue-templates/assets/examples/nature_abstract_examples.md +213 -0
  313. package/bin/skills/venue-templates/assets/examples/neurips_introduction_example.md +245 -0
  314. package/bin/skills/venue-templates/assets/grants/nih_specific_aims.tex +235 -0
  315. package/bin/skills/venue-templates/assets/grants/nsf_proposal_template.tex +375 -0
  316. package/bin/skills/venue-templates/assets/journals/nature_article.tex +171 -0
  317. package/bin/skills/venue-templates/assets/journals/neurips_article.tex +283 -0
  318. package/bin/skills/venue-templates/assets/journals/plos_one.tex +317 -0
  319. package/bin/skills/venue-templates/assets/posters/beamerposter_academic.tex +311 -0
  320. package/bin/skills/venue-templates/references/cell_press_style.md +483 -0
  321. package/bin/skills/venue-templates/references/conferences_formatting.md +564 -0
  322. package/bin/skills/venue-templates/references/cs_conference_style.md +463 -0
  323. package/bin/skills/venue-templates/references/grants_requirements.md +787 -0
  324. package/bin/skills/venue-templates/references/journals_formatting.md +486 -0
  325. package/bin/skills/venue-templates/references/medical_journal_styles.md +535 -0
  326. package/bin/skills/venue-templates/references/ml_conference_style.md +556 -0
  327. package/bin/skills/venue-templates/references/nature_science_style.md +405 -0
  328. package/bin/skills/venue-templates/references/posters_guidelines.md +628 -0
  329. package/bin/skills/venue-templates/references/reviewer_expectations.md +417 -0
  330. package/bin/skills/venue-templates/references/venue_writing_styles.md +321 -0
  331. package/bin/skills/venue-templates/scripts/customize_template.py +195 -0
  332. package/bin/skills/venue-templates/scripts/query_template.py +266 -0
  333. package/bin/skills/venue-templates/scripts/validate_format.py +250 -0
  334. package/bin/synsc +0 -0
  335. package/package.json +1 -1
  336. package/bin/skills/unsloth/references/index.md +0 -7
  337. package/bin/skills/unsloth/references/llms-full.md +0 -16799
  338. package/bin/skills/unsloth/references/llms-txt.md +0 -12044
  339. package/bin/skills/unsloth/references/llms.md +0 -82
@@ -0,0 +1,90 @@
1
+ # Multi-GPU Fine-tuning with Distributed Data Parallel (DDP)
2
+
3
+ Let's assume we have multiple GPUs, and we want to fine-tune a model using all of them! To do so, the most straightforward strategy is to use Distributed Data Parallel (DDP), which creates one copy of the model on each GPU device, feeding each copy distinct samples from the dataset during training and aggregating their contributions to weight updates per optimizer step.
4
+
5
+ Why would we want to do this? Well, as we add more GPUs into the training process, we scale the number of samples our models train on per step, making each gradient update more stable and increasing our training throughput dramatically with each added GPU.
6
+
7
+ Here's a step-by-step guide on how to do this using Unsloth's command-line interface (CLI)!
8
+
9
+ **Note:** Unsloth DDP will work with any of your training scripts, not just via our CLI! More details below.
10
+
11
+ ## Install Unsloth from source
12
+
13
+ We'll clone Unsloth from GitHub and install it. Please consider using a virtual environment; we like to use `uv venv --python 3.12 && source .venv/bin/activate`, but any virtual environment creation tooling will do.
14
+
15
+ ```bash
16
+ git clone https://github.com/unslothai/unsloth.git
17
+ cd unsloth
18
+ pip install .
19
+ ```
20
+
21
+ ## Choose target model and dataset for finetuning
22
+
23
+ In this demo, we will fine-tune [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) chat dataset. This is a Supervised Fine-Tuning (SFT) workload that is commonly used when attempting to adapt a base model to a desired conversational style, or improve the model's performance on a downstream task.
24
+
25
+ ## Use the Unsloth CLI!
26
+
27
+ The CLI provides options for model, LoRA, and training configuration:
28
+
29
+ ```bash
30
+ $ python unsloth-cli.py --help
31
+ usage: unsloth-cli.py [-h] [--model_name MODEL_NAME] [--max_seq_length MAX_SEQ_LENGTH]
32
+ [--dtype DTYPE] [--load_in_4bit] [--dataset DATASET]
33
+ [--r R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
34
+ [--bias BIAS] [--use_gradient_checkpointing USE_GRADIENT_CHECKPOINTING]
35
+ ...
36
+
37
+ Model Options:
38
+ --model_name MODEL_NAME Model name to load
39
+ --max_seq_length MAX_SEQ_LENGTH Maximum sequence length (default 2048)
40
+
41
+ LoRA Options:
42
+ --r R Rank for LoRA (default 16, common: 8, 16, 32, 64, 128)
43
+ --lora_alpha LORA_ALPHA LoRA alpha parameter (default 16)
44
+
45
+ Training Options:
46
+ --per_device_train_batch_size Batch size per device (default 2)
47
+ --gradient_accumulation_steps Gradient accumulation steps (default 4)
48
+ ```
49
+
50
+ For multi-GPU training (DDP), we use the [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html) launcher, which allows you to spin up multiple distributed training processes in single-node or multi-node settings.
51
+
52
+ ### Starting the training run
53
+
54
+ ```bash
55
+ # required:
56
+ # --model_name
57
+ # --dataset
58
+ # optional; experiment with these:
59
+ # --learning_rate, --max_seq_length, --per_device_train_batch_size, --gradient_accumulation_steps, --max_steps
60
+ # to save the model at the end of training:
61
+ # --save_model
62
+
63
+ torchrun --nproc_per_node=2 unsloth-cli.py \
64
+ --model_name=Qwen/Qwen3-8B \
65
+ --dataset=yahma/alpaca-cleaned \
66
+ --learning_rate=2e-5 \
67
+ --max_seq_length=2048 \
68
+ --per_device_train_batch_size=1 \
69
+ --gradient_accumulation_steps=4 \
70
+ --max_steps=1000 \
71
+ --save_model
72
+ ```
73
+
74
+ If you have more GPUs, you may set `--nproc_per_node` accordingly to utilize them.
75
+
76
+ **Note:** You can use the `torchrun` launcher with any of your Unsloth training scripts, including the [scripts](https://github.com/unslothai/notebooks/tree/main/python_scripts) converted from our free Colab notebooks, and DDP will be auto-enabled when training with >1 GPU!
77
+
78
+ ## Training metrics
79
+
80
+ We ran a few short rank-16 LoRA fine-tunes on [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset to demonstrate the improved training throughput when using DDP training with multiple GPUs.
81
+
82
+ ### Training Loss
83
+
84
+ The loss curves match in scale and trend between single GPU (pink) and multi-GPU DDP (blue), but are a bit different since the multi-GPU training processes twice as much training data per step. This results in a slightly different training curve with less variability on a step-by-step basis.
85
+
86
+ ### Training Progress
87
+
88
+ The multi-GPU DDP training progresses through an epoch of the training data in half as many steps as single GPU training. This is because each GPU can process a distinct batch (of size `per_device_train_batch_size`) per step. However, the per-step timing for DDP training is slightly slower due to distributed communication for the model weight updates. As you increase the number of GPUs, the training throughput will continue to increase ~linearly (but with a small, but increasing penalty for the distributed comms).
89
+
90
+ These same loss and training epoch progress behaviors hold for QLoRA fine-tunes, in which we loaded the base models in 4-bit precision in order to save additional GPU memory. This is particularly useful for training large models on limited amounts of GPU VRAM.
@@ -0,0 +1,223 @@
1
+ # Unsloth Notebooks
2
+
3
+ Train your own model with our notebooks, powered by free GPU compute. Click Run all (or save locally), add your dataset, train, and deploy. You can use any model in the notebooks.
4
+
5
+ <a href="#grpo-reasoning-rl" class="button secondary">GRPO (RL)</a><a href="#text-to-speech-tts" class="button secondary">Text-to-speech</a><a href="#vision-multimodal" class="button secondary">Vision</a><a href="#embedding-models" class="button secondary">Embedding</a><a href="#kaggle-notebooks" class="button secondary">Kaggle</a>
6
+
7
+ Also see our GitHub repo for our notebooks: [github.com/unslothai/notebooks](https://github.com/unslothai/notebooks/)
8
+
9
+ ## Colab notebooks
10
+
11
+ ### Standard notebooks:
12
+
13
+ * [**gpt-oss (20b)**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-Fine-tuning.ipynb) • [Inference](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GPT_OSS_MXFP4_\(20B\)-Inference.ipynb) • [Fine-tuning](https://colab.research.google.com/gitunslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-Fine-tuning.ipynb)
14
+ * [EmbeddingGemma (300M)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/EmbeddingGemma_\(300M\).ipynb) - new
15
+ * [Qwen3 (14B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(14B\)-Reasoning-Conversational.ipynb) • [**Qwen3-VL (8B)**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_VL_\(8B\)-Vision.ipynb)
16
+ * [**Qwen3-2507-4B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)-Instruct.ipynb) • [Thinking](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)-Thinking.ipynb) • [Instruct](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)-Instruct.ipynb)
17
+ * [Gemma 3 (4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\).ipynb) • [Text](https://colab.research.google.com/github/unslothai/notebooks/blob/maima3_\(4B\).ipynb) • [Vision](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\)-Vision.ipynb) • [270M](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(270M\).ipynb) • [**FunctionGemma**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\).ipynb)
18
+ * [Gemma 3n (E4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Conversational.ipynb) • [Text](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Conversational.ipynb) • [Vision](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Vision.ipynb) • [Audio](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Audio.ipynb)
19
+ * [**Mistral Ministral 3**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_VL_\(3B\)_Vision.ipynb)
20
+ * [**DeepSeek-OCR 2**](https://co.google.com/github/unslothai/notebooks/blob/main/nb/Deepseek_OCR_2_\(3B\).ipynb)&#x20;
21
+ * [IBM Granite-4.0-H](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Granite4.0.ipynb)
22
+ * [Phi-4 (14B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)
23
+ * [Llama 3.1 (8B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_\(8B\)-Alpaca.ipynb) • [Llama 3.2 (1B + 3B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_\(1B_and_3B\)-Conversational.ipynb)
24
+
25
+ ### GRPO (Reasoning RL):
26
+
27
+ * [gpt-oss-20b](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-GRPO.ipynb) (automatic kernels creation)
28
+ * [Mistral Ministral 3](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_\(3B\)_Reinforcement_Learning_Sudoku_Game.ipynb) (solving sodoku) - new
29
+ * [Qwen3-8B - **FP8**](https://colab.research.google.com/github/unslothai/notebks/blob/main/nb/Qwen3_8B_FP8_GRPO.ipynb) (L4) - new
30
+ * [Llama-3.2-1B - **FP8**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama_FP8_GRPO.ipynb) (L4) - new
31
+ * [gpt-oss-20b](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt_oss_\(20B\)_Reinforcement_Learning_2048_Game.ipynb) (auto win 2048 game)&#x20;
32
+ * [Qwen3-VL (8B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_VL_\(8B\)-Vision-GRPO.ipynb) - Vision GSPO
33
+ * [Qwen3 (4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)-GRPO.ipynb) - Advanced GRPO LoRA
34
+ * [Gemma 3 (4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\)-Vision-GRPO.ipynb) - Vision GSPO
35
+ * [gpt-oss-20b](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_\(20B\)_Reinforcement_Learning_2048_Game.ipynb) (2048 OpenEnv example)
36
+ * [DeepSeek-R1-0528-Qwen3 (8B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/DeepSeek_R1_0528_Qwen3_\(8B\)_GRPO.ipynb) (for multilingual usecase)
37
+ * [Gemma 3 (1B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(1B\)-GRPO.ipynb)
38
+ * [Llama 3.2 (3B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_\(3B\)_GRPO_LoRA.ipynb) - Advanced GRPO LoRA
39
+ * [Llama 3.1 (8B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_\(8B\)-GRPO.ipynb)
40
+ * [Phi-4 (14B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_\(14B\)-GRPO.ipynb)
41
+ * [Mistral v0.3 (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-GRPO.ipynb)
42
+ * [NeMo Gym Multi Environment ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/NeMo-Gym-Multi-Environment.ipynb)(Multiple environments)
43
+
44
+ ### Text-to-Speech (TTS):
45
+
46
+ * [Sesame-CSM (1B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Sesame_CSM_\(1B\)-TTS.ipynb)
47
+ * [Orpheus-TTS (3B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Orpheus_\(3B\)-TTS.ipynb)
48
+ * [Whisper Large V3](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Whisper.ipynb) - Speech-to-Text (STT)
49
+ * [Llasa-TTS (1B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llasa_TTS_\(1B\).ipynb)
50
+ * [Spark-TTS (0.5B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Spark_TTS_\(0_5B\).ipynb)
51
+ * [Oute-TTS (1B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Oute_TTS_\(1B\).ipynb)
52
+
53
+ **Speech-to-Text (SST):**
54
+
55
+ * [Whisper-Large-V3](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Whisper.ipynb)
56
+ * [Gemma 3n (E4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Audio.ipynb) - Audio
57
+
58
+ ### Vision (Multimodal):
59
+
60
+ * [**Qwen3-VL (8B)**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_VL_\(8B\)-Vision.ipynb)
61
+ * [**Mistral Ministral 3**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_VL_\(3B\)_Vision.ipynb)
62
+ * [**DeepSeek-OCR**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Deepseek_OCR_\(3B\).ipynb)
63
+ * [**Paddle-OCR (1B)**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Paddle_OCR_\(1B\)_Vision.ipynb)
64
+ * [Gemma 3n (E4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_\(4B\)-Vision.ipynb)
65
+ * [Gemma 3 (4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\)-Vision.ipynb)
66
+ * [Llama 3.2 Vision (11B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_\(11B\)-Vision.ipynb)
67
+ * [Qwen2.5-VL (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_VL_\(7B\)-Vision.ipynb)
68
+ * [Pixtral (12B) 2409](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Pixtral_\(12B\)-Vision.ipynb)
69
+ * [Qwen3-VL](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_VL_\(8B\)-Vision-GRPO.ipynb) - Vision GSPO - new
70
+ * [Qwen2.5-VL](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_5_7B_VL_GRPO.ipynb) - Vision GSPO
71
+ * [Gemma 3 (4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\)-Vision-GRPO.ipynb) - Vision GSPO
72
+
73
+ ### Embedding models:
74
+
75
+ * [EmbeddingGemma (300M)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/EmbeddingGemma_\(300M\).ipynb) - new
76
+ * [Qwen3-Embedding 4B](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_Embedding_\(4B\).ipynb) - new
77
+ * [Qwen3-Embedding 0.6B](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_Embedding_\(0_6B\).ipynb) - new
78
+ * [BGE M3](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/BGE_M3.ipynb) - new
79
+ * [ModernBERT-large](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/bert_classification.ipynb) - new
80
+ * [All-MiniLM-L6-v2](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/All_MiniLM_L6_v2.ipynb) - new
81
+ * [GTE ModernBert](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/ModernBert.ipynb) - new
82
+
83
+ ### Large LLMs:
84
+
85
+ **Notebooks for large models:** These exceed Colab’s free 15 GB VRAM tier. With Colab’s new 80 GB GPUs, you can fine-tune 120B parameter models.
86
+
87
+ {% hint style="info" %}
88
+ Colab subscription or credits are required. We **don't** earn anything from these notebooks.
89
+ {% endhint %}
90
+
91
+ * [gpt-oss-20b (500K context)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt_oss_\(20B\)_500K_Context_Fine_tuning.ipynb) - new
92
+ * [GLM-4.7-Flash](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GLM_Flash_A100\(80GB\).ipynb) - new
93
+ * [Qwen3-30B-A3B](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_MoE.ipynb) -
94
+ * [Nemotron-3-Nano-30B-A3B LoRA notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Nemotron-3-Nano-30B-A3B_A100.ipynb)&#x20;
95
+ * [NeMo Gym Sudoku GRPO notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/NeMo-Gym-Sudoku.ipynb)
96
+ * [NeMo Gym Multi Environment GRPO Notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/NeMo-Gym-Multi-Environment.ipynb)
97
+ * [gpt-oss-120b](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(120B\)_A100-Fine-tuning.ipynb)
98
+ * [Qwen3 (32B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(32B\)_A100-Reasoning-Conversational.ipynb)
99
+ * [Llama 3.3 (70B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.3_\(70B\)_A100-Conversational.ipynb)
100
+ * [Gemma 3 (27B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(27B\)_A100-Conversational.ipynb)&#x20;
101
+ * [Baidu ERNIE 4.5 VL (28B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/ERNIE_4_5_VL_28B_A3B_PT_Vision.ipynb) - new
102
+
103
+ ### Other important notebooks:
104
+
105
+ * [**Customer support agent**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Granite4.0.ipynb)
106
+ * [Mistral Ministral 3](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_\(3B\)_Reinforcement_Learning_Sudoku_Game.ipynb) - new (solving sodoku)
107
+ * [Deploy on LM Studio ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\)-LMStudio.ipynb)- new
108
+ * [Quantization-Aware Training](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)_Instruct-QAT.ipynb) (QAT) - new
109
+ * [Phone Deployment ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(0_6B\)-Phone_Deployment.ipynb)- new
110
+ * [Reason before **Tool Calling** notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\).ipynb) - new
111
+ * [Mobile Actions notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\)-Mobile-Actions.ipynb) - new
112
+ * [**Automatic Kernel Creation**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-GRPO.ipynb) with RL
113
+ * [**ModernBERT-large**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/bert_classification.ipynb) **- new** Aug 19
114
+ * [**Synthetic Data Generation Llama 3.2 (3B)**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Meta_Synthetic_Data_Llama3_2_\(3B\).ipynb)
115
+ * [gpt-oss-20b (500K context)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt_oss_\(20B\)_500K_Context_Fine_tuning.ipynb) - new (A100)
116
+ * [**Tool Calling**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_Coder_\(1.5B\)-Tool_Calling.ipynb)
117
+ * [Mistral v0.3 Instruct (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-Conversational.ipynb)
118
+ * [Ollama](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_\(8B\)-Ollama.ipynb)
119
+ * [ORPO](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_\(8B\)-ORPO.ipynb)
120
+ * [Continued Pretraining](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-CPT.ipynb)
121
+ * [DPO Zephyr](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_\(7B\)-DPO.ipynb)
122
+ * [***Inference only***](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_\(8B\)-Inference.ipynb)
123
+ * [Llama 3 (8B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_\(8B\)-Alpaca.ipynb)
124
+
125
+ ### Specific use-case notebooks:
126
+
127
+ * [Phone Deployment ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(0_6B\)-Phone_Deployment.ipynb)- new
128
+ * [Deploy on LM Studio ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\)-LMStudio.ipynb)- new
129
+ * [Reason before **Tool Calling**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\).ipynb) - new
130
+ * [Mobile Actions](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/FunctionGemma_\(270M\)-Mobile-Actions.ipynb) - new
131
+ * [**Customer support agent**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Granite4.0.ipynb)
132
+ * [Quantization-Aware Training](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(4B\)_Instruct-QAT.ipynb) (QAT) - new
133
+ * [**Automatic Kernel Creation**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-\(20B\)-GRPO.ipynb) with RL **- new**
134
+ * [DPO Zephyr](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_\(7B\)-DPO.ipynb)
135
+ * [BERT - Text Classification](https://colab.research.google.com/github/timothelaborie/text_classification_scripts/blob/main/unsloth_classification.ipynb) - (AutoModelForSequenceClassification)
136
+ * [Ollama](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_\(8B\)-Ollama.ipynb)
137
+ * [**Tool Calling**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_Coder_\(1.5B\)-Tool_Calling.ipynb)
138
+ * [Continued Pretraining (CPT)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-CPT.ipynb)
139
+ * [Multiple Datasets](https://colab.research.google.com/drive/1njCCbE1YVal9xC83hjdo2hiGItpY_D6t?usp=sharing) by Flail
140
+ * [KTO](https://colab.research.google.com/drive/1MRgGtLWuZX4ypSfGguFgC-IblTvO2ivM?usp=sharing) by Jeffrey
141
+ * [Inference chat UI](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Unsloth_Studio.ipynb)
142
+ * [Conversational](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_\(1B_and_3B\)-Conversational.ipynb)
143
+ * [ChatML](https://colab.research.google.com/drive/15F1xyn8497_dUbxZP4zWmPZ3PJx1Oymv?usp=sharing)
144
+ * [Text Completion](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_\(7B\)-Text_Completion.ipynb)
145
+
146
+ ### Rest of notebooks:
147
+
148
+ * [Qwen2.5 (3B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_\(3B\)-GRPO.ipynb)
149
+ * [Gemma 2 (9B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_\(9B\)-Alpaca.ipynb)
150
+ * [Mistral NeMo (12B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_Nemo_\(12B\)-Alpaca.ipynb)
151
+ * [Phi-3.5 (mini)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb)
152
+ * [Phi-3 (medium)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3_Medium-Conversational.ipynb)
153
+ * [Gemma 2 (2B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_\(2B\)-Alpaca.ipynb)
154
+ * [Qwen 2.5 Coder (14B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_Coder_\(14B\)-Conversational.ipynb)
155
+ * [Mistral Small (22B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_Small_\(22B\)-Alpaca.ipynb)
156
+ * [TinyLlama](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/TinyLlama_\(1.1B\)-Alpaca.ipynb)
157
+ * [CodeGemma (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/CodeGemma_\(7B\)-Conversational.ipynb)
158
+ * [Mistral v0.3 (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-Alpaca.ipynb)
159
+ * [Qwen2 (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_\(7B\)-Alpaca.ipynb)
160
+
161
+ ## Kaggle notebooks
162
+
163
+ #### Standard notebooks:
164
+
165
+ * [**gpt-oss (20B)**](https://www.kaggle.com/notebooks/welcome?src=https://github.com/unslothai/notebooks/blob/main/nb/Kaggle-gpt-oss-\(20B\)-Fine-tuning.ipynb\&accelerator=nvidiaTeslaT4) **- new**
166
+ * [Gemma 3n (E4B)](https://www.kaggle.com/code/danielhanchen/gemma-3n-4b-multimodal-finetuning-inference)
167
+ * [Qwen3 (14B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Qwen3_\(14B\).ipynb)
168
+ * [Magistral-2509 (24B)](https://www.kaggle.com/notebooks/welcome?src=https://github.com/unslothai/notebooks/blob/main/nb/Kaggle-Magistral_\(24B\)-Reasoning-Conversational.ipynb\&accelerator=nvidiaTeslaT4) - new
169
+ * [Gemma 3 (4B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Gemma3_\(4B\).ipynb)
170
+ * [Phi-4 (14B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Phi_4-Conversational.ipynb)
171
+ * [Llama 3.1 (8B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3.1_\(8B\)-Alpaca.ipynb)
172
+ * [Llama 3.2 (1B + 3B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3.2_\(1B_and_3B\)-Conversational.ipynb)
173
+ * [Qwen 2.5 (7B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Qwen2.5_\(7B\)-Alpaca.ipynb)
174
+
175
+ #### GRPO (Reasoning) notebooks:
176
+
177
+ * [**Qwen2.5-VL**](https://www.kaggle.com/notebooks/welcome?src=https://github.com/unslothai/notebooks/blob/main/nb/Kaggle-Qwen2_5_7B_VL_GRPO.ipynb\&accelerator=nvidiaTeslaT4) - Vision GRPO - new
178
+ * [Qwen3 (4B)](https://www.kaggle.com/notebooks/welcome?src=https://github.com/unslothai/notebooks/blob/main/nb/Kaggle-Qwen3_\(4B\)-GRPO.ipynb\&accelerator=nvidiaTeslaT4)
179
+ * [Gemma 3 (1B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Gemma3_\(1B\)-GRPO.ipynb)
180
+ * [Llama 3.1 (8B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3.1_\(8B\)-GRPO.ipynb)
181
+ * [Phi-4 (14B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Phi_4_\(14B\)-GRPO.ipynb)
182
+ * [Qwen 2.5 (3B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Qwen2.5_\(3B\)-GRPO.ipynb)
183
+
184
+ #### Text-to-Speech (TTS) notebooks:
185
+
186
+ * [Sesame-CSM (1B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Sesame_CSM_\(1B\)-TTS.ipynb)
187
+ * [Orpheus-TTS (3B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Orpheus_\(3B\)-TTS.ipynb)
188
+ * [Whisper Large V3](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Whisper.ipynb) – Speech-to-Text
189
+ * [Llasa-TTS (1B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llasa_TTS_\(1B\).ipynb)
190
+ * [Spark-TTS (0.5B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Spark_TTS_\(0_5B\).ipynb)
191
+ * [Oute-TTS (1B)](https://w.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Oute_TTS_\(1B\).ipynb)
192
+
193
+ #### Vision (Multimodal) notebooks:
194
+
195
+ * [Llama 3.2 Vision (11B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3.2_\(11B\)-Vision.ipynb)
196
+ * [Qwen 2.5-VL (7B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Qwen2.5_VL_\(7B\)-Vision.ipynb)
197
+ * [Pixtral (12B) 2409](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Pixtral_\(12B\)-Vision.ipynb)
198
+
199
+ #### Specific use-case notebooks:
200
+
201
+ * [Tool Calling](https://www.kaggle.com/notebooks/welcome?src=https://github.com/unslothai/notebooks/blob/main/nb/Kaggle-Qwen2.5_Coder_\(1.5B\)-Tool_Calling.ipynb\&accelerator=nvidiaTeslaT4)
202
+ * [ORPO](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3_\(8B\)-ORPO.ipynb)
203
+ * [Continued Pretraining](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Mistral_v0.3_\(7B\)-CPT.ipynb)
204
+ * [DPO Zephyr](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Zephyr_\(7B\)-DPO.ipynb)
205
+ * [Inference only](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3.1_\(8B\)-Inference.ipynb)
206
+ * [Ollama](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Llama3_\(8B\)-Ollama.ipynb)
207
+ * [Text Completion](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Mistral_\(7B\)-Text_Completion.ipynb)
208
+ * [CodeForces-cot (Reasoning)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-CodeForces-cot-Finetune_for_Reasoning_on_CodeForces.ipynb)
209
+ * [Unsloth Studio (chat UI)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Unsloth_Studio.ipynb)
210
+
211
+ #### Rest of notebooks:
212
+
213
+ * [Gemma 2 (9B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Gemma2_\(9B\)-Alpaca.ipynb)
214
+ * [Gemma 2 (2B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Gemma2_\(2B\)-Alpaca.ipynb)
215
+ * [CodeGemma (7B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-CodeGemma_\(7B\)-Conversational.ipynb)
216
+ * [Mistral NeMo (12B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Mistral_Nemo_\(12B\)-Alpaca.ipynb)
217
+ * [Mistral Small (22B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-Mistral_Small_\(22B\)-Alpaca.ipynb)
218
+ * [TinyLlama (1.1B)](https://www.kaggle.com/notebooks/welcome?src=https%3A%2F%2Fgithub.com%2Funslothai/notebooks/blob/main/nb/Kaggle-TinyLlama_\(1.1B\)-Alpaca.ipynb)
219
+
220
+ To view a complete list of all our Kaggle notebooks, [click here](https://github.com/unslothai/notebooks#-kaggle-notebooks).
221
+
222
+ {% hint style="info" %}
223
+ Feel free to contribute to the notebooks by visiting our [repo](https://github.com/unslothai/notebooks)!
@@ -0,0 +1,110 @@
1
+ # Unsloth Docs
2
+ Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.
3
+
4
+ At Unsloth, our mission is to make AI as accurate and accessible as possible. Train and deploy DeepSeek, gpt-oss, Llama, TTS, Qwen, Gemma LLMs 2x faster with 70% less VRAM.
5
+
6
+ Our docs will guide you through running & training your own model locally.
7
+
8
+ Get started
9
+ Our GitHub
10
+
11
+ Cover
12
+ Faster MoE is here!
13
+
14
+ Train MoE LLMs 12x faster with less VRAM.
15
+
16
+ Cover
17
+ GLM-5
18
+
19
+ Run the new SOTA open model.
20
+
21
+ Cover
22
+ Qwen3-Coder-Next
23
+
24
+ Run & fine-tune the new 80B coding model.
25
+
26
+ Cover
27
+ Kimi K2.5
28
+
29
+ Run the SOTA open model locally.
30
+
31
+ Cover
32
+ GLM-4.7-Flash
33
+
34
+ Run & fine-tune the powerful 30B model.
35
+
36
+ Cover
37
+ Embedding Fine-tuning
38
+
39
+ You can now train embedding models!
40
+
41
+ 🧬
42
+ Fine-tuning Guide
43
+ 📒
44
+ Unsloth Notebooks
45
+ 🔮
46
+ All Our Models
47
+ 🚀
48
+ LLM Tutorials Directory
49
+ 🦥 Why Unsloth?
50
+ We directly collab with teams behind gpt-oss, Qwen3, Llama 4, Mistral, Gemma 1–3 and Phi-4, where we’ve fixed critical bugs that greatly improvecy.
51
+
52
+ Unsloth streamlines local training, evaluation, and deployment with Ollama, llama.cpp and vLLM.
53
+
54
+ Unsloth is the only training framework to support models like: vision, TTS, embedding, RL while remaining customizable with flexible chat templates, dataset formatting and ready-to-use notebooks.
55
+
56
+ ⭐ Key Features
57
+ Supports full-finetuning, pretraining, 4-bit, 16-bit and 8-bit training.
58
+
59
+ Supports all types models: TTS, embedding, multimodal, and more.
60
+
61
+ Most efficient reinforcement learning (RL) library, using 80% less VRAM. Supports GRPO, GSPO etc.
62
+
63
+ 0% loss in accuracy - no quantization or approximation methods - all exact.
64
+
65
+ MultiGPU works already but a much better version is coming!
66
+
67
+ Unsloth supports Linux, Windows, WSL, NVIDIA and AMD & Intel. See: Unsloth Requirements
68
+
69
+ Quickstart
70
+ Install locally with pip (recommended) for Linux or WSL devices:
71
+
72
+
73
+ Copy
74
+ pip install unsloth
75
+ Use our official Docker image: unsloth/unsloth. Read our Docker guide.
76
+
77
+ For Windows install instructions, see here.
78
+
79
+ 📥
80
+ Installation
81
+ Neleases
82
+ What is Fine-tuning and RL? Why?
83
+ Fine-tuning an LLM customizes its behavior, enhances domain knowledge, and optimizes performance for specific tasks. By fine-tuning a pre-trained model (e.g. Llama-3.1-8B) on a dataset, you can:
84
+
85
+ Update Knowledge: Introduce new domain-specific information.
86
+
87
+ Customize Behavior: Adjust the model’s tone, personality, or response style.
88
+
89
+ Optimize for Tasks: Improve accuracy and relevance for specific use cases.
90
+
91
+ Reinforcement Learning (RL) is where an "agent" learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.
92
+
93
+ Action: What the model generates (e.g. a sentence).
94
+
95
+ Reward: A signal indicating how good or bad the model's action was (e.g. did the response follow instructions? was it helpful?).
96
+
97
+ Environment: The scenario or task the model is working on (e.g. answering a user’s question).
98
+
99
+ Example fine-tuning or RL use-cases:
100
+
101
+ Enables LLMs to predict if a headline impacts a company positively or negatively.n use historical customer interactions for more accurate and custom responses.
102
+
103
+ Fine-tune LLM on legal texts for contract analysis, case law research, and compliance.
104
+
105
+ You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.
106
+
107
+ 🤔
108
+ FAQ + Is Fine-tuning Right For Me?
109
+ 💡
110
+ Reinforcement Learning Guide