@synsci/cli-darwin-x64 1.1.97 → 1.1.99

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (1549) hide show
  1. package/bin/synsc +0 -0
  2. package/package.json +1 -1
  3. package/bin/skills/accelerate/SKILL.md +0 -332
  4. package/bin/skills/accelerate/references/custom-plugins.md +0 -453
  5. package/bin/skills/accelerate/references/megatron-integration.md +0 -489
  6. package/bin/skills/accelerate/references/performance.md +0 -525
  7. package/bin/skills/adaptyv/SKILL.md +0 -114
  8. package/bin/skills/adaptyv/reference/api_reference.md +0 -308
  9. package/bin/skills/adaptyv/reference/examples.md +0 -913
  10. package/bin/skills/adaptyv/reference/experiments.md +0 -360
  11. package/bin/skills/adaptyv/reference/protein_optimization.md +0 -637
  12. package/bin/skills/aeon/SKILL.md +0 -374
  13. package/bin/skills/aeon/references/anomaly_detection.md +0 -154
  14. package/bin/skills/aeon/references/classification.md +0 -144
  15. package/bin/skills/aeon/references/clustering.md +0 -123
  16. package/bin/skills/aeon/references/datasets_benchmarking.md +0 -387
  17. package/bin/skills/aeon/references/distances.md +0 -256
  18. package/bin/skills/aeon/references/forecasting.md +0 -140
  19. package/bin/skills/aeon/references/networks.md +0 -289
  20. package/bin/skills/aeon/references/regression.md +0 -118
  21. package/bin/skills/aeon/references/segmentation.md +0 -163
  22. package/bin/skills/aeon/references/similarity_search.md +0 -187
  23. package/bin/skills/aeon/references/transformations.md +0 -246
  24. package/bin/skills/alphafold-database/SKILL.md +0 -513
  25. package/bin/skills/alphafold-database/references/api_reference.md +0 -423
  26. package/bin/skills/anndata/SKILL.md +0 -400
  27. package/bin/skills/anndata/references/best_practices.md +0 -525
  28. package/bin/skills/anndata/references/concatenation.md +0 -396
  29. package/bin/skills/anndata/references/data_structure.md +0 -314
  30. package/bin/skills/anndata/references/io_operations.md +0 -404
  31. package/bin/skills/anndata/references/manipulation.md +0 -516
  32. package/bin/skills/arboreto/SKILL.md +0 -243
  33. package/bin/skills/arboreto/references/algorithms.md +0 -138
  34. package/bin/skills/arboreto/references/basic_inference.md +0 -151
  35. package/bin/skills/arboreto/references/distributed_computing.md +0 -242
  36. package/bin/skills/arboreto/scripts/basic_grn_inference.py +0 -97
  37. package/bin/skills/astropy/SKILL.md +0 -331
  38. package/bin/skills/astropy/references/coordinates.md +0 -273
  39. package/bin/skills/astropy/references/cosmology.md +0 -307
  40. package/bin/skills/astropy/references/fits.md +0 -396
  41. package/bin/skills/astropy/references/tables.md +0 -489
  42. package/bin/skills/astropy/references/time.md +0 -404
  43. package/bin/skills/astropy/references/units.md +0 -178
  44. package/bin/skills/astropy/references/wcs_and_other_modules.md +0 -373
  45. package/bin/skills/audiocraft/SKILL.md +0 -564
  46. package/bin/skills/audiocraft/references/advanced-usage.md +0 -666
  47. package/bin/skills/audiocraft/references/troubleshooting.md +0 -504
  48. package/bin/skills/autogpt/SKILL.md +0 -403
  49. package/bin/skills/autogpt/references/advanced-usage.md +0 -535
  50. package/bin/skills/autogpt/references/troubleshooting.md +0 -420
  51. package/bin/skills/awq/SKILL.md +0 -310
  52. package/bin/skills/awq/references/advanced-usage.md +0 -324
  53. package/bin/skills/awq/references/troubleshooting.md +0 -344
  54. package/bin/skills/axolotl/SKILL.md +0 -158
  55. package/bin/skills/axolotl/references/api.md +0 -5548
  56. package/bin/skills/axolotl/references/dataset-formats.md +0 -1029
  57. package/bin/skills/axolotl/references/index.md +0 -15
  58. package/bin/skills/axolotl/references/other.md +0 -3563
  59. package/bin/skills/benchling-integration/SKILL.md +0 -480
  60. package/bin/skills/benchling-integration/references/api_endpoints.md +0 -883
  61. package/bin/skills/benchling-integration/references/authentication.md +0 -379
  62. package/bin/skills/benchling-integration/references/sdk_reference.md +0 -774
  63. package/bin/skills/bigcode-evaluation-harness/SKILL.md +0 -405
  64. package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +0 -393
  65. package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +0 -424
  66. package/bin/skills/bigcode-evaluation-harness/references/issues.md +0 -394
  67. package/bin/skills/biopython/SKILL.md +0 -443
  68. package/bin/skills/biopython/references/advanced.md +0 -577
  69. package/bin/skills/biopython/references/alignment.md +0 -362
  70. package/bin/skills/biopython/references/blast.md +0 -455
  71. package/bin/skills/biopython/references/databases.md +0 -484
  72. package/bin/skills/biopython/references/phylogenetics.md +0 -566
  73. package/bin/skills/biopython/references/sequence_io.md +0 -285
  74. package/bin/skills/biopython/references/structure.md +0 -564
  75. package/bin/skills/biorxiv-database/SKILL.md +0 -483
  76. package/bin/skills/biorxiv-database/references/api_reference.md +0 -280
  77. package/bin/skills/biorxiv-database/scripts/biorxiv_search.py +0 -445
  78. package/bin/skills/bioservices/SKILL.md +0 -361
  79. package/bin/skills/bioservices/references/identifier_mapping.md +0 -685
  80. package/bin/skills/bioservices/references/services_reference.md +0 -636
  81. package/bin/skills/bioservices/references/workflow_patterns.md +0 -811
  82. package/bin/skills/bioservices/scripts/batch_id_converter.py +0 -347
  83. package/bin/skills/bioservices/scripts/compound_cross_reference.py +0 -378
  84. package/bin/skills/bioservices/scripts/pathway_analysis.py +0 -309
  85. package/bin/skills/bioservices/scripts/protein_analysis_workflow.py +0 -408
  86. package/bin/skills/bitsandbytes/SKILL.md +0 -411
  87. package/bin/skills/bitsandbytes/references/memory-optimization.md +0 -521
  88. package/bin/skills/bitsandbytes/references/qlora-training.md +0 -521
  89. package/bin/skills/bitsandbytes/references/quantization-formats.md +0 -447
  90. package/bin/skills/blip-2/SKILL.md +0 -564
  91. package/bin/skills/blip-2/references/advanced-usage.md +0 -680
  92. package/bin/skills/blip-2/references/troubleshooting.md +0 -526
  93. package/bin/skills/brenda-database/SKILL.md +0 -719
  94. package/bin/skills/brenda-database/references/api_reference.md +0 -537
  95. package/bin/skills/brenda-database/scripts/brenda_queries.py +0 -844
  96. package/bin/skills/brenda-database/scripts/brenda_visualization.py +0 -772
  97. package/bin/skills/brenda-database/scripts/enzyme_pathway_builder.py +0 -1053
  98. package/bin/skills/cellxgene-census/SKILL.md +0 -511
  99. package/bin/skills/cellxgene-census/references/census_schema.md +0 -182
  100. package/bin/skills/cellxgene-census/references/common_patterns.md +0 -351
  101. package/bin/skills/chembl-database/SKILL.md +0 -389
  102. package/bin/skills/chembl-database/references/api_reference.md +0 -272
  103. package/bin/skills/chembl-database/scripts/example_queries.py +0 -278
  104. package/bin/skills/chroma/SKILL.md +0 -406
  105. package/bin/skills/chroma/references/integration.md +0 -38
  106. package/bin/skills/cirq/SKILL.md +0 -346
  107. package/bin/skills/cirq/references/building.md +0 -307
  108. package/bin/skills/cirq/references/experiments.md +0 -572
  109. package/bin/skills/cirq/references/hardware.md +0 -515
  110. package/bin/skills/cirq/references/noise.md +0 -515
  111. package/bin/skills/cirq/references/simulation.md +0 -350
  112. package/bin/skills/cirq/references/transformation.md +0 -416
  113. package/bin/skills/citation-management/SKILL.md +0 -1109
  114. package/bin/skills/citation-management/assets/bibtex_template.bib +0 -264
  115. package/bin/skills/citation-management/assets/citation_checklist.md +0 -386
  116. package/bin/skills/citation-management/references/bibtex_formatting.md +0 -908
  117. package/bin/skills/citation-management/references/citation_validation.md +0 -794
  118. package/bin/skills/citation-management/references/google_scholar_search.md +0 -725
  119. package/bin/skills/citation-management/references/metadata_extraction.md +0 -870
  120. package/bin/skills/citation-management/references/pubmed_search.md +0 -839
  121. package/bin/skills/citation-management/scripts/doi_to_bibtex.py +0 -182
  122. package/bin/skills/citation-management/scripts/extract_metadata.py +0 -570
  123. package/bin/skills/citation-management/scripts/format_bibtex.py +0 -349
  124. package/bin/skills/citation-management/scripts/search_google_scholar.py +0 -251
  125. package/bin/skills/citation-management/scripts/search_pubmed.py +0 -348
  126. package/bin/skills/citation-management/scripts/validate_citations.py +0 -494
  127. package/bin/skills/clinical-decision-support/README.md +0 -129
  128. package/bin/skills/clinical-decision-support/SKILL.md +0 -506
  129. package/bin/skills/clinical-decision-support/assets/biomarker_report_template.tex +0 -380
  130. package/bin/skills/clinical-decision-support/assets/clinical_pathway_template.tex +0 -222
  131. package/bin/skills/clinical-decision-support/assets/cohort_analysis_template.tex +0 -359
  132. package/bin/skills/clinical-decision-support/assets/color_schemes.tex +0 -149
  133. package/bin/skills/clinical-decision-support/assets/example_gbm_cohort.md +0 -208
  134. package/bin/skills/clinical-decision-support/assets/recommendation_strength_guide.md +0 -328
  135. package/bin/skills/clinical-decision-support/assets/treatment_recommendation_template.tex +0 -529
  136. package/bin/skills/clinical-decision-support/references/biomarker_classification.md +0 -719
  137. package/bin/skills/clinical-decision-support/references/clinical_decision_algorithms.md +0 -604
  138. package/bin/skills/clinical-decision-support/references/evidence_synthesis.md +0 -840
  139. package/bin/skills/clinical-decision-support/references/outcome_analysis.md +0 -640
  140. package/bin/skills/clinical-decision-support/references/patient_cohort_analysis.md +0 -427
  141. package/bin/skills/clinical-decision-support/references/treatment_recommendations.md +0 -521
  142. package/bin/skills/clinical-decision-support/scripts/biomarker_classifier.py +0 -383
  143. package/bin/skills/clinical-decision-support/scripts/build_decision_tree.py +0 -417
  144. package/bin/skills/clinical-decision-support/scripts/create_cohort_tables.py +0 -509
  145. package/bin/skills/clinical-decision-support/scripts/generate_survival_analysis.py +0 -441
  146. package/bin/skills/clinical-decision-support/scripts/validate_cds_document.py +0 -326
  147. package/bin/skills/clinical-reports/IMPLEMENTATION_SUMMARY.md +0 -641
  148. package/bin/skills/clinical-reports/README.md +0 -236
  149. package/bin/skills/clinical-reports/SKILL.md +0 -1127
  150. package/bin/skills/clinical-reports/assets/case_report_template.md +0 -352
  151. package/bin/skills/clinical-reports/assets/clinical_trial_csr_template.md +0 -353
  152. package/bin/skills/clinical-reports/assets/clinical_trial_sae_template.md +0 -359
  153. package/bin/skills/clinical-reports/assets/consult_note_template.md +0 -305
  154. package/bin/skills/clinical-reports/assets/discharge_summary_template.md +0 -453
  155. package/bin/skills/clinical-reports/assets/hipaa_compliance_checklist.md +0 -395
  156. package/bin/skills/clinical-reports/assets/history_physical_template.md +0 -305
  157. package/bin/skills/clinical-reports/assets/lab_report_template.md +0 -309
  158. package/bin/skills/clinical-reports/assets/pathology_report_template.md +0 -249
  159. package/bin/skills/clinical-reports/assets/quality_checklist.md +0 -338
  160. package/bin/skills/clinical-reports/assets/radiology_report_template.md +0 -318
  161. package/bin/skills/clinical-reports/assets/soap_note_template.md +0 -253
  162. package/bin/skills/clinical-reports/references/case_report_guidelines.md +0 -570
  163. package/bin/skills/clinical-reports/references/clinical_trial_reporting.md +0 -693
  164. package/bin/skills/clinical-reports/references/data_presentation.md +0 -530
  165. package/bin/skills/clinical-reports/references/diagnostic_reports_standards.md +0 -629
  166. package/bin/skills/clinical-reports/references/medical_terminology.md +0 -588
  167. package/bin/skills/clinical-reports/references/patient_documentation.md +0 -744
  168. package/bin/skills/clinical-reports/references/peer_review_standards.md +0 -585
  169. package/bin/skills/clinical-reports/references/regulatory_compliance.md +0 -577
  170. package/bin/skills/clinical-reports/scripts/check_deidentification.py +0 -332
  171. package/bin/skills/clinical-reports/scripts/compliance_checker.py +0 -78
  172. package/bin/skills/clinical-reports/scripts/extract_clinical_data.py +0 -97
  173. package/bin/skills/clinical-reports/scripts/format_adverse_events.py +0 -97
  174. package/bin/skills/clinical-reports/scripts/generate_report_template.py +0 -149
  175. package/bin/skills/clinical-reports/scripts/terminology_validator.py +0 -126
  176. package/bin/skills/clinical-reports/scripts/validate_case_report.py +0 -323
  177. package/bin/skills/clinical-reports/scripts/validate_trial_report.py +0 -88
  178. package/bin/skills/clinicaltrials-database/SKILL.md +0 -507
  179. package/bin/skills/clinicaltrials-database/references/api_reference.md +0 -358
  180. package/bin/skills/clinicaltrials-database/scripts/query_clinicaltrials.py +0 -215
  181. package/bin/skills/clinpgx-database/SKILL.md +0 -638
  182. package/bin/skills/clinpgx-database/references/api_reference.md +0 -757
  183. package/bin/skills/clinpgx-database/scripts/query_clinpgx.py +0 -518
  184. package/bin/skills/clinvar-database/SKILL.md +0 -362
  185. package/bin/skills/clinvar-database/references/api_reference.md +0 -227
  186. package/bin/skills/clinvar-database/references/clinical_significance.md +0 -218
  187. package/bin/skills/clinvar-database/references/data_formats.md +0 -358
  188. package/bin/skills/clip/SKILL.md +0 -253
  189. package/bin/skills/clip/references/applications.md +0 -207
  190. package/bin/skills/cobrapy/SKILL.md +0 -463
  191. package/bin/skills/cobrapy/references/api_quick_reference.md +0 -655
  192. package/bin/skills/cobrapy/references/workflows.md +0 -593
  193. package/bin/skills/colab-finetuning/SKILL.md +0 -153
  194. package/bin/skills/colab-finetuning/references/bridge-setup.md +0 -68
  195. package/bin/skills/colab-finetuning/references/gpu-tiers.md +0 -54
  196. package/bin/skills/colab-finetuning/references/troubleshooting.md +0 -79
  197. package/bin/skills/constitutional-ai/SKILL.md +0 -290
  198. package/bin/skills/cosmic-database/SKILL.md +0 -336
  199. package/bin/skills/cosmic-database/references/cosmic_data_reference.md +0 -220
  200. package/bin/skills/cosmic-database/scripts/download_cosmic.py +0 -231
  201. package/bin/skills/crewai/SKILL.md +0 -498
  202. package/bin/skills/crewai/references/flows.md +0 -438
  203. package/bin/skills/crewai/references/tools.md +0 -429
  204. package/bin/skills/crewai/references/troubleshooting.md +0 -480
  205. package/bin/skills/dask/SKILL.md +0 -456
  206. package/bin/skills/dask/references/arrays.md +0 -497
  207. package/bin/skills/dask/references/bags.md +0 -468
  208. package/bin/skills/dask/references/best-practices.md +0 -277
  209. package/bin/skills/dask/references/dataframes.md +0 -368
  210. package/bin/skills/dask/references/futures.md +0 -541
  211. package/bin/skills/dask/references/schedulers.md +0 -504
  212. package/bin/skills/datacommons-client/SKILL.md +0 -255
  213. package/bin/skills/datacommons-client/references/getting_started.md +0 -417
  214. package/bin/skills/datacommons-client/references/node.md +0 -250
  215. package/bin/skills/datacommons-client/references/observation.md +0 -185
  216. package/bin/skills/datacommons-client/references/resolve.md +0 -246
  217. package/bin/skills/datamol/SKILL.md +0 -706
  218. package/bin/skills/datamol/references/conformers_module.md +0 -131
  219. package/bin/skills/datamol/references/core_api.md +0 -130
  220. package/bin/skills/datamol/references/descriptors_viz.md +0 -195
  221. package/bin/skills/datamol/references/fragments_scaffolds.md +0 -174
  222. package/bin/skills/datamol/references/io_module.md +0 -109
  223. package/bin/skills/datamol/references/reactions_data.md +0 -218
  224. package/bin/skills/deepchem/SKILL.md +0 -597
  225. package/bin/skills/deepchem/references/api_reference.md +0 -303
  226. package/bin/skills/deepchem/references/workflows.md +0 -491
  227. package/bin/skills/deepchem/scripts/graph_neural_network.py +0 -338
  228. package/bin/skills/deepchem/scripts/predict_solubility.py +0 -224
  229. package/bin/skills/deepchem/scripts/transfer_learning.py +0 -375
  230. package/bin/skills/deepspeed/SKILL.md +0 -141
  231. package/bin/skills/deepspeed/references/08.md +0 -17
  232. package/bin/skills/deepspeed/references/09.md +0 -173
  233. package/bin/skills/deepspeed/references/2020.md +0 -378
  234. package/bin/skills/deepspeed/references/2023.md +0 -279
  235. package/bin/skills/deepspeed/references/assets.md +0 -179
  236. package/bin/skills/deepspeed/references/index.md +0 -35
  237. package/bin/skills/deepspeed/references/mii.md +0 -118
  238. package/bin/skills/deepspeed/references/other.md +0 -1191
  239. package/bin/skills/deepspeed/references/tutorials.md +0 -6554
  240. package/bin/skills/deeptools/SKILL.md +0 -531
  241. package/bin/skills/deeptools/assets/quick_reference.md +0 -58
  242. package/bin/skills/deeptools/references/effective_genome_sizes.md +0 -116
  243. package/bin/skills/deeptools/references/normalization_methods.md +0 -410
  244. package/bin/skills/deeptools/references/tools_reference.md +0 -533
  245. package/bin/skills/deeptools/references/workflows.md +0 -474
  246. package/bin/skills/deeptools/scripts/validate_files.py +0 -195
  247. package/bin/skills/deeptools/scripts/workflow_generator.py +0 -454
  248. package/bin/skills/denario/SKILL.md +0 -215
  249. package/bin/skills/denario/references/examples.md +0 -494
  250. package/bin/skills/denario/references/installation.md +0 -213
  251. package/bin/skills/denario/references/llm_configuration.md +0 -265
  252. package/bin/skills/denario/references/research_pipeline.md +0 -471
  253. package/bin/skills/diffdock/SKILL.md +0 -483
  254. package/bin/skills/diffdock/assets/batch_template.csv +0 -4
  255. package/bin/skills/diffdock/assets/custom_inference_config.yaml +0 -90
  256. package/bin/skills/diffdock/references/confidence_and_limitations.md +0 -182
  257. package/bin/skills/diffdock/references/parameters_reference.md +0 -163
  258. package/bin/skills/diffdock/references/workflows_examples.md +0 -392
  259. package/bin/skills/diffdock/scripts/analyze_results.py +0 -334
  260. package/bin/skills/diffdock/scripts/prepare_batch_csv.py +0 -254
  261. package/bin/skills/diffdock/scripts/setup_check.py +0 -278
  262. package/bin/skills/dnanexus-integration/SKILL.md +0 -383
  263. package/bin/skills/dnanexus-integration/references/app-development.md +0 -247
  264. package/bin/skills/dnanexus-integration/references/configuration.md +0 -646
  265. package/bin/skills/dnanexus-integration/references/data-operations.md +0 -400
  266. package/bin/skills/dnanexus-integration/references/job-execution.md +0 -412
  267. package/bin/skills/dnanexus-integration/references/python-sdk.md +0 -523
  268. package/bin/skills/document-skills/docx/LICENSE.txt +0 -30
  269. package/bin/skills/document-skills/docx/SKILL.md +0 -233
  270. package/bin/skills/document-skills/docx/docx-js.md +0 -350
  271. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +0 -1499
  272. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +0 -146
  273. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +0 -1085
  274. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +0 -11
  275. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +0 -3081
  276. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +0 -23
  277. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +0 -185
  278. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +0 -287
  279. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +0 -1676
  280. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +0 -28
  281. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +0 -144
  282. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +0 -174
  283. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +0 -25
  284. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +0 -18
  285. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +0 -59
  286. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +0 -56
  287. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +0 -195
  288. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +0 -582
  289. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +0 -25
  290. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +0 -4439
  291. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +0 -570
  292. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +0 -509
  293. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +0 -12
  294. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +0 -108
  295. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +0 -96
  296. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +0 -3646
  297. package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +0 -116
  298. package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +0 -42
  299. package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +0 -50
  300. package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +0 -49
  301. package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +0 -33
  302. package/bin/skills/document-skills/docx/ooxml/schemas/mce/mc.xsd +0 -75
  303. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2010.xsd +0 -560
  304. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2012.xsd +0 -67
  305. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2018.xsd +0 -14
  306. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd +0 -20
  307. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd +0 -13
  308. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +0 -4
  309. package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd +0 -8
  310. package/bin/skills/document-skills/docx/ooxml/scripts/pack.py +0 -159
  311. package/bin/skills/document-skills/docx/ooxml/scripts/unpack.py +0 -29
  312. package/bin/skills/document-skills/docx/ooxml/scripts/validate.py +0 -69
  313. package/bin/skills/document-skills/docx/ooxml/scripts/validation/__init__.py +0 -15
  314. package/bin/skills/document-skills/docx/ooxml/scripts/validation/base.py +0 -951
  315. package/bin/skills/document-skills/docx/ooxml/scripts/validation/docx.py +0 -274
  316. package/bin/skills/document-skills/docx/ooxml/scripts/validation/pptx.py +0 -315
  317. package/bin/skills/document-skills/docx/ooxml/scripts/validation/redlining.py +0 -279
  318. package/bin/skills/document-skills/docx/ooxml.md +0 -610
  319. package/bin/skills/document-skills/docx/scripts/__init__.py +0 -1
  320. package/bin/skills/document-skills/docx/scripts/document.py +0 -1276
  321. package/bin/skills/document-skills/docx/scripts/templates/comments.xml +0 -3
  322. package/bin/skills/document-skills/docx/scripts/templates/commentsExtended.xml +0 -3
  323. package/bin/skills/document-skills/docx/scripts/templates/commentsExtensible.xml +0 -3
  324. package/bin/skills/document-skills/docx/scripts/templates/commentsIds.xml +0 -3
  325. package/bin/skills/document-skills/docx/scripts/templates/people.xml +0 -3
  326. package/bin/skills/document-skills/docx/scripts/utilities.py +0 -374
  327. package/bin/skills/document-skills/pdf/LICENSE.txt +0 -30
  328. package/bin/skills/document-skills/pdf/SKILL.md +0 -330
  329. package/bin/skills/document-skills/pdf/forms.md +0 -205
  330. package/bin/skills/document-skills/pdf/reference.md +0 -612
  331. package/bin/skills/document-skills/pdf/scripts/check_bounding_boxes.py +0 -70
  332. package/bin/skills/document-skills/pdf/scripts/check_bounding_boxes_test.py +0 -226
  333. package/bin/skills/document-skills/pdf/scripts/check_fillable_fields.py +0 -12
  334. package/bin/skills/document-skills/pdf/scripts/convert_pdf_to_images.py +0 -35
  335. package/bin/skills/document-skills/pdf/scripts/create_validation_image.py +0 -41
  336. package/bin/skills/document-skills/pdf/scripts/extract_form_field_info.py +0 -152
  337. package/bin/skills/document-skills/pdf/scripts/fill_fillable_fields.py +0 -114
  338. package/bin/skills/document-skills/pdf/scripts/fill_pdf_form_with_annotations.py +0 -108
  339. package/bin/skills/document-skills/pptx/LICENSE.txt +0 -30
  340. package/bin/skills/document-skills/pptx/SKILL.md +0 -520
  341. package/bin/skills/document-skills/pptx/html2pptx.md +0 -625
  342. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +0 -1499
  343. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +0 -146
  344. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +0 -1085
  345. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +0 -11
  346. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +0 -3081
  347. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +0 -23
  348. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +0 -185
  349. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +0 -287
  350. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +0 -1676
  351. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +0 -28
  352. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +0 -144
  353. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +0 -174
  354. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +0 -25
  355. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +0 -18
  356. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +0 -59
  357. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +0 -56
  358. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +0 -195
  359. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +0 -582
  360. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +0 -25
  361. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +0 -4439
  362. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +0 -570
  363. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +0 -509
  364. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +0 -12
  365. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +0 -108
  366. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +0 -96
  367. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +0 -3646
  368. package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +0 -116
  369. package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +0 -42
  370. package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +0 -50
  371. package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +0 -49
  372. package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +0 -33
  373. package/bin/skills/document-skills/pptx/ooxml/schemas/mce/mc.xsd +0 -75
  374. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2010.xsd +0 -560
  375. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2012.xsd +0 -67
  376. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2018.xsd +0 -14
  377. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cex-2018.xsd +0 -20
  378. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cid-2016.xsd +0 -13
  379. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +0 -4
  380. package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-symex-2015.xsd +0 -8
  381. package/bin/skills/document-skills/pptx/ooxml/scripts/pack.py +0 -159
  382. package/bin/skills/document-skills/pptx/ooxml/scripts/unpack.py +0 -29
  383. package/bin/skills/document-skills/pptx/ooxml/scripts/validate.py +0 -69
  384. package/bin/skills/document-skills/pptx/ooxml/scripts/validation/__init__.py +0 -15
  385. package/bin/skills/document-skills/pptx/ooxml/scripts/validation/base.py +0 -951
  386. package/bin/skills/document-skills/pptx/ooxml/scripts/validation/docx.py +0 -274
  387. package/bin/skills/document-skills/pptx/ooxml/scripts/validation/pptx.py +0 -315
  388. package/bin/skills/document-skills/pptx/ooxml/scripts/validation/redlining.py +0 -279
  389. package/bin/skills/document-skills/pptx/ooxml.md +0 -427
  390. package/bin/skills/document-skills/pptx/scripts/html2pptx.js +0 -979
  391. package/bin/skills/document-skills/pptx/scripts/inventory.py +0 -1020
  392. package/bin/skills/document-skills/pptx/scripts/rearrange.py +0 -231
  393. package/bin/skills/document-skills/pptx/scripts/replace.py +0 -385
  394. package/bin/skills/document-skills/pptx/scripts/thumbnail.py +0 -450
  395. package/bin/skills/document-skills/xlsx/LICENSE.txt +0 -30
  396. package/bin/skills/document-skills/xlsx/SKILL.md +0 -325
  397. package/bin/skills/document-skills/xlsx/recalc.py +0 -178
  398. package/bin/skills/drugbank-database/SKILL.md +0 -190
  399. package/bin/skills/drugbank-database/references/chemical-analysis.md +0 -590
  400. package/bin/skills/drugbank-database/references/data-access.md +0 -242
  401. package/bin/skills/drugbank-database/references/drug-queries.md +0 -386
  402. package/bin/skills/drugbank-database/references/interactions.md +0 -425
  403. package/bin/skills/drugbank-database/references/targets-pathways.md +0 -518
  404. package/bin/skills/drugbank-database/scripts/drugbank_helper.py +0 -350
  405. package/bin/skills/dspy/SKILL.md +0 -590
  406. package/bin/skills/dspy/references/examples.md +0 -663
  407. package/bin/skills/dspy/references/modules.md +0 -475
  408. package/bin/skills/dspy/references/optimizers.md +0 -566
  409. package/bin/skills/ena-database/SKILL.md +0 -204
  410. package/bin/skills/ena-database/references/api_reference.md +0 -490
  411. package/bin/skills/ensembl-database/SKILL.md +0 -311
  412. package/bin/skills/ensembl-database/references/api_endpoints.md +0 -346
  413. package/bin/skills/ensembl-database/scripts/ensembl_query.py +0 -427
  414. package/bin/skills/esm/SKILL.md +0 -306
  415. package/bin/skills/esm/references/esm-c-api.md +0 -583
  416. package/bin/skills/esm/references/esm3-api.md +0 -452
  417. package/bin/skills/esm/references/forge-api.md +0 -657
  418. package/bin/skills/esm/references/workflows.md +0 -685
  419. package/bin/skills/etetoolkit/SKILL.md +0 -623
  420. package/bin/skills/etetoolkit/references/api_reference.md +0 -583
  421. package/bin/skills/etetoolkit/references/visualization.md +0 -783
  422. package/bin/skills/etetoolkit/references/workflows.md +0 -774
  423. package/bin/skills/etetoolkit/scripts/quick_visualize.py +0 -214
  424. package/bin/skills/etetoolkit/scripts/tree_operations.py +0 -229
  425. package/bin/skills/exploratory-data-analysis/SKILL.md +0 -446
  426. package/bin/skills/exploratory-data-analysis/assets/report_template.md +0 -196
  427. package/bin/skills/exploratory-data-analysis/references/bioinformatics_genomics_formats.md +0 -664
  428. package/bin/skills/exploratory-data-analysis/references/chemistry_molecular_formats.md +0 -664
  429. package/bin/skills/exploratory-data-analysis/references/general_scientific_formats.md +0 -518
  430. package/bin/skills/exploratory-data-analysis/references/microscopy_imaging_formats.md +0 -620
  431. package/bin/skills/exploratory-data-analysis/references/proteomics_metabolomics_formats.md +0 -517
  432. package/bin/skills/exploratory-data-analysis/references/spectroscopy_analytical_formats.md +0 -633
  433. package/bin/skills/exploratory-data-analysis/scripts/eda_analyzer.py +0 -547
  434. package/bin/skills/faiss/SKILL.md +0 -221
  435. package/bin/skills/faiss/references/index_types.md +0 -280
  436. package/bin/skills/fda-database/SKILL.md +0 -518
  437. package/bin/skills/fda-database/references/animal_veterinary.md +0 -377
  438. package/bin/skills/fda-database/references/api_basics.md +0 -687
  439. package/bin/skills/fda-database/references/devices.md +0 -632
  440. package/bin/skills/fda-database/references/drugs.md +0 -468
  441. package/bin/skills/fda-database/references/foods.md +0 -374
  442. package/bin/skills/fda-database/references/other.md +0 -472
  443. package/bin/skills/fda-database/scripts/fda_examples.py +0 -335
  444. package/bin/skills/fda-database/scripts/fda_query.py +0 -440
  445. package/bin/skills/fireworks-ai/SKILL.md +0 -665
  446. package/bin/skills/flash-attention/SKILL.md +0 -367
  447. package/bin/skills/flash-attention/references/benchmarks.md +0 -215
  448. package/bin/skills/flash-attention/references/transformers-integration.md +0 -293
  449. package/bin/skills/flowio/SKILL.md +0 -608
  450. package/bin/skills/flowio/references/api_reference.md +0 -372
  451. package/bin/skills/fluidsim/SKILL.md +0 -349
  452. package/bin/skills/fluidsim/references/advanced_features.md +0 -398
  453. package/bin/skills/fluidsim/references/installation.md +0 -68
  454. package/bin/skills/fluidsim/references/output_analysis.md +0 -283
  455. package/bin/skills/fluidsim/references/parameters.md +0 -198
  456. package/bin/skills/fluidsim/references/simulation_workflow.md +0 -172
  457. package/bin/skills/fluidsim/references/solvers.md +0 -94
  458. package/bin/skills/fred-economic-data/SKILL.md +0 -433
  459. package/bin/skills/fred-economic-data/references/api_basics.md +0 -212
  460. package/bin/skills/fred-economic-data/references/categories.md +0 -442
  461. package/bin/skills/fred-economic-data/references/geofred.md +0 -588
  462. package/bin/skills/fred-economic-data/references/releases.md +0 -642
  463. package/bin/skills/fred-economic-data/references/series.md +0 -584
  464. package/bin/skills/fred-economic-data/references/sources.md +0 -423
  465. package/bin/skills/fred-economic-data/references/tags.md +0 -485
  466. package/bin/skills/fred-economic-data/scripts/fred_examples.py +0 -354
  467. package/bin/skills/fred-economic-data/scripts/fred_query.py +0 -590
  468. package/bin/skills/gene-database/SKILL.md +0 -179
  469. package/bin/skills/gene-database/references/api_reference.md +0 -404
  470. package/bin/skills/gene-database/references/common_workflows.md +0 -428
  471. package/bin/skills/gene-database/scripts/batch_gene_lookup.py +0 -298
  472. package/bin/skills/gene-database/scripts/fetch_gene_data.py +0 -277
  473. package/bin/skills/gene-database/scripts/query_gene.py +0 -251
  474. package/bin/skills/generate-image/SKILL.md +0 -178
  475. package/bin/skills/generate-image/scripts/generate_image.py +0 -254
  476. package/bin/skills/geniml/SKILL.md +0 -318
  477. package/bin/skills/geniml/references/bedspace.md +0 -127
  478. package/bin/skills/geniml/references/consensus_peaks.md +0 -238
  479. package/bin/skills/geniml/references/region2vec.md +0 -90
  480. package/bin/skills/geniml/references/scembed.md +0 -197
  481. package/bin/skills/geniml/references/utilities.md +0 -385
  482. package/bin/skills/geo-database/SKILL.md +0 -815
  483. package/bin/skills/geo-database/references/geo_reference.md +0 -829
  484. package/bin/skills/geopandas/SKILL.md +0 -251
  485. package/bin/skills/geopandas/references/crs-management.md +0 -243
  486. package/bin/skills/geopandas/references/data-io.md +0 -165
  487. package/bin/skills/geopandas/references/data-structures.md +0 -70
  488. package/bin/skills/geopandas/references/geometric-operations.md +0 -221
  489. package/bin/skills/geopandas/references/spatial-analysis.md +0 -184
  490. package/bin/skills/geopandas/references/visualization.md +0 -243
  491. package/bin/skills/get-available-resources/SKILL.md +0 -277
  492. package/bin/skills/get-available-resources/scripts/detect_resources.py +0 -401
  493. package/bin/skills/gget/SKILL.md +0 -871
  494. package/bin/skills/gget/references/database_info.md +0 -300
  495. package/bin/skills/gget/references/module_reference.md +0 -467
  496. package/bin/skills/gget/references/workflows.md +0 -814
  497. package/bin/skills/gget/scripts/batch_sequence_analysis.py +0 -191
  498. package/bin/skills/gget/scripts/enrichment_pipeline.py +0 -235
  499. package/bin/skills/gget/scripts/gene_analysis.py +0 -161
  500. package/bin/skills/gguf/SKILL.md +0 -427
  501. package/bin/skills/gguf/references/advanced-usage.md +0 -504
  502. package/bin/skills/gguf/references/troubleshooting.md +0 -442
  503. package/bin/skills/gptq/SKILL.md +0 -450
  504. package/bin/skills/gptq/references/calibration.md +0 -337
  505. package/bin/skills/gptq/references/integration.md +0 -129
  506. package/bin/skills/gptq/references/troubleshooting.md +0 -95
  507. package/bin/skills/groq/SKILL.md +0 -347
  508. package/bin/skills/grpo-rl-training/README.md +0 -97
  509. package/bin/skills/grpo-rl-training/SKILL.md +0 -572
  510. package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +0 -393
  511. package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +0 -228
  512. package/bin/skills/gtars/SKILL.md +0 -285
  513. package/bin/skills/gtars/references/cli.md +0 -222
  514. package/bin/skills/gtars/references/coverage.md +0 -172
  515. package/bin/skills/gtars/references/overlap.md +0 -156
  516. package/bin/skills/gtars/references/python-api.md +0 -211
  517. package/bin/skills/gtars/references/refget.md +0 -147
  518. package/bin/skills/gtars/references/tokenizers.md +0 -103
  519. package/bin/skills/guidance/SKILL.md +0 -572
  520. package/bin/skills/guidance/references/backends.md +0 -554
  521. package/bin/skills/guidance/references/constraints.md +0 -674
  522. package/bin/skills/guidance/references/examples.md +0 -767
  523. package/bin/skills/gwas-database/SKILL.md +0 -608
  524. package/bin/skills/gwas-database/references/api_reference.md +0 -793
  525. package/bin/skills/histolab/SKILL.md +0 -678
  526. package/bin/skills/histolab/references/filters_preprocessing.md +0 -514
  527. package/bin/skills/histolab/references/slide_management.md +0 -172
  528. package/bin/skills/histolab/references/tile_extraction.md +0 -421
  529. package/bin/skills/histolab/references/tissue_masks.md +0 -251
  530. package/bin/skills/histolab/references/visualization.md +0 -547
  531. package/bin/skills/hmdb-database/SKILL.md +0 -196
  532. package/bin/skills/hmdb-database/references/hmdb_data_fields.md +0 -267
  533. package/bin/skills/hqq/SKILL.md +0 -445
  534. package/bin/skills/hqq/references/advanced-usage.md +0 -528
  535. package/bin/skills/hqq/references/troubleshooting.md +0 -503
  536. package/bin/skills/hugging-face-cli/SKILL.md +0 -191
  537. package/bin/skills/hugging-face-cli/references/commands.md +0 -954
  538. package/bin/skills/hugging-face-cli/references/examples.md +0 -374
  539. package/bin/skills/hugging-face-datasets/SKILL.md +0 -547
  540. package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +0 -239
  541. package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +0 -196
  542. package/bin/skills/hugging-face-datasets/examples/training_examples.json +0 -176
  543. package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +0 -522
  544. package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +0 -844
  545. package/bin/skills/hugging-face-datasets/templates/chat.json +0 -55
  546. package/bin/skills/hugging-face-datasets/templates/classification.json +0 -62
  547. package/bin/skills/hugging-face-datasets/templates/completion.json +0 -51
  548. package/bin/skills/hugging-face-datasets/templates/custom.json +0 -75
  549. package/bin/skills/hugging-face-datasets/templates/qa.json +0 -54
  550. package/bin/skills/hugging-face-datasets/templates/tabular.json +0 -81
  551. package/bin/skills/hugging-face-evaluation/SKILL.md +0 -656
  552. package/bin/skills/hugging-face-evaluation/examples/.env.example +0 -7
  553. package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +0 -382
  554. package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +0 -141
  555. package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +0 -135
  556. package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +0 -50
  557. package/bin/skills/hugging-face-evaluation/requirements.txt +0 -20
  558. package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +0 -1374
  559. package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +0 -104
  560. package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +0 -317
  561. package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +0 -303
  562. package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +0 -98
  563. package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +0 -331
  564. package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +0 -206
  565. package/bin/skills/hugging-face-jobs/SKILL.md +0 -1040
  566. package/bin/skills/hugging-face-jobs/index.html +0 -216
  567. package/bin/skills/hugging-face-jobs/references/hardware_guide.md +0 -336
  568. package/bin/skills/hugging-face-jobs/references/hub_saving.md +0 -352
  569. package/bin/skills/hugging-face-jobs/references/token_usage.md +0 -546
  570. package/bin/skills/hugging-face-jobs/references/troubleshooting.md +0 -475
  571. package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +0 -718
  572. package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +0 -546
  573. package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +0 -587
  574. package/bin/skills/hugging-face-model-trainer/SKILL.md +0 -710
  575. package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +0 -296
  576. package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +0 -283
  577. package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +0 -364
  578. package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +0 -371
  579. package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +0 -189
  580. package/bin/skills/hugging-face-model-trainer/references/training_methods.md +0 -150
  581. package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +0 -203
  582. package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +0 -282
  583. package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +0 -424
  584. package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +0 -417
  585. package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +0 -150
  586. package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +0 -106
  587. package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +0 -89
  588. package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +0 -122
  589. package/bin/skills/hugging-face-paper-publisher/SKILL.md +0 -627
  590. package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +0 -327
  591. package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +0 -216
  592. package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +0 -508
  593. package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +0 -299
  594. package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +0 -358
  595. package/bin/skills/hugging-face-paper-publisher/templates/modern.md +0 -319
  596. package/bin/skills/hugging-face-paper-publisher/templates/standard.md +0 -201
  597. package/bin/skills/hugging-face-tool-builder/SKILL.md +0 -115
  598. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +0 -57
  599. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +0 -40
  600. package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +0 -57
  601. package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +0 -230
  602. package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +0 -96
  603. package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +0 -188
  604. package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +0 -171
  605. package/bin/skills/hugging-face-trackio/.claude-plugin/plugin.json +0 -19
  606. package/bin/skills/hugging-face-trackio/SKILL.md +0 -65
  607. package/bin/skills/hugging-face-trackio/references/logging_metrics.md +0 -206
  608. package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +0 -223
  609. package/bin/skills/huggingface-tokenizers/SKILL.md +0 -516
  610. package/bin/skills/huggingface-tokenizers/references/algorithms.md +0 -653
  611. package/bin/skills/huggingface-tokenizers/references/integration.md +0 -637
  612. package/bin/skills/huggingface-tokenizers/references/pipeline.md +0 -723
  613. package/bin/skills/huggingface-tokenizers/references/training.md +0 -565
  614. package/bin/skills/hypogenic/SKILL.md +0 -655
  615. package/bin/skills/hypogenic/references/config_template.yaml +0 -150
  616. package/bin/skills/hypothesis-generation/SKILL.md +0 -293
  617. package/bin/skills/hypothesis-generation/assets/FORMATTING_GUIDE.md +0 -672
  618. package/bin/skills/hypothesis-generation/assets/hypothesis_generation.sty +0 -307
  619. package/bin/skills/hypothesis-generation/assets/hypothesis_report_template.tex +0 -572
  620. package/bin/skills/hypothesis-generation/references/experimental_design_patterns.md +0 -329
  621. package/bin/skills/hypothesis-generation/references/hypothesis_quality_criteria.md +0 -198
  622. package/bin/skills/hypothesis-generation/references/literature_search_strategies.md +0 -622
  623. package/bin/skills/imaging-data-commons/SKILL.md +0 -1182
  624. package/bin/skills/imaging-data-commons/references/bigquery_guide.md +0 -556
  625. package/bin/skills/imaging-data-commons/references/cli_guide.md +0 -272
  626. package/bin/skills/imaging-data-commons/references/cloud_storage_guide.md +0 -333
  627. package/bin/skills/imaging-data-commons/references/dicomweb_guide.md +0 -399
  628. package/bin/skills/infographics/SKILL.md +0 -563
  629. package/bin/skills/infographics/references/color_palettes.md +0 -496
  630. package/bin/skills/infographics/references/design_principles.md +0 -636
  631. package/bin/skills/infographics/references/infographic_types.md +0 -907
  632. package/bin/skills/infographics/scripts/generate_infographic.py +0 -234
  633. package/bin/skills/infographics/scripts/generate_infographic_ai.py +0 -1290
  634. package/bin/skills/instructor/SKILL.md +0 -740
  635. package/bin/skills/instructor/references/examples.md +0 -107
  636. package/bin/skills/instructor/references/providers.md +0 -70
  637. package/bin/skills/instructor/references/validation.md +0 -606
  638. package/bin/skills/iso-13485-certification/SKILL.md +0 -680
  639. package/bin/skills/iso-13485-certification/assets/templates/procedures/CAPA-procedure-template.md +0 -453
  640. package/bin/skills/iso-13485-certification/assets/templates/procedures/document-control-procedure-template.md +0 -567
  641. package/bin/skills/iso-13485-certification/assets/templates/quality-manual-template.md +0 -521
  642. package/bin/skills/iso-13485-certification/references/gap-analysis-checklist.md +0 -568
  643. package/bin/skills/iso-13485-certification/references/iso-13485-requirements.md +0 -610
  644. package/bin/skills/iso-13485-certification/references/mandatory-documents.md +0 -606
  645. package/bin/skills/iso-13485-certification/references/quality-manual-guide.md +0 -688
  646. package/bin/skills/iso-13485-certification/scripts/gap_analyzer.py +0 -440
  647. package/bin/skills/kegg-database/SKILL.md +0 -377
  648. package/bin/skills/kegg-database/references/kegg_reference.md +0 -326
  649. package/bin/skills/kegg-database/scripts/kegg_api.py +0 -251
  650. package/bin/skills/knowledge-distillation/SKILL.md +0 -458
  651. package/bin/skills/knowledge-distillation/references/minillm.md +0 -334
  652. package/bin/skills/labarchive-integration/SKILL.md +0 -268
  653. package/bin/skills/labarchive-integration/references/api_reference.md +0 -342
  654. package/bin/skills/labarchive-integration/references/authentication_guide.md +0 -357
  655. package/bin/skills/labarchive-integration/references/integrations.md +0 -425
  656. package/bin/skills/labarchive-integration/scripts/entry_operations.py +0 -334
  657. package/bin/skills/labarchive-integration/scripts/notebook_operations.py +0 -269
  658. package/bin/skills/labarchive-integration/scripts/setup_config.py +0 -205
  659. package/bin/skills/lambda-labs/SKILL.md +0 -545
  660. package/bin/skills/lambda-labs/references/advanced-usage.md +0 -611
  661. package/bin/skills/lambda-labs/references/troubleshooting.md +0 -530
  662. package/bin/skills/lamindb/SKILL.md +0 -390
  663. package/bin/skills/lamindb/references/annotation-validation.md +0 -513
  664. package/bin/skills/lamindb/references/core-concepts.md +0 -380
  665. package/bin/skills/lamindb/references/data-management.md +0 -433
  666. package/bin/skills/lamindb/references/integrations.md +0 -642
  667. package/bin/skills/lamindb/references/ontologies.md +0 -497
  668. package/bin/skills/lamindb/references/setup-deployment.md +0 -733
  669. package/bin/skills/langchain/SKILL.md +0 -480
  670. package/bin/skills/langchain/references/agents.md +0 -499
  671. package/bin/skills/langchain/references/integration.md +0 -562
  672. package/bin/skills/langchain/references/rag.md +0 -600
  673. package/bin/skills/langsmith/SKILL.md +0 -422
  674. package/bin/skills/langsmith/references/advanced-usage.md +0 -548
  675. package/bin/skills/langsmith/references/troubleshooting.md +0 -537
  676. package/bin/skills/latchbio-integration/SKILL.md +0 -353
  677. package/bin/skills/latchbio-integration/references/data-management.md +0 -427
  678. package/bin/skills/latchbio-integration/references/resource-configuration.md +0 -429
  679. package/bin/skills/latchbio-integration/references/verified-workflows.md +0 -487
  680. package/bin/skills/latchbio-integration/references/workflow-creation.md +0 -254
  681. package/bin/skills/latex-posters/README.md +0 -417
  682. package/bin/skills/latex-posters/SKILL.md +0 -1602
  683. package/bin/skills/latex-posters/assets/baposter_template.tex +0 -257
  684. package/bin/skills/latex-posters/assets/beamerposter_template.tex +0 -244
  685. package/bin/skills/latex-posters/assets/poster_quality_checklist.md +0 -358
  686. package/bin/skills/latex-posters/assets/tikzposter_template.tex +0 -251
  687. package/bin/skills/latex-posters/references/latex_poster_packages.md +0 -745
  688. package/bin/skills/latex-posters/references/poster_content_guide.md +0 -748
  689. package/bin/skills/latex-posters/references/poster_design_principles.md +0 -806
  690. package/bin/skills/latex-posters/references/poster_layout_design.md +0 -900
  691. package/bin/skills/latex-posters/scripts/review_poster.sh +0 -214
  692. package/bin/skills/literature-review/SKILL.md +0 -641
  693. package/bin/skills/literature-review/assets/review_template.md +0 -412
  694. package/bin/skills/literature-review/references/citation_styles.md +0 -166
  695. package/bin/skills/literature-review/references/database_strategies.md +0 -455
  696. package/bin/skills/literature-review/scripts/generate_pdf.py +0 -184
  697. package/bin/skills/literature-review/scripts/search_databases.py +0 -310
  698. package/bin/skills/literature-review/scripts/verify_citations.py +0 -218
  699. package/bin/skills/litgpt/SKILL.md +0 -469
  700. package/bin/skills/litgpt/references/custom-models.md +0 -568
  701. package/bin/skills/litgpt/references/distributed-training.md +0 -451
  702. package/bin/skills/litgpt/references/supported-models.md +0 -336
  703. package/bin/skills/litgpt/references/training-recipes.md +0 -619
  704. package/bin/skills/llama-cpp/SKILL.md +0 -258
  705. package/bin/skills/llama-cpp/references/optimization.md +0 -89
  706. package/bin/skills/llama-cpp/references/quantization.md +0 -213
  707. package/bin/skills/llama-cpp/references/server.md +0 -125
  708. package/bin/skills/llama-factory/SKILL.md +0 -80
  709. package/bin/skills/llama-factory/references/_images.md +0 -23
  710. package/bin/skills/llama-factory/references/advanced.md +0 -1055
  711. package/bin/skills/llama-factory/references/getting_started.md +0 -349
  712. package/bin/skills/llama-factory/references/index.md +0 -19
  713. package/bin/skills/llama-factory/references/other.md +0 -31
  714. package/bin/skills/llamaguard/SKILL.md +0 -337
  715. package/bin/skills/llamaindex/SKILL.md +0 -569
  716. package/bin/skills/llamaindex/references/agents.md +0 -83
  717. package/bin/skills/llamaindex/references/data_connectors.md +0 -108
  718. package/bin/skills/llamaindex/references/query_engines.md +0 -406
  719. package/bin/skills/llava/SKILL.md +0 -304
  720. package/bin/skills/llava/references/training.md +0 -197
  721. package/bin/skills/llm-as-judge-evaluation/SKILL.md +0 -385
  722. package/bin/skills/llm-as-judge-evaluation/references/pairwise-comparison.md +0 -95
  723. package/bin/skills/llm-as-judge-evaluation/references/scoring-rubrics.md +0 -169
  724. package/bin/skills/lm-evaluation-harness/SKILL.md +0 -490
  725. package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +0 -490
  726. package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +0 -488
  727. package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +0 -602
  728. package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +0 -519
  729. package/bin/skills/long-context/SKILL.md +0 -536
  730. package/bin/skills/long-context/references/extension_methods.md +0 -468
  731. package/bin/skills/long-context/references/fine_tuning.md +0 -611
  732. package/bin/skills/long-context/references/rope.md +0 -402
  733. package/bin/skills/mamba/SKILL.md +0 -260
  734. package/bin/skills/mamba/references/architecture-details.md +0 -206
  735. package/bin/skills/mamba/references/benchmarks.md +0 -255
  736. package/bin/skills/mamba/references/training-guide.md +0 -388
  737. package/bin/skills/market-research-reports/SKILL.md +0 -904
  738. package/bin/skills/market-research-reports/assets/FORMATTING_GUIDE.md +0 -428
  739. package/bin/skills/market-research-reports/assets/market_report_template.tex +0 -1380
  740. package/bin/skills/market-research-reports/assets/market_research.sty +0 -564
  741. package/bin/skills/market-research-reports/references/data_analysis_patterns.md +0 -548
  742. package/bin/skills/market-research-reports/references/report_structure_guide.md +0 -999
  743. package/bin/skills/market-research-reports/references/visual_generation_guide.md +0 -1077
  744. package/bin/skills/market-research-reports/scripts/generate_market_visuals.py +0 -472
  745. package/bin/skills/markitdown/INSTALLATION_GUIDE.md +0 -318
  746. package/bin/skills/markitdown/LICENSE.txt +0 -22
  747. package/bin/skills/markitdown/OPENROUTER_INTEGRATION.md +0 -359
  748. package/bin/skills/markitdown/QUICK_REFERENCE.md +0 -309
  749. package/bin/skills/markitdown/README.md +0 -184
  750. package/bin/skills/markitdown/SKILL.md +0 -486
  751. package/bin/skills/markitdown/SKILL_SUMMARY.md +0 -307
  752. package/bin/skills/markitdown/assets/example_usage.md +0 -463
  753. package/bin/skills/markitdown/references/api_reference.md +0 -399
  754. package/bin/skills/markitdown/references/file_formats.md +0 -542
  755. package/bin/skills/markitdown/scripts/batch_convert.py +0 -195
  756. package/bin/skills/markitdown/scripts/convert_literature.py +0 -262
  757. package/bin/skills/markitdown/scripts/convert_with_ai.py +0 -224
  758. package/bin/skills/matchms/SKILL.md +0 -203
  759. package/bin/skills/matchms/references/filtering.md +0 -288
  760. package/bin/skills/matchms/references/importing_exporting.md +0 -416
  761. package/bin/skills/matchms/references/similarity.md +0 -380
  762. package/bin/skills/matchms/references/workflows.md +0 -647
  763. package/bin/skills/matlab/SKILL.md +0 -376
  764. package/bin/skills/matlab/references/data-import-export.md +0 -479
  765. package/bin/skills/matlab/references/executing-scripts.md +0 -444
  766. package/bin/skills/matlab/references/graphics-visualization.md +0 -579
  767. package/bin/skills/matlab/references/mathematics.md +0 -553
  768. package/bin/skills/matlab/references/matrices-arrays.md +0 -349
  769. package/bin/skills/matlab/references/octave-compatibility.md +0 -544
  770. package/bin/skills/matlab/references/programming.md +0 -672
  771. package/bin/skills/matlab/references/python-integration.md +0 -433
  772. package/bin/skills/matplotlib/SKILL.md +0 -361
  773. package/bin/skills/matplotlib/references/api_reference.md +0 -412
  774. package/bin/skills/matplotlib/references/common_issues.md +0 -563
  775. package/bin/skills/matplotlib/references/plot_types.md +0 -476
  776. package/bin/skills/matplotlib/references/styling_guide.md +0 -589
  777. package/bin/skills/matplotlib/scripts/plot_template.py +0 -401
  778. package/bin/skills/matplotlib/scripts/style_configurator.py +0 -409
  779. package/bin/skills/medchem/SKILL.md +0 -406
  780. package/bin/skills/medchem/references/api_guide.md +0 -600
  781. package/bin/skills/medchem/references/rules_catalog.md +0 -604
  782. package/bin/skills/medchem/scripts/filter_molecules.py +0 -418
  783. package/bin/skills/megatron-core/SKILL.md +0 -366
  784. package/bin/skills/megatron-core/references/benchmarks.md +0 -249
  785. package/bin/skills/megatron-core/references/parallelism-guide.md +0 -404
  786. package/bin/skills/megatron-core/references/production-examples.md +0 -473
  787. package/bin/skills/megatron-core/references/training-recipes.md +0 -547
  788. package/bin/skills/metabolomics-workbench-database/SKILL.md +0 -259
  789. package/bin/skills/metabolomics-workbench-database/references/api_reference.md +0 -494
  790. package/bin/skills/miles/SKILL.md +0 -315
  791. package/bin/skills/miles/references/api-reference.md +0 -141
  792. package/bin/skills/miles/references/troubleshooting.md +0 -352
  793. package/bin/skills/ml-paper-writing/SKILL.md +0 -937
  794. package/bin/skills/ml-paper-writing/references/checklists.md +0 -361
  795. package/bin/skills/ml-paper-writing/references/citation-workflow.md +0 -562
  796. package/bin/skills/ml-paper-writing/references/reviewer-guidelines.md +0 -367
  797. package/bin/skills/ml-paper-writing/references/sources.md +0 -159
  798. package/bin/skills/ml-paper-writing/references/writing-guide.md +0 -476
  799. package/bin/skills/ml-paper-writing/templates/README.md +0 -251
  800. package/bin/skills/ml-paper-writing/templates/aaai2026/README.md +0 -534
  801. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +0 -144
  802. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +0 -952
  803. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +0 -111
  804. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +0 -1493
  805. package/bin/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +0 -315
  806. package/bin/skills/ml-paper-writing/templates/acl/README.md +0 -50
  807. package/bin/skills/ml-paper-writing/templates/acl/acl.sty +0 -312
  808. package/bin/skills/ml-paper-writing/templates/acl/acl_latex.tex +0 -377
  809. package/bin/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +0 -101
  810. package/bin/skills/ml-paper-writing/templates/acl/acl_natbib.bst +0 -1940
  811. package/bin/skills/ml-paper-writing/templates/acl/anthology.bib.txt +0 -26
  812. package/bin/skills/ml-paper-writing/templates/acl/custom.bib +0 -70
  813. package/bin/skills/ml-paper-writing/templates/acl/formatting.md +0 -326
  814. package/bin/skills/ml-paper-writing/templates/colm2025/README.md +0 -3
  815. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +0 -11
  816. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +0 -1440
  817. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  818. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +0 -218
  819. package/bin/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +0 -305
  820. package/bin/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +0 -485
  821. package/bin/skills/ml-paper-writing/templates/colm2025/math_commands.tex +0 -508
  822. package/bin/skills/ml-paper-writing/templates/colm2025/natbib.sty +0 -1246
  823. package/bin/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +0 -485
  824. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +0 -24
  825. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +0 -1440
  826. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  827. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +0 -246
  828. package/bin/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +0 -414
  829. package/bin/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +0 -508
  830. package/bin/skills/ml-paper-writing/templates/iclr2026/natbib.sty +0 -1246
  831. package/bin/skills/ml-paper-writing/templates/icml2026/algorithm.sty +0 -79
  832. package/bin/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +0 -201
  833. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.bib +0 -75
  834. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  835. package/bin/skills/ml-paper-writing/templates/icml2026/example_paper.tex +0 -662
  836. package/bin/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +0 -864
  837. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.bst +0 -1443
  838. package/bin/skills/ml-paper-writing/templates/icml2026/icml2026.sty +0 -767
  839. package/bin/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  840. package/bin/skills/ml-paper-writing/templates/neurips2025/Makefile +0 -36
  841. package/bin/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +0 -53
  842. package/bin/skills/ml-paper-writing/templates/neurips2025/main.tex +0 -38
  843. package/bin/skills/ml-paper-writing/templates/neurips2025/neurips.sty +0 -382
  844. package/bin/skills/mlflow/SKILL.md +0 -704
  845. package/bin/skills/mlflow/references/deployment.md +0 -744
  846. package/bin/skills/mlflow/references/model-registry.md +0 -770
  847. package/bin/skills/mlflow/references/tracking.md +0 -680
  848. package/bin/skills/modal/SKILL.md +0 -418
  849. package/bin/skills/modal/references/advanced-patterns.md +0 -695
  850. package/bin/skills/modal/references/examples-catalog.md +0 -423
  851. package/bin/skills/modal/references/troubleshooting.md +0 -494
  852. package/bin/skills/modal-research-gpu/SKILL.md +0 -238
  853. package/bin/skills/model-economics/SKILL.md +0 -238
  854. package/bin/skills/model-merging/SKILL.md +0 -539
  855. package/bin/skills/model-merging/references/evaluation.md +0 -462
  856. package/bin/skills/model-merging/references/examples.md +0 -428
  857. package/bin/skills/model-merging/references/methods.md +0 -352
  858. package/bin/skills/model-pruning/SKILL.md +0 -495
  859. package/bin/skills/model-pruning/references/wanda.md +0 -347
  860. package/bin/skills/moe-training/SKILL.md +0 -526
  861. package/bin/skills/moe-training/references/architectures.md +0 -432
  862. package/bin/skills/moe-training/references/inference.md +0 -348
  863. package/bin/skills/moe-training/references/training.md +0 -425
  864. package/bin/skills/molfeat/SKILL.md +0 -511
  865. package/bin/skills/molfeat/references/api_reference.md +0 -428
  866. package/bin/skills/molfeat/references/available_featurizers.md +0 -333
  867. package/bin/skills/molfeat/references/examples.md +0 -723
  868. package/bin/skills/nanogpt/SKILL.md +0 -290
  869. package/bin/skills/nanogpt/references/architecture.md +0 -382
  870. package/bin/skills/nanogpt/references/data.md +0 -476
  871. package/bin/skills/nanogpt/references/training.md +0 -564
  872. package/bin/skills/nemo-curator/SKILL.md +0 -383
  873. package/bin/skills/nemo-curator/references/deduplication.md +0 -87
  874. package/bin/skills/nemo-curator/references/filtering.md +0 -102
  875. package/bin/skills/nemo-evaluator/SKILL.md +0 -494
  876. package/bin/skills/nemo-evaluator/references/adapter-system.md +0 -340
  877. package/bin/skills/nemo-evaluator/references/configuration.md +0 -447
  878. package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +0 -315
  879. package/bin/skills/nemo-evaluator/references/execution-backends.md +0 -361
  880. package/bin/skills/nemo-guardrails/SKILL.md +0 -297
  881. package/bin/skills/networkx/SKILL.md +0 -437
  882. package/bin/skills/networkx/references/algorithms.md +0 -383
  883. package/bin/skills/networkx/references/generators.md +0 -378
  884. package/bin/skills/networkx/references/graph-basics.md +0 -283
  885. package/bin/skills/networkx/references/io.md +0 -441
  886. package/bin/skills/networkx/references/visualization.md +0 -529
  887. package/bin/skills/neurokit2/SKILL.md +0 -356
  888. package/bin/skills/neurokit2/references/bio_module.md +0 -417
  889. package/bin/skills/neurokit2/references/complexity.md +0 -715
  890. package/bin/skills/neurokit2/references/ecg_cardiac.md +0 -355
  891. package/bin/skills/neurokit2/references/eda.md +0 -497
  892. package/bin/skills/neurokit2/references/eeg.md +0 -506
  893. package/bin/skills/neurokit2/references/emg.md +0 -408
  894. package/bin/skills/neurokit2/references/eog.md +0 -407
  895. package/bin/skills/neurokit2/references/epochs_events.md +0 -471
  896. package/bin/skills/neurokit2/references/hrv.md +0 -480
  897. package/bin/skills/neurokit2/references/ppg.md +0 -413
  898. package/bin/skills/neurokit2/references/rsp.md +0 -510
  899. package/bin/skills/neurokit2/references/signal_processing.md +0 -648
  900. package/bin/skills/neuropixels-analysis/SKILL.md +0 -350
  901. package/bin/skills/neuropixels-analysis/assets/analysis_template.py +0 -271
  902. package/bin/skills/neuropixels-analysis/references/AI_CURATION.md +0 -345
  903. package/bin/skills/neuropixels-analysis/references/ANALYSIS.md +0 -392
  904. package/bin/skills/neuropixels-analysis/references/AUTOMATED_CURATION.md +0 -358
  905. package/bin/skills/neuropixels-analysis/references/MOTION_CORRECTION.md +0 -323
  906. package/bin/skills/neuropixels-analysis/references/PREPROCESSING.md +0 -273
  907. package/bin/skills/neuropixels-analysis/references/QUALITY_METRICS.md +0 -359
  908. package/bin/skills/neuropixels-analysis/references/SPIKE_SORTING.md +0 -339
  909. package/bin/skills/neuropixels-analysis/references/api_reference.md +0 -415
  910. package/bin/skills/neuropixels-analysis/references/plotting_guide.md +0 -454
  911. package/bin/skills/neuropixels-analysis/references/standard_workflow.md +0 -385
  912. package/bin/skills/neuropixels-analysis/scripts/compute_metrics.py +0 -178
  913. package/bin/skills/neuropixels-analysis/scripts/explore_recording.py +0 -168
  914. package/bin/skills/neuropixels-analysis/scripts/export_to_phy.py +0 -79
  915. package/bin/skills/neuropixels-analysis/scripts/neuropixels_pipeline.py +0 -432
  916. package/bin/skills/neuropixels-analysis/scripts/preprocess_recording.py +0 -122
  917. package/bin/skills/neuropixels-analysis/scripts/run_sorting.py +0 -98
  918. package/bin/skills/nnsight/SKILL.md +0 -436
  919. package/bin/skills/nnsight/references/README.md +0 -78
  920. package/bin/skills/nnsight/references/api.md +0 -344
  921. package/bin/skills/nnsight/references/tutorials.md +0 -300
  922. package/bin/skills/offer-k-dense-web/SKILL.md +0 -21
  923. package/bin/skills/omero-integration/SKILL.md +0 -251
  924. package/bin/skills/omero-integration/references/advanced.md +0 -631
  925. package/bin/skills/omero-integration/references/connection.md +0 -369
  926. package/bin/skills/omero-integration/references/data_access.md +0 -544
  927. package/bin/skills/omero-integration/references/image_processing.md +0 -665
  928. package/bin/skills/omero-integration/references/metadata.md +0 -688
  929. package/bin/skills/omero-integration/references/rois.md +0 -648
  930. package/bin/skills/omero-integration/references/scripts.md +0 -637
  931. package/bin/skills/omero-integration/references/tables.md +0 -532
  932. package/bin/skills/openalex-database/SKILL.md +0 -494
  933. package/bin/skills/openalex-database/references/api_guide.md +0 -371
  934. package/bin/skills/openalex-database/references/common_queries.md +0 -381
  935. package/bin/skills/openalex-database/scripts/openalex_client.py +0 -337
  936. package/bin/skills/openalex-database/scripts/query_helpers.py +0 -306
  937. package/bin/skills/openrlhf/SKILL.md +0 -249
  938. package/bin/skills/openrlhf/references/algorithm-comparison.md +0 -404
  939. package/bin/skills/openrlhf/references/custom-rewards.md +0 -530
  940. package/bin/skills/openrlhf/references/hybrid-engine.md +0 -287
  941. package/bin/skills/openrlhf/references/multi-node-training.md +0 -454
  942. package/bin/skills/opentargets-database/SKILL.md +0 -373
  943. package/bin/skills/opentargets-database/references/api_reference.md +0 -249
  944. package/bin/skills/opentargets-database/references/evidence_types.md +0 -306
  945. package/bin/skills/opentargets-database/references/target_annotations.md +0 -401
  946. package/bin/skills/opentargets-database/scripts/query_opentargets.py +0 -403
  947. package/bin/skills/opentrons-integration/SKILL.md +0 -573
  948. package/bin/skills/opentrons-integration/references/api_reference.md +0 -366
  949. package/bin/skills/opentrons-integration/scripts/basic_protocol_template.py +0 -67
  950. package/bin/skills/opentrons-integration/scripts/pcr_setup_template.py +0 -154
  951. package/bin/skills/opentrons-integration/scripts/serial_dilution_template.py +0 -96
  952. package/bin/skills/outlines/SKILL.md +0 -652
  953. package/bin/skills/outlines/references/backends.md +0 -615
  954. package/bin/skills/outlines/references/examples.md +0 -773
  955. package/bin/skills/outlines/references/json_generation.md +0 -652
  956. package/bin/skills/paper-2-web/SKILL.md +0 -491
  957. package/bin/skills/paper-2-web/references/installation.md +0 -141
  958. package/bin/skills/paper-2-web/references/paper2poster.md +0 -346
  959. package/bin/skills/paper-2-web/references/paper2video.md +0 -305
  960. package/bin/skills/paper-2-web/references/paper2web.md +0 -187
  961. package/bin/skills/paper-2-web/references/usage_examples.md +0 -436
  962. package/bin/skills/pathml/SKILL.md +0 -166
  963. package/bin/skills/pathml/references/data_management.md +0 -742
  964. package/bin/skills/pathml/references/graphs.md +0 -653
  965. package/bin/skills/pathml/references/image_loading.md +0 -448
  966. package/bin/skills/pathml/references/machine_learning.md +0 -725
  967. package/bin/skills/pathml/references/multiparametric.md +0 -686
  968. package/bin/skills/pathml/references/preprocessing.md +0 -722
  969. package/bin/skills/pdb-database/SKILL.md +0 -309
  970. package/bin/skills/pdb-database/references/api_reference.md +0 -617
  971. package/bin/skills/peer-review/SKILL.md +0 -702
  972. package/bin/skills/peer-review/references/calibration_guidelines.md +0 -196
  973. package/bin/skills/peer-review/references/common_issues.md +0 -552
  974. package/bin/skills/peer-review/references/paper_mechanics.md +0 -269
  975. package/bin/skills/peer-review/references/reporting_standards.md +0 -290
  976. package/bin/skills/peer-review/references/scoring_rubric.md +0 -239
  977. package/bin/skills/peft/SKILL.md +0 -431
  978. package/bin/skills/peft/references/advanced-usage.md +0 -514
  979. package/bin/skills/peft/references/troubleshooting.md +0 -480
  980. package/bin/skills/pennylane/SKILL.md +0 -226
  981. package/bin/skills/pennylane/references/advanced_features.md +0 -667
  982. package/bin/skills/pennylane/references/devices_backends.md +0 -596
  983. package/bin/skills/pennylane/references/getting_started.md +0 -227
  984. package/bin/skills/pennylane/references/optimization.md +0 -671
  985. package/bin/skills/pennylane/references/quantum_chemistry.md +0 -567
  986. package/bin/skills/pennylane/references/quantum_circuits.md +0 -437
  987. package/bin/skills/pennylane/references/quantum_ml.md +0 -571
  988. package/bin/skills/perplexity-search/SKILL.md +0 -448
  989. package/bin/skills/perplexity-search/assets/.env.example +0 -16
  990. package/bin/skills/perplexity-search/references/model_comparison.md +0 -386
  991. package/bin/skills/perplexity-search/references/openrouter_setup.md +0 -454
  992. package/bin/skills/perplexity-search/references/search_strategies.md +0 -258
  993. package/bin/skills/perplexity-search/scripts/perplexity_search.py +0 -277
  994. package/bin/skills/perplexity-search/scripts/setup_env.py +0 -171
  995. package/bin/skills/phoenix/SKILL.md +0 -475
  996. package/bin/skills/phoenix/references/advanced-usage.md +0 -619
  997. package/bin/skills/phoenix/references/troubleshooting.md +0 -538
  998. package/bin/skills/pinecone/SKILL.md +0 -358
  999. package/bin/skills/pinecone/references/deployment.md +0 -181
  1000. package/bin/skills/plotly/SKILL.md +0 -267
  1001. package/bin/skills/plotly/references/chart-types.md +0 -488
  1002. package/bin/skills/plotly/references/export-interactivity.md +0 -453
  1003. package/bin/skills/plotly/references/graph-objects.md +0 -302
  1004. package/bin/skills/plotly/references/layouts-styling.md +0 -457
  1005. package/bin/skills/plotly/references/plotly-express.md +0 -213
  1006. package/bin/skills/polars/SKILL.md +0 -387
  1007. package/bin/skills/polars/references/best_practices.md +0 -649
  1008. package/bin/skills/polars/references/core_concepts.md +0 -378
  1009. package/bin/skills/polars/references/io_guide.md +0 -557
  1010. package/bin/skills/polars/references/operations.md +0 -602
  1011. package/bin/skills/polars/references/pandas_migration.md +0 -417
  1012. package/bin/skills/polars/references/transformations.md +0 -549
  1013. package/bin/skills/pptx-posters/SKILL.md +0 -410
  1014. package/bin/skills/pptx-posters/assets/poster_html_template.html +0 -257
  1015. package/bin/skills/pptx-posters/assets/poster_quality_checklist.md +0 -358
  1016. package/bin/skills/pptx-posters/references/poster_content_guide.md +0 -748
  1017. package/bin/skills/pptx-posters/references/poster_design_principles.md +0 -806
  1018. package/bin/skills/pptx-posters/references/poster_layout_design.md +0 -900
  1019. package/bin/skills/prime-intellect-lab/README.md +0 -69
  1020. package/bin/skills/prime-intellect-lab/SKILL.md +0 -598
  1021. package/bin/skills/prime-intellect-lab/templates/basic_rl_training.toml +0 -82
  1022. package/bin/skills/protocolsio-integration/SKILL.md +0 -421
  1023. package/bin/skills/protocolsio-integration/references/additional_features.md +0 -387
  1024. package/bin/skills/protocolsio-integration/references/authentication.md +0 -100
  1025. package/bin/skills/protocolsio-integration/references/discussions.md +0 -225
  1026. package/bin/skills/protocolsio-integration/references/file_manager.md +0 -412
  1027. package/bin/skills/protocolsio-integration/references/protocols_api.md +0 -294
  1028. package/bin/skills/protocolsio-integration/references/workspaces.md +0 -293
  1029. package/bin/skills/pubchem-database/SKILL.md +0 -574
  1030. package/bin/skills/pubchem-database/references/api_reference.md +0 -440
  1031. package/bin/skills/pubchem-database/scripts/bioactivity_query.py +0 -367
  1032. package/bin/skills/pubchem-database/scripts/compound_search.py +0 -297
  1033. package/bin/skills/pubmed-database/SKILL.md +0 -460
  1034. package/bin/skills/pubmed-database/references/api_reference.md +0 -298
  1035. package/bin/skills/pubmed-database/references/common_queries.md +0 -453
  1036. package/bin/skills/pubmed-database/references/search_syntax.md +0 -436
  1037. package/bin/skills/pufferlib/SKILL.md +0 -436
  1038. package/bin/skills/pufferlib/references/environments.md +0 -508
  1039. package/bin/skills/pufferlib/references/integration.md +0 -621
  1040. package/bin/skills/pufferlib/references/policies.md +0 -653
  1041. package/bin/skills/pufferlib/references/training.md +0 -360
  1042. package/bin/skills/pufferlib/references/vectorization.md +0 -557
  1043. package/bin/skills/pufferlib/scripts/env_template.py +0 -340
  1044. package/bin/skills/pufferlib/scripts/train_template.py +0 -239
  1045. package/bin/skills/pydeseq2/SKILL.md +0 -559
  1046. package/bin/skills/pydeseq2/references/api_reference.md +0 -228
  1047. package/bin/skills/pydeseq2/references/workflow_guide.md +0 -582
  1048. package/bin/skills/pydeseq2/scripts/run_deseq2_analysis.py +0 -353
  1049. package/bin/skills/pydicom/SKILL.md +0 -434
  1050. package/bin/skills/pydicom/references/common_tags.md +0 -228
  1051. package/bin/skills/pydicom/references/transfer_syntaxes.md +0 -352
  1052. package/bin/skills/pydicom/scripts/anonymize_dicom.py +0 -137
  1053. package/bin/skills/pydicom/scripts/dicom_to_image.py +0 -172
  1054. package/bin/skills/pydicom/scripts/extract_metadata.py +0 -173
  1055. package/bin/skills/pyhealth/SKILL.md +0 -491
  1056. package/bin/skills/pyhealth/references/datasets.md +0 -178
  1057. package/bin/skills/pyhealth/references/medical_coding.md +0 -284
  1058. package/bin/skills/pyhealth/references/models.md +0 -594
  1059. package/bin/skills/pyhealth/references/preprocessing.md +0 -638
  1060. package/bin/skills/pyhealth/references/tasks.md +0 -379
  1061. package/bin/skills/pyhealth/references/training_evaluation.md +0 -648
  1062. package/bin/skills/pylabrobot/SKILL.md +0 -185
  1063. package/bin/skills/pylabrobot/references/analytical-equipment.md +0 -464
  1064. package/bin/skills/pylabrobot/references/hardware-backends.md +0 -480
  1065. package/bin/skills/pylabrobot/references/liquid-handling.md +0 -403
  1066. package/bin/skills/pylabrobot/references/material-handling.md +0 -620
  1067. package/bin/skills/pylabrobot/references/resources.md +0 -489
  1068. package/bin/skills/pylabrobot/references/visualization.md +0 -532
  1069. package/bin/skills/pymatgen/SKILL.md +0 -691
  1070. package/bin/skills/pymatgen/references/analysis_modules.md +0 -530
  1071. package/bin/skills/pymatgen/references/core_classes.md +0 -318
  1072. package/bin/skills/pymatgen/references/io_formats.md +0 -469
  1073. package/bin/skills/pymatgen/references/materials_project_api.md +0 -517
  1074. package/bin/skills/pymatgen/references/transformations_workflows.md +0 -591
  1075. package/bin/skills/pymatgen/scripts/phase_diagram_generator.py +0 -233
  1076. package/bin/skills/pymatgen/scripts/structure_analyzer.py +0 -266
  1077. package/bin/skills/pymatgen/scripts/structure_converter.py +0 -169
  1078. package/bin/skills/pymc/SKILL.md +0 -572
  1079. package/bin/skills/pymc/assets/hierarchical_model_template.py +0 -333
  1080. package/bin/skills/pymc/assets/linear_regression_template.py +0 -241
  1081. package/bin/skills/pymc/references/distributions.md +0 -320
  1082. package/bin/skills/pymc/references/sampling_inference.md +0 -424
  1083. package/bin/skills/pymc/references/workflows.md +0 -526
  1084. package/bin/skills/pymc/scripts/model_comparison.py +0 -387
  1085. package/bin/skills/pymc/scripts/model_diagnostics.py +0 -350
  1086. package/bin/skills/pymoo/SKILL.md +0 -571
  1087. package/bin/skills/pymoo/references/algorithms.md +0 -180
  1088. package/bin/skills/pymoo/references/constraints_mcdm.md +0 -417
  1089. package/bin/skills/pymoo/references/operators.md +0 -345
  1090. package/bin/skills/pymoo/references/problems.md +0 -265
  1091. package/bin/skills/pymoo/references/visualization.md +0 -353
  1092. package/bin/skills/pymoo/scripts/custom_problem_example.py +0 -181
  1093. package/bin/skills/pymoo/scripts/decision_making_example.py +0 -161
  1094. package/bin/skills/pymoo/scripts/many_objective_example.py +0 -72
  1095. package/bin/skills/pymoo/scripts/multi_objective_example.py +0 -63
  1096. package/bin/skills/pymoo/scripts/single_objective_example.py +0 -59
  1097. package/bin/skills/pyopenms/SKILL.md +0 -217
  1098. package/bin/skills/pyopenms/references/data_structures.md +0 -497
  1099. package/bin/skills/pyopenms/references/feature_detection.md +0 -410
  1100. package/bin/skills/pyopenms/references/file_io.md +0 -349
  1101. package/bin/skills/pyopenms/references/identification.md +0 -422
  1102. package/bin/skills/pyopenms/references/metabolomics.md +0 -482
  1103. package/bin/skills/pyopenms/references/signal_processing.md +0 -433
  1104. package/bin/skills/pysam/SKILL.md +0 -265
  1105. package/bin/skills/pysam/references/alignment_files.md +0 -280
  1106. package/bin/skills/pysam/references/common_workflows.md +0 -520
  1107. package/bin/skills/pysam/references/sequence_files.md +0 -407
  1108. package/bin/skills/pysam/references/variant_files.md +0 -365
  1109. package/bin/skills/pytdc/SKILL.md +0 -460
  1110. package/bin/skills/pytdc/references/datasets.md +0 -246
  1111. package/bin/skills/pytdc/references/oracles.md +0 -400
  1112. package/bin/skills/pytdc/references/utilities.md +0 -684
  1113. package/bin/skills/pytdc/scripts/benchmark_evaluation.py +0 -327
  1114. package/bin/skills/pytdc/scripts/load_and_split_data.py +0 -214
  1115. package/bin/skills/pytdc/scripts/molecular_generation.py +0 -404
  1116. package/bin/skills/pytorch-fsdp/SKILL.md +0 -126
  1117. package/bin/skills/pytorch-fsdp/references/index.md +0 -7
  1118. package/bin/skills/pytorch-fsdp/references/other.md +0 -4249
  1119. package/bin/skills/pytorch-lightning/SKILL.md +0 -346
  1120. package/bin/skills/pytorch-lightning/references/callbacks.md +0 -436
  1121. package/bin/skills/pytorch-lightning/references/distributed.md +0 -490
  1122. package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +0 -556
  1123. package/bin/skills/pyvene/SKILL.md +0 -473
  1124. package/bin/skills/pyvene/references/README.md +0 -73
  1125. package/bin/skills/pyvene/references/api.md +0 -383
  1126. package/bin/skills/pyvene/references/tutorials.md +0 -376
  1127. package/bin/skills/qdrant/SKILL.md +0 -493
  1128. package/bin/skills/qdrant/references/advanced-usage.md +0 -648
  1129. package/bin/skills/qdrant/references/troubleshooting.md +0 -631
  1130. package/bin/skills/qiskit/SKILL.md +0 -275
  1131. package/bin/skills/qiskit/references/algorithms.md +0 -607
  1132. package/bin/skills/qiskit/references/backends.md +0 -433
  1133. package/bin/skills/qiskit/references/circuits.md +0 -197
  1134. package/bin/skills/qiskit/references/patterns.md +0 -533
  1135. package/bin/skills/qiskit/references/primitives.md +0 -277
  1136. package/bin/skills/qiskit/references/setup.md +0 -99
  1137. package/bin/skills/qiskit/references/transpilation.md +0 -286
  1138. package/bin/skills/qiskit/references/visualization.md +0 -415
  1139. package/bin/skills/qutip/SKILL.md +0 -318
  1140. package/bin/skills/qutip/references/advanced.md +0 -555
  1141. package/bin/skills/qutip/references/analysis.md +0 -523
  1142. package/bin/skills/qutip/references/core_concepts.md +0 -293
  1143. package/bin/skills/qutip/references/time_evolution.md +0 -348
  1144. package/bin/skills/qutip/references/visualization.md +0 -431
  1145. package/bin/skills/ray-data/SKILL.md +0 -326
  1146. package/bin/skills/ray-data/references/integration.md +0 -82
  1147. package/bin/skills/ray-data/references/transformations.md +0 -83
  1148. package/bin/skills/ray-train/SKILL.md +0 -406
  1149. package/bin/skills/ray-train/references/multi-node.md +0 -628
  1150. package/bin/skills/rdkit/SKILL.md +0 -780
  1151. package/bin/skills/rdkit/references/api_reference.md +0 -432
  1152. package/bin/skills/rdkit/references/descriptors_reference.md +0 -595
  1153. package/bin/skills/rdkit/references/smarts_patterns.md +0 -668
  1154. package/bin/skills/rdkit/scripts/molecular_properties.py +0 -243
  1155. package/bin/skills/rdkit/scripts/similarity_search.py +0 -297
  1156. package/bin/skills/rdkit/scripts/substructure_filter.py +0 -386
  1157. package/bin/skills/reactome-database/SKILL.md +0 -278
  1158. package/bin/skills/reactome-database/references/api_reference.md +0 -465
  1159. package/bin/skills/reactome-database/scripts/reactome_query.py +0 -286
  1160. package/bin/skills/research-grants/README.md +0 -285
  1161. package/bin/skills/research-grants/SKILL.md +0 -938
  1162. package/bin/skills/research-grants/assets/budget_justification_template.md +0 -453
  1163. package/bin/skills/research-grants/assets/nih_specific_aims_template.md +0 -166
  1164. package/bin/skills/research-grants/assets/nsf_project_summary_template.md +0 -92
  1165. package/bin/skills/research-grants/references/broader_impacts.md +0 -392
  1166. package/bin/skills/research-grants/references/darpa_guidelines.md +0 -636
  1167. package/bin/skills/research-grants/references/doe_guidelines.md +0 -586
  1168. package/bin/skills/research-grants/references/nih_guidelines.md +0 -851
  1169. package/bin/skills/research-grants/references/nsf_guidelines.md +0 -570
  1170. package/bin/skills/research-grants/references/specific_aims_guide.md +0 -458
  1171. package/bin/skills/research-lookup/README.md +0 -156
  1172. package/bin/skills/research-lookup/SKILL.md +0 -606
  1173. package/bin/skills/research-lookup/examples.py +0 -174
  1174. package/bin/skills/research-lookup/lookup.py +0 -187
  1175. package/bin/skills/research-lookup/research_lookup.py +0 -483
  1176. package/bin/skills/research-lookup/scripts/research_lookup.py +0 -483
  1177. package/bin/skills/rowan/SKILL.md +0 -427
  1178. package/bin/skills/rowan/references/api_reference.md +0 -413
  1179. package/bin/skills/rowan/references/molecule_handling.md +0 -429
  1180. package/bin/skills/rowan/references/proteins_and_organization.md +0 -499
  1181. package/bin/skills/rowan/references/rdkit_native.md +0 -438
  1182. package/bin/skills/rowan/references/results_interpretation.md +0 -481
  1183. package/bin/skills/rowan/references/workflow_types.md +0 -591
  1184. package/bin/skills/rwkv/SKILL.md +0 -260
  1185. package/bin/skills/rwkv/references/architecture-details.md +0 -344
  1186. package/bin/skills/rwkv/references/rwkv7.md +0 -386
  1187. package/bin/skills/rwkv/references/state-management.md +0 -369
  1188. package/bin/skills/saelens/SKILL.md +0 -386
  1189. package/bin/skills/saelens/references/README.md +0 -70
  1190. package/bin/skills/saelens/references/api.md +0 -333
  1191. package/bin/skills/saelens/references/tutorials.md +0 -318
  1192. package/bin/skills/scanpy/SKILL.md +0 -386
  1193. package/bin/skills/scanpy/assets/analysis_template.py +0 -295
  1194. package/bin/skills/scanpy/references/api_reference.md +0 -251
  1195. package/bin/skills/scanpy/references/plotting_guide.md +0 -352
  1196. package/bin/skills/scanpy/references/standard_workflow.md +0 -206
  1197. package/bin/skills/scanpy/scripts/qc_analysis.py +0 -200
  1198. package/bin/skills/scholar-evaluation/SKILL.md +0 -289
  1199. package/bin/skills/scholar-evaluation/references/evaluation_framework.md +0 -663
  1200. package/bin/skills/scholar-evaluation/scripts/calculate_scores.py +0 -366
  1201. package/bin/skills/scientific-brainstorming/SKILL.md +0 -191
  1202. package/bin/skills/scientific-brainstorming/references/brainstorming_methods.md +0 -326
  1203. package/bin/skills/scientific-critical-thinking/SKILL.md +0 -566
  1204. package/bin/skills/scientific-critical-thinking/references/common_biases.md +0 -364
  1205. package/bin/skills/scientific-critical-thinking/references/evidence_hierarchy.md +0 -484
  1206. package/bin/skills/scientific-critical-thinking/references/experimental_design.md +0 -496
  1207. package/bin/skills/scientific-critical-thinking/references/logical_fallacies.md +0 -478
  1208. package/bin/skills/scientific-critical-thinking/references/scientific_method.md +0 -169
  1209. package/bin/skills/scientific-critical-thinking/references/statistical_pitfalls.md +0 -506
  1210. package/bin/skills/scientific-schematics/QUICK_REFERENCE.md +0 -207
  1211. package/bin/skills/scientific-schematics/README.md +0 -327
  1212. package/bin/skills/scientific-schematics/SKILL.md +0 -615
  1213. package/bin/skills/scientific-schematics/example_usage.sh +0 -89
  1214. package/bin/skills/scientific-schematics/references/best_practices.md +0 -559
  1215. package/bin/skills/scientific-schematics/scripts/generate_schematic.py +0 -135
  1216. package/bin/skills/scientific-schematics/scripts/generate_schematic_ai.py +0 -837
  1217. package/bin/skills/scientific-schematics/test_ai_generation.py +0 -243
  1218. package/bin/skills/scientific-slides/SKILL.md +0 -942
  1219. package/bin/skills/scientific-slides/assets/timing_guidelines.md +0 -597
  1220. package/bin/skills/scientific-slides/references/data_visualization_slides.md +0 -708
  1221. package/bin/skills/scientific-slides/references/presentation_structure.md +0 -642
  1222. package/bin/skills/scientific-slides/references/slide_design_principles.md +0 -849
  1223. package/bin/skills/scientific-slides/references/talk_types_guide.md +0 -687
  1224. package/bin/skills/scientific-slides/references/visual_review_workflow.md +0 -775
  1225. package/bin/skills/scientific-slides/scripts/generate_slide_image.py +0 -143
  1226. package/bin/skills/scientific-slides/scripts/generate_slide_image_ai.py +0 -748
  1227. package/bin/skills/scientific-slides/scripts/pdf_to_images.py +0 -201
  1228. package/bin/skills/scientific-slides/scripts/slides_to_pdf.py +0 -220
  1229. package/bin/skills/scientific-slides/scripts/validate_presentation.py +0 -367
  1230. package/bin/skills/scientific-visualization/SKILL.md +0 -779
  1231. package/bin/skills/scientific-visualization/assets/color_palettes.py +0 -197
  1232. package/bin/skills/scientific-visualization/assets/nature.mplstyle +0 -63
  1233. package/bin/skills/scientific-visualization/assets/presentation.mplstyle +0 -61
  1234. package/bin/skills/scientific-visualization/assets/publication.mplstyle +0 -68
  1235. package/bin/skills/scientific-visualization/references/color_palettes.md +0 -348
  1236. package/bin/skills/scientific-visualization/references/journal_requirements.md +0 -320
  1237. package/bin/skills/scientific-visualization/references/matplotlib_examples.md +0 -620
  1238. package/bin/skills/scientific-visualization/references/publication_guidelines.md +0 -205
  1239. package/bin/skills/scientific-visualization/scripts/figure_export.py +0 -343
  1240. package/bin/skills/scientific-visualization/scripts/style_presets.py +0 -416
  1241. package/bin/skills/scientific-writing/SKILL.md +0 -714
  1242. package/bin/skills/scientific-writing/assets/REPORT_FORMATTING_GUIDE.md +0 -574
  1243. package/bin/skills/scientific-writing/assets/scientific_report.sty +0 -606
  1244. package/bin/skills/scientific-writing/assets/scientific_report_template.tex +0 -449
  1245. package/bin/skills/scientific-writing/references/citation_styles.md +0 -720
  1246. package/bin/skills/scientific-writing/references/figures_tables.md +0 -806
  1247. package/bin/skills/scientific-writing/references/imrad_structure.md +0 -686
  1248. package/bin/skills/scientific-writing/references/professional_report_formatting.md +0 -664
  1249. package/bin/skills/scientific-writing/references/reporting_guidelines.md +0 -748
  1250. package/bin/skills/scientific-writing/references/writing_principles.md +0 -824
  1251. package/bin/skills/scikit-bio/SKILL.md +0 -437
  1252. package/bin/skills/scikit-bio/references/api_reference.md +0 -749
  1253. package/bin/skills/scikit-learn/SKILL.md +0 -521
  1254. package/bin/skills/scikit-learn/references/model_evaluation.md +0 -592
  1255. package/bin/skills/scikit-learn/references/pipelines_and_composition.md +0 -612
  1256. package/bin/skills/scikit-learn/references/preprocessing.md +0 -606
  1257. package/bin/skills/scikit-learn/references/quick_reference.md +0 -433
  1258. package/bin/skills/scikit-learn/references/supervised_learning.md +0 -378
  1259. package/bin/skills/scikit-learn/references/unsupervised_learning.md +0 -505
  1260. package/bin/skills/scikit-learn/scripts/classification_pipeline.py +0 -257
  1261. package/bin/skills/scikit-learn/scripts/clustering_analysis.py +0 -386
  1262. package/bin/skills/scikit-survival/SKILL.md +0 -399
  1263. package/bin/skills/scikit-survival/references/competing-risks.md +0 -397
  1264. package/bin/skills/scikit-survival/references/cox-models.md +0 -182
  1265. package/bin/skills/scikit-survival/references/data-handling.md +0 -494
  1266. package/bin/skills/scikit-survival/references/ensemble-models.md +0 -327
  1267. package/bin/skills/scikit-survival/references/evaluation-metrics.md +0 -378
  1268. package/bin/skills/scikit-survival/references/svm-models.md +0 -411
  1269. package/bin/skills/scvi-tools/SKILL.md +0 -190
  1270. package/bin/skills/scvi-tools/references/differential-expression.md +0 -581
  1271. package/bin/skills/scvi-tools/references/models-atac-seq.md +0 -321
  1272. package/bin/skills/scvi-tools/references/models-multimodal.md +0 -367
  1273. package/bin/skills/scvi-tools/references/models-scrna-seq.md +0 -330
  1274. package/bin/skills/scvi-tools/references/models-spatial.md +0 -438
  1275. package/bin/skills/scvi-tools/references/models-specialized.md +0 -408
  1276. package/bin/skills/scvi-tools/references/theoretical-foundations.md +0 -438
  1277. package/bin/skills/scvi-tools/references/workflows.md +0 -546
  1278. package/bin/skills/seaborn/SKILL.md +0 -673
  1279. package/bin/skills/seaborn/references/examples.md +0 -822
  1280. package/bin/skills/seaborn/references/function_reference.md +0 -770
  1281. package/bin/skills/seaborn/references/objects_interface.md +0 -964
  1282. package/bin/skills/segment-anything/SKILL.md +0 -500
  1283. package/bin/skills/segment-anything/references/advanced-usage.md +0 -589
  1284. package/bin/skills/segment-anything/references/troubleshooting.md +0 -484
  1285. package/bin/skills/sentence-transformers/SKILL.md +0 -255
  1286. package/bin/skills/sentence-transformers/references/models.md +0 -123
  1287. package/bin/skills/sentencepiece/SKILL.md +0 -235
  1288. package/bin/skills/sentencepiece/references/algorithms.md +0 -200
  1289. package/bin/skills/sentencepiece/references/training.md +0 -304
  1290. package/bin/skills/sglang/SKILL.md +0 -442
  1291. package/bin/skills/sglang/references/deployment.md +0 -490
  1292. package/bin/skills/sglang/references/radix-attention.md +0 -413
  1293. package/bin/skills/sglang/references/structured-generation.md +0 -541
  1294. package/bin/skills/shap/SKILL.md +0 -566
  1295. package/bin/skills/shap/references/explainers.md +0 -339
  1296. package/bin/skills/shap/references/plots.md +0 -507
  1297. package/bin/skills/shap/references/theory.md +0 -449
  1298. package/bin/skills/shap/references/workflows.md +0 -605
  1299. package/bin/skills/simpo/SKILL.md +0 -219
  1300. package/bin/skills/simpo/references/datasets.md +0 -478
  1301. package/bin/skills/simpo/references/hyperparameters.md +0 -452
  1302. package/bin/skills/simpo/references/loss-functions.md +0 -350
  1303. package/bin/skills/simpy/SKILL.md +0 -429
  1304. package/bin/skills/simpy/references/events.md +0 -374
  1305. package/bin/skills/simpy/references/monitoring.md +0 -475
  1306. package/bin/skills/simpy/references/process-interaction.md +0 -424
  1307. package/bin/skills/simpy/references/real-time.md +0 -395
  1308. package/bin/skills/simpy/references/resources.md +0 -275
  1309. package/bin/skills/simpy/scripts/basic_simulation_template.py +0 -193
  1310. package/bin/skills/simpy/scripts/resource_monitor.py +0 -345
  1311. package/bin/skills/skypilot/SKILL.md +0 -509
  1312. package/bin/skills/skypilot/references/advanced-usage.md +0 -491
  1313. package/bin/skills/skypilot/references/troubleshooting.md +0 -570
  1314. package/bin/skills/slime/SKILL.md +0 -464
  1315. package/bin/skills/slime/references/api-reference.md +0 -392
  1316. package/bin/skills/slime/references/troubleshooting.md +0 -386
  1317. package/bin/skills/speculative-decoding/SKILL.md +0 -467
  1318. package/bin/skills/speculative-decoding/references/lookahead.md +0 -309
  1319. package/bin/skills/speculative-decoding/references/medusa.md +0 -350
  1320. package/bin/skills/stable-baselines3/SKILL.md +0 -299
  1321. package/bin/skills/stable-baselines3/references/algorithms.md +0 -333
  1322. package/bin/skills/stable-baselines3/references/callbacks.md +0 -556
  1323. package/bin/skills/stable-baselines3/references/custom_environments.md +0 -526
  1324. package/bin/skills/stable-baselines3/references/vectorized_envs.md +0 -568
  1325. package/bin/skills/stable-baselines3/scripts/custom_env_template.py +0 -314
  1326. package/bin/skills/stable-baselines3/scripts/evaluate_agent.py +0 -245
  1327. package/bin/skills/stable-baselines3/scripts/train_rl_agent.py +0 -165
  1328. package/bin/skills/stable-diffusion/SKILL.md +0 -519
  1329. package/bin/skills/stable-diffusion/references/advanced-usage.md +0 -716
  1330. package/bin/skills/stable-diffusion/references/troubleshooting.md +0 -555
  1331. package/bin/skills/statistical-analysis/SKILL.md +0 -632
  1332. package/bin/skills/statistical-analysis/references/assumptions_and_diagnostics.md +0 -369
  1333. package/bin/skills/statistical-analysis/references/bayesian_statistics.md +0 -661
  1334. package/bin/skills/statistical-analysis/references/effect_sizes_and_power.md +0 -581
  1335. package/bin/skills/statistical-analysis/references/reporting_standards.md +0 -469
  1336. package/bin/skills/statistical-analysis/references/test_selection_guide.md +0 -129
  1337. package/bin/skills/statistical-analysis/scripts/assumption_checks.py +0 -539
  1338. package/bin/skills/statsmodels/SKILL.md +0 -614
  1339. package/bin/skills/statsmodels/references/discrete_choice.md +0 -669
  1340. package/bin/skills/statsmodels/references/glm.md +0 -619
  1341. package/bin/skills/statsmodels/references/linear_models.md +0 -447
  1342. package/bin/skills/statsmodels/references/stats_diagnostics.md +0 -859
  1343. package/bin/skills/statsmodels/references/time_series.md +0 -716
  1344. package/bin/skills/string-database/SKILL.md +0 -534
  1345. package/bin/skills/string-database/references/string_reference.md +0 -455
  1346. package/bin/skills/string-database/scripts/string_api.py +0 -369
  1347. package/bin/skills/sympy/SKILL.md +0 -500
  1348. package/bin/skills/sympy/references/advanced-topics.md +0 -635
  1349. package/bin/skills/sympy/references/code-generation-printing.md +0 -599
  1350. package/bin/skills/sympy/references/core-capabilities.md +0 -348
  1351. package/bin/skills/sympy/references/matrices-linear-algebra.md +0 -526
  1352. package/bin/skills/sympy/references/physics-mechanics.md +0 -592
  1353. package/bin/skills/tensorboard/SKILL.md +0 -629
  1354. package/bin/skills/tensorboard/references/integrations.md +0 -638
  1355. package/bin/skills/tensorboard/references/profiling.md +0 -545
  1356. package/bin/skills/tensorboard/references/visualization.md +0 -620
  1357. package/bin/skills/tensorpool/SKILL.md +0 -519
  1358. package/bin/skills/tensorrt-llm/SKILL.md +0 -187
  1359. package/bin/skills/tensorrt-llm/references/multi-gpu.md +0 -298
  1360. package/bin/skills/tensorrt-llm/references/optimization.md +0 -242
  1361. package/bin/skills/tensorrt-llm/references/serving.md +0 -470
  1362. package/bin/skills/tinker/SKILL.md +0 -466
  1363. package/bin/skills/tinker/references/api-reference.md +0 -168
  1364. package/bin/skills/tinker/references/dpo-and-preference.md +0 -174
  1365. package/bin/skills/tinker/references/evaluations.md +0 -183
  1366. package/bin/skills/tinker/references/getting-started.md +0 -157
  1367. package/bin/skills/tinker/references/loss-functions.md +0 -163
  1368. package/bin/skills/tinker/references/models-and-lora.md +0 -148
  1369. package/bin/skills/tinker/references/recipes.md +0 -326
  1370. package/bin/skills/tinker/references/reinforcement-learning.md +0 -357
  1371. package/bin/skills/tinker/references/rendering.md +0 -255
  1372. package/bin/skills/tinker/references/supervised-learning.md +0 -256
  1373. package/bin/skills/tinker-training-cost/SKILL.md +0 -187
  1374. package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +0 -123
  1375. package/bin/skills/together-ai/SKILL.md +0 -722
  1376. package/bin/skills/torch_geometric/SKILL.md +0 -676
  1377. package/bin/skills/torch_geometric/references/datasets_reference.md +0 -574
  1378. package/bin/skills/torch_geometric/references/layers_reference.md +0 -485
  1379. package/bin/skills/torch_geometric/references/transforms_reference.md +0 -679
  1380. package/bin/skills/torch_geometric/scripts/benchmark_model.py +0 -309
  1381. package/bin/skills/torch_geometric/scripts/create_gnn_template.py +0 -529
  1382. package/bin/skills/torch_geometric/scripts/visualize_graph.py +0 -313
  1383. package/bin/skills/torchdrug/SKILL.md +0 -450
  1384. package/bin/skills/torchdrug/references/core_concepts.md +0 -565
  1385. package/bin/skills/torchdrug/references/datasets.md +0 -380
  1386. package/bin/skills/torchdrug/references/knowledge_graphs.md +0 -320
  1387. package/bin/skills/torchdrug/references/models_architectures.md +0 -541
  1388. package/bin/skills/torchdrug/references/molecular_generation.md +0 -352
  1389. package/bin/skills/torchdrug/references/molecular_property_prediction.md +0 -169
  1390. package/bin/skills/torchdrug/references/protein_modeling.md +0 -272
  1391. package/bin/skills/torchdrug/references/retrosynthesis.md +0 -436
  1392. package/bin/skills/torchforge/SKILL.md +0 -433
  1393. package/bin/skills/torchforge/references/api-reference.md +0 -327
  1394. package/bin/skills/torchforge/references/troubleshooting.md +0 -409
  1395. package/bin/skills/torchtitan/SKILL.md +0 -358
  1396. package/bin/skills/torchtitan/references/checkpoint.md +0 -181
  1397. package/bin/skills/torchtitan/references/custom-models.md +0 -258
  1398. package/bin/skills/torchtitan/references/float8.md +0 -133
  1399. package/bin/skills/torchtitan/references/fsdp.md +0 -126
  1400. package/bin/skills/training-data-pipeline/SKILL.md +0 -427
  1401. package/bin/skills/training-data-pipeline/references/data-quality.md +0 -136
  1402. package/bin/skills/training-data-pipeline/references/frontier-distillation.md +0 -129
  1403. package/bin/skills/training-data-pipeline/references/production-data-formatting.md +0 -126
  1404. package/bin/skills/transformer-lens/SKILL.md +0 -346
  1405. package/bin/skills/transformer-lens/references/README.md +0 -54
  1406. package/bin/skills/transformer-lens/references/api.md +0 -362
  1407. package/bin/skills/transformer-lens/references/tutorials.md +0 -339
  1408. package/bin/skills/transformers/SKILL.md +0 -164
  1409. package/bin/skills/transformers/references/generation.md +0 -467
  1410. package/bin/skills/transformers/references/models.md +0 -361
  1411. package/bin/skills/transformers/references/pipelines.md +0 -335
  1412. package/bin/skills/transformers/references/tokenizers.md +0 -447
  1413. package/bin/skills/transformers/references/training.md +0 -500
  1414. package/bin/skills/treatment-plans/README.md +0 -488
  1415. package/bin/skills/treatment-plans/SKILL.md +0 -1579
  1416. package/bin/skills/treatment-plans/assets/STYLING_QUICK_REFERENCE.md +0 -185
  1417. package/bin/skills/treatment-plans/assets/chronic_disease_management_plan.tex +0 -665
  1418. package/bin/skills/treatment-plans/assets/general_medical_treatment_plan.tex +0 -547
  1419. package/bin/skills/treatment-plans/assets/medical_treatment_plan.sty +0 -222
  1420. package/bin/skills/treatment-plans/assets/mental_health_treatment_plan.tex +0 -774
  1421. package/bin/skills/treatment-plans/assets/one_page_treatment_plan.tex +0 -193
  1422. package/bin/skills/treatment-plans/assets/pain_management_plan.tex +0 -799
  1423. package/bin/skills/treatment-plans/assets/perioperative_care_plan.tex +0 -753
  1424. package/bin/skills/treatment-plans/assets/quality_checklist.md +0 -471
  1425. package/bin/skills/treatment-plans/assets/rehabilitation_treatment_plan.tex +0 -756
  1426. package/bin/skills/treatment-plans/references/goal_setting_frameworks.md +0 -411
  1427. package/bin/skills/treatment-plans/references/intervention_guidelines.md +0 -507
  1428. package/bin/skills/treatment-plans/references/regulatory_compliance.md +0 -476
  1429. package/bin/skills/treatment-plans/references/specialty_specific_guidelines.md +0 -655
  1430. package/bin/skills/treatment-plans/references/treatment_plan_standards.md +0 -485
  1431. package/bin/skills/treatment-plans/scripts/check_completeness.py +0 -322
  1432. package/bin/skills/treatment-plans/scripts/generate_template.py +0 -233
  1433. package/bin/skills/treatment-plans/scripts/timeline_generator.py +0 -385
  1434. package/bin/skills/treatment-plans/scripts/validate_treatment_plan.py +0 -369
  1435. package/bin/skills/trl-fine-tuning/SKILL.md +0 -455
  1436. package/bin/skills/trl-fine-tuning/references/dpo-variants.md +0 -227
  1437. package/bin/skills/trl-fine-tuning/references/online-rl.md +0 -82
  1438. package/bin/skills/trl-fine-tuning/references/reward-modeling.md +0 -122
  1439. package/bin/skills/trl-fine-tuning/references/sft-training.md +0 -168
  1440. package/bin/skills/umap-learn/SKILL.md +0 -479
  1441. package/bin/skills/umap-learn/references/api_reference.md +0 -532
  1442. package/bin/skills/uniprot-database/SKILL.md +0 -195
  1443. package/bin/skills/uniprot-database/references/api_examples.md +0 -413
  1444. package/bin/skills/uniprot-database/references/api_fields.md +0 -275
  1445. package/bin/skills/uniprot-database/references/id_mapping_databases.md +0 -285
  1446. package/bin/skills/uniprot-database/references/query_syntax.md +0 -256
  1447. package/bin/skills/uniprot-database/scripts/uniprot_client.py +0 -341
  1448. package/bin/skills/unsloth/SKILL.md +0 -635
  1449. package/bin/skills/unsloth/docs/advanced-rl.md +0 -222
  1450. package/bin/skills/unsloth/docs/chat-templates.md +0 -141
  1451. package/bin/skills/unsloth/docs/datasets.md +0 -489
  1452. package/bin/skills/unsloth/docs/docker-extended.md +0 -99
  1453. package/bin/skills/unsloth/docs/dynamic-ggufs-2.0.md +0 -116
  1454. package/bin/skills/unsloth/docs/dynamic-ggufs-aider.md +0 -118
  1455. package/bin/skills/unsloth/docs/faq.md +0 -91
  1456. package/bin/skills/unsloth/docs/fp16-vs-bf16.md +0 -61
  1457. package/bin/skills/unsloth/docs/fp8-rl.md +0 -224
  1458. package/bin/skills/unsloth/docs/glm-4.7-flash.md +0 -997
  1459. package/bin/skills/unsloth/docs/inference-deployment-overview.md +0 -17
  1460. package/bin/skills/unsloth/docs/inference.md +0 -27
  1461. package/bin/skills/unsloth/docs/installation-docker.md +0 -155
  1462. package/bin/skills/unsloth/docs/installation-pip.md +0 -148
  1463. package/bin/skills/unsloth/docs/kernels-packing.md +0 -190
  1464. package/bin/skills/unsloth/docs/kimi-k2.5.md +0 -634
  1465. package/bin/skills/unsloth/docs/lm-studio.md +0 -235
  1466. package/bin/skills/unsloth/docs/lora-hot-swapping.md +0 -75
  1467. package/bin/skills/unsloth/docs/lora-hyperparameters.md +0 -363
  1468. package/bin/skills/unsloth/docs/memory-efficient-rl.md +0 -267
  1469. package/bin/skills/unsloth/docs/model-selection.md +0 -70
  1470. package/bin/skills/unsloth/docs/models.md +0 -532
  1471. package/bin/skills/unsloth/docs/multi-gpu-ddp.md +0 -90
  1472. package/bin/skills/unsloth/docs/notebooks.md +0 -223
  1473. package/bin/skills/unsloth/docs/overview.md +0 -110
  1474. package/bin/skills/unsloth/docs/qwen3-coder-next-extended.md +0 -900
  1475. package/bin/skills/unsloth/docs/qwen3-coder-next.md +0 -900
  1476. package/bin/skills/unsloth/docs/requirements.md +0 -45
  1477. package/bin/skills/unsloth/docs/reward-hacking.md +0 -25
  1478. package/bin/skills/unsloth/docs/saving-to-gguf.md +0 -138
  1479. package/bin/skills/unsloth/docs/saving-to-ollama.md +0 -46
  1480. package/bin/skills/unsloth/docs/sglang-guide.md +0 -278
  1481. package/bin/skills/unsloth/docs/speculative-decoding.md +0 -70
  1482. package/bin/skills/unsloth/docs/tool-calling.md +0 -334
  1483. package/bin/skills/unsloth/docs/troubleshooting-faq.md +0 -204
  1484. package/bin/skills/unsloth/docs/troubleshooting-inference.md +0 -26
  1485. package/bin/skills/unsloth/docs/tts-fine-tuning.md +0 -149
  1486. package/bin/skills/unsloth/docs/tutorial-grpo.md +0 -273
  1487. package/bin/skills/unsloth/docs/tutorial-llama3-ollama.md +0 -356
  1488. package/bin/skills/unsloth/docs/vision-fine-tuning.md +0 -135
  1489. package/bin/skills/unsloth/docs/vision-rl.md +0 -170
  1490. package/bin/skills/unsloth/docs/vllm-engine-arguments.md +0 -43
  1491. package/bin/skills/unsloth/docs/vllm-guide.md +0 -98
  1492. package/bin/skills/uspto-database/SKILL.md +0 -607
  1493. package/bin/skills/uspto-database/references/additional_apis.md +0 -394
  1494. package/bin/skills/uspto-database/references/patentsearch_api.md +0 -266
  1495. package/bin/skills/uspto-database/references/peds_api.md +0 -212
  1496. package/bin/skills/uspto-database/references/trademark_api.md +0 -358
  1497. package/bin/skills/uspto-database/scripts/patent_search.py +0 -290
  1498. package/bin/skills/uspto-database/scripts/peds_client.py +0 -285
  1499. package/bin/skills/uspto-database/scripts/trademark_client.py +0 -311
  1500. package/bin/skills/vaex/SKILL.md +0 -182
  1501. package/bin/skills/vaex/references/core_dataframes.md +0 -367
  1502. package/bin/skills/vaex/references/data_processing.md +0 -555
  1503. package/bin/skills/vaex/references/io_operations.md +0 -703
  1504. package/bin/skills/vaex/references/machine_learning.md +0 -728
  1505. package/bin/skills/vaex/references/performance.md +0 -571
  1506. package/bin/skills/vaex/references/visualization.md +0 -613
  1507. package/bin/skills/venue-templates/SKILL.md +0 -686
  1508. package/bin/skills/venue-templates/assets/examples/cell_summary_example.md +0 -247
  1509. package/bin/skills/venue-templates/assets/examples/medical_structured_abstract.md +0 -313
  1510. package/bin/skills/venue-templates/assets/examples/nature_abstract_examples.md +0 -213
  1511. package/bin/skills/venue-templates/assets/examples/neurips_introduction_example.md +0 -245
  1512. package/bin/skills/venue-templates/assets/grants/nih_specific_aims.tex +0 -235
  1513. package/bin/skills/venue-templates/assets/grants/nsf_proposal_template.tex +0 -375
  1514. package/bin/skills/venue-templates/assets/journals/nature_article.tex +0 -171
  1515. package/bin/skills/venue-templates/assets/journals/neurips_article.tex +0 -283
  1516. package/bin/skills/venue-templates/assets/journals/plos_one.tex +0 -317
  1517. package/bin/skills/venue-templates/assets/posters/beamerposter_academic.tex +0 -311
  1518. package/bin/skills/venue-templates/references/cell_press_style.md +0 -483
  1519. package/bin/skills/venue-templates/references/conferences_formatting.md +0 -564
  1520. package/bin/skills/venue-templates/references/cs_conference_style.md +0 -463
  1521. package/bin/skills/venue-templates/references/grants_requirements.md +0 -787
  1522. package/bin/skills/venue-templates/references/journals_formatting.md +0 -486
  1523. package/bin/skills/venue-templates/references/medical_journal_styles.md +0 -535
  1524. package/bin/skills/venue-templates/references/ml_conference_style.md +0 -556
  1525. package/bin/skills/venue-templates/references/nature_science_style.md +0 -405
  1526. package/bin/skills/venue-templates/references/posters_guidelines.md +0 -628
  1527. package/bin/skills/venue-templates/references/reviewer_expectations.md +0 -417
  1528. package/bin/skills/venue-templates/references/venue_writing_styles.md +0 -321
  1529. package/bin/skills/venue-templates/scripts/customize_template.py +0 -195
  1530. package/bin/skills/venue-templates/scripts/query_template.py +0 -266
  1531. package/bin/skills/venue-templates/scripts/validate_format.py +0 -250
  1532. package/bin/skills/verl/SKILL.md +0 -391
  1533. package/bin/skills/verl/references/api-reference.md +0 -301
  1534. package/bin/skills/verl/references/troubleshooting.md +0 -391
  1535. package/bin/skills/vllm/SKILL.md +0 -364
  1536. package/bin/skills/vllm/references/optimization.md +0 -226
  1537. package/bin/skills/vllm/references/quantization.md +0 -284
  1538. package/bin/skills/vllm/references/server-deployment.md +0 -255
  1539. package/bin/skills/vllm/references/troubleshooting.md +0 -447
  1540. package/bin/skills/weights-and-biases/SKILL.md +0 -590
  1541. package/bin/skills/weights-and-biases/references/artifacts.md +0 -584
  1542. package/bin/skills/weights-and-biases/references/integrations.md +0 -700
  1543. package/bin/skills/weights-and-biases/references/sweeps.md +0 -847
  1544. package/bin/skills/whisper/SKILL.md +0 -317
  1545. package/bin/skills/whisper/references/languages.md +0 -189
  1546. package/bin/skills/zarr-python/SKILL.md +0 -779
  1547. package/bin/skills/zarr-python/references/api_reference.md +0 -515
  1548. package/bin/skills/zinc-database/SKILL.md +0 -404
  1549. package/bin/skills/zinc-database/references/api_reference.md +0 -692
@@ -1,141 +0,0 @@
1
- ---
2
- name: deepspeed
3
- description: Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
4
- version: 1.0.0
5
- author: Synthetic Sciences
6
- license: MIT
7
- tags: [DeepSpeed, Distributed Training, ZeRO, Pipeline Parallelism, Mixed Precision, Optimization, Microsoft, Large-Scale Training, FP16, FP8]
8
- dependencies: [deepspeed, torch, transformers, accelerate]
9
- ---
10
-
11
- # Deepspeed Skill
12
-
13
- Comprehensive assistance with deepspeed development, generated from official documentation.
14
-
15
- ## When to Use This Skill
16
-
17
- This skill should be triggered when:
18
- - Working with deepspeed
19
- - Asking about deepspeed features or APIs
20
- - Implementing deepspeed solutions
21
- - Debugging deepspeed code
22
- - Learning deepspeed best practices
23
-
24
- ## Quick Reference
25
-
26
- ### Common Patterns
27
-
28
- **Pattern 1:** DeepNVMe Contents Requirements Creating DeepNVMe Handles Using DeepNVMe Handles Blocking File Write Non-Blocking File Write Parallel File Write Pinned Tensors Putting it together Acknowledgements Appendix Advanced Handle Creation Performance Tuning DeepNVMe APIs General I/O APIs GDS-specific APIs Handle Settings APIs This tutorial will show how to use DeepNVMe for data transfers between persistent storage and tensors residing in host or device memory. DeepNVMe improves the performance and efficiency of I/O operations in Deep Learning applications through powerful optimizations built on Non-Volatile Memory Express (NVMe) Solid State Drives (SSDs), Linux Asynchronous I/O (libaio), and NVIDIA Magnum IOTM GPUDirect® Storage (GDS). Requirements Ensure your environment is properly configured to use DeepNVMe. First, you need to install DeepSpeed version >= 0.15.0. Next, ensure that the DeepNVMe operators are available in the DeepSpeed installation. The async_io operator is required for any DeepNVMe functionality, while the gds operator is required only for GDS functionality. You can confirm availability of each operator by inspecting the output of ds_report to check that compatible status is [OKAY]. Below is a snippet of ds_report output confirming the availability of both async_io and gds operators. If async_io operator is unavailable, you will need to install the appropriate libaio library binaries for your Linux flavor. For example, Ubuntu users will need to run apt install libaio-dev. In general, you should carefully inspect ds_report output for helpful tips such as the following: [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. To enable gds operator, you will need to install NVIDIA GDS by consulting the appropriate guide for bare-metal systems or Azure VMs (coming soon). Creating DeepNVMe Handles DeepNVMe functionality can be accessed through two abstractions: aio_handle and gds_handle. The aio_handle is usable on both host and device tensors. while gds_handle works only on CUDA tensors, but is more efficient. The first step to use DeepNVMe is to create a desired handle. aio_handle requires async_io operator, while gds_handle requires both async_io and gds operators. The following snippets illustrate aio_handle and gds_handle creation respectively. ### Create aio_handle from deepspeed.ops.op_builder import AsyncIOBuilder aio_handle = AsyncIOBuilder().load().aio_handle() ### Create gds_handle from deepspeed.ops.op_builder import GDSBuilder gds_handle = GDSBuilder().load().gds_handle() For simplicity, the above examples illustrate handle creation using default parameters. We expect that handles created with default parameters to provide good performance in most environments. However, you can see below for advanced handle creation. Using DeepNVMe Handles aio_handle and gds_handle provide identical APIs for storing tensors to files or loading tensors from files. A common feature of these APIs is that they take a tensor and a file path as arguments for the desired I/O operation. For best performance, pinned device or host tensors should be used for I/O operations (see here for details). For brevity, this tutorial will use aio_handle for illustration, but keep in mind that gds_handle works similarly. You can see the available APIs in a Python shell via tab completion on an aio_handle object . This is illustrated using tab completion of h.. >python Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle() >>> h. h.async_pread( h.free_cpu_locked_tensor( h.get_overlap_events( h.get_single_submit( h.new_cpu_locked_tensor( h.pwrite( h.sync_pread( h.wait( h.async_pwrite( h.get_block_size( h.get_queue_depth( h.get_intra_op_parallelism( h.pread( h.read( h.sync_pwrite( h.write( The APIs of interest for performing I/O operations are those named with pread and pwrite substrings. For brevity, we will focus on the file write APIs, namely sync_pwrite, async_pwrite, and pwrite. We will discuss only sync_pwrite and async_pwrite below because they are specializations of pwrite. Blocking File Write sync_pwrite provides the standard blocking semantics of Python file write. The example below illustrates using sync_pwrite to store a 1GB CUDA tensor to a local NVMe file. >>> import os >>> os.path.isfile('/local_nvme/test_1GB.pt') False >>> import torch >>> t=torch.empty(1024**3, dtype=torch.uint8).cuda() >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle() >>> h.sync_pwrite(t,'/local_nvme/test_1GB.pt') >>> os.path.isfile('/local_nvme/test_1GB.pt') True >>> os.path.getsize('/local_nvme/test_1GB.pt') 1073741824 Non-Blocking File Write An important DeepNVMe optimization is the non-blocking I/O semantics which enables Python threads to overlap computations with I/O operations. async_pwrite provides the non-blocking semantics for file writes. The Python thread can later use wait() to synchronize with the I/O operation. async_write can also be used to submit multiple back-to-back non-blocking I/O operations, of which can then be later blocked on using a single wait(). The example below illustrates using async_pwrite to store a 1GB CUDA tensor to a local NVMe file. >>> import os >>> os.path.isfile('/local_nvme/test_1GB.pt') False >>> import torch >>> t=torch.empty(1024**3, dtype=torch.uint8).cuda() >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle() >>> h.async_pwrite(t,'/local_nvme/test_1GB.pt') >>> h.wait() 1 >>> os.path.isfile('/local_nvme/test_1GB.pt') True >>> os.path.getsize('/local_nvme/test_1GB.pt') 1073741824 Warning for non-blocking I/O operations: To avoid data races and corruptions, .wait() must be carefully used to serialize the writing of source tensors, and the reading of destination tensors. For example, the following update of t during a non-blocking file write is unsafe and could corrupt /local_nvme/test_1GB.pt. >>> t=torch.empty(1024**3, dtype=torch.uint8).cuda() >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle() >>> h.async_pwrite(t,'/local_nvme/test_1GB.pt') >>> t += 1 # <--- Data race; avoid by preceding with `h.wait()` Similar safety problems apply to reading the destination tensor of a non-blocking file read without .wait() synchronization. Parallel File Write An important DeepNVMe optimization is the ability to parallelize individual I/O operations. This optimization is enabled by specifying the desired parallelism degree when constructing a DeepNVMe handle. Subsequent I/O operations with that handle are automatically parallelized over the requested number of host or device threads, as appropriate. I/O parallelism is composable with either the blocking or non-blocking I/O APIs. The example below illustrates 4-way parallelism of a file write using async_pwrite. Note the use of intra_op_parallelism argument to specify the desired parallelism degree in handle creation. >>> import os >>> os.path.isfile('/local_nvme/test_1GB.pt') False >>> import torch >>> t=torch.empty(1024**3, dtype=torch.uint8).cuda() >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle(intra_op_parallelism=4) >>> h.async_pwrite(t,'/local_nvme/test_1GB.pt') >>> h.wait() 1 >>> os.path.isfile('/local_nvme/test_1GB.pt') True >>> os.path.getsize('/local_nvme/test_1GB.pt') 1073741824 Pinned Tensors A key part of DeepNVMe optimizations is using direct memory access (DMA) for I/O operations, which requires that the host or device tensor be pinned. To pin host tensors, you can use mechanisms provided by Pytorch or DeepSpeed Accelerators. The following example illustrates writing a pinned CPU tensor to a local NVMe file. >>> import os >>> os.path.isfile('/local_nvme/test_1GB.pt') False >>> import torch >>> t=torch.empty(1024**3, dtype=torch.uint8).pin_memory() >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle() >>> h.async_pwrite(t,'/local_nvme/test_1GB.pt') >>> h.wait() 1 >>> os.path.isfile('/local_nvme/test_1GB.pt') True >>> os.path.getsize('/local_nvme/test_1GB.pt') 1073741824 On the other hand,gds_handle provides new_pinned_device_tensor() and pin_device_tensor() functions for pinning CUDA tensors. The following example illustrates writing a pinned CUDA tensor to a local NVMe file. >>> import os >>> os.path.isfile('/local_nvme/test_1GB.pt') False >>> import torch >>> t=torch.empty(1024**3, dtype=torch.uint8).cuda() >>> from deepspeed.ops.op_builder import GDSBuilder >>> h = GDSBuilder().load().gds_handle() >>> h.pin_device_tensor(t) >>> h.async_pwrite(t,'/local_nvme/test_1GB.pt') >>> h.wait() 1 >>> os.path.isfile('/local_nvme/test_1GB.pt') True >>> os.path.getsize('/local_nvme/test_1GB.pt') 1073741824 >>> h.unpin_device_tensor(t) Putting it together We hope that the above material helps you to get started with DeepNVMe. You can also use the following links to see DeepNVMe usage in real-world Deep Learning applications. Parameter swapper in ZeRO-Inference and ZeRO-Infinity. Optimizer swapper in ZeRO-Infinity. Gradient swapper in ZeRO-Infinity. Simple file read and write operations. Acknowledgements This tutorial has been significantly improved by feedback from Guanhua Wang, Masahiro Tanaka, and Stas Bekman. Appendix Advanced Handle Creation Achieving peak I/O performance with DeepNVMe requires careful configuration of handle creation. In particular, the parameters of aio_handle and gds_handle constructors are performance-critical because they determine how efficiently DeepNVMe interacts with the underlying storage subsystem (i.e., libaio, GDS, PCIe, and SSD). For convenience we make it possible to create handles using default parameter values which will provide decent performance in most scenarios. However, squeezing out every available performance in your environment will likely require tuning the constructor parameters, namely block_size, queue_depth, single_submit, overlap_events, and intra_op_parallelism. The aio_handle constructor parameters and default values are illustrated below: >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> help(AsyncIOBuilder().load().aio_handle()) Help on aio_handle in module async_io object: class aio_handle(pybind11_builtins.pybind11_object) | Method resolution order: | aio_handle | pybind11_builtins.pybind11_object | builtins.object | | Methods defined here: | | __init__(...) | __init__(self: async_io.aio_handle, block_size: int = 1048576, queue_depth: int = 128, single_submit: bool = False, overlap_events: bool = False, intra_op_parallelism: int = 1) -> None | | AIO handle constructor Performance Tuning As discussed earlier, achieving peak DeepNVMe performance for a target workload or environment requires using optimally configured aio_handle or gds_handle handles. For configuration convenience, we provide a utility called ds_nvme_tune to automate the discovery of optimal DeepNVMe configurations. ds_nvme_tune automatically explores a user-specified or default configuration space and recommends the option that provides the best read and write performance. Below is an example usage of ds_nvme_tune to tune aio_handle data transfers between GPU memory and a local NVVMe SSD mounted on /local_nvme. This example used the default configuration space of ds_nvme_tune for tuning. $ ds_nvme_tune --nvme_dir /local_nvme --gpu Running DeepNVMe performance tuning on ['/local_nvme/'] Best performance (GB/sec): read = 3.69, write = 3.18 { "aio": { "single_submit": "false", "overlap_events": "true", "intra_op_parallelism": 8, "queue_depth": 32, "block_size": 1048576 } } The above tuning was executed on a Lambda workstation equipped with two NVIDIA A6000-48GB GPUs, 252GB of DRAM, and a CS3040 NVMe 2TB SDD with peak read and write speeds of 5.6 GB/s and 4.3 GB/s respectively. The tuning required about four and half minutes. Based on the results, one can expect to achieve read and write transfer speeds of 3.69 GB/sec and 3.18 GB/sec respectively by using an aio_handle configured as below. >>> from deepspeed.ops.op_builder import AsyncIOBuilder >>> h = AsyncIOBuilder().load().aio_handle(block_size=1048576, queue_depth=32, single_submit=False, overlap_events=True, intra_op_parallelism=8) The full command line options of ds_nvme_tune can be obtained via the normal -h or --help. usage: ds_nvme_tune [-h] --nvme_dir NVME_DIR [NVME_DIR ...] [--sweep_config SWEEP_CONFIG] [--no_read] [--no_write] [--io_size IO_SIZE] [--gpu] [--gds] [--flush_page_cache] [--log_dir LOG_DIR] [--loops LOOPS] [--verbose] options: -h, --help show this help message and exit --nvme_dir NVME_DIR [NVME_DIR ...] Directory in which to perform I/O tests. A writeable directory on a NVMe device. --sweep_config SWEEP_CONFIG Performance sweep configuration json file. --no_read Disable read performance measurements. --no_write Disable write performance measurements. --io_size IO_SIZE Number of I/O bytes to read/write for performance measurements. --gpu Test tensor transfers between GPU device and NVME device. --gds Run the sweep over NVIDIA GPUDirectStorage operator --flush_page_cache Page cache will not be flushed and reported read speeds may be higher than actual ***Requires sudo access***. --log_dir LOG_DIR Output directory for performance log files. Default is ./_aio_bench_logs --loops LOOPS Count of operation repetitions --verbose Print debugging information. DeepNVMe APIs For convenience, we provide listing and brief descriptions of the DeepNVMe APIs. General I/O APIs The following functions are used for I/O operations with both aio_handle and gds_handle. Function Description async_pread Non-blocking file read into tensor sync_pread Blocking file read into tensor pread File read with blocking and non-blocking options async_pwrite Non-blocking file write from tensor sync_pwrite Blocking file write from tensor pwrite File write with blocking and non-blocking options wait Wait for non-blocking I/O operations to complete GDS-specific APIs The following functions are available only for gds_handle Function Description new_pinned_device_tensor Allocate and pin a device tensor free_pinned_device_tensor Unpin and free a device tensor pin_device_tensor Pin a device tensor unpin_device_tensor unpin a device tensor Handle Settings APIs The following APIs can be used to probe handle configuration. Function Description get_queue_depth Return queue depth setting get_single_submit Return whether single_submit is enabled get_intra_op_parallelism Return I/O parallelism degree get_block_size Return I/O block size setting get_overlap_events Return whether overlap_event is enabled Updated: November 5, 2025 Previous Next
29
-
30
- ```
31
- libaio
32
- ```
33
-
34
- **Pattern 2:** Mixture of Experts for NLG models Contents 1. Installation 2. Training NLG+MoE models 2.1. Changes to the model 2.2. Pre-training the Standard MoE model 2.3. Pre-training the PR-MoE model 2.4. Training MoS with reduced model size In this tutorial, we introduce how to apply DeepSpeed Mixture of Experts (MoE) to NLG models, which reduces the training cost by 5 times and reduce the MoE model size by 3 times (details in our Blog). We use the GPT-3 like models in Megatron-LM framework as the example. Before reading this tutorial, we recommend to first read the tutorials about Mixture of Experts and Megatron-LM GPT pre-training. 1. Installation You would need to install DeepSpeed v0.6.0 or higher to use the MoE feature. The MoE for NLG model examples are in the Megatron-DeepSpeed repo under the MoE folder. 2. Training NLG+MoE models 2.1. Changes to the model To apply MoE to the GPT-style model, we made several changes in Megatron framework, mostly in megatron/model/ where we add the MoE layers into the model. 2.2. Pre-training the Standard MoE model We provide example training scripts under examples_deepspeed/MoE which we used to perform the experiments in our Blog. There are a few new hyperparameters for standard MoE model: --num-experts: the number of experts per MoE layer. In our experiments we set it to 128. Larger number of experts tend to provide better convergence, but it’s a diminishing return. --moe-expert-parallel-size: degree of the MoE expert parallelism. In other words, there will be num-experts/moe-expert-parallel-size experts on each GPU. Thus --moe-expert-parallel-size should be no more than both number of GPUs, and --num-experts. --moe-loss-coeff: scaling coefficient for adding MoE loss to model loss. In our experiments we find that 0.01 is a good setting. --moe-train-capacity-factor, --moe-eval-capacity-factor, --moe-min-capacity: these configs determine how many tokens can a single expert handle. Larger numbers could lead to better convergence, but would also lead to slower training since the load would be more unbalanced on different experts. --disable-moe-token-dropping: this will completely remove the limitation of how many tokens can a single expert handle. For the same reason as above, we only recommend using this during inference/eval. 2.3. Pre-training the PR-MoE model PR-MoE is a new designed MoE models, standing for Pyramid-Residual-MoE, which improves the parameter efficiency up to 3x as compared to standard MoE. Please see our Blog for more details. We provide example training scripts under examples_deepspeed/MoE. There are a few different hyperparameters for PR-MoE model compared to standard MoE: --num-experts: Instead of providing a single number, to enable Pyramid-MoE, you need to provide a list, whose length is the same as the number of MoE layers. We suggest to use more experts in the latter stage (close to output) of the model. --mlp-type: chosen from [standard, residual]. When it is residual, Residual-MoE is enabled. In addition to the new hyperparameters above for standard MoE and PR-MoE, for NLG+MoE models we found that it’s helpful to lower the learning rate and increase the learning rate decay duration compared to the base dense model. Details of our tuning can be found in the example training scripts. Regarding training data, we are not able to release our internal data but any public data for Megatron-LM pre-training can be directly used to train MoE models (with the caveat that it might not provide the exact same model quality as in our experiments). For example, we evaluated The Pile dataset (pile.eleuther.ai, github.com/EleutherAI/the-pile) for both dense and MoE models. Table 1 below shows that this public data provides similar evaluation results as our internal data. Model size LAMBADA: completion prediction PIQA: commonsense reasoning BoolQ: reading comprehension RACE-h: reading comprehension TriviaQA: question answering WebQs: question answering Dense NLG: 350M, internal data 0.5203 0.6931 0.5364 0.3177 0.0321 0.0157 350M, public Pile 0.5106 0.6589 0.5933 0.3196 0.0257 0.0064 Standard MoE NLG: 350M+MoE-128, internal data 0.6270 0.7459 0.6046 0.3560 0.1658 0.0517 350M+MoE-128, public Pile 0.6128 0.7323 0.6040 0.3349 0.1111 0.0335 PR-MoE NLG: 350M+MoE-128, internal data 0.6365 0.7399 0.5988 0.3569 0.1630 0.0473 PR-MoE + MoS NLG: 350M+MoE-128, internal data 0.6346 0.7334 0.5807 0.3483 0.1369 0.0522 Table 1: Zero-shot evaluation results (last six columns) for different dense and MoE NLG models. All zero-shot evaluation results use the accuracy metric. 2.4. Training MoS with reduced model size MoS, standing for Mixture-of-Students, is a staged distillation-based technique for compressing large MoE models. MoS further reduces the model size by 12.5%, leading to up 3.7x model size reduction when combined with PR-MoE over the standard MoE. The reduced model size helps reduce the latency and cost during inference. To train an MoS model, one needs to specify a few additional parameters. We will use PR-MoE as an example: --mos: This would enable Mixture-of-Students via knowledge distillation. --load-teacher: This specifies the path to the teacher model checkpoint. This is a mandatory argument for using MoS and the teacher model checkpoint can be obtained by either training a standard MoE or the PR-MoE. num-layers-teacher, --hidden-size-teacher, --hidden-size-teacher, --num-experts-teacher: In addition to the teacher model checkpoint path, we also need to specify the model architecture of the teacher model such as its number of layers, hidden dimension size, and the number of experts per MoE layer. In the case of PR-MoE, we need to also provide a list of experts for the teacher model, where we remove a few expert layers from the teacher model. In addition to the new parameters above, we observe that using the teacher PR-MoE during the entire training process may adversely impact the final student model accuracy. In our experiments, we use a staged distillation method by stopping distillation early in the training process (e.g., after 400K steps) and perform optimization only against the standard language modeling loss for the rest of the training. We provide example training scripts under examples_deepspeed/MoE. Details of our parameter settings can be found in the example training scripts. The performance results of MoS can be seen from our blog post and our paper. Updated: November 5, 2025 Previous Next
35
-
36
- ```
37
- megatron/model/
38
- ```
39
-
40
- **Pattern 3:** MoS, standing for Mixture-of-Students, is a staged distillation-based technique for compressing large MoE models. MoS further reduces the model size by 12.5%, leading to up 3.7x model size reduction when combined with PR-MoE over the standard MoE. The reduced model size helps reduce the latency and cost during inference. To train an MoS model, one needs to specify a few additional parameters. We will use PR-MoE as an example:
41
-
42
- ```
43
- --mos
44
- ```
45
-
46
- **Pattern 4:** Learning Rate Range Test Contents Learning Rate Range Test (LRRT) Prerequisites LRRT Parameters Required Model Configuration Changes PyTorch Example: Tuning for Large Batch Sizes This tutorial shows how to use to perform Learning Rate range tests in PyTorch. Learning Rate Range Test (LRRT) Learning rate range test ( LRRT ) is a method for discovering the largest learning rate values that can be used to train a model without divergence. Data scientists are often interested in this information because large learning rates lead to faster model convergence than a small learning rates. Moreover, large learning rates are crucial in learning rate schedules such as CLR and 1Cycle, which are used to train effectively with large batch sizes. DeepSpeed provides LRRT for model training in PyTorch frameworks. Prerequisites To use DeepSpeed’s LRRT, you must satisfy the following two conditions: Integrate DeepSpeed into your training script using the Getting Started guide. Add the parameters to configure LRRT to the parameters of your model. The LRRT parameters are defined below. LRRT Parameters LRRT works by linearly increasing the learning rate by a predefined amount, at predefined intervals. Thus, LRRT is a form of learning rate schedule because it defines how and when the learning rate should change during model training. To configure LRRT, you will need to set these parameters: lr_range_test_min_lr : The initial learning rate for training (float) lr_range_test_step_size: The interval for scaling up learning rate, defined in training steps (integer) lr_range_test_step_rate: The scaling factor for increasing learning rate (float) lr_range_test_staircase: If true, learning rate is changed every lr_range_test_step_size training steps, otherwise learning rate is changed at every training step (boolean) Required Model Configuration Changes We will illustrate the required model configuration changes an example LRRT schedule that: Starts training with an initial learning rate of 0.0001 Uses a scaling rate of 5 Uses a scaling interval of 200 training steps Scales learning rate at every training step, i.e., does not use staircase PyTorch For PyTorch models, LRRT is implemented as a learning rate scheduler, a feature that is available in PyTorch versions 1.0.1 and newer. Thus, you can add a "scheduler" entry of type "LRRangeTest" into your model configuration as illustrated below: "scheduler": { "type": "LRRangeTest", "params": { "lr_range_test_min_lr": 0.0001, "lr_range_test_step_size": 200, "lr_range_test_step_rate": 5, "lr_range_test_staircase": false } } Example: Tuning for Large Batch Sizes We illustrate how LRRT can benefit data scientists with a snippet of our experience of tuning an internal production model to converge efficiently on larger batch sizes, as we scaled from one GPU (batch size 512) to four GPUs (batch size 2048). Our goal was to train the model with the larger batch size to match the performance of the smaller batch size using the same amount of data samples. The challenge here is the well known problem of slow convergence of large batch size training. Our approach was to use a 1Cycle schedule in DeepSpeed to tackle this problem, and we used LRRT to configure the schedule. In the plots below, we illustrate using LRRT to discover the maximum learning rates for effective training with batch size 2048. The plot on the left shows the impact of large learning rates on validation loss over the first 9000 batches of training. The plot on the right shows the learning rate values during the same period of training. Using grid search we discover that the best fixed learning rate for the batch size 2048 is 0.0002. The blue line (lr=0.0002) represents training with this fixed learning rate. We compare the two LRRT schedules with this fixed learning rate. The orange (lr_range_test_step_rate=5) and gray (lr_range_test_step_rate=50) lines represent training with similar LRRT schedules that differ only in lr_range_test_step_rate values. Although the LRRT schedules start from the same base learning rate, the gray line’s learning rate grows about 10 times faster than the orange line. Also, the learning rates of the LRRT schedules had grown larger than that of the blue line in the presented data points. We subsequently refer to the gray line as “fast growing”, and the orange line as “slow growing” LRRT schedules respectively. We make the following observations from this small example. Larger learning rates clearly benefit model performance, up to some point. The fast growing LRRT schedule achieves validation loss of 0.46 after 3000 batches, which the fixed learning rate does not achieve with 9000 batches. The slow growing LRRT does not match that score until after 6000 batches, however it maintains an increasing performance advantage over the fixed learning rate. There is an upper bound on learning rate values that are useful for training the model. The fast growing LRRT schedule hits this boundary quickly and diverges, while the slow growing LRRT will later diverge for the same reason. LRRT helped us discover these boundaries quickly, using less than 2% of the training data. These boundaries are useful information for constructing learning rate schedules. These observations from LRRT helped us to configure the learning rate boundaries and the cycle span for a 1Cycle schedule that solves the problem, as shown below. "OneCycle": { "cycle_min_lr": 0.002, "cycle_max_lr": 0.005, "cycle_first_step_size": 2000, "cycle_second_step_size": 2000, ... } In our experience these are four most critical parameters of 1Cycle schedules. We chose to use the slower LRRT schedule (lr_range_test_step_rate=5) to set cycle_min_lr because it achieves the best loss and the faster schedule diverges fairly quickly. We set cycle_max_lr to 0.005 even though the plot shows that performance was still improving at slightly higher learning rate. This is because we observed that if we wait till the maximum learning rate, the model could be at the point of divergence and impossible to recover. Since it takes 8000 batches for the learning rate to become 0.005, we set cycle_first_step_size and (cycle_second_step_size) to 2000 which is the number of steps that it takes for four GPUs to process 8000 batches. We hope this brief example sparks your imagination on using LRRT for your own unique tuning challenges. Updated: November 5, 2025 Previous Next
47
-
48
- ```
49
- lr_range_test_min_lr
50
- ```
51
-
52
- **Pattern 5:** Training Overview and Features Contents Overview Distributed, Effective, and Efficient Training with Ease Speed Memory efficiency Scalability Communication efficiency Data efficiency Supporting long sequence length Fast convergence for effectiveness Good Usability Features Distributed Training with Mixed Precision Mixed Precision Training Single-GPU, Multi-GPU, and Multi-Node Training Pipeline Parallelism Model Parallelism Support for Custom Model Parallelism Integration with Megatron-LM The Zero Redundancy Optimizer Optimizer State and Gradient Partitioning Activation Partitioning Constant Buffer Optimization (CBO) Contiguous Memory Optimization (CMO) ZeRO-Offload Additional Memory and Bandwidth Optimizations Smart Gradient Accumulation Communication Overlapping Training Features Simplified training API Activation Checkpointing API Gradient Clipping Automatic loss scaling with mixed precision Training Optimizers 1-bit Adam, 0/1 Adam and 1-bit LAMB optimizers with up to 26x less communication Fused Adam optimizer and arbitrary torch.optim.Optimizer CPU-Adam: High-Performance vectorized implementation of Adam Memory bandwidth optimized FP16 Optimizer Large Batch Training with LAMB Optimizer Memory-Efficient Training with ZeRO Optimizer Training Agnostic Checkpointing Advanced parameter search Learning Rate Range Test 1Cycle Learning Rate Schedule Simplified Data Loader Data Efficiency Curriculum Learning Performance Analysis and Debugging Wall Clock Breakdown Timing Activation Checkpoint Functions Flops Profiler Autotuning Monitor Communication Logging Sparse Attention Mixture of Experts (MoE) Overview Training advanced deep learning models is challenging. Beyond model design, model scientists also need to set up the state-of-the-art training techniques such as distributed training, mixed precision, gradient accumulation, and checkpointing. Yet still, scientists may not achieve the desired system performance and convergence rate. Large model sizes are even more challenging: a large model easily runs out of memory with pure data parallelism and it is difficult to use model parallelism. DeepSpeed addresses these challenges to accelerate model development and training. Distributed, Effective, and Efficient Training with Ease The DeepSpeed API is a lightweight wrapper on PyTorch. This means that you can use everything you love in PyTorch and without learning a new platform. In addition, DeepSpeed manages all of the boilerplate state-of-the-art training techniques, such as distributed training, mixed precision, gradient accumulation, and checkpoints so that you can focus on your model development. Most importantly, you can leverage the distinctive efficiency and effectiveness benefit of DeepSpeed to boost speed and scale with just a few lines of code changes to your PyTorch models. Speed DeepSpeed achieves high performance and fast convergence through a combination of efficiency optimizations on compute/communication/memory/IO and effectiveness optimizations on advanced hyperparameter tuning and optimizers. For example: DeepSpeed trains BERT-large to parity in 44 mins using 1024 V100 GPUs (64 DGX-2 boxes) and in 2.4 hours using 256 GPUs (16 DGX-2 boxes). BERT-large Training Times Devices Source Training Time 1024 V100 GPUs DeepSpeed 44 min 256 V100 GPUs DeepSpeed 2.4 hr 64 V100 GPUs DeepSpeed 8.68 hr 16 V100 GPUs DeepSpeed 33.22 hr BERT code and tutorials will be available soon. DeepSpeed trains GPT2 (1.5 billion parameters) 3.75x faster than state-of-art, NVIDIA Megatron on Azure GPUs. Read more: GPT tutorial Memory efficiency DeepSpeed provides memory-efficient data parallelism and enables training models without model parallelism. For example, DeepSpeed can train models with up to 13 billion parameters on a single GPU. In comparison, existing frameworks (e.g., PyTorch’s Distributed Data Parallel) run out of memory with 1.4 billion parameter models. DeepSpeed reduces the training memory footprint through a novel solution called Zero Redundancy Optimizer (ZeRO). Unlike basic data parallelism where memory states are replicated across data-parallel processes, ZeRO partitions model states and gradients to save significant memory. Furthermore, it also reduces activation memory and fragmented memory. The current implementation (ZeRO-2) reduces memory by up to 8x relative to the state-of-art. You can read more about ZeRO in our paper, and in our blog posts related to ZeRO-1 and ZeRO-2. With this impressive memory reduction, early adopters of DeepSpeed have already produced a language model (LM) with over 17B parameters called Turing-NLG, establishing a new SOTA in the LM category. For model scientists with limited GPU resources, ZeRO-Offload leverages both CPU and GPU memory for training large models. Using a machine with a single GPU, our users can run models of up to 13 billion parameters without running out of memory, 10x bigger than the existing approaches, while obtaining competitive throughput. This feature democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models. Scalability DeepSpeed supports efficient data parallelism, model parallelism, pipeline parallelism and their combinations, which we call 3D parallelism. 3D parallelism of DeepSpeed provides system support to run models with trillions of parameters, read more in our press-release and tutorial. DeepSpeed can run large models more efficiently, up to 10x faster for models with various sizes spanning 1.5B to hundred billion. More specifically, the data parallelism powered by ZeRO is complementary and can be combined with different types of model parallelism. It allows DeepSpeed to fit models using lower degree of model parallelism and higher batch size, offering significant performance gains compared to using model parallelism alone. Read more: ZeRO paper, and GPT tutorial. The figure depicts system throughput improvements of DeepSpeed (combining ZeRO-powered data parallelism with model parallelism of NVIDIA Megatron-LM) over using Megatron-LM alone. Communication efficiency Pipeline parallelism of DeepSpeed reduce communication volume during distributed training, which allows users to train multi-billion-parameter models 2–7x faster on clusters with limited network bandwidth. 1-bit Adam, 0/1 Adam and 1-bit LAMB reduce communication volume by up to 26x while achieving similar convergence efficiency to Adam, allowing for scaling to different types of GPU clusters and networks. 1-bit Adam blog post, 1-bit Adam tutorial, 0/1 Adam tutorial, 1-bit LAMB tutorial. Data efficiency DeepSpeed Data Efficiency Library provides efficient data sampling via curriculum learning and efficient data routing via random layerwise token dropping. The composed solution enables up to 2x data and 2x time saving during GPT-3/BERT pretraining and GPT/ViT finetuning, or further improve model quality under the same data/time. See more in the tutorial. Supporting long sequence length DeepSpeed offers sparse attention kernels—an instrumental technology to support long sequences of model inputs, whether for text, image, or sound. Compared with the classic dense Transformers, it powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution with comparable accuracy. It also outperforms state-of-the-art sparse implementations with 1.5–3x faster execution. Furthermore, our sparse kernels support efficient execution of flexible sparse format and empower users to innovate on their custom sparse structures. Read more here. Fast convergence for effectiveness DeepSpeed supports advanced hyperparameter tuning and large batch size optimizers such as LAMB. These improve the effectiveness of model training and reduce the number of samples required to convergence to desired accuracy. Read more: Tuning tutorial. Good Usability Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to 13 billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.4 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custom model parallelisms, such as tensor slicing of NVIDIA’s Megatron-LM. Features Below we provide a brief feature list, see our detailed feature overview for descriptions and usage. Distributed Training with Mixed Precision 16-bit mixed precision Single-GPU/Multi-GPU/Multi-Node Model Parallelism Support for Custom Model Parallelism Integration with Megatron-LM Pipeline Parallelism 3D Parallelism The Zero Redundancy Optimizer Optimizer State and Gradient Partitioning Activation Partitioning Constant Buffer Optimization Contiguous Memory Optimization ZeRO-Offload Leverage both CPU/GPU memory for model training Support 10B model training on a single GPU Ultra-fast dense transformer kernels Sparse attention Memory- and compute-efficient sparse kernels Support 10x long sequences than dense Flexible support to different sparse structures 1-bit Adam, 0/1 Adam and 1-bit LAMB Custom communication collective Up to 26x communication volume saving Additional Memory and Bandwidth Optimizations Smart Gradient Accumulation Communication/Computation Overlap Training Features Simplified training API Gradient Clipping Automatic loss scaling with mixed precision Training Optimizers Fused Adam optimizer and arbitrary torch.optim.Optimizer Memory bandwidth optimized FP16 Optimizer Large Batch Training with LAMB Optimizer Memory efficient Training with ZeRO Optimizer CPU-Adam Training Agnostic Checkpointing Advanced Parameter Search Learning Rate Range Test 1Cycle Learning Rate Schedule Simplified Data Loader Data Efficiency Efficient data sampling via curriculum learning and efficient data routing via random layerwise token dropping Up to 2x data and 2x time saving during GPT-3/BERT pretraining and GPT/ViT finetuning Or further improve model quality under the same data/time Curriculum Learning A curriculum learning-based data pipeline that presents easier or simpler examples earlier during training Stable and 3.3x faster GPT-2 pre-training with 8x/4x larger batch size/learning rate while maintaining token-wise convergence speed Complementary to many other DeepSpeed features Note that the Data Efficiency Library above provides more general curriculum learning support. This legacy curriculum learning feature is still supported but we recommend to use the Data Efficiency Library. Progressive Layer Dropping Efficient and robust compressed training Up to 2.5x convergence speedup for pre-training Performance Analysis and Debugging Mixture of Experts (MoE) title: “Feature Overview” layout: single permalink: /features/ toc: true toc_label: “Contents” — Distributed Training with Mixed Precision Mixed Precision Training Enable 16-bit (FP16) training by in the deepspeed_config JSON. "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "hysteresis": 2, "consecutive_hysteresis": false, "min_loss_scale": 1 } Single-GPU, Multi-GPU, and Multi-Node Training Easily switch between single-GPU, single-node multi-GPU, or multi-node multi-GPU execution by specifying resources with a hostfile. deepspeed --hostfile=<hostfile> \ <client_entry.py> <client args> \ --deepspeed --deepspeed_config ds_config.json The script <client_entry.py> will execute on the resources specified in <hostfile>. Pipeline Parallelism DeepSpeed provides pipeline parallelism for memory- and communication- efficient training. DeepSpeed supports a hybrid combination of data, model, and pipeline parallelism and has scaled to over one trillion parameters using 3D parallelism. Pipeline parallelism can also improve communication efficiency and has accelerated training by up to 7x on low-bandwidth clusters. Model Parallelism Support for Custom Model Parallelism DeepSpeed supports all forms of model parallelism including tensor slicing based approaches such as the Megatron-LM. It does so by only requiring the model parallelism framework to provide a model parallelism unit (mpu) that implements a few bookkeeping functionalities: mpu.get_model_parallel_rank() mpu.get_model_parallel_group() mpu.get_model_parallel_world_size() mpu.get_data_parallel_rank() mpu.get_data_parallel_group() mpu.get_data_parallel_world_size() Integration with Megatron-LM DeepSpeed is fully compatible with Megatron. Please see the Megatron-LM tutorial for details. The Zero Redundancy Optimizer The Zero Redundancy Optimizer (ZeRO) is at the heart of DeepSpeed and enables large model training at a scale that is simply not possible with model parallelism alone. When enabled, ZeRO allows training models with over 13 billion parameters without any model parallelism, and up to 200 billion parameter models with model parallelism on current generation hardware. For more details see the ZeRO paper, GPT tutorial on integration with DeepSpeed. Optimizer State and Gradient Partitioning Optimizer State and Gradient Partitioning in ZeRO reduces the memory consumption of the model states (optimizer states, gradients and parameters) by 8x compared to standard data parallelism by partitioning these states across data parallel process instead of replicating them. Activation Partitioning Activation Partitioning is a memory optimization in ZeRO that can reduce the memory consumed by activations during model parallel training (MP). In MP certain activations maybe required by all MP processes, resulting in a replication of activations across MP GPUs. Activation Partitioning stores these activations in a partitioned state once they are used for computation in the forward propagation. These activations are allgathered right before they are needed again during the backward propagation. By storing activations in a partitioned state, ZeRO in DeepSpeed can reduce the activation memory footprint proportional to the MP degree. Constant Buffer Optimization (CBO) CBO enables high network and memory throughput while restricting memory usage to a constant size. For memory- and network-bound operations such as normalization or allreduce collectives, the performance depends on the size of the operand. Simply fusing all operands into a single large operand can enable great throughput at the expense of unnecessary memory overhead. CBO in DeepSpeed fuses smaller operands into approximately a pre-defined sized buffer large enough to achieve great performance without the unnecessary memory overhead. Contiguous Memory Optimization (CMO) CMO reduces memory fragmentation during training, preventing out of memory errors due to lack of contiguous memory. Memory fragmentation is a result of interleaving between short lived and long lived memory objects. During the forward propagation activation checkpoints are long lived but the activations that recomputed are short lived. Similarly, during the backward computation, the activation gradients are short lived while the parameter gradients are long lived. CMO transfers activation checkpoints and parameter gradients to contiguous buffers preventing memory fragmentation. ZeRO-Offload ZeRO-Offload pushes the boundary of the maximum model size that can be trained efficiently using minimal GPU resources, by exploiting computational and memory resources on both GPUs and their host CPUs. It allows training up to 13-billion-parameter models on a single NVIDIA V100 GPU, 10x larger than the state-of-the-art, while retaining high training throughput of over 30 teraflops per GPU. For more details see the ZeRO-Offload release blog, and tutorial on integration with DeepSpeed. Additional Memory and Bandwidth Optimizations Smart Gradient Accumulation Gradient accumulation allows running larger batch size with limited memory by breaking an effective batch into several sequential micro-batches, and averaging the parameter gradients across these micro-batches. Furthermore, instead of averaging the gradients of each micro-batch across all GPUs, the gradients are averaged locally during each step of the sequence, and a single allreduce is done at the end of the sequence to produce the averaged gradients for the effective batch across all GPUs. This strategy significantly reduces the communication involved over the approach of averaging globally for each micro-batch, specially when the number of micro-batches per effective batch is large. Communication Overlapping During back propagation, DeepSpeed can overlap the communication required for averaging parameter gradients that have already been computed with the ongoing gradient computation. This computation-communication overlap allows DeepSpeed to achieve higher throughput even at modest batch sizes. Training Features Simplified training API The DeepSpeed core API consists of just a handful of methods: initialization: initialize training: backward and step argument parsing: add_config_arguments checkpointing : load_checkpoint and store_checkpoint DeepSpeed supports most of the features described in this document, via the use of these API, along with a deepspeed_config JSON file for enabling and disabling the features. Please see the core API doc for more details. Activation Checkpointing API DeepSpeed’s Activation Checkpointing API supports activation checkpoint partitioning, cpu checkpointing, and contiguous memory optimizations, while also allowing layerwise profiling. Please see the core API doc for more details. Gradient Clipping { "gradient_clipping": 1.0 } DeepSpeed handles gradient clipping under the hood based on the max gradient norm specified by the user. Please see the core API doc for more details. Automatic loss scaling with mixed precision DeepSpeed internally handles loss scaling for mixed precision training. The parameters for loss scaling can be specified in the deepspeed_config JSON file. Please see the core API doc for more details. Training Optimizers 1-bit Adam, 0/1 Adam and 1-bit LAMB optimizers with up to 26x less communication DeepSpeed has three communication-efficient optimizers called 1-bit Adam, 0/1 Adam and 1-bit LAMB. They offer the same convergence as Adam/LAMB, incur up to 26x less communication that enables up to 6.6x higher throughput for BERT-Large pretraining and up to 2.7x higher throughput for SQuAD fine-tuning on bandwidth-limited clusters. For more details on usage and performance, please refer to the 1-bit Adam tutorial, 1-bit Adam blog post, 0/1 Adam tutorial and 1-bit LAMB tutorial. For technical details, please refer to the 1-bit Adam paper, 0/1 Adam paper and 1-bit LAMB paper. Fused Adam optimizer and arbitrary torch.optim.Optimizer With DeepSpeed, the user can choose to use a high performance implementation of ADAM from NVIDIA, or any training optimizer that extends torch’s torch.optim.Optimizer class. CPU-Adam: High-Performance vectorized implementation of Adam We introduce an efficient implementation of Adam optimizer on CPU that improves the parameter-update performance by nearly an order of magnitude. We use the AVX SIMD instructions on Intel-x86 architecture for the CPU-Adam implementation. We support both AVX-512 and AVX-2 instruction sets. DeepSpeed uses AVX-2 by default which can be switched to AVX-512 by setting the build flag, DS_BUILD_AVX512 to 1 when installing DeepSpeed. Using AVX-512, we observe 5.1x to 6.5x speedups considering the model-size between 1 to 10 billion parameters with respect to torch-adam. Memory bandwidth optimized FP16 Optimizer Mixed precision training is handled by the DeepSpeed FP16 Optimizer. This optimizer not only handles FP16 training but is also highly efficient. The performance of weight update is primarily dominated by the memory bandwidth, and the achieved memory bandwidth is dependent on the size of the input operands. The FP16 Optimizer is designed to maximize the achievable memory bandwidth by merging all the parameters of the model into a single large buffer, and applying the weight updates in a single kernel, allowing it to achieve high memory bandwidth. Large Batch Training with LAMB Optimizer DeepSpeed makes it easy to train with large batch sizes by enabling the LAMB Optimizer. For more details on LAMB, see the LAMB paper. Memory-Efficient Training with ZeRO Optimizer DeepSpeed can train models with up to 13 billion parameters without model parallelism, and models with up to 200 billion parameters with 16-way model parallelism. This leap in model size is possible through the memory efficiency achieved via the ZeRO Optimizer. For more details see ZeRO paper . Training Agnostic Checkpointing DeepSpeed can simplify checkpointing for you regardless of whether you are using data parallel training, model parallel training, mixed-precision training, a mix of these three, or using the zero optimizer to enable larger model sizes. Please see the Getting Started guide and the core API doc for more details. Advanced parameter search DeepSpeed supports multiple Learning Rate Schedules to enable faster convergence for large batch scaling. Learning Rate Range Test Please refer to the Learning Rate Range Test tutorial. 1Cycle Learning Rate Schedule Please refer to the 1Cycle Learning Rate Schedule tutorial. Simplified Data Loader DeepSpeed abstracts away data parallelism and model parallelism from the user when it comes to data loading. Users simply provide a PyTorch dataset, and DeepSpeed data loader can automatically handle batch creation appropriately. Data Efficiency Please refer to the Data Efficiency tutorial. Curriculum Learning Please refer to the Curriculum Learning tutorial. Note that the Data Efficiency Library above provides more general curriculum learning support. This legacy curriculum learning feature is still supported but we recommend to use the Data Efficiency Library. Performance Analysis and Debugging DeepSpeed provides a set of tools for performance analysis and debugging. Wall Clock Breakdown DeepSpeed provides a detailed breakdown of the time spent in different parts of the training. This can be enabled by setting the following in the deepspeed_config file. { "wall_clock_breakdown": true, } Timing Activation Checkpoint Functions When activation checkpointing is enabled, profiling the forward and backward time of each checkpoint function can be enabled in the deepspeed_config file. { "activation_checkpointing": { "profile": true } } Flops Profiler The DeepSpeed flops profiler measures the time, flops and parameters of a PyTorch model and shows which modules or layers are the bottleneck. When used with the DeepSpeed runtime, the flops profiler can be configured in the deepspeed_config file as follows: { "flops_profiler": { "enabled": true, "profile_step": 1, "module_depth": -1, "top_modules": 3, "detailed": true, } } The flops profiler can also be used as a standalone package. Please refer to the Flops Profiler tutorial for more details. Autotuning The DeepSpeed Autotuner uses model information, system information, and heuristics to efficiently tune Zero stage, micro batch size, and other Zero configurations. Using the autotuning feature requires no code change from DeepSpeed users. While "autotuning": {"enabled": true} is the minimal required to enable autotuning, there are other parameters users can define to configure the autotuning process. Below shows major parameters and their default values in the autotuning configuration. Please refer to the Autotuning tutorial for more details. { "autotuning": { "enabled": true, "results_dir": null, "exps_dir": null, "overwrite": false, "metric": "throughput", "num_nodes": null, "num_gpus": null, "start_profile_step": 3, "end_profile_step": 5, "fast": true, "num_tuning_micro_batch_sizes": 3, "tuner_type": "model_based", "tuner_early_stopping": 5, "tuner_num_trials": 50, "arg_mappings": null } } The flops profiler can also be used as a standalone package. Please refer to the Flops Profiler tutorial for more details. Monitor The DeepSpeed Monitor logs live training metrics to one or more monitoring backends, including PyTorch’s TensorBoard, WandB, or simply to CSV files. The Monitor can be configured with one or more backends in the deepspeed_config file as follows: { "tensorboard": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } "wandb": { "enabled": true, "team": "my_team", "group": "my_group", "project": "my_project" } "csv_monitor": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } } The Monitor can also be added to log custom metrics and client codes. Please refer to the Monitor tutorial for more details. Communication Logging DeepSpeed provides logging of all communication operations launched within deepspeed.comm. The communication logger can be configured in the deepspeed_config file as follows: { "comms_logger": { "enabled": true, "verbose": false, "prof_all": true, "debug": false } } Client codes can then print a summary with a call to deepspeed.comm.log_summary(). For more details and example usage, see the Communication Logging tutorial. Sparse Attention DeepSpeed offers sparse attention to support long sequences. Please refer to the Sparse Attention tutorial. --deepspeed_sparse_attention "sparse_attention": { "mode": "fixed", "block": 16, "different_layout_per_head": true, "num_local_blocks": 4, "num_global_blocks": 1, "attention": "bidirectional", "horizontal_global_attention": false, "num_different_global_patterns": 4 } Mixture of Experts (MoE) To learn more about training Mixture of Experts (MoE) models with DeepSpeed, see our tutorial for more details.
53
-
54
- ```
55
- torch.optim.Optimizer
56
- ```
57
-
58
- **Pattern 6:** Flops Profiler Contents Overview Flops Measurement Multi-GPU, Multi-node, Data Parallelism, and Model Parallelism Usage Usage With the DeepSpeed Runtime Example: Megatron-LM Usage Outside the DeepSpeed Runtime In Model Inference Example: AlexNet Example: Bert In Model Training Workflow Example Training Workflow In this tutorial, we introduce the DeepSpeed Flops Profiler and provide examples of its usage. Overview Flops Measurement Multi-GPU, Multi-node, Data Parallelism, and Model Parallelism Usage Overview Effective use of hardware resources is critical to good performance, but performance inefficiency in existing implementations for large-scale model training and inference are often hard to spot and attribute to specific module components. DeepSpeed Flops Profiler helps users easily measure both the model training/inference speed (latency, throughput) and efficiency (floating-point operations per second, i.e., FLOPS) of a model and its submodules, with an eye towards eliminating inefficiencies in existing implementations. Below is an example output for BERT-Large(NVIDIA) on an A100 GPU with batch size 80: -------------------------- DeepSpeed Flops Profiler -------------------------- Profile Summary at step 10: Notations: data parallel size (dp_size), model parallel size(mp_size), number of parameters (params), number of multiply-accumulate operations(MACs), number of floating-point operations (flops), floating-point operations per second (FLOPS), fwd latency (forward propagation latency), bwd latency (backward propagation latency), step (weights update latency), iter latency (sum of fwd, bwd and step latency) world size: 1 data parallel size: 1 model parallel size: 1 batch size per GPU: 80 params per gpu: 336.23 M params of model = params per GPU * mp_size: 336.23 M fwd MACs per GPU: 3139.93 G fwd flops per GPU: 6279.86 G fwd flops of model = fwd flops per GPU * mp_size: 6279.86 G fwd latency: 76.67 ms bwd latency: 108.02 ms fwd FLOPS per GPU = fwd flops per GPU / fwd latency: 81.9 TFLOPS bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: 116.27 TFLOPS fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency): 102.0 TFLOPS step latency: 34.09 us iter latency: 184.73 ms samples/second: 433.07 ----------------------------- Aggregated Profile per GPU ----------------------------- Top modules in terms of params, MACs or fwd latency at different model depths: depth 0: params - {'BertForPreTrainingPreLN': '336.23 M'} MACs - {'BertForPreTrainingPreLN': '3139.93 GMACs'} fwd latency - {'BertForPreTrainingPreLN': '76.39 ms'} depth 1: params - {'BertModel': '335.15 M', 'BertPreTrainingHeads': '32.34 M'} MACs - {'BertModel': '3092.96 GMACs', 'BertPreTrainingHeads': '46.97 GMACs'} fwd latency - {'BertModel': '34.29 ms', 'BertPreTrainingHeads': '3.23 ms'} depth 2: params - {'BertEncoder': '302.31 M', 'BertLMPredictionHead': '32.34 M'} MACs - {'BertEncoder': '3092.88 GMACs', 'BertLMPredictionHead': '46.97 GMACs'} fwd latency - {'BertEncoder': '33.45 ms', 'BertLMPredictionHead': '2.61 ms'} depth 3: params - {'ModuleList': '302.31 M', 'Embedding': '31.79 M', 'Linear': '31.26 M'} MACs - {'ModuleList': '3092.88 GMACs', 'Linear': '36.23 GMACs'} fwd latency - {'ModuleList': '33.11 ms', 'BertPredictionHeadTransform': '1.83 ms''} depth 4: params - {'BertLayer': '302.31 M', 'LinearActivation': '1.05 M''} MACs - {'BertLayer': '3092.88 GMACs', 'LinearActivation': '10.74 GMACs'} fwd latency - {'BertLayer': '33.11 ms', 'LinearActivation': '1.43 ms'} depth 5: params - {'BertAttention': '100.76 M', 'BertIntermediate': '100.76 M'} MACs - {'BertAttention': '1031.3 GMACs', 'BertIntermediate': '1030.79 GMACs'} fwd latency - {'BertAttention': '19.83 ms', 'BertOutput': '4.38 ms'} depth 6: params - {'LinearActivation': '100.76 M', 'Linear': '100.69 M'} MACs - {'LinearActivation': '1030.79 GMACs', 'Linear': '1030.79 GMACs'} fwd latency - {'BertSelfAttention': '16.29 ms', 'LinearActivation': '3.48 ms'} ------------------------------ Detailed Profile per GPU ------------------------------ Each module profile is listed after its name in the following order: params, percentage of total params, MACs, percentage of total MACs, fwd latency, percentage of total fwd latency, fwd FLOPS BertForPreTrainingPreLN( 336.23 M, 100.00% Params, 3139.93 GMACs, 100.00% MACs, 76.39 ms, 100.00% latency, 82.21 TFLOPS, (bert): BertModel( 335.15 M, 99.68% Params, 3092.96 GMACs, 98.50% MACs, 34.29 ms, 44.89% latency, 180.4 TFLOPS, (embeddings): BertEmbeddings(...) (encoder): BertEncoder( 302.31 M, 89.91% Params, 3092.88 GMACs, 98.50% MACs, 33.45 ms, 43.79% latency, 184.93 TFLOPS, (FinalLayerNorm): FusedLayerNorm(...) (layer): ModuleList( 302.31 M, 89.91% Params, 3092.88 GMACs, 98.50% MACs, 33.11 ms, 43.35% latency, 186.8 TFLOPS, (0): BertLayer( 12.6 M, 3.75% Params, 128.87 GMACs, 4.10% MACs, 1.29 ms, 1.69% latency, 199.49 TFLOPS, (attention): BertAttention( 4.2 M, 1.25% Params, 42.97 GMACs, 1.37% MACs, 833.75 us, 1.09% latency, 103.08 TFLOPS, (self): BertSelfAttention( 3.15 M, 0.94% Params, 32.23 GMACs, 1.03% MACs, 699.04 us, 0.92% latency, 92.22 TFLOPS, (query): Linear(1.05 M, 0.31% Params, 10.74 GMACs, 0.34% MACs, 182.39 us, 0.24% latency, 117.74 TFLOPS,...) (key): Linear(1.05 M, 0.31% Params, 10.74 GMACs, 0.34% MACs, 57.22 us, 0.07% latency, 375.3 TFLOPS,...) (value): Linear(1.05 M, 0.31% Params, 10.74 GMACs, 0.34% MACs, 53.17 us, 0.07% latency, 403.91 TFLOPS,...) (dropout): Dropout(...) (softmax): Softmax(...) ) (output): BertSelfOutput( 1.05 M, 0.31% Params, 10.74 GMACs, 0.34% MACs, 114.68 us, 0.15% latency, 187.26 TFLOPS, (dense): Linear(1.05 M, 0.31% Params, 10.74 GMACs, 0.34% MACs, 64.13 us, 0.08% latency, 334.84 TFLOPS, ...) (dropout): Dropout(...) ) ) (PreAttentionLayerNorm): FusedLayerNorm(...) (PostAttentionLayerNorm): FusedLayerNorm(...) (intermediate): BertIntermediate( 4.2 M, 1.25% Params, 42.95 GMACs, 1.37% MACs, 186.68 us, 0.24% latency, 460.14 TFLOPS, (dense_act): LinearActivation(4.2 M, 1.25% Params, 42.95 GMACs, 1.37% MACs, 175.0 us, 0.23% latency, 490.86 TFLOPS,...) ) (output): BertOutput( 4.2 M, 1.25% Params, 42.95 GMACs, 1.37% MACs, 116.83 us, 0.15% latency, 735.28 TFLOPS, (dense): Linear(4.2 M, 1.25% Params, 42.95 GMACs, 1.37% MACs, 65.57 us, 0.09% latency, 1310.14 TFLOPS,...) (dropout): Dropout(...) ) ) ... (23): BertLayer(...) ) ) (pooler): BertPooler(...) ) (cls): BertPreTrainingHeads(...) ) ------------------------------------------------------------------------------ In the summary profile, the DeepSpeed Flops Profiler outputs the number of parameters, floating-point operations (flops), FLOPS, latency, and throughput in samples/second of the model. This profile shows how much performance gap (compared to the peak hardware performance) the current model execution has and helps users tune the training or inference setup (e.g., hyperparameters, data parallelism, model parallelism, system configurations, etc.) for better performance. The DeepSpeed Flops Profiler also measures significant modules at different model depths (aggregated profile) and module-specific profile in the model architecture (detailed profile). Using these profiles, DeepSpeed users can understand how each layer or submodule contributes to the overall model complexity/performance. Then users can adjust or refactor the model design to improve performance. For example, using the profiler, DeepSpeed users can quantitatively tell if stacking smaller layers is lighter or more performant than having bigger ones. The aggregated and detailed profiles also allow users to quickly identify bottleneck modules. In the BERT-Large example above, using the DeepSpeed Flops Profiler, we find that BertLayer is the most significant layer and contains quite a few dropout, softmax, and layer norm along with linear modules. These modules are not heavy in flops and would trigger many GPU kernel invocations and create excessive read/write requests to memory. The pattern shown in the detailed profile suggests this is a perfect match for kernel fusion, and we developed fused transformer-kernels to reduce data movement (see DeepSpeedBert). After applying our optimizations, we see a 25% improvement in FLOPS per GPU and overall training samples/second in the DeepSpeed Flops Profiler output. The DeepSpeed Flops Profiler can be used with the DeepSpeed runtime without any user code change or be used independently from DeepSpeed as a standalone package. When using DeepSpeed for model training, the profiler can be enabled in the DeepSpeed configuration file. As a standalone package, the profiler API can be used in both training and inference code. The DeepSpeed profiler is still under active development and includes just initial features. Stay connected for more exciting features to be added soon. Flops Measurement Similar to existing flops calculation tools or methods, the DeepSpeed Flops Profiler measures the flops of the forward pass of a module and the flops of the backward pass is estimated as 2 times of that of the forward pass. Different from the PyTorch profiler which calculates the flops of PyTorch operators, the DeepSpeed Flops Profiler measures the flops within modules in a model and provides more insights to the users about the model execution. The flops estimation is partly inspired by ptflops with the major difference being that the DeepSpeed Flops Profiler not only supports flops computation directly at module level, but can also capture torch.nn.functional invoked in a module to estimate the flops. Thus the DeepSpeed Flops Profiler allows for customized modules in the model, e.g., ParallelTransformerLayerworks, ParallelSelfAttention, RowParallelLinear, etc. in Megatron-LM. This is in contrast to ptflops which requires users to write customized flops calculation functions for each customized module. Multi-GPU, Multi-node, Data Parallelism, and Model Parallelism The DeepSpeed Flops Profiler outputs the per GPU profile as well as the world size, data parallel size, and model parallel size. For models running on multi-GPU or multi-node, only change of the model parallelism (e.g., --model-parallel-size in Megatron-LM) affects the number of flops and parameters profiled, i.e., model_parallel_size * flops = total_flops and model_parallel_size * parameters = total_parameters. The data parallel size or world size (related to the number of GPUs or nodes) does not affect the per GPU profile. Usage The DeepSpeed Flops Profiler can be used with the DeepSpeed runtime or as a standalone package. When using DeepSpeed for model training, the profiler can be configured in the deepspeed configuration file without user code changes. To use the flops profiler outside the DeepSpeed runtime, install DeepSpeed and import the flops_profiler package to use the APIs directly. Examples of each usage are given below. Usage With the DeepSpeed Runtime Example: Megatron-LM Usage Outside the DeepSpeed Runtime In Model Inference Example: AlexNet Example: Bert In Model Training Workflow Example Training Workflow Usage With the DeepSpeed Runtime When using DeepSpeed for model training, the profiler can be configured in the deepspeed configuration file. No explicit API calls are needed to use the profiler. The profiler can be enabled by adding the following field to deepspeed’s configuration json file. Refer to flops profiler for details. { "flops_profiler": { "enabled": true, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } } Example: Megatron-LM For information on running Megatron-LM with DeepSpeed, please refer to our tutorial Megatron-LM. An example output of 12-layer Megatron-LM model (hidden_size = 8192, num_attention_heads = 32, batch_size = 1024, seq_length = 1024) is shown below. -------------------------- DeepSpeed Flops Profiler -------------------------- Profile Summary at step 10: Notations: data parallel size (dp_size), model parallel size(mp_size), number of parameters (params), number of multiply-accumulate operations(MACs), number of floating-point operations (flops), floating-point operations per second (FLOPS), fwd latency (forward propagation latency), bwd latency (backward propagation latency), step (weights update latency), iter latency (sum of fwd, bwd and step latency) world size: 1 data parallel size: 1 model parallel size: 1 batch size per GPU: 1024 params per gpu: 1.29 M params of model = params per GPU * mp_size: 1.29 M fwd MACs per GPU: 41271.95 G fwd flops per GPU: 82543.9 G fwd flops of model = fwd flops per GPU * mp_size: 82543.9 G fwd latency: 1.89 s bwd latency: 5.38 s fwd FLOPS per GPU = fwd flops per GPU / fwd latency: 43.68 TFLOPS bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: 30.7 TFLOPS fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency): 34.07 TFLOPS step latency: 34.12 s iter latency: 41.39 s samples/second: 24.74 ----------------------------- Aggregated Profile per GPU ----------------------------- Top 1 modules in terms of params, MACs or fwd latency at different model depths: depth 0: params - {'GPT2Model': '1.29 M'} MACs - {'GPT2Model': '41271.95 GMACs'} fwd latency - {'GPT2Model': '1.84 s'} depth 1: params - {'TransformerLanguageModel': '1.29 M'} MACs - {'TransformerLanguageModel': '39584.03 GMACs'} fwd latency - {'TransformerLanguageModel': '1.83 s'} depth 2: params - {'ParallelTransformer': '1.29 M'} MACs - {'ParallelTransformer': '39584.03 GMACs'} fwd latency - {'ParallelTransformer': '1.81 s'} depth 3: params - {'ModuleList': '1.28 M'} MACs - {'ModuleList': '39584.03 GMACs'} fwd latency - {'ModuleList': '1.3 s'} depth 4: params - {'ParallelTransformerLayerPart2': '688.15 k'} MACs - {'ParallelTransformerLayerPart2': '26388.28 GMACs'} fwd latency - {'ParallelTransformerLayerPart2': '865.73 ms'} depth 5: params - {'ParallelMLP': '491.54 k'} MACs - {'ParallelMLP': '26388.28 GMACs'} fwd latency - {'ParallelMLP': '849.4 ms'} ------------------------------ Detailed Profile per GPU ------------------------------ Each module profile is listed after its name in the following order: params, percentage of total params, MACs, percentage of total MACs, fwd latency, percentage of total fwd latency, fwd FLOPS Note: 1. A module can have torch.nn.module or torch.nn.functional to compute logits (e.g. CrossEntropyLoss). They are not counted as submodules, thus not to be printed out. However they make up the difference between a parent's MACs(or latency) and the sum of its submodules'. 1. Number of floating-point operations is a theoretical estimation, thus FLOPS computed using that could be larger than the maximum system throughput. 2. The fwd latency listed in the top module's profile is directly captured at the module forward function in PyTorch, thus it's less than the fwd latency shown above which is captured in DeepSpeed. GPT2Model( 1.29 M, 100.00% Params, 41271.95 GMACs, 100.00% MACs, 1.84 s, 100.00% latency, 44.78 TFLOPS, (language_model): TransformerLanguageModel( 1.29 M, 100.00% Params, 39584.03 GMACs, 95.91% MACs, 1.83 s, 99.11% latency, 43.34 TFLOPS, (embedding): Embedding( 2, 0.00% Params, 0 MACs, 0.00% MACs, 18.1 ms, 0.98% latency, 0.0 FLOPS, (word_embeddings): VocabParallelEmbedding(1, 0.00% Params, 0 MACs, 0.00% MACs, 164.75 us, 0.01% latency, 0.0 FLOPS, ) (position_embeddings): Embedding(1, 0.00% Params, 0 MACs, 0.00% MACs, 489.23 us, 0.03% latency, 0.0 FLOPS, 1024, 8192) (embedding_dropout): Dropout(0, 0.00% Params, 0 MACs, 0.00% MACs, 93.94 us, 0.01% latency, 0.0 FLOPS, p=0.1, inplace=False) ) (transformer): ParallelTransformer( 1.29 M, 100.00% Params, 39584.03 GMACs, 95.91% MACs, 1.81 s, 98.11% latency, 43.78 TFLOPS, (layers): ModuleList( 1.28 M, 98.73% Params, 39584.03 GMACs, 95.91% MACs, 1.3 s, 70.66% latency, 60.79 TFLOPS, (0): ParallelTransformerLayerPart1( 49.15 k, 3.80% Params, 1099.65 GMACs, 2.66% MACs, 23.5 ms, 1.27% latency, 93.6 TFLOPS, (input_layernorm): FusedLayerNorm(16.38 k, 1.27% Params, 0 MACs, 0.00% MACs, 128.75 us, 0.01% latency, 0.0 FLOPS, torch.Size([8192]), eps=1e-05, elementwise_affine=True) (attention): ParallelSelfAttention( 32.77 k, 2.53% Params, 1099.65 GMACs, 2.66% MACs, 22.8 ms, 1.24% latency, 96.46 TFLOPS, (query_key_value): ColumnParallelLinear(24.58 k, 1.90% Params, 824.63 GMACs, 2.00% MACs, 8.93 ms, 0.48% latency, 184.7 TFLOPS, ) (scale_mask_softmax): FusedScaleMaskSoftmax(0, 0.00% Params, 134.22 MMACs, 0.00% MACs, 151.16 us, 0.01% latency, 1.78 TFLOPS, ) (attention_dropout): Dropout(0, 0.00% Params, 0 MACs, 0.00% MACs, 79.63 us, 0.00% latency, 0.0 FLOPS, p=0.1, inplace=False) (dense): RowParallelLinear(8.19 k, 0.63% Params, 274.88 GMACs, 0.67% MACs, 2.67 ms, 0.14% latency, 205.81 TFLOPS, ) ) ) (1): ParallelTransformerLayerPart2( 57.35 k, 4.43% Params, 2199.02 GMACs, 5.33% MACs, 77.53 ms, 4.21% latency, 56.73 TFLOPS, (post_attention_layernorm): FusedLayerNorm(16.38 k, 1.27% Params, 0 MACs, 0.00% MACs, 116.11 us, 0.01% latency, 0.0 FLOPS, torch.Size([8192]), eps=1e-05, elementwise_affine=True) (mlp): ParallelMLP( 40.96 k, 3.16% Params, 2199.02 GMACs, 5.33% MACs, 76.19 ms, 4.13% latency, 57.72 TFLOPS, (dense_h_to_4h): ColumnParallelLinear(32.77 k, 2.53% Params, 1099.51 GMACs, 2.66% MACs, 10.79 ms, 0.59% latency, 203.81 TFLOPS, ) (dense_4h_to_h): RowParallelLinear(8.19 k, 0.63% Params, 1099.51 GMACs, 2.66% MACs, 14.38 ms, 0.78% latency, 152.95 TFLOPS, ) ) ) ... (23): ParallelTransformerLayerPart2(...) ) (final_layernorm): FusedLayerNorm(16.38 k, 1.27% Params, 0 MACs, 0.00% MACs, 110.86 us, 0.01% latency, 0.0 FLOPS, torch.Size([8192]), eps=1e-05, elementwise_affine=True) ) ) ) ------------------------------------------------------------------------------ Usage Outside the DeepSpeed Runtime The profiler can be used as a standalone package outside of the DeepSpeed runtime. One can simply install DeepSpeed and import the flops_profiler package to use the APIs directly. Refer to installation of DeepSpeed for installing DeepSpeed. In Model Inference To profile a trained model in inference, use the get_model_profile function. Examples are given below. Example: AlexNet The following example shows how to profile AlexNet using the DeepSpeed flops profiler. import torchvision.models as models import torch from deepspeed.profiling.flops_profiler import get_model_profile from deepspeed.accelerator import get_accelerator with get_accelerator().device(0): model = models.alexnet() batch_size = 256 flops, macs, params = get_model_profile(model=model, # model input_shape=(batch_size, 3, 224, 224), # input shape to the model. If specified, the model takes a tensor with this shape as the only positional argument. args=None, # list of positional arguments to the model. kwargs=None, # dictionary of keyword arguments to the model. print_profile=True, # prints the model graph with the measured profile attached to each module detailed=True, # print the detailed profile module_depth=-1, # depth into the nested modules, with -1 being the inner most modules top_modules=1, # the number of top modules to print aggregated profile warm_up=10, # the number of warm-ups before measuring the time of each module as_string=True, # print raw numbers (e.g. 1000) or as human-readable strings (e.g. 1k) output_file=None, # path to the output file. If None, the profiler prints to stdout. ignore_modules=None) # the list of modules to ignore in the profiling Example: Bert from functools import partial import torch from transformers import BertForSequenceClassification, BertTokenizer from deepspeed.profiling.flops_profiler import get_model_profile from deepspeed.accelerator import get_accelerator def bert_input_constructor(batch_size, seq_len, tokenizer): fake_seq = "" for _ in range(seq_len - 2): # ignore the two special tokens [CLS] and [SEP] fake_seq += tokenizer.pad_token inputs = tokenizer([fake_seq] * batch_size, padding=True, truncation=True, return_tensors="pt") labels = torch.tensor([1] * batch_size) inputs = dict(inputs) inputs.update({"labels": labels}) return inputs with get_accelerator().device(0): tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased') batch_size = 4 seq_len = 128 enable_profile = True if enable_profile: flops, macs, params = get_model_profile( model, kwargs=bert_input_constructor(batch_size, seq_len, tokenizer), print_profile=True, detailed=True, ) else: inputs = bert_input_constructor((batch_size, seq_len), tokenizer) outputs = model(inputs) In Model Training Workflow To profile model forward in a training workflow, use the FlopsProfilerclass. The FlopsProfilerclass provides the following methods: start_profile() - starts profiling get_total_flops(as_string=False) - returns the total number of floating-point operations in the model get_total_macs(as_string=False) - returns the total number of MACs in the model get_total_params(as_string=False) - returns the total number of parameters in the model print_model_profile(profile_step=1, module_depth=-1, top_modules=3, detailed=True, output_file=None) - prints the model profile stop_profile() - stops profiling. This stops the flops counting in the model. end_profile() - cleans up. This cleans up the profile attributes added to the model during the profiling. This should be invoked at the end of the profiling and AFTER get_total_flops, get_total_params or print_model_profile. Example Training Workflow Below is an example of this usage in a typical training workflow. from deepspeed.profiling.flops_profiler import FlopsProfiler model = Model() prof = FlopsProfiler(model) profile_step = 5 print_profile= True for step, batch in enumerate(data_loader): # start profiling at training step "profile_step" if step == profile_step: prof.start_profile() # forward() method loss = model(batch) # end profiling and print output if step == profile_step: # if using multi nodes, check global_rank == 0 as well prof.stop_profile() flops = prof.get_total_flops() macs = prof.get_total_macs() params = prof.get_total_params() if print_profile: prof.print_model_profile(profile_step=profile_step) prof.end_profile() # runs backpropagation loss.backward() # weight update optimizer.step() Updated: November 5, 2025 Previous Next
59
-
60
- ```
61
- 80
62
- ```
63
-
64
- **Pattern 7:** DeepSpeed Configuration JSON Contents Batch Size Related Parameters Optimizer Parameters Scheduler Parameters Communication options FP16 training options BFLOAT16 training options Automatic mixed precision (AMP) training options Gradient Clipping ZeRO Optimizations for FP16 Training Parameter offloading Optimizer offloading Asynchronous I/O Logging Autotuning Flops Profiler Activation Checkpointing Sparse Attention Data Efficiency Curriculum Learning Monitoring Module Elastic Training Config (V0.1 and V0.2) Communication Logging Compression Layer Reduction Weight Quantization Activation Quantization Sparse Pruning Row Pruning Head Pruning Channel Pruning Checkpoint options Data Type options Batch Size Related Parameters Note: train_batch_size must be equal to train_micro_batch_size_per_gpu * gradient_accumulation_steps * number of GPUs. For simplicity, you can choose to only specify two of the three parameters, the last one will be inferred automatically by DeepSpeed. train_batch_size: [integer] Value Example The effective training batch size. This is the amount of data samples that leads to one step of model update. train_batch_size is aggregated by the batch size that a single GPU processes in one forward/backward pass (a.k.a., train_micro_batch_size_per_gpu), the gradient accumulation steps (a.k.a., gradient_accumulation_steps), and the number of GPUs. Can be omitted if both train_micro_batch_size_per_gpu and gradient_accumulation_steps are provided. 32 train_micro_batch_size_per_gpu: [integer] Description Default Batch size to be processed by one GPU in one step (without gradient accumulation). Can be omitted if both train_batch_size and gradient_accumulation_steps are provided. train_batch_size value gradient_accumulation_steps: [integer] Description Default Number of training steps to accumulate gradients before averaging and applying them. This feature is sometimes useful to improve scalability since it results in less frequent communication of gradients between steps. Another impact of this feature is the ability to train with larger batch sizes per GPU. Can be omitted if both train_batch_size and train_micro_batch_size_per_gpu are provided. 1 Optimizer Parameters optimizer: [dictionary] Fields Value Example type The optimizer name. DeepSpeed natively supports Adam, AdamW, OneBitAdam, Lamb, and OneBitLamb optimizers (See here for details) and will import other optimizers from torch. "Adam" params Dictionary of parameters to instantiate optimizer. The parameter names must match the optimizer constructor signature (e.g., for Adam). {"lr": 0.001, "eps": 1e-8} Example of optimizer with Adam "optimizer": { "type": "Adam", "params": { "lr": 0.001, "betas": [ 0.8, 0.999 ], "eps": 1e-8, "weight_decay": 3e-7 } } The Adam optimizer also supports the following two params keys/values in addition to the standard parameters from torch.optim.Adam: “params” key Description Default torch_adam Use torch’s implementation of adam instead of our fused adam implementation false adam_w_mode Apply L2 regularization (also known as AdamW) true Another example of optimizer with 1-bit Adam specific parameters is as follows. "optimizer": { "type": "OneBitAdam", "params": { "lr": 0.001, "betas": [ 0.8, 0.999 ], "eps": 1e-8, "weight_decay": 3e-7, "freeze_step": 400, "cuda_aware": false, "comm_backend_name": "nccl" } } The 1-bit Adam optimizer supports the following three params keys/values in addition to the standard Adam (learn more in our tutorial): “params” key Description Default freeze_step Number of warm up steps before 1-bit compression gets applied to the communication 100000 cuda_aware To indicate that the underlying MPI library supports CUDA-Aware communication false comm_backend_name To indicate which backend implementation to use “nccl” A variant optimizer for 1-bit Adam is 0/1 Adam, which further optimizes 1-bit Adam via adaptive variance freezing and 1-bit synchronization over optimizer states. "optimizer": { "type": "ZeroOneAdam", "params": { "lr": 1e-3, "weight_decay": 0.01, "bias_correction": false, "var_freeze_step": 1000, "var_update_scaler": 16, "local_step_scaler": 1000, "local_step_clipper": 16, "cuda_aware": false, "comm_backend_name": "nccl" } } 0/1 Adam supports the following params key/values in addition to standard Adam (learn more in our tutorial.) “params” key Description Default var_freeze_step The latest step to update the variance 100000 var_update_scaler The interval to update the variance 16 local_step_scaler The interval to scale the local steps interval according to the learning rate policy 32678 local_step_clipper The largest interval for local steps with learning rate policy 16 cuda_aware To indicate that the underlying MPI library supports CUDA-Aware communication false comm_backend_name To indicate which backend implementation to use “nccl” Another example of optimizer with 1-bit LAMB "optimizer": { "type": "OneBitLamb", "params": { "lr": 11e-3, "weight_decay": 0.01, "bias_correction": false, "max_coeff": 0.3, "min_coeff": 0.01, "freeze_step": 1000, "cuda_aware": false, "comm_backend_name": "nccl", "coeff_beta": 0.9, "factor_max": 4.0, "factor_min": 0.5, "factor_threshold": 0.1 } } The 1-bit LAMB optimizer supports the following params keys/values in addition to the standard LAMB (learn more in our tutorial): “params” key Description Default max_coeff Scaling coefficient upper bound for original LAMB algorithm and 1-bit LAMB’s warmup stage 10.0 min_coeff Scaling coefficient lower bound for original LAMB algorithm and 1-bit LAMB’s warmup stage 0.01 freeze_step Number of warm up steps before 1-bit compression gets applied to the communication 100000 cuda_aware To indicate that the underlying MPI library supports CUDA-Aware communication false comm_backend_name To indicate which backend implementation to use “nccl” coeff_beta Coefficient used for computing running averages of lamb coefficient 0.9 factor_max Maximum value of scaling factor to the frozen lamb coefficient during compression stage 4.0 factor_min Minimum value of scaling factor to the frozen lamb coefficient during compression stage 0.5 factor_threshold Threshold of how much the scaling factor can fluctuate between steps 0.1 Scheduler Parameters DeepSpeed calls the step() method of the scheduler at every training step when model_engine.step() is executed. scheduler: [dictionary] Fields Value Example type The scheduler name. See here for list of support schedulers. "WarmupLR" params Dictionary of parameters to instantiate scheduler. The parameter names should match scheduler constructor signature. {"warmup_min_lr": 0, "warmup_max_lr": 0.001} Example of scheduler "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": 0, "warmup_max_lr": 0.001, "warmup_num_steps": 1000 } } Communication options communication_data_type: [string] Description Default During gradient averaging perform communication with selected data type. By default it will be determined by selected regime None prescale_gradients: [boolean] Description Default Scale gradients before doing allreduce false gradient_predivide_factor: [float] Description Default Before gradient averaging predivide gradients by a specified factor, can sometimes help with fp16 stability when scaling to large numbers of GPUs 1.0 sparse_gradients: [boolean] Description Default Enable sparse compression of torch.nn.Embedding gradients. This feature is essentially deprecated as we don’t see use cases for it as much anymore. It should be noted that this feature is not compatible with torch.sparse related features. false FP16 training options Note: this mode cannot be combined with the amp mode described below. fp16: [dictionary] Description Default Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package. An example, including the available dictionary keys is illustrated below. NOTE: this does not use Apex’s AMP mode that allows for more flexibility in mixed precision training modes, this mode is similar to AMP’s O2 mode. Please see AMP support below if you want to use more complex mixed precision modes. If you want to use ZeRO (currently) you must use this mode. None "fp16": { "enabled": true, "auto_cast": false, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "consecutive_hysteresis": false, "min_loss_scale": 1 } fp16:enabled: [boolean] Description Default enabled is a fp16 parameter indicating whether or not FP16 training enabled. false fp16:auto_cast: [boolean] Description Default auto_cast automatically casts inputs to fp16 false fp16:loss_scale: [float] Description Default loss_scale is a fp16 parameter representing the loss scaling value for FP16 training. The default value of 0.0 results in dynamic loss scaling, otherwise the value will be used for static fixed loss scaling. 0.0 fp16:initial_scale_power: [integer] Description Default initial_scale_power is a fp16 parameter representing the power of the initial dynamic loss scale value. The actual loss scale is computed as 2initial_scale_power. 16 fp16:loss_scale_window: [integer] Description Default loss_scale_window is a fp16 parameter representing the window over which to raise/lower the dynamic loss scale value. 1000 fp16:hysteresis: [integer] Description Default hysteresis is a fp16 parameter representing the delay shift in dynamic loss scaling. 2 fp16:consecutive_hysteresis: [boolean] Description Default consecutive_hysteresis is a fp16 parameter representing whether to refill the hysteresis if we reach an iteration that doesn’t overflow false fp16:min_loss_scale: [integer] Description Default min_loss_scale is a fp16 parameter representing the minimum dynamic loss scale value. 1 BFLOAT16 training options Note: this mode cannot be combined with the amp mode described below. Note: this mode cannot be combined with the fp16 mode described above. bf16: [dictionary] Description Default Configuration for using bfloat16 floating-point format as an alternative to FP16. BFLOAT16 requires hardware support (e.g., NVIDIA A100). An example, including the available dictionary keys is illustrated below. Training with bfloat16 does not require loss scaling. None "bf16": { "enabled": true } bf16:enabled: [boolean] Description Default enabled indicates whether BFLOAT16 training is enabled. false Automatic mixed precision (AMP) training options Note: this mode cannot be combined with the fp16 mode described above. In addition this mode is not currently compatible with ZeRO. amp: [dictionary] Description Default Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. An example, including the available dictionary keys is illustrated below. Is not compatible with fp16 mode above or ZeRO. Any parameters outside of “enabled” will be passed to AMP’s initialize call, see the API and descriptions here at the apex.amp.initialize documentation. None "amp": { "enabled": true, ... "opt_level": "O1", ... } amp:enabled: [boolean] Description Default enabled is an amp parameter indicating whether or not AMP training is enabled. false amp params: [various] Description Default Any parameters outside of “enabled” will be passed to AMP’s initialize call, see the API and descriptions here at the apex.amp.initialize documentation. None Gradient Clipping gradient_clipping: [float] Description Default Enable gradient clipping with value 1.0 ZeRO Optimizations for FP16 Training Enabling and configuring ZeRO memory optimizations "zero_optimization": { "stage": [0|1|2|3], "allgather_partitions": [true|false], "allgather_bucket_size": 5e8, "overlap_comm": false, "reduce_scatter": [true|false], "reduce_bucket_size": 5e8, "contiguous_gradients" : [true|false], "offload_param": { ... }, "offload_optimizer": { ... }, "stage3_max_live_parameters" : 1e9, "stage3_max_reuse_distance" : 1e9, "stage3_prefetch_bucket_size" : 5e8, "stage3_param_persistence_threshold" : 1e6, "sub_group_size" : 1e12, "elastic_checkpoint" : [true|false], "stage3_gather_16bit_weights_on_model_save": [true|false], "ignore_unused_parameters": [true|false], "round_robin_gradients": [true|false], "zero_hpz_partition_size": 1, "zero_quantized_weights": [true|false], "zero_quantized_gradients": [true|false], "log_trace_cache_warnings": [true|false], } zero_optimization: [dictionary] Description Default Enable ZeRO memory optimizations, compatible with FP16/BF16/FP32 and the Adam optimizer. false stage: [integer] Description Default Chooses different stages of ZeRO Optimizer. Stage 0, 1, 2, and 3 refer to disabled, optimizer state partitioning, and optimizer+gradient state partitioning, and optimizer+gradient+parameter partitioning, respectively. 0 allgather_partitions: [boolean] Description Default Chooses between allgather collective or a series of broadcast collectives to gather updated parameters from all the GPUs at the end of each step true allgather_bucket_size: [integer] Description Default Number of elements allgathered at a time. Limits the memory required for the allgather for large model sizes 5e8 overlap_comm: [boolean] Description Default Attempts to overlap the reduction of the gradients with backward computation false reduce_scatter: [boolean] Description Default Uses reduce or reduce scatter instead of allreduce to average gradients true reduce_bucket_size: [integer] Description Default Number of elements reduced/allreduced at a time. Limits the memory required for the allgather for large model sizes 5e8 contiguous_gradients: [boolean] Description Default Copies the gradients to a contiguous buffer as they are produced. Avoids memory fragmentation during backward pass. True load_from_fp32_weights: [boolean] Description Default Initialize fp32 master weights from fp32 copies in checkpoint (no precision loss) or from model’s fp16 copies (with precision loss). This can be used to initialize optimizer state even when checkpoint is missing optimizer state. True grad_hooks: [boolean] Description Default For use with ZeRO stage 1, enable backward hooks to reduce gradients during the backward pass or wait until the end of the backward pass. True round_robin_gradients: [boolean] Description Default Stage 1 and 2 optimization for CPU offloading that parallelizes gradient copying to CPU memory among ranks by fine-grained gradient partitioning. Performance benefit grows with gradient accumulation steps (more copying between optimizer steps) or GPU count (increased parallelism). False offload_param: [dictionary] Description Default Enable offloading of model parameters to CPU or NVMe. This frees up GPU memory for larger models or batch sizes. Valid only with stage 3. See here for more details. False offload_optimizer: [dictionary] Description Default Enable offloading of optimizer state to CPU or NVMe, and optimizer computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid for ZeRO stage 1, 2, 3. See here for more details. False stage3_max_live_parameters: [integer] Description Default The maximum number of parameters resident per GPU before releasing. Smaller values use less memory, but perform more communication. 1e9 stage3_max_reuse_distance: [integer] Description Default Do not release a parameter if it will be reused within this threshold of parameters. Smaller values use less memory, but perform more communication. 1e9 stage3_prefetch_bucket_size: [integer] Description Default The size of the fixed buffer for prefetching parameters. Smaller values use less memory, but can increase stalls due to communication. 5e8 stage3_param_persistence_threshold: [integer] Description Default Do not partition parameters smaller than this threshold. Smaller values use less memory, but can greatly increase communication (especially latency-bound messages). 1e5 stage3_gather_16bit_weights_on_model_save: [boolean] Description Default Consolidate the weights before saving the model by save_16bit_model(). Since the weights are partitioned across GPUs, they aren’t part of state_dict, so this function automatically gathers the weights when this option is enabled and then saves the fp16 model weights. False stage3_module_granularity_threshold: [integer] | Description | Default | |——————————————————————————————————————————————————————————————————————————————————————————–| ——- | | The granularity of a module is determined by the ratio of parameter_count / (1 + descendant_count). ZeRO3 classifies modules with a granularity below the threshold as fine-grained, treating them as integral units during parameter fetching. This reduces host and communication overhead from separate hooks. | 0 | zero_hpz_partition_size: [integer] Description Default Number of ranks in hiearchical partitioning ZeRO (hpZ) secondary tensor group of ZeRO++, default is 1 meaning no hpZ, ideal is number of ranks (gpus) per node. 1 zero_quantized_weights: [boolean] Description Default Boolean indicating whether to enable communication efficient quantized weights of ZeRO++. False zero_quantized_gradients: [boolean] Description Default Boolean indicating whether to enable communication efficient quantized gradients of ZeRO++. False log_trace_cache_warnings: [boolean] Description Default Log warnings from trace cache optimization of parameter sharding, such as cache invalidation events. False cpu_offload: [boolean] Deprecated: cpu_offload is deprecated and will be removed in future, please use offload_optimizer instead. Description Default Enable offloading of optimizer memory and computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid with stage 1 and 2. False Parameter offloading Enabling and configuring ZeRO optimization of parameter offloading to CPU/NVMe. Available only with ZeRO stage 3. Note that if the value of “device” is not specified or not supported, an assertion will be triggered. "offload_param": { "device": "[cpu|nvme]", "nvme_path": "/local_nvme", "pin_memory": [true|false], "buffer_count": 5, "buffer_size": 1e8, "max_in_cpu": 1e9 } device: [string] Description Default Device memory to offload model parameters. Supported options are cpu and nvme. cpu nvme_path: [string] Description Default Filesystem path for NVMe device for parameter offloading. /local_nvme pin_memory: [boolean] Description Default Offload to page-locked CPU memory. This could boost throughput at the cost of extra memory overhead. false buffer_count: [integer] Description Default Number of buffers in buffer pool for parameter offloading to NVMe. 5 buffer_size: [integer] Description Default Size of buffers in buffer pool for parameter offloading to NVMe. 1e8 max_in_cpu: [integer] Description Default Number of parameter elements to maintain in CPU memory when offloading to NVMe is enabled. 1e9 Optimizer offloading Enabling and configuring ZeRO optimization of offloading optimizer computation to CPU and state to CPU/NVMe. CPU offloading is available with ZeRO stage 1, 2, 3. NVMe offloading is available only with ZeRO stage 3. Note that if the value of “device” is not specified or not supported, an assertion will be triggered. "offload_optimizer": { "device": "[cpu|nvme]", "nvme_path": "/local_nvme", "pin_memory": [true|false], "ratio": 0.3, "buffer_count": 4, "fast_init": false } device: [string] Description Default Device memory to offload optimizer state. Supported options are cpu and nvme. Optimizer computation is offload to CPU regardless of device option. cpu nvme_path: [string] Description Default Filesystem path for NVMe device for optimizer state offloading. /local_nvme pin_memory: [boolean] Description Default Offload to page-locked CPU memory. This could boost throughput at the cost of extra memory overhead. false ratio: [float] Description Default the ratio of parameters updating (i.e. optimizer step) on CPU side. 1 buffer_count: [integer] Description Default Number of buffers in buffer pool for optimizer state offloading to NVMe. This should be at least the number of states maintained per parameter by the optimizer. For example, Adam optimizer has 4 states (parameter, gradient, momentum, and variance). 4 fast_init: [boolean] Description Default Enable fast optimizer initialization when offloading to NVMe. false Asynchronous I/O Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). "aio": { "block_size": 1048576, "queue_depth": 8, "thread_count": 1, "single_submit": false, "overlap_events": true } block_size: [integer] Description Default I/O block size in bytes. 1048576 queue_depth: [integer] Description Default I/O queue depth. 8 thread_count: [integer] Description Default Intra-request parallelism for each read/write submitted by a user thread. 1 single_submit: [boolean] Description Default Submit requests to storage device as multiple individual requests as opposed to one block of requests. false overlap_events: [boolean] Description Default Submit requests to storage device in an overlapped fashion without waiting for completion of earlier requests. true ignore_unused_parameters: [boolean] Description Default Unused parameters in modules may be unexpected in static networks, but could be normal in dynamic networks. This controls whether or not training should terminate with an error message when unused parameters are detected. This is set to True by default, which means unused parameters are ignored and training continues. Now is just used in stage 2. True Logging steps_per_print: [integer] Description Default Print progress report every N training steps. The report includes the number of training steps, number of skipped optimizer updates (likely due to overflows in mixed-precision training), current learning rate, and current momentum. 10 wall_clock_breakdown: [boolean] Description Default Enable timing of the latency of forward/backward/update training phases false dump_state: [boolean] Description Default Print out state information of DeepSpeed object after initialization false Autotuning { "autotuning": { "enabled": false, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": false, "metric": "throughput", "start_profile_step": 3, "end_profile_step": 5, "fast": true, "max_train_batch_size": null, "mp_size": 1, "num_tuning_micro_batch_sizes": 3, "tuner_type": "model_based", "tuner_early_stopping": 5, "tuner_num_trials": 50, "arg_mappings": null } } enabled: [boolean] Description Default Enables the autotuner. false results_dir: [string] Description Default Path to the autotuning experiment results directory. The default appears in the working directory from which Deepspeed was launched. “autotuning_results” exps_dir: [string] Description Default Path to the auotuning experiment descriptions directory. The default appears in the working directory from which Deepspeed was launched. “autotuning_exps” overwrite: [boolean] Description Default Whether to run autotuning experiments whose results already exist. Setting it to true would overwrite the existing result. false metric: [string] Description Default The performance metric to use for ranking autotuning experiments. latency, throughput, and FLOPS are currently supported, referring to training step latency, training samples per second, and floating-point operations per second achieved per GPU respectively. throughput start_profile_step: [integer] Description Default The global training step at which to start profiling in an autotuning experiment. Note that warm-up is needed for accurate performance measurement. 3 end_profile_step: [integer] Description Default The global training step at which to end profiling in an autotuning experiment. Must not be less than start_profile_step. 5 fast: [boolean] Description Default Enables fast-model autotuning where only Zero stages and micro-batch sizes per GPU are tuned. true max_train_batch_size: [int] Description Default The maximum train batch size (global effective batch size) for the model training. null mp_size: [int] Description Default Model parallelism degree. 1 num_tuning_micro_batch_sizes: [integer] Description Default The number of micro-batch sizes to explore. 3 tuner_type: [string] Description Default The algorithm defines the order of autotuning space exploration within a ZeRO stage. model_based tuner_early_stopping: [integer] Description Default The number of experiments to run beyond the current best experiment. If no better experiment is found within that number, the Autotuner stops the exploration. 5 tuner_num_trials: [integer] Description Default The maximum number of experiments to explore in the tuning space within a ZeRO stage. 50 Flops Profiler { "flops_profiler": { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null, } } enabled: [boolean] Description Default Enables the flops profiler. This would also enables wall_clock_breakdown false profile_step: [integer] Description Default The global training step at which to profile. Note that warm up steps are needed for accurate time measurement. 1 module_depth: [integer] Description Default The depth of the model at which to print the aggregated module information. When set to -1, it prints information from the top module to the innermost modules (the maximum depth). -1 top_modules: [integer] Description Default Limits the aggregated profile output to the number of top modules specified. 1 detailed: [boolean] Description Default Whether to print the detailed model profile. true output_file: [string] Description Default Path to the output file. If None, the profiler prints to stdout.. null Activation Checkpointing "activation_checkpointing": { "partition_activations": false, "cpu_checkpointing": false, "contiguous_memory_optimization": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } partition_activations: [boolean] Description Default Enables partition activation when used with model parallelism false cpu_checkpointing: [boolean] Description Default Offloads partitioned activations to CPU if partition_activations is enabled false contiguous_memory_optimization: [boolean] Description Default Copies partitioned activations so that they are contiguous in memory false number_checkpoints: [integer] Description Default Total number of activation checkpoints used to allocate memory buffer for contiguous_memory_optimization None synchronize_checkpoint_boundary: [boolean] Description Default Inserts get_accelerator().synchronize() at each checkpoint boundary. false profile: [boolean] Description Default Logs the forward and backward time for each checkpoint function false Sparse Attention sparse_attention: [dictionary] Fields Value Example mode A string determining sparsity structure type. Deepspeed currently supports "dense", "fixed", "bigbird", "bslongformer", and "variable". "fixed" block An integer determining the block size. Current implementation of sparse self-attention is based on blocked sparse matrices. In which this parameter defines size of such blocks, Block X Block. 16 different_layout_per_head A boolean determining if each head should be assigned a different sparsity layout; this will be satisfied based on availability. false num_local_blocks An integer determining the number of random blocks in each block row; only used in "fixed" mode. 4 num_global_blocks An integer determining how many consecutive blocks in a local window is used as the representative of the window for global attention; used in "fixed" and "bigbird" modes. 1 attention A string determining attention type. Attention can be "unidirectional", such as autoregressive models, in which tokens attend only to tokens appear before them in the context. Considering that, the upper triangular of attention matrix is empty. Or it can be "bidirectional", such as BERT, in which tokens can attend to any other tokens before or after them. Then, the upper triangular part of the attention matrix is mirror of the lower triangular; used in "fixed" and "variable" modes. "bidirectional" horizontal_global_attention A boolean determining if blocks that are global representative of a local window, also attend to all other blocks. This is valid only if attention type is "bidirectional". Looking at the attention matrix, that means global attention not only includes the vertical blocks, but also horizontal blocks; used in "fixed" and "variable" modes. false num_different_global_patterns An integer determining number of different global attentions layouts. While global attention can be fixed by which block/s are representative of any local window, since there are multi-heads, each head can use a different global representative; used only in "fixed" mode. 4 num_random_blocks An integer determining the number of random blocks in each block row; used in "variable" and "bigbird" modes. 0 local_window_blocks A list of integers determining the number of blocks in each local attention window. It assumes first number determines # of blocks in the first local window, second the second window, …, and the last number determines the number of blocks in the remaining local windows; only used in "variable" mode. [4] global_block_indices A list of integers determining which blocks are considered as global attention. Given indices, determine the blocks that all other token blocks attend to and they attend to all other token blocks. Notice that if global_block_end_indices parameter is set, this parameter is used as starting index of each global window; used in "variable" and "bslongformer" modes. [0] global_block_end_indices A list of integers determining end indices of global window blocks. By default this is not used. But if it is set, it must have the same size of global_block_indices parameter, and combining this two parameters, for each index i, blocks from global_block_indices[i] to global_block_end_indices[i], exclusive, are considered as global attention; used in "variable" and "bslongformer" modes. None num_sliding_window_blocks An integer determining the number of blocks in sliding local attention window; used in "bigbird" and "bslongformer" modes. 3 Example of sparse_attention "sparse_attention": { "mode": "fixed", "block": 16, "different_layout_per_head": true, "num_local_blocks": 4, "num_global_blocks": 1, "attention": "bidirectional", "horizontal_global_attention": false, "num_different_global_patterns": 4, "num_random_blocks": 0, "local_window_blocks": [4], "global_block_indices": [0], "global_block_end_indices": None, "num_sliding_window_blocks": 3 } Data Efficiency DeepSpeed Data Efficiency Library includes two techniques: curriculum learning and random layerwise token dropping (random-LTD). Read more about how to use the DeepSpeed Data Efficiency Library in our tutorial. "data_efficiency": { "enabled": true, "seed": 1234, "data_routing": { "enabled": true, "random_ltd":{ "enabled": true, "total_layer_num": 24, "random_ltd_layer_num": 22, "random_ltd_layer_id": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22], "model_mask_name": "attention_mask", "model_type": "decoder", "hidden_state_order": "seq_batch_dim", "random_ltd_schedule": { "min_value": 128, "max_value": 2048, "schedule_type":"fixed_linear", "schedule_config": { "require_steps": 200000, "seq_per_step": 16 } } } }, "data_sampling": { "enabled": true, "num_epochs": 1, "num_workers": 0, "curriculum_learning": { "enabled": true, "data_cluster_path": "/path/to/data_clusters", "curriculum_metrics": { "vocabularyrarity": { "index_to_sample_path": "/path/to/index_to_sample", "index_to_metric_path": "/path/to/index_to_metric", "difficulty_type": "percentile", "clustering_type": "schedule_based", "min_difficulty": 1, "max_difficulty": 100, "schedule_type": "fixed_root", "schedule_config": { "total_curriculum_step": 110000, "difficulty_step": 1, "root_degree": 2 } } } } } } data_efficiency: [dictionary] Fields Value Default enabled: [boolean] Enable data efficiency or not. false seed: [integer] Random seed for data sampling. 1234 data_routing: [dictionary] Configs for data routing techniques. N/A data_sampling: [dictionary] Configs for data sampling techniques. N/A data_routing: [dictionary] Fields Value Default enabled: [boolean] Enable data routing techniques or not. false random_ltd: [dictionary] Configs for random-LTD technique. N/A data_sampling: [dictionary] Fields Value Default enabled: [boolean] Enable data sampling techniques or not. false num_epochs: [integer] At most how many epoches of the original dataset will be iterated. 1000 num_workers: [integer] Data loader number of workers. 0 curriculum_learning: [dictionary] Configs for curriculum learing technique. N/A random_ltd: [dictionary] Fields Value Default enabled: [boolean] Enable random-LTD technique or not. false total_layer_num: [integer] The number of layer (or the depth) for the pretraining/fine-tuning model. N/A random_ltd_layer_num: [integer] The number of layers that will be applied with random-LTD. N/A random_ltd_layer_id: [list] The exact layer_id that will be applied with random-LTD. The length of this list must be the same as random_ltd_layer_num. N/A model_mask_name: [str] The variable name of the attention_mask. Different libraries have different names, such as att_mask. For huggingface model, it’s named “attention_mask”. Users need to check the forward function in the original model files. If the attention mask input in the original model’s forward function is not a keyword/named argument (e.g., attention_mask=None), user would need to change it to a keyword/named argument and provide that keyword as model_mask_name. N/A model_type: [str] Users need to identify whether the model is decoder or encoder. Currently we only support these two. N/A hidden_state_order: [str] Users need to know the input order of the hidden state tensor. Normally, it’s batch, sequence and then the hidden dimension, which is batch_seq_dim. Somethings, the order between batch and sequence will be switch like seq_batch_dim. Currently, we support these two. N/A random_ltd_schedule: [dictionary] The schedule of the effective sequence length after token dropping. It’s a linear function where random-LTD gradually drops less tokens and increases effective sequence length. N/A min_value: [integer] The initial effective sequence length (after token dropping) at step/iteration 0. N/A max_value: [integer] The max effective sequence length (usually the case without any token dropping). Usually this is set as baseline’s seqlen. N/A schedule_type: [str] The sequence length follows a linear increasing function starting from min_value and reaching max_value. We currently only support this type. N/A schedule_config: [dictionary] Configs for the linear increasing function. N/A require_steps: [integer] How many iterations will be needed to reach max_value from min_value. N/A seq_per_step: [integer] At any time, the effective sequence length be multiple of this seq_per_step. Set this to multiple of 8 (for FP16 data) or 16 (for INT8 data) to enable NVIDIA Tensor Core acceleration. N/A curriculum_learning: [dictionary] Fields Value Default enabled: [boolean] Enable curriculum learing technique or not. false data_cluster_path: [str] Path to directory where curriculum learning will store the indexes of data samples within the same difficulty ranges. N/A curriculum_metrics: [dictionary] This dictionary includes all desired curriculum metrics and their configs. Each metric will be a separate sub-dictionary, where the key is the metric name and the values are configs below. N/A index_to_sample_path: [str] Path to the index_to_sample file generated during offline data analysis. Note that data analysis will generate two kinds of index_to_sample files: The metric_name_index_to_sample_percentile_merged file is a concatenated index for perf improvement, but it only works when you set difficulty_type=percentile. If you use difficulty_type=value, you need to change this to use the metric_name_index_to_sample file. N/A index_to_metric_path: [str] Path to the index_to_metric_path file generated during offline data analysis. N/A difficulty_type: [str] During training, how to increase the max accepted difficulty. Currently support value (increase by absolute value) and percentile (increase by difficulty percentile). N/A clustering_type: [str] Currently support schedule_based (cluster data based on the difficulty schedule (pacing function) below) and single_cluster (no clustering required and probably CL is achieved by data postprocessing, such as sequence length truncation). N/A min_difficulty: [integer] Starting difficulty at first step. When difficulty_type=value the min_difficulty is an absolute difficulty value. When difficulty_type=percentile the min_difficulty is a difficulty percentile value. N/A max_difficulty: [integer] Final max difficulty. When difficulty_type=value the max_difficulty is an absolute difficulty value. When difficulty_type=percentile the max_difficulty is a difficulty percentile value. N/A schedule_type: [str] The difficulty schedule (pacing function) that defines how the max accepted difficulty increases from min_difficulty to max_difficulty during training. Currently support fixed_linear, fixed_root, fixed_discrete, and custom. N/A schedule_config: [dictionary] Configs for the pacing function. When schedule_type=custom this dictionary is not necessary. Instead user needs to provide a callback function (via the set_custom_curriculum_learning_schedule API in deepspeed/runtime/engine.py) which will update the max accepted difficulty during training. Configs below are all belongs to schedule_config. N/A total_curriculum_step: [integer] How many steps the curriculum learning takes to go from min difficulty to max difficulty. Used by fixed_linear and fixed_root schedule. N/A difficulty_step: [integer] The max accepted difficulty level determined every step must be a multiple of this difficulty_step. This is used to ensure the use of NVIDIA Tensor Core acceleration (requires multiple of 8 (FP16) or 16 (INT8)). Used by fixed_linear and fixed_root schedule. N/A root_degree: [integer] The degree of the root function. Degree of 2 means square root and degree of 3 means cube root. Degree of 1 is equivalent to linear. Used by fixed_root schedule. N/A difficulty: [list] List of max accepted difficulty levels to be used during schedule. Used by fixed_discrete schedule. N/A max_step: [list] List of which step to change max accepted difficulty level. Used by fixed_discrete schedule. N/A Curriculum Learning Note: On 12/12/2022, we released DeepSpeed Data Efficiency Library which provides a more general curriculum learning support. This legacy curriculum learning feature below is still supported but we recommend to use the Data Efficiency Library. "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 8, "max_difficulty": 1024, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 40000, "difficulty_step": 8 } } enabled: [boolean] Description Default Set to true to enable curriculum learning false curriculum_type: [string] Description Default Type of curriculum difficulty metric. Currently support seqlen. N/A min_difficulty: [integer] Description Default The starting difficulty level N/A max_difficulty: [integer] Description Default The ending difficulty level N/A schedule_type: [string] Description Default Type of curriculum schedule. Currently support fixed_linear, fixed_root, and fixed_discrete. N/A total_curriculum_step: [integer] Description Default Total number of steps for the curriculum learning. One of the schedule_config when the fixed_linear and fixed_root schedule_type are used. N/A difficulty_step: [integer] Description Default At any time, the curriculum learning difficulty must be multiple of this difficulty_step. Set this to multiple of 8 (for FP16 data) or 16 (for INT8 data) to enable NVIDIA Tensor Core acceleration. One of the schedule_config when the fixed_linear and fixed_root schedule_type are used. N/A root_degree: [integer] Description Default Root degree of the curriculum schedule function. One of the schedule_config when the fixed_root schedule_type is used. N/A difficulty: [list of integer] Description Default List of difficulty levels to be used during schedule. One of the schedule_config when the fixed_discrete schedule_type is used. N/A max_step: [list of integer] Description Default List of which step to change difficulty level. One of the schedule_config when the fixed_discrete schedule_type is used. N/A Monitoring Module Note: Deepspeed logs to TensorBoard through PyTorch. Logging to TensorBoard requires that the tensorboard package is installed (read more in the PyTorch documentation). Note: Logging to WandB requires that the wandb package is installed (read more in the WandB documentation). Note: Logging to Comet requires that the comet_ml package is installed (read more in the Comet documentation). Deepspeed’s Monitor module can log training details into a Tensorboard-compatible file, to WandB, to Comet or to simple CSV files. Below is an overview of what DeepSpeed will log automatically. Field Description Conditions Train/Samples/train_loss The training loss. None Train/Samples/lr The learning rate during training. None Train/Samples/loss_scale The loss scale when training using fp16. fp16 must be enabled. Train/Eigenvalues/ModelBlockParam_{i} Eigen values per param block. eigenvalue must be enabled. Train/Samples/elapsed_time_ms_forward The global duration of the forward pass. flops_profiler.enabled or wall_clock_breakdown. Train/Samples/elapsed_time_ms_backward The global duration of the forward pass. flops_profiler.enabled or wall_clock_breakdown. Train/Samples/elapsed_time_ms_backward_inner The backward time that does not include the gradient reduction time. Only in cases where the gradient reduction is not overlapped, if it is overlapped then the inner time should be about the same as the entire backward time. flops_profiler.enabled or wall_clock_breakdown. Train/Samples/elapsed_time_ms_backward_allreduce The global duration of the allreduce operation. flops_profiler.enabled or wall_clock_breakdown. Train/Samples/elapsed_time_ms_step The optimizer step time flops_profiler.enabled or wall_clock_breakdown. tensorboard: [dictionary] Fields Value Default enabled Whether logging to Tensorboard is enabled. false output_path Path to where the Tensorboard logs will be written. If None, the output path is set under the training script’s launching path. null job_name Name for the current job. This will become a new directory inside output_path. "DeepSpeedJobName" Example of tensorboard configuration: "tensorboard": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } wandb: [dictionary] Fields Value Default enabled Whether logging to WandB is enabled. false group Name for the WandB group. This can be used to group together runs. None team Name for the WandB team. None project Name for the WandB project. deepspeed Example of wandb configuration: "wandb": { "enabled": true, "group": "my_group", "team": "my_team", "project": "my_project" } comet: [dictionary] Fields Value Default enabled Whether logging to Comet is enabled. false workspace Comet workspace name. None project Comet project name. None samples_log_interval Metrics will be submitted to Comet after processing every samples_log_intervas samples. 100 experiment_name The name for comet experiment to be used for logging. None api_key Comet API key. It’s not recommended to save the Comet API Key in code. None experiment_key The key for comet experiment to be used for logging. Must be an alphanumeric string whose length is between 32 and 50 characters. None online If True, the data will be logged to Comet server, otherwise it will be stored locally in offline experiment. Default is True. None mode Control how the Comet experiment is started. “get”: Continue logging to an existing experiment identified by the experiment_key value. “create”: Always creates of a new experiment, useful for HPO sweeps. “get_or_create” (default): Starts a fresh experiment if required, or persists logging to an existing one. None Example of comet configuration: "comet": { "enabled": true, "workspace": "my_workspace", "project": "my_project", "samples_log_interval": 50, "experiment_name": "llama-fine-tuning", "experiment_key": "0c4a1c4a90664f2a8084e600b19a9d7", "online": false, "mode": "get", } csv_monitor: [dictionary] Fields Value Default enabled Whether logging to local CSV files is enabled. false output_path Path to where the csv files will be written. If None, the output path is set under the training script’s launching path. null job_name Name for the current job. This will become a new directory inside output_path "DeepSpeedJobName" Example of csv_monitor configuration: "csv_monitor": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } Elastic Training Config (V0.1 and V0.2) "elasticity": { "enabled": true, "max_train_batch_size": "seqlen", "micro_batch_sizes": 8, "min_gpus": 1024, "max_gpus": "fixed_linear", "min_time": "seqlen", "version": 8, "ignore_non_elastic_batch_info": 1024, "num_gpus_per_node": "fixed_linear", "model_parallel_size": MODEL_PARALLEL_SIZE } Field Description Default enabled Enables computation of global batch size in elastic training. false max_train_batch_size Max acceptable batch size can be used in training. 2000 micro_batch_sizes Acceptable micro batch sizes, same as train_micro_batch_size_per_gpu [2,4,6] min_gpus Min number of GPUs to search over when computing highly composite batch size in v0.1 and v0.2. 1 max_gpus Max number of GPUs to search over when computing highly composite batch size in v0.1 and v0.2. 10000 min_time Minimum running time (minutes) before the scheduler will scale again (only used in v0.1). 0 implies it’s unknown 0 prefer_large_batch When finding a suitable batch size, attempt to find one that is closest to the max train batch size given. true version Version of elastic logic to use. 0.2 ignore_non_elastic_batch_info Ignore all batch info provided outside the elastic config. To reduce confusion, we require all batch related info to be given in elastic config only. false num_gpus_per_node Number of GPUs per node. This information is used by v0.2 to support model-parallel training (only used by v0.2) 1 model_parallel_size Tensor or model parallel size (only used by v0.2) 1 Communication Logging DeepSpeed provides a flexible communication logging tool which can automatically detect and record communication operations launched via deepspeed.comm. NOTE: All logging communication calls are synchronized in order to provide accurate timing information. This may hamper performance if your model heavily uses asynchronous communication operations. Once the logs are populated, they can be summarized with deepspeed.comm.log_summary(). For more detail and example usage, see the tutorial comms_logger: [dictionary] Fields Value Default enabled Whether communication logging is enabled. false verbose Whether to immediately print every communication operation false prof_all Whether to profile all operations. true debug Appends the caller function to each communication operation’s log_name. false prof_ops A list of communication operations to log (only the specified ops will be profiled). [] Example of recommended comms_logger configuration: "comms_logger": { "enabled": true, "verbose": false, "prof_all": true, "debug": false } Example of comms_logger configuration for logging specific operations only: "comms_logger": { "enabled": true, "verbose": false, "prof_all": false, "debug": false, "prof_ops": ["all_reduce", "all_gather"] } Compression Note: Compression has seven different components, including layer reduction, weight quantization, activation quantization, sparse pruning, row pruning, head pruning, and channel pruning. We explain them one by one with simple json examples. Read more about how to use the DeepSpeed Compression library in our tutorial. Layer Reduction Note: Layer reduction works much better when using knowledage distillation (learn more in our tutorial): "compression_training": { "layer_reduction": { "enabled": true, "keep_number_layer": 5, "module_name_prefix": "bert.encoder.layer", "teacher_layer": [ 2, 4, 6, 8, 10 ], "other_module_name": [ "bert.pooler", "bert.embeddings", "classifier" ] } } layer_reduction: [dictionary] Fields Value Default enabled: [boolean] Enable layer reduction or not. false keep_number_layer: [list] The number of layer in the model to be kept. N/A module_name_prefix: [str] The (uniform) name prefix of the model’s modules of which the associated weight parameters are to be reinitialized. N/A teacher_layer: [list] The layer of the weight parameters are to be reinitialized. The length of the list equals to ‘keep_number_layer’. N/A other_module_name: [list] The name of modules of which the associated weight parameters are to be reinitialized. It is an complemenatory or alternative of module_name_prefix. For instance, “other_module_name”: [“bert.encoder.layer.2”,”bert.encoder.layer.4”] equals to “module_name_prefix”:”bert.encoder.layer” and “teacher_layer”: [2,4]. N/A Weight Quantization "compression_training": { "weight_quantization": { "shared_parameters":{ "enabled": true, "quantizer_kernel": false, "schedule_offset": 0, "quantize_groups": 1, "quantize_verbose": false, "quantization_type": "symmetric", "rounding": "nearest", "quantize_weight_in_forward": false, "fp16_mixed_quantize":{ "enabled": false, "quantize_change_ratio": 0.001 } }, "different_groups":{ "wq1": { "params": { "start_bits": 8, "target_bits": 8, "quantization_period": 50 }, "modules": [ "attention.self", "intermediate" ] }, "wq2": { "params": { "start_bits": 4, "target_bits": 4, "quantization_period": 50 }, "modules": [ "attention.output" ] } } } } shared_parameters: [dictionary] Shared parameters for all weight quantization groups. Fields Value Default enabled: [boolean] Enable weight quantization or not. false quantizer_kernel: [boolean] Use DeepSpeed quantization kernel for >=4 bit quantization. This can only be enabled when using DeepSpeed FP16 optimizer. false schedule_offset: [integer] Enable weight quantization after scheduled steps (can be treated as warmup steps). 0 quantize_groups: [integer] Split the weight matrix into different number of groups, and each of them has its own scaling factor. 1 quantize_verbose: [boolean] Print the quantization related logs. false quantization_type: [string] Choose the quantization algorithm, symmetric or asymmetric. "symmetric" rounding: [string] Rounding algorithm associated with quantization, nearest or stochastic. "nearest" quantize_weight_in_forward: [boolean] Quantize weight in optimizer or forward step, must set to be true for FP32 optimizer training. false fp16_mixed_quantize: [dictionary] Using the value mixed by FP16 value and the quantized value. N/A enabled: [boolean] Whether fp16 mixed quantization is enabled. false quantize_change_ratio: [float] Initial quantize value ratio, will gradually increase to 1. 0.001 different_groups: [dictionary] Different quantization sets, this is used for different quantization parameters. In this example, we give two different sets. In practice, you can choose the number of sets based on your requirements. Fields Value Default params: [dictionary] start_bits: [integer] Quantization starting bits, will gradaully reduce to target bits. 8 target_bits: [integer] Quantization target bits, need to be <= start_bits. 8 quantization_period: [integer] For every n steps, the quantization bits will be reduce by 1. 1 modules: [list] Scope of weight parameters associated to the params setting. "All Linear and CONV2D layers" Activation Quantization "compression_training": { "activation_quantization": { "shared_parameters":{ "enabled": true, "quantization_type": "asymmetric", "range_calibration": "dynamic", "schedule_offset": 50 }, "different_groups":{ "aq1": { "params": { "bits": 8 }, "modules": [ "attention.output" ] } } } shared_parameters: [dictionary] Shared parameters for all activation quantization groups. Fields Value Default enabled: [boolean] Enable activation quantization or not. false quantization_type: [string] Choose the quantization algorithm, symmetric or asymmetric. "symmetric" range_calibration: [string] Using dynamic (per token or per image) or static (fixed min/max using momentum) for inference. "static" schedule_offset: [integer] Enable activation quantization after scheduled steps (can be treated as warmup steps). 0 different_groups: [dictionary] Different quantization sets, this is used for different quantization parameters. In this example, we give one set. In practice, you can choose the number of sets based on your requirements. Fields Value Default params: [dictionary] bits: [integer] Number of bits used for activation target bits, need to be >= 4. 8 modules: [list] Scope of weight parameters associated to the params setting. "All Linear and CONV2D layers" Sparse Pruning "compression_training": { "sparse_pruning":{ "shared_parameters":{ "enabled": true, "schedule_offset": 30, "method": "l1" }, "different_groups":{ "sp1": { "params": { "dense_ratio": 0.5 }, "modules": [ "attention.self" ] } } } } "compression_training": { "sparse_pruning":{ "shared_parameters":{ "enabled": true, "schedule_offset": 30, "schedule_offset_end": 90, "schedule_offset_stride": 15, "method": "snip_momentum", "block_pattern": "4x1", "dense_ratio": 0.4, "excluded_modules": ['classifier', 'pooler'] }, "different_groups":{ } } } shared_parameters: [dictionary] Shared parameters for all sparse pruning groups. Fields Value Default enabled: [boolean] Enable sparse pruning or not. false schedule_offset: [integer] Enable sparse pruning after scheduled steps (can be treated as warmup steps). 0 schedule_offset_end: [integer] Disable sparse pruning after scheduled steps, mandotory for snip_momentum. 0 schedule_offset_stride: [integer] The stride of pruning on training steps, mandotory for snip_momentum. "1" method: [string] Choose different pruning methods, l1 (static, magnitude based), topk (dynamic, learnable) or snip_momentum (structured pruning). "l1" block_pattern: [string] Choose different structured pruning block patterns, NxM or N:M (N and M are integers). For instance, “4x1” or “2:4” are common block patterns, mandotory for snip_momentum. "4x1" dense_ratio: [float] Used to get the targeted global sparsity ratio, mandotory for snip_momentum. "0.1" excluded_modules: [list] Excluded pruning scope on some special modules like output layer. [] different_groups: [dictionary] Different pruning sets, this is used for different pruning parameters. In this example, we give one set. In practice, you can choose the number of sets based on your requirements. Note for snip_momentum method, you can leave it as empty. Fields Value Default params: [dictionary] dense_ratio: [float] The percentage of weights to keep after pruning. 0.5 modules: [list] Scope of weight parameters associated to the params setting. "All Linear and CONV2D layers" Row Pruning Note: Row Pruning is a feature designed for two back-to-back linear layers (e.g., Feed Forward Network in Transformers). As such, we suggested use row pruning for the first linear layer (i.e., the intermediate.dense layer for BERT). Reducing the row dimension of this matrix can help reducing the column of the follow-up matrix (i.e., layer.\\w+.output.dense layer for BERT). It should also work for other linear layers as well. "compression_training": { "row_pruning":{ "shared_parameters":{ "enabled": true, "schedule_offset": 20, "method": "topk" }, "different_groups":{ "rp1": { "params": { "dense_ratio": 0.5 }, "modules": [ "intermediate.dense" ], "related_modules":[ ["layer.\\w+.output.dense"] ] } } } } shared_parameters: [dictionary] Shared parameters for all row pruning groups. Fields Value Default enabled: [boolean] Enable row pruning or not. false schedule_offset: [integer] Enable row pruning after scheduled steps (can be treated as warmup steps). 0 method: [string] Choose different pruning methods, l1 (static, magnitude based) or topk (dynamic, learnable). "l1" different_groups: [dictionary] Different pruning sets, this is used for different pruning parameters. In this example, we give one set. In practice, you can choose the number of sets based on your requirements. Fields Value Default params: [dictionary] dense_ratio: [float] The percentage of weights to keep after pruning. 0.5 modules: [list] Scope of weight parameters associated to the params setting. "All Linear and CONV2D layers" related_modules: [list[list]] Related module to the row pruned module, which can be performed column pruning. None Head Pruning Note: Head Pruning is a feature designed for two attention layers (e.g., Multi Head Attention in Transformers). For now, it can only be applied to output matrix of the Transformer (i.e., attention.output.dense in BERT). Pruning the output matrix can lead to the pruning of Query/Key/Value matrix as well. "compression_training": { "head_pruning":{ "shared_parameters":{ "enabled": true, "schedule_offset": 10, "method": "topk", "num_heads": 12 }, "different_groups":{ "rp1": { "params": { "dense_ratio": 0.5 }, "modules": [ "attention.output.dense" ], "related_modules":[ ["self.query", "self.key", "self.value"] ] } } } } shared_parameters: [dictionary] Shared parameters for all head pruning groups. Fields Value Default enabled: [boolean] Enable head pruning or not. false schedule_offset: [integer] Enable head pruning after scheduled steps (can be treated as warmup steps). 0 method: [string] Choose different pruning methods. For now, we only support topk (dynamic, learnable). "topk" num_heads: [int] Number of heads (must be provided by user). N/A different_groups: [dictionary] Different pruning sets, this is used for different pruning parameters. In this example, we give one set. In practice, you can choose the number of sets based on your requirements. Fields Value Default params: [dictionary] dense_ratio: [float] The percentage of weights to keep after pruning. 0.5 modules: [list] Scope of weight parameters associated to the params setting. "All Linear and CONV2D layers" related_modules: [list[list]] Related module (Usually Q/K/V) to the head pruned module (i.e., the output matrix). For now, this feature only works for BERT. None Channel Pruning Note: Channel Pruning is a feature designed for two back-to-back CONV2d layers (e.g., residual connection in ResNet). As such, we suggested use channel pruning for the first CONV2d layer. Reducing the number of output channels of this layer can help reducing the number of input channels the follow-up layer. It should also work for other CONV2d layers as well. "compression_training": { "channel_pruning":{ "shared_parameters":{ "enabled": true, "schedule_offset": 0, "method": "topk" }, "different_groups":{ "cp1": { "params": { "dense_ratio": 0.5 }, "modules": [ "layer....conv1" ], "related_modules": [ ["layer....conv2", "layer....bn1"] ] } } } } shared_parameters: [dictionary] Shared parameters for all channel pruning groups. Fields Value Default enabled: [boolean] Enable channel pruning or not. false schedule_offset: [integer] Enable channel pruning after scheduled steps (can be treated as warmup steps). 0 method: [string] Choose different pruning methods, l1 (static, magnitude based) or topk (dynamic, learnable). "l1" different_groups: [dictionary] Different pruning sets, this is used for different pruning parameters. In this example, we give one set. In practice, you can choose the number of sets based on your requirements. Fields Value Default params: [dictionary] dense_ratio: [float] The percentage of weights to keep after pruning. 0.5 modules: [list] Scope of weight parameters associated to the params setting. "All CONV2D layers" related_modules: [list[list]] Related module to the channel pruned module. None Checkpoint options "checkpoint": { "tag_validation"="Warn", "load_universal"=false, "use_node_local_storage"=false, "parallel_write":{ "pipeline_stage": false } } tag_validation: [“Ignore” “Warn” “Fail”] Description Default Enables level of checking to ensure checkpoint tags are consistent across all ranks. Useful when restoring with different world sizes. “Warn” load_universal: [boolean] Description Default Load the latest checkpoint for all. false use_node_local_storage: [boolean] Description Default If true DeepSpeed will store model parameter states and checkpoint states based on local rank allowing checkpoints to be loaded without access to a shared filesystem. false pipeline_stage: [boolean] Description Default Use pipeline stages to parallelize the writing of checkpoints. false Data Type options "data_types": { "grad_accum_dtype"=["fp32"|"fp16"|"bf16"] } } grad_accum_dtype: [“fp32” “fp16” “bf16”] Description Default Specifies the data type in which to do gradient accumulation. If None the default is to match the model type. None
65
-
66
- ```
67
- 32
68
- ```
69
-
70
- **Pattern 8:** Monitor Contents Overview Usage Automatic Monitoring Custom Monitoring In this tutorial, we introduce the DeepSpeed Monitor and provide examples of its usage. Overview Usage Overview Monitoring model and system metrics during training is vital to ensure hardware resources are fully utilized. The DeepSpeed Monitor enables live logging of metrics through one or more monitoring backends such as PyTorch’s TensorBoard, WandB, Comet and simple CSV files. Below is a live monitoring view for TensorBoard: Below is a live monitoring view for WandB: Below is a live monitoring view for Comet: Usage The DeepSpeed Monitor is configured within the deepspeed configuration file. DeepSpeed will automatically monitor key training metrics, including those tracked with the wall_clock_breakdown configuration option. In addition, users can log their own custom events and metrics. Automatic Monitoring Custom Monitoring Automatic Monitoring When using DeepSpeed for model training, the Monitor can be configured in the DeepSpeed configuration file. No explicit API calls are needed to use the Monitor. The Monitor can be enabled by adding the following field to DeepSpeed’s configuration json file. Refer to Monitoring for details. { "tensorboard": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } "wandb": { "enabled": true, "team": "my_team", "group": "my_group", "project": "my_project" } "comet": { "enabled": true, "project": "my_project", "experiment_name": "my_experiment" } "csv_monitor": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } } DeepSpeed will automatically log to all available and enabled monitoring backends listed in the config, and will generate live monitoring views such as those listed above. Custom Monitoring In addition to automatic monitoring, users can log their own custom metrics in client scripts. Currently, there are two ways to initialize Monitor objects: (Recommended) - Create a MonitorMaster(ds_config.monitor_config) object, which automatically initializes all monitor backends present in the DeepSpeed configuration Create a specific TensorBoardMonitor(ds_config.monitor_config), WandbMonitor(ds_config.monitor_config), csvMonitor(ds_config.monitor_config) object which will only initialize a specific monitor backend present in the DeepSpeed configuration The steps to create a custom monitor are as follows: Add import to your desired Monitor Initialize monitor with DeepSpeed config’s monitor_config Create a list of one or more 3-tuples in the format [("label", value, ds_engine.global_samples), ...]* Call monitor.write_events on the list from step 3 * Note - Some Monitor backends don’t support mixed sample values. Be sure to use your DeepSpeed engine object’s global_samples attribute in each 3-tuple For example usage, see the following modified DeepSpeedExamples/cifar example: # Step 1: Import monitor (and DeepSpeed config, if needed) from deepspeed.monitor.monitor import MonitorMaster from deepspeed.runtime.config import DeepSpeedConfig # Step 2: Initialized monitor with DeepSpeed config (get DeepSpeed config object, if needed) ds_config = DeepSpeedConfig("ds_config.json") monitor = MonitorMaster(ds_config.monitor_config) for epoch in range(2): running_loss = 0.0 for i, data in enumerate(trainloader): pre = time.time() inputs, labels = data[0].to(model_engine.local_rank), data[1].to( model_engine.local_rank) if fp16: inputs = inputs.half() outputs = model_engine(inputs) loss = criterion(outputs, labels) model_engine.backward(loss) model_engine.step() post = time.time() # Step 3: Create list of 3-tuple records (single entry in this case) events = [("Time per step", post-pre, model_engine.global_samples)] # Step 4: Call monitor.write_events on the list from step 3 monitor.write_events(events) Updated: November 5, 2025 Previous Next
71
-
72
- ```
73
- wall_clock_breakdown
74
- ```
75
-
76
- ### Example Code Patterns
77
-
78
- **Example 1** (python):
79
- ```python
80
- ### Create aio_handle
81
- from deepspeed.ops.op_builder import AsyncIOBuilder
82
- aio_handle = AsyncIOBuilder().load().aio_handle()
83
- ```
84
-
85
- ## Reference Files
86
-
87
- This skill includes comprehensive documentation in `references/`:
88
-
89
- - **08.md** - 08 documentation
90
- - **09.md** - 09 documentation
91
- - **2020.md** - 2020 documentation
92
- - **2023.md** - 2023 documentation
93
- - **assets.md** - Assets documentation
94
- - **mii.md** - Mii documentation
95
- - **other.md** - Other documentation
96
- - **tutorials.md** - Tutorials documentation
97
-
98
- Use `view` to read specific reference files when detailed information is needed.
99
-
100
- ## Working with This Skill
101
-
102
- ### For Beginners
103
- Start with the getting_started or tutorials reference files for foundational concepts.
104
-
105
- ### For Specific Features
106
- Use the appropriate category reference file (api, guides, etc.) for detailed information.
107
-
108
- ### For Code Examples
109
- The quick reference section above contains common patterns extracted from the official docs.
110
-
111
- ## Resources
112
-
113
- ### references/
114
- Organized documentation extracted from official sources. These files contain:
115
- - Detailed explanations
116
- - Code examples with language annotations
117
- - Links to original documentation
118
- - Table of contents for quick navigation
119
-
120
- ### scripts/
121
- Add helper scripts here for common automation tasks.
122
-
123
- ### assets/
124
- Add templates, boilerplate, or example projects here.
125
-
126
- ## Notes
127
-
128
- - This skill was automatically generated from official documentation
129
- - Reference files preserve the structure and examples from source docs
130
- - Code examples include language detection for better syntax highlighting
131
- - Quick reference patterns are extracted from common usage examples in the docs
132
-
133
- ## Updating
134
-
135
- To refresh this skill with updated documentation:
136
- 1. Re-run the scraper with the same configuration
137
- 2. The skill will be rebuilt with the latest information
138
-
139
-
140
-
141
-
@@ -1,17 +0,0 @@
1
- # Deepspeed - 08
2
-
3
- **Pages:** 1
4
-
5
- ---
6
-
7
- ## DeepSpeed powers 8x larger MoE model training with high performance
8
-
9
- **URL:** https://www.deepspeed.ai/2021/08/17/deepspeed-moe.html
10
-
11
- **Contents:**
12
- - DeepSpeed powers 8x larger MoE model training with high performance
13
- - Contents
14
-
15
- Updated: August 17, 2021
16
-
17
- ---