scientific-writer 2.2.1__py3-none-any.whl → 2.2.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of scientific-writer might be problematic. Click here for more details.
- scientific_writer/.claude/WRITER.md +748 -0
- scientific_writer/.claude/settings.local.json +30 -0
- scientific_writer/.claude/skills/citation-management/SKILL.md +1046 -0
- scientific_writer/.claude/skills/citation-management/assets/bibtex_template.bib +264 -0
- scientific_writer/.claude/skills/citation-management/assets/citation_checklist.md +386 -0
- scientific_writer/.claude/skills/citation-management/references/bibtex_formatting.md +908 -0
- scientific_writer/.claude/skills/citation-management/references/citation_validation.md +794 -0
- scientific_writer/.claude/skills/citation-management/references/google_scholar_search.md +725 -0
- scientific_writer/.claude/skills/citation-management/references/metadata_extraction.md +870 -0
- scientific_writer/.claude/skills/citation-management/references/pubmed_search.md +839 -0
- scientific_writer/.claude/skills/citation-management/scripts/doi_to_bibtex.py +204 -0
- scientific_writer/.claude/skills/citation-management/scripts/extract_metadata.py +569 -0
- scientific_writer/.claude/skills/citation-management/scripts/format_bibtex.py +349 -0
- scientific_writer/.claude/skills/citation-management/scripts/search_google_scholar.py +282 -0
- scientific_writer/.claude/skills/citation-management/scripts/search_pubmed.py +398 -0
- scientific_writer/.claude/skills/citation-management/scripts/validate_citations.py +497 -0
- scientific_writer/.claude/skills/clinical-reports/IMPLEMENTATION_SUMMARY.md +641 -0
- scientific_writer/.claude/skills/clinical-reports/README.md +236 -0
- scientific_writer/.claude/skills/clinical-reports/SKILL.md +1088 -0
- scientific_writer/.claude/skills/clinical-reports/assets/case_report_template.md +352 -0
- scientific_writer/.claude/skills/clinical-reports/assets/clinical_trial_csr_template.md +353 -0
- scientific_writer/.claude/skills/clinical-reports/assets/clinical_trial_sae_template.md +359 -0
- scientific_writer/.claude/skills/clinical-reports/assets/consult_note_template.md +305 -0
- scientific_writer/.claude/skills/clinical-reports/assets/discharge_summary_template.md +453 -0
- scientific_writer/.claude/skills/clinical-reports/assets/hipaa_compliance_checklist.md +395 -0
- scientific_writer/.claude/skills/clinical-reports/assets/history_physical_template.md +305 -0
- scientific_writer/.claude/skills/clinical-reports/assets/lab_report_template.md +309 -0
- scientific_writer/.claude/skills/clinical-reports/assets/pathology_report_template.md +249 -0
- scientific_writer/.claude/skills/clinical-reports/assets/quality_checklist.md +338 -0
- scientific_writer/.claude/skills/clinical-reports/assets/radiology_report_template.md +318 -0
- scientific_writer/.claude/skills/clinical-reports/assets/soap_note_template.md +253 -0
- scientific_writer/.claude/skills/clinical-reports/references/case_report_guidelines.md +570 -0
- scientific_writer/.claude/skills/clinical-reports/references/clinical_trial_reporting.md +693 -0
- scientific_writer/.claude/skills/clinical-reports/references/data_presentation.md +530 -0
- scientific_writer/.claude/skills/clinical-reports/references/diagnostic_reports_standards.md +629 -0
- scientific_writer/.claude/skills/clinical-reports/references/medical_terminology.md +588 -0
- scientific_writer/.claude/skills/clinical-reports/references/patient_documentation.md +744 -0
- scientific_writer/.claude/skills/clinical-reports/references/peer_review_standards.md +585 -0
- scientific_writer/.claude/skills/clinical-reports/references/regulatory_compliance.md +577 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/check_deidentification.py +346 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/compliance_checker.py +78 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/extract_clinical_data.py +102 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/format_adverse_events.py +103 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/generate_report_template.py +163 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/terminology_validator.py +133 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/validate_case_report.py +334 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/validate_trial_report.py +89 -0
- scientific_writer/.claude/skills/document-skills/docx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/docx/SKILL.md +197 -0
- scientific_writer/.claude/skills/document-skills/docx/docx-js.md +350 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/mce/mc.xsd +75 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/pack.py +159 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/unpack.py +29 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validate.py +69 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/__init__.py +15 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/base.py +951 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/docx.py +274 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/pptx.py +315 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/redlining.py +279 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml.md +610 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/__init__.py +1 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/document.py +1276 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/comments.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsExtended.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsExtensible.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsIds.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/people.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/utilities.py +374 -0
- scientific_writer/.claude/skills/document-skills/pdf/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/pdf/SKILL.md +294 -0
- scientific_writer/.claude/skills/document-skills/pdf/forms.md +205 -0
- scientific_writer/.claude/skills/document-skills/pdf/reference.md +612 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_bounding_boxes.py +70 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_bounding_boxes_test.py +226 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_fillable_fields.py +12 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/convert_pdf_to_images.py +35 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/create_validation_image.py +41 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/extract_form_field_info.py +152 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/fill_fillable_fields.py +114 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/fill_pdf_form_with_annotations.py +108 -0
- scientific_writer/.claude/skills/document-skills/pptx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/pptx/SKILL.md +484 -0
- scientific_writer/.claude/skills/document-skills/pptx/html2pptx.md +625 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/mce/mc.xsd +75 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/pack.py +159 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/unpack.py +29 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validate.py +69 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/__init__.py +15 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/base.py +951 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/docx.py +274 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/pptx.py +315 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/redlining.py +279 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml.md +427 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/html2pptx.js +979 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/inventory.py +1020 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/rearrange.py +231 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/replace.py +385 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/thumbnail.py +450 -0
- scientific_writer/.claude/skills/document-skills/xlsx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/xlsx/SKILL.md +289 -0
- scientific_writer/.claude/skills/document-skills/xlsx/recalc.py +178 -0
- scientific_writer/.claude/skills/hypothesis-generation/SKILL.md +155 -0
- scientific_writer/.claude/skills/hypothesis-generation/assets/hypothesis_output_template.md +302 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/experimental_design_patterns.md +327 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/hypothesis_quality_criteria.md +196 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/literature_search_strategies.md +505 -0
- scientific_writer/.claude/skills/latex-posters/README.md +417 -0
- scientific_writer/.claude/skills/latex-posters/SKILL.md +919 -0
- scientific_writer/.claude/skills/latex-posters/assets/baposter_template.tex +257 -0
- scientific_writer/.claude/skills/latex-posters/assets/beamerposter_template.tex +244 -0
- scientific_writer/.claude/skills/latex-posters/assets/poster_quality_checklist.md +358 -0
- scientific_writer/.claude/skills/latex-posters/assets/tikzposter_template.tex +251 -0
- scientific_writer/.claude/skills/latex-posters/references/latex_poster_packages.md +745 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_content_guide.md +748 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_design_principles.md +806 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_layout_design.md +900 -0
- scientific_writer/.claude/skills/latex-posters/scripts/review_poster.sh +214 -0
- scientific_writer/.claude/skills/literature-review/SKILL.md +546 -0
- scientific_writer/.claude/skills/literature-review/assets/review_template.md +412 -0
- scientific_writer/.claude/skills/literature-review/references/citation_styles.md +166 -0
- scientific_writer/.claude/skills/literature-review/references/database_strategies.md +381 -0
- scientific_writer/.claude/skills/literature-review/scripts/generate_pdf.py +176 -0
- scientific_writer/.claude/skills/literature-review/scripts/search_databases.py +303 -0
- scientific_writer/.claude/skills/literature-review/scripts/verify_citations.py +222 -0
- scientific_writer/.claude/skills/markitdown/INSTALLATION_GUIDE.md +318 -0
- scientific_writer/.claude/skills/markitdown/LICENSE.txt +22 -0
- scientific_writer/.claude/skills/markitdown/OPENROUTER_INTEGRATION.md +359 -0
- scientific_writer/.claude/skills/markitdown/QUICK_REFERENCE.md +309 -0
- scientific_writer/.claude/skills/markitdown/README.md +184 -0
- scientific_writer/.claude/skills/markitdown/SKILL.md +450 -0
- scientific_writer/.claude/skills/markitdown/SKILL_SUMMARY.md +307 -0
- scientific_writer/.claude/skills/markitdown/assets/example_usage.md +463 -0
- scientific_writer/.claude/skills/markitdown/references/api_reference.md +399 -0
- scientific_writer/.claude/skills/markitdown/references/file_formats.md +542 -0
- scientific_writer/.claude/skills/markitdown/scripts/batch_convert.py +228 -0
- scientific_writer/.claude/skills/markitdown/scripts/convert_literature.py +283 -0
- scientific_writer/.claude/skills/markitdown/scripts/convert_with_ai.py +243 -0
- scientific_writer/.claude/skills/paper-2-web/SKILL.md +455 -0
- scientific_writer/.claude/skills/paper-2-web/references/installation.md +141 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2poster.md +346 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2video.md +305 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2web.md +187 -0
- scientific_writer/.claude/skills/paper-2-web/references/usage_examples.md +436 -0
- scientific_writer/.claude/skills/peer-review/SKILL.md +375 -0
- scientific_writer/.claude/skills/peer-review/references/common_issues.md +552 -0
- scientific_writer/.claude/skills/peer-review/references/reporting_standards.md +290 -0
- scientific_writer/.claude/skills/research-grants/README.md +285 -0
- scientific_writer/.claude/skills/research-grants/SKILL.md +896 -0
- scientific_writer/.claude/skills/research-grants/assets/budget_justification_template.md +453 -0
- scientific_writer/.claude/skills/research-grants/assets/nih_specific_aims_template.md +166 -0
- scientific_writer/.claude/skills/research-grants/assets/nsf_project_summary_template.md +92 -0
- scientific_writer/.claude/skills/research-grants/references/broader_impacts.md +392 -0
- scientific_writer/.claude/skills/research-grants/references/darpa_guidelines.md +636 -0
- scientific_writer/.claude/skills/research-grants/references/doe_guidelines.md +586 -0
- scientific_writer/.claude/skills/research-grants/references/nih_guidelines.md +851 -0
- scientific_writer/.claude/skills/research-grants/references/nsf_guidelines.md +570 -0
- scientific_writer/.claude/skills/research-grants/references/specific_aims_guide.md +458 -0
- scientific_writer/.claude/skills/research-lookup/README.md +116 -0
- scientific_writer/.claude/skills/research-lookup/SKILL.md +443 -0
- scientific_writer/.claude/skills/research-lookup/examples.py +174 -0
- scientific_writer/.claude/skills/research-lookup/lookup.py +93 -0
- scientific_writer/.claude/skills/research-lookup/research_lookup.py +335 -0
- scientific_writer/.claude/skills/research-lookup/scripts/research_lookup.py +261 -0
- scientific_writer/.claude/skills/scholar-evaluation/SKILL.md +254 -0
- scientific_writer/.claude/skills/scholar-evaluation/references/evaluation_framework.md +663 -0
- scientific_writer/.claude/skills/scholar-evaluation/scripts/calculate_scores.py +378 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/SKILL.md +530 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/common_biases.md +364 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/evidence_hierarchy.md +484 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/experimental_design.md +496 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/logical_fallacies.md +478 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/scientific_method.md +169 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/statistical_pitfalls.md +506 -0
- scientific_writer/.claude/skills/scientific-schematics/SKILL.md +2035 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/block_diagram_template.tex +199 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/circuit_template.tex +159 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/flowchart_template.tex +161 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/pathway_template.tex +162 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/tikz_styles.tex +422 -0
- scientific_writer/.claude/skills/scientific-schematics/references/best_practices.md +562 -0
- scientific_writer/.claude/skills/scientific-schematics/references/diagram_types.md +637 -0
- scientific_writer/.claude/skills/scientific-schematics/references/python_libraries.md +791 -0
- scientific_writer/.claude/skills/scientific-schematics/references/tikz_guide.md +734 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/circuit_generator.py +307 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/compile_tikz.py +292 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/generate_flowchart.py +281 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/pathway_diagram.py +406 -0
- scientific_writer/.claude/skills/scientific-writing/SKILL.md +443 -0
- scientific_writer/.claude/skills/scientific-writing/references/citation_styles.md +720 -0
- scientific_writer/.claude/skills/scientific-writing/references/figures_tables.md +806 -0
- scientific_writer/.claude/skills/scientific-writing/references/imrad_structure.md +658 -0
- scientific_writer/.claude/skills/scientific-writing/references/reporting_guidelines.md +748 -0
- scientific_writer/.claude/skills/scientific-writing/references/writing_principles.md +824 -0
- scientific_writer/.claude/skills/treatment-plans/README.md +483 -0
- scientific_writer/.claude/skills/treatment-plans/SKILL.md +817 -0
- scientific_writer/.claude/skills/treatment-plans/assets/chronic_disease_management_plan.tex +636 -0
- scientific_writer/.claude/skills/treatment-plans/assets/general_medical_treatment_plan.tex +616 -0
- scientific_writer/.claude/skills/treatment-plans/assets/mental_health_treatment_plan.tex +745 -0
- scientific_writer/.claude/skills/treatment-plans/assets/pain_management_plan.tex +770 -0
- scientific_writer/.claude/skills/treatment-plans/assets/perioperative_care_plan.tex +724 -0
- scientific_writer/.claude/skills/treatment-plans/assets/quality_checklist.md +471 -0
- scientific_writer/.claude/skills/treatment-plans/assets/rehabilitation_treatment_plan.tex +727 -0
- scientific_writer/.claude/skills/treatment-plans/references/goal_setting_frameworks.md +411 -0
- scientific_writer/.claude/skills/treatment-plans/references/intervention_guidelines.md +507 -0
- scientific_writer/.claude/skills/treatment-plans/references/regulatory_compliance.md +476 -0
- scientific_writer/.claude/skills/treatment-plans/references/specialty_specific_guidelines.md +607 -0
- scientific_writer/.claude/skills/treatment-plans/references/treatment_plan_standards.md +456 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/check_completeness.py +318 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/generate_template.py +244 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/timeline_generator.py +369 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/validate_treatment_plan.py +367 -0
- scientific_writer/.claude/skills/venue-templates/SKILL.md +590 -0
- scientific_writer/.claude/skills/venue-templates/assets/grants/nih_specific_aims.tex +235 -0
- scientific_writer/.claude/skills/venue-templates/assets/grants/nsf_proposal_template.tex +375 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/nature_article.tex +171 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/neurips_article.tex +283 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/plos_one.tex +317 -0
- scientific_writer/.claude/skills/venue-templates/assets/posters/beamerposter_academic.tex +311 -0
- scientific_writer/.claude/skills/venue-templates/references/conferences_formatting.md +564 -0
- scientific_writer/.claude/skills/venue-templates/references/grants_requirements.md +787 -0
- scientific_writer/.claude/skills/venue-templates/references/journals_formatting.md +486 -0
- scientific_writer/.claude/skills/venue-templates/references/posters_guidelines.md +628 -0
- scientific_writer/.claude/skills/venue-templates/scripts/customize_template.py +206 -0
- scientific_writer/.claude/skills/venue-templates/scripts/query_template.py +260 -0
- scientific_writer/.claude/skills/venue-templates/scripts/validate_format.py +255 -0
- scientific_writer/__init__.py +1 -1
- scientific_writer/api.py +9 -5
- scientific_writer/cli.py +9 -5
- scientific_writer/core.py +28 -5
- {scientific_writer-2.2.1.dist-info → scientific_writer-2.2.3.dist-info}/METADATA +1 -1
- scientific_writer-2.2.3.dist-info/RECORD +312 -0
- scientific_writer-2.2.1.dist-info/RECORD +0 -11
- {scientific_writer-2.2.1.dist-info → scientific_writer-2.2.3.dist-info}/WHEEL +0 -0
- {scientific_writer-2.2.1.dist-info → scientific_writer-2.2.3.dist-info}/entry_points.txt +0 -0
- {scientific_writer-2.2.1.dist-info → scientific_writer-2.2.3.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,450 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: markitdown
|
|
3
|
+
description: "Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more."
|
|
4
|
+
license: MIT
|
|
5
|
+
source: https://github.com/microsoft/markitdown
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# MarkItDown - File to Markdown Conversion
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
MarkItDown is a Python tool developed by Microsoft for converting various file formats to Markdown. It's particularly useful for converting documents into LLM-friendly text format, as Markdown is token-efficient and well-understood by modern language models.
|
|
13
|
+
|
|
14
|
+
**Key Benefits**:
|
|
15
|
+
- Convert documents to clean, structured Markdown
|
|
16
|
+
- Token-efficient format for LLM processing
|
|
17
|
+
- Supports 15+ file formats
|
|
18
|
+
- Optional AI-enhanced image descriptions
|
|
19
|
+
- OCR for images and scanned documents
|
|
20
|
+
- Speech transcription for audio files
|
|
21
|
+
|
|
22
|
+
## Supported Formats
|
|
23
|
+
|
|
24
|
+
| Format | Description | Notes |
|
|
25
|
+
|--------|-------------|-------|
|
|
26
|
+
| **PDF** | Portable Document Format | Full text extraction |
|
|
27
|
+
| **DOCX** | Microsoft Word | Tables, formatting preserved |
|
|
28
|
+
| **PPTX** | PowerPoint | Slides with notes |
|
|
29
|
+
| **XLSX** | Excel spreadsheets | Tables and data |
|
|
30
|
+
| **Images** | JPEG, PNG, GIF, WebP | EXIF metadata + OCR |
|
|
31
|
+
| **Audio** | WAV, MP3 | Metadata + transcription |
|
|
32
|
+
| **HTML** | Web pages | Clean conversion |
|
|
33
|
+
| **CSV** | Comma-separated values | Table format |
|
|
34
|
+
| **JSON** | JSON data | Structured representation |
|
|
35
|
+
| **XML** | XML documents | Structured format |
|
|
36
|
+
| **ZIP** | Archive files | Iterates contents |
|
|
37
|
+
| **EPUB** | E-books | Full text extraction |
|
|
38
|
+
| **YouTube** | Video URLs | Fetch transcriptions |
|
|
39
|
+
|
|
40
|
+
## Quick Start
|
|
41
|
+
|
|
42
|
+
### Installation
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
# Install with all features
|
|
46
|
+
pip install 'markitdown[all]'
|
|
47
|
+
|
|
48
|
+
# Or from source
|
|
49
|
+
git clone https://github.com/microsoft/markitdown.git
|
|
50
|
+
cd markitdown
|
|
51
|
+
pip install -e 'packages/markitdown[all]'
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Command-Line Usage
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# Basic conversion
|
|
58
|
+
markitdown document.pdf > output.md
|
|
59
|
+
|
|
60
|
+
# Specify output file
|
|
61
|
+
markitdown document.pdf -o output.md
|
|
62
|
+
|
|
63
|
+
# Pipe content
|
|
64
|
+
cat document.pdf | markitdown > output.md
|
|
65
|
+
|
|
66
|
+
# Enable plugins
|
|
67
|
+
markitdown --list-plugins # List available plugins
|
|
68
|
+
markitdown --use-plugins document.pdf -o output.md
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Python API
|
|
72
|
+
|
|
73
|
+
```python
|
|
74
|
+
from markitdown import MarkItDown
|
|
75
|
+
|
|
76
|
+
# Basic usage
|
|
77
|
+
md = MarkItDown()
|
|
78
|
+
result = md.convert("document.pdf")
|
|
79
|
+
print(result.text_content)
|
|
80
|
+
|
|
81
|
+
# Convert from stream
|
|
82
|
+
with open("document.pdf", "rb") as f:
|
|
83
|
+
result = md.convert_stream(f, file_extension=".pdf")
|
|
84
|
+
print(result.text_content)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Advanced Features
|
|
88
|
+
|
|
89
|
+
### 1. AI-Enhanced Image Descriptions
|
|
90
|
+
|
|
91
|
+
Use LLMs via OpenRouter to generate detailed image descriptions (for PPTX and image files):
|
|
92
|
+
|
|
93
|
+
```python
|
|
94
|
+
from markitdown import MarkItDown
|
|
95
|
+
from openai import OpenAI
|
|
96
|
+
|
|
97
|
+
# Initialize OpenRouter client (OpenAI-compatible API)
|
|
98
|
+
client = OpenAI(
|
|
99
|
+
api_key="your-openrouter-api-key",
|
|
100
|
+
base_url="https://openrouter.ai/api/v1"
|
|
101
|
+
)
|
|
102
|
+
|
|
103
|
+
md = MarkItDown(
|
|
104
|
+
llm_client=client,
|
|
105
|
+
llm_model="anthropic/claude-sonnet-4.5", # recommended for scientific vision
|
|
106
|
+
llm_prompt="Describe this image in detail for scientific documentation"
|
|
107
|
+
)
|
|
108
|
+
|
|
109
|
+
result = md.convert("presentation.pptx")
|
|
110
|
+
print(result.text_content)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### 2. Azure Document Intelligence
|
|
114
|
+
|
|
115
|
+
For enhanced PDF conversion with Microsoft Document Intelligence:
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
# Command line
|
|
119
|
+
markitdown document.pdf -o output.md -d -e "<document_intelligence_endpoint>"
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
```python
|
|
123
|
+
# Python API
|
|
124
|
+
from markitdown import MarkItDown
|
|
125
|
+
|
|
126
|
+
md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
|
|
127
|
+
result = md.convert("complex_document.pdf")
|
|
128
|
+
print(result.text_content)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### 3. Plugin System
|
|
132
|
+
|
|
133
|
+
MarkItDown supports 3rd-party plugins for extending functionality:
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
# List installed plugins
|
|
137
|
+
markitdown --list-plugins
|
|
138
|
+
|
|
139
|
+
# Enable plugins
|
|
140
|
+
markitdown --use-plugins file.pdf -o output.md
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Find plugins on GitHub with hashtag: `#markitdown-plugin`
|
|
144
|
+
|
|
145
|
+
## Optional Dependencies
|
|
146
|
+
|
|
147
|
+
Control which file formats you support:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
# Install specific formats
|
|
151
|
+
pip install 'markitdown[pdf, docx, pptx]'
|
|
152
|
+
|
|
153
|
+
# All available options:
|
|
154
|
+
# [all] - All optional dependencies
|
|
155
|
+
# [pptx] - PowerPoint files
|
|
156
|
+
# [docx] - Word documents
|
|
157
|
+
# [xlsx] - Excel spreadsheets
|
|
158
|
+
# [xls] - Older Excel files
|
|
159
|
+
# [pdf] - PDF documents
|
|
160
|
+
# [outlook] - Outlook messages
|
|
161
|
+
# [az-doc-intel] - Azure Document Intelligence
|
|
162
|
+
# [audio-transcription] - WAV and MP3 transcription
|
|
163
|
+
# [youtube-transcription] - YouTube video transcription
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## Common Use Cases
|
|
167
|
+
|
|
168
|
+
### 1. Convert Scientific Papers to Markdown
|
|
169
|
+
|
|
170
|
+
```python
|
|
171
|
+
from markitdown import MarkItDown
|
|
172
|
+
|
|
173
|
+
md = MarkItDown()
|
|
174
|
+
|
|
175
|
+
# Convert PDF paper
|
|
176
|
+
result = md.convert("research_paper.pdf")
|
|
177
|
+
with open("paper.md", "w") as f:
|
|
178
|
+
f.write(result.text_content)
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### 2. Extract Data from Excel for Analysis
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
from markitdown import MarkItDown
|
|
185
|
+
|
|
186
|
+
md = MarkItDown()
|
|
187
|
+
result = md.convert("data.xlsx")
|
|
188
|
+
|
|
189
|
+
# Result will be in Markdown table format
|
|
190
|
+
print(result.text_content)
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### 3. Process Multiple Documents
|
|
194
|
+
|
|
195
|
+
```python
|
|
196
|
+
from markitdown import MarkItDown
|
|
197
|
+
import os
|
|
198
|
+
from pathlib import Path
|
|
199
|
+
|
|
200
|
+
md = MarkItDown()
|
|
201
|
+
|
|
202
|
+
# Process all PDFs in a directory
|
|
203
|
+
pdf_dir = Path("papers/")
|
|
204
|
+
output_dir = Path("markdown_output/")
|
|
205
|
+
output_dir.mkdir(exist_ok=True)
|
|
206
|
+
|
|
207
|
+
for pdf_file in pdf_dir.glob("*.pdf"):
|
|
208
|
+
result = md.convert(str(pdf_file))
|
|
209
|
+
output_file = output_dir / f"{pdf_file.stem}.md"
|
|
210
|
+
output_file.write_text(result.text_content)
|
|
211
|
+
print(f"Converted: {pdf_file.name}")
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### 4. Convert PowerPoint with AI Descriptions
|
|
215
|
+
|
|
216
|
+
```python
|
|
217
|
+
from markitdown import MarkItDown
|
|
218
|
+
from openai import OpenAI
|
|
219
|
+
|
|
220
|
+
# Use OpenRouter for access to multiple AI models
|
|
221
|
+
client = OpenAI(
|
|
222
|
+
api_key="your-openrouter-api-key",
|
|
223
|
+
base_url="https://openrouter.ai/api/v1"
|
|
224
|
+
)
|
|
225
|
+
|
|
226
|
+
md = MarkItDown(
|
|
227
|
+
llm_client=client,
|
|
228
|
+
llm_model="anthropic/claude-sonnet-4.5", # recommended for presentations
|
|
229
|
+
llm_prompt="Describe this slide image in detail, focusing on key visual elements and data"
|
|
230
|
+
)
|
|
231
|
+
|
|
232
|
+
result = md.convert("presentation.pptx")
|
|
233
|
+
with open("presentation.md", "w") as f:
|
|
234
|
+
f.write(result.text_content)
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### 5. Batch Convert with Different Formats
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
from markitdown import MarkItDown
|
|
241
|
+
from pathlib import Path
|
|
242
|
+
|
|
243
|
+
md = MarkItDown()
|
|
244
|
+
|
|
245
|
+
# Files to convert
|
|
246
|
+
files = [
|
|
247
|
+
"document.pdf",
|
|
248
|
+
"spreadsheet.xlsx",
|
|
249
|
+
"presentation.pptx",
|
|
250
|
+
"notes.docx"
|
|
251
|
+
]
|
|
252
|
+
|
|
253
|
+
for file in files:
|
|
254
|
+
try:
|
|
255
|
+
result = md.convert(file)
|
|
256
|
+
output = Path(file).stem + ".md"
|
|
257
|
+
with open(output, "w") as f:
|
|
258
|
+
f.write(result.text_content)
|
|
259
|
+
print(f"✓ Converted {file}")
|
|
260
|
+
except Exception as e:
|
|
261
|
+
print(f"✗ Error converting {file}: {e}")
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### 6. Extract YouTube Video Transcription
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
from markitdown import MarkItDown
|
|
268
|
+
|
|
269
|
+
md = MarkItDown()
|
|
270
|
+
|
|
271
|
+
# Convert YouTube video to transcript
|
|
272
|
+
result = md.convert("https://www.youtube.com/watch?v=VIDEO_ID")
|
|
273
|
+
print(result.text_content)
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
## Docker Usage
|
|
277
|
+
|
|
278
|
+
```bash
|
|
279
|
+
# Build image
|
|
280
|
+
docker build -t markitdown:latest .
|
|
281
|
+
|
|
282
|
+
# Run conversion
|
|
283
|
+
docker run --rm -i markitdown:latest < ~/document.pdf > output.md
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
## Best Practices
|
|
287
|
+
|
|
288
|
+
### 1. Choose the Right Conversion Method
|
|
289
|
+
|
|
290
|
+
- **Simple documents**: Use basic `MarkItDown()`
|
|
291
|
+
- **Complex PDFs**: Use Azure Document Intelligence
|
|
292
|
+
- **Visual content**: Enable AI image descriptions
|
|
293
|
+
- **Scanned documents**: Ensure OCR dependencies are installed
|
|
294
|
+
|
|
295
|
+
### 2. Handle Errors Gracefully
|
|
296
|
+
|
|
297
|
+
```python
|
|
298
|
+
from markitdown import MarkItDown
|
|
299
|
+
|
|
300
|
+
md = MarkItDown()
|
|
301
|
+
|
|
302
|
+
try:
|
|
303
|
+
result = md.convert("document.pdf")
|
|
304
|
+
print(result.text_content)
|
|
305
|
+
except FileNotFoundError:
|
|
306
|
+
print("File not found")
|
|
307
|
+
except Exception as e:
|
|
308
|
+
print(f"Conversion error: {e}")
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### 3. Process Large Files Efficiently
|
|
312
|
+
|
|
313
|
+
```python
|
|
314
|
+
from markitdown import MarkItDown
|
|
315
|
+
|
|
316
|
+
md = MarkItDown()
|
|
317
|
+
|
|
318
|
+
# For large files, use streaming
|
|
319
|
+
with open("large_file.pdf", "rb") as f:
|
|
320
|
+
result = md.convert_stream(f, file_extension=".pdf")
|
|
321
|
+
|
|
322
|
+
# Process in chunks or save directly
|
|
323
|
+
with open("output.md", "w") as out:
|
|
324
|
+
out.write(result.text_content)
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### 4. Optimize for Token Efficiency
|
|
328
|
+
|
|
329
|
+
Markdown output is already token-efficient, but you can:
|
|
330
|
+
- Remove excessive whitespace
|
|
331
|
+
- Consolidate similar sections
|
|
332
|
+
- Strip metadata if not needed
|
|
333
|
+
|
|
334
|
+
```python
|
|
335
|
+
from markitdown import MarkItDown
|
|
336
|
+
import re
|
|
337
|
+
|
|
338
|
+
md = MarkItDown()
|
|
339
|
+
result = md.convert("document.pdf")
|
|
340
|
+
|
|
341
|
+
# Clean up extra whitespace
|
|
342
|
+
clean_text = re.sub(r'\n{3,}', '\n\n', result.text_content)
|
|
343
|
+
clean_text = clean_text.strip()
|
|
344
|
+
|
|
345
|
+
print(clean_text)
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
## Integration with Scientific Workflows
|
|
349
|
+
|
|
350
|
+
### Convert Literature for Review
|
|
351
|
+
|
|
352
|
+
```python
|
|
353
|
+
from markitdown import MarkItDown
|
|
354
|
+
from pathlib import Path
|
|
355
|
+
|
|
356
|
+
md = MarkItDown()
|
|
357
|
+
|
|
358
|
+
# Convert all papers in literature folder
|
|
359
|
+
papers_dir = Path("literature/pdfs")
|
|
360
|
+
output_dir = Path("literature/markdown")
|
|
361
|
+
output_dir.mkdir(exist_ok=True)
|
|
362
|
+
|
|
363
|
+
for paper in papers_dir.glob("*.pdf"):
|
|
364
|
+
result = md.convert(str(paper))
|
|
365
|
+
|
|
366
|
+
# Save with metadata
|
|
367
|
+
output_file = output_dir / f"{paper.stem}.md"
|
|
368
|
+
content = f"# {paper.stem}\n\n"
|
|
369
|
+
content += f"**Source**: {paper.name}\n\n"
|
|
370
|
+
content += "---\n\n"
|
|
371
|
+
content += result.text_content
|
|
372
|
+
|
|
373
|
+
output_file.write_text(content)
|
|
374
|
+
|
|
375
|
+
# For AI-enhanced conversion with figures
|
|
376
|
+
from openai import OpenAI
|
|
377
|
+
|
|
378
|
+
client = OpenAI(
|
|
379
|
+
api_key="your-openrouter-api-key",
|
|
380
|
+
base_url="https://openrouter.ai/api/v1"
|
|
381
|
+
)
|
|
382
|
+
|
|
383
|
+
md_ai = MarkItDown(
|
|
384
|
+
llm_client=client,
|
|
385
|
+
llm_model="anthropic/claude-sonnet-4.5",
|
|
386
|
+
llm_prompt="Describe scientific figures with technical precision"
|
|
387
|
+
)
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
### Extract Tables for Analysis
|
|
391
|
+
|
|
392
|
+
```python
|
|
393
|
+
from markitdown import MarkItDown
|
|
394
|
+
import re
|
|
395
|
+
|
|
396
|
+
md = MarkItDown()
|
|
397
|
+
result = md.convert("data_tables.xlsx")
|
|
398
|
+
|
|
399
|
+
# Markdown tables can be parsed or used directly
|
|
400
|
+
print(result.text_content)
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
## Troubleshooting
|
|
404
|
+
|
|
405
|
+
### Common Issues
|
|
406
|
+
|
|
407
|
+
1. **Missing dependencies**: Install feature-specific packages
|
|
408
|
+
```bash
|
|
409
|
+
pip install 'markitdown[pdf]' # For PDF support
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
2. **Binary file errors**: Ensure files are opened in binary mode
|
|
413
|
+
```python
|
|
414
|
+
with open("file.pdf", "rb") as f: # Note the "rb"
|
|
415
|
+
result = md.convert_stream(f, file_extension=".pdf")
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
3. **OCR not working**: Install tesseract
|
|
419
|
+
```bash
|
|
420
|
+
# macOS
|
|
421
|
+
brew install tesseract
|
|
422
|
+
|
|
423
|
+
# Ubuntu
|
|
424
|
+
sudo apt-get install tesseract-ocr
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
## Performance Considerations
|
|
428
|
+
|
|
429
|
+
- **PDF files**: Large PDFs may take time; consider page ranges if supported
|
|
430
|
+
- **Image OCR**: OCR processing is CPU-intensive
|
|
431
|
+
- **Audio transcription**: Requires additional compute resources
|
|
432
|
+
- **AI image descriptions**: Requires API calls (costs may apply)
|
|
433
|
+
|
|
434
|
+
## Next Steps
|
|
435
|
+
|
|
436
|
+
- See `references/api_reference.md` for complete API documentation
|
|
437
|
+
- Check `references/file_formats.md` for format-specific details
|
|
438
|
+
- Review `scripts/batch_convert.py` for automation examples
|
|
439
|
+
- Explore `scripts/convert_with_ai.py` for AI-enhanced conversions
|
|
440
|
+
|
|
441
|
+
## Resources
|
|
442
|
+
|
|
443
|
+
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
|
|
444
|
+
- **PyPI**: https://pypi.org/project/markitdown/
|
|
445
|
+
- **OpenRouter**: https://openrouter.ai (for AI-enhanced conversions)
|
|
446
|
+
- **OpenRouter API Keys**: https://openrouter.ai/keys
|
|
447
|
+
- **OpenRouter Models**: https://openrouter.ai/models
|
|
448
|
+
- **MCP Server**: markitdown-mcp (for Claude Desktop integration)
|
|
449
|
+
- **Plugin Development**: See `packages/markitdown-sample-plugin`
|
|
450
|
+
|