scientific-writer 2.1.1__py3-none-any.whl → 2.2.2__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of scientific-writer might be problematic. Click here for more details.
- scientific_writer/.claude/settings.local.json +30 -0
- scientific_writer/.claude/skills/citation-management/SKILL.md +1046 -0
- scientific_writer/.claude/skills/citation-management/assets/bibtex_template.bib +264 -0
- scientific_writer/.claude/skills/citation-management/assets/citation_checklist.md +386 -0
- scientific_writer/.claude/skills/citation-management/references/bibtex_formatting.md +908 -0
- scientific_writer/.claude/skills/citation-management/references/citation_validation.md +794 -0
- scientific_writer/.claude/skills/citation-management/references/google_scholar_search.md +725 -0
- scientific_writer/.claude/skills/citation-management/references/metadata_extraction.md +870 -0
- scientific_writer/.claude/skills/citation-management/references/pubmed_search.md +839 -0
- scientific_writer/.claude/skills/citation-management/scripts/doi_to_bibtex.py +204 -0
- scientific_writer/.claude/skills/citation-management/scripts/extract_metadata.py +569 -0
- scientific_writer/.claude/skills/citation-management/scripts/format_bibtex.py +349 -0
- scientific_writer/.claude/skills/citation-management/scripts/search_google_scholar.py +282 -0
- scientific_writer/.claude/skills/citation-management/scripts/search_pubmed.py +398 -0
- scientific_writer/.claude/skills/citation-management/scripts/validate_citations.py +497 -0
- scientific_writer/.claude/skills/clinical-reports/IMPLEMENTATION_SUMMARY.md +641 -0
- scientific_writer/.claude/skills/clinical-reports/README.md +236 -0
- scientific_writer/.claude/skills/clinical-reports/SKILL.md +1088 -0
- scientific_writer/.claude/skills/clinical-reports/assets/case_report_template.md +352 -0
- scientific_writer/.claude/skills/clinical-reports/assets/clinical_trial_csr_template.md +353 -0
- scientific_writer/.claude/skills/clinical-reports/assets/clinical_trial_sae_template.md +359 -0
- scientific_writer/.claude/skills/clinical-reports/assets/consult_note_template.md +305 -0
- scientific_writer/.claude/skills/clinical-reports/assets/discharge_summary_template.md +453 -0
- scientific_writer/.claude/skills/clinical-reports/assets/hipaa_compliance_checklist.md +395 -0
- scientific_writer/.claude/skills/clinical-reports/assets/history_physical_template.md +305 -0
- scientific_writer/.claude/skills/clinical-reports/assets/lab_report_template.md +309 -0
- scientific_writer/.claude/skills/clinical-reports/assets/pathology_report_template.md +249 -0
- scientific_writer/.claude/skills/clinical-reports/assets/quality_checklist.md +338 -0
- scientific_writer/.claude/skills/clinical-reports/assets/radiology_report_template.md +318 -0
- scientific_writer/.claude/skills/clinical-reports/assets/soap_note_template.md +253 -0
- scientific_writer/.claude/skills/clinical-reports/references/case_report_guidelines.md +570 -0
- scientific_writer/.claude/skills/clinical-reports/references/clinical_trial_reporting.md +693 -0
- scientific_writer/.claude/skills/clinical-reports/references/data_presentation.md +530 -0
- scientific_writer/.claude/skills/clinical-reports/references/diagnostic_reports_standards.md +629 -0
- scientific_writer/.claude/skills/clinical-reports/references/medical_terminology.md +588 -0
- scientific_writer/.claude/skills/clinical-reports/references/patient_documentation.md +744 -0
- scientific_writer/.claude/skills/clinical-reports/references/peer_review_standards.md +585 -0
- scientific_writer/.claude/skills/clinical-reports/references/regulatory_compliance.md +577 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/check_deidentification.py +346 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/compliance_checker.py +78 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/extract_clinical_data.py +102 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/format_adverse_events.py +103 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/generate_report_template.py +163 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/terminology_validator.py +133 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/validate_case_report.py +334 -0
- scientific_writer/.claude/skills/clinical-reports/scripts/validate_trial_report.py +89 -0
- scientific_writer/.claude/skills/document-skills/docx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/docx/SKILL.md +197 -0
- scientific_writer/.claude/skills/document-skills/docx/docx-js.md +350 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/mce/mc.xsd +75 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/pack.py +159 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/unpack.py +29 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validate.py +69 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/__init__.py +15 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/base.py +951 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/docx.py +274 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/pptx.py +315 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml/scripts/validation/redlining.py +279 -0
- scientific_writer/.claude/skills/document-skills/docx/ooxml.md +610 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/__init__.py +1 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/document.py +1276 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/comments.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsExtended.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsExtensible.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/commentsIds.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/templates/people.xml +3 -0
- scientific_writer/.claude/skills/document-skills/docx/scripts/utilities.py +374 -0
- scientific_writer/.claude/skills/document-skills/pdf/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/pdf/SKILL.md +294 -0
- scientific_writer/.claude/skills/document-skills/pdf/forms.md +205 -0
- scientific_writer/.claude/skills/document-skills/pdf/reference.md +612 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_bounding_boxes.py +70 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_bounding_boxes_test.py +226 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/check_fillable_fields.py +12 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/convert_pdf_to_images.py +35 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/create_validation_image.py +41 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/extract_form_field_info.py +152 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/fill_fillable_fields.py +114 -0
- scientific_writer/.claude/skills/document-skills/pdf/scripts/fill_pdf_form_with_annotations.py +108 -0
- scientific_writer/.claude/skills/document-skills/pptx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/pptx/SKILL.md +484 -0
- scientific_writer/.claude/skills/document-skills/pptx/html2pptx.md +625 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/mce/mc.xsd +75 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/pack.py +159 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/unpack.py +29 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validate.py +69 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/__init__.py +15 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/base.py +951 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/docx.py +274 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/pptx.py +315 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml/scripts/validation/redlining.py +279 -0
- scientific_writer/.claude/skills/document-skills/pptx/ooxml.md +427 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/html2pptx.js +979 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/inventory.py +1020 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/rearrange.py +231 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/replace.py +385 -0
- scientific_writer/.claude/skills/document-skills/pptx/scripts/thumbnail.py +450 -0
- scientific_writer/.claude/skills/document-skills/xlsx/LICENSE.txt +30 -0
- scientific_writer/.claude/skills/document-skills/xlsx/SKILL.md +289 -0
- scientific_writer/.claude/skills/document-skills/xlsx/recalc.py +178 -0
- scientific_writer/.claude/skills/hypothesis-generation/SKILL.md +155 -0
- scientific_writer/.claude/skills/hypothesis-generation/assets/hypothesis_output_template.md +302 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/experimental_design_patterns.md +327 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/hypothesis_quality_criteria.md +196 -0
- scientific_writer/.claude/skills/hypothesis-generation/references/literature_search_strategies.md +505 -0
- scientific_writer/.claude/skills/latex-posters/README.md +417 -0
- scientific_writer/.claude/skills/latex-posters/SKILL.md +919 -0
- scientific_writer/.claude/skills/latex-posters/assets/baposter_template.tex +257 -0
- scientific_writer/.claude/skills/latex-posters/assets/beamerposter_template.tex +244 -0
- scientific_writer/.claude/skills/latex-posters/assets/poster_quality_checklist.md +358 -0
- scientific_writer/.claude/skills/latex-posters/assets/tikzposter_template.tex +251 -0
- scientific_writer/.claude/skills/latex-posters/references/latex_poster_packages.md +745 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_content_guide.md +748 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_design_principles.md +806 -0
- scientific_writer/.claude/skills/latex-posters/references/poster_layout_design.md +900 -0
- scientific_writer/.claude/skills/latex-posters/scripts/review_poster.sh +214 -0
- scientific_writer/.claude/skills/literature-review/SKILL.md +546 -0
- scientific_writer/.claude/skills/literature-review/assets/review_template.md +412 -0
- scientific_writer/.claude/skills/literature-review/references/citation_styles.md +166 -0
- scientific_writer/.claude/skills/literature-review/references/database_strategies.md +381 -0
- scientific_writer/.claude/skills/literature-review/scripts/generate_pdf.py +176 -0
- scientific_writer/.claude/skills/literature-review/scripts/search_databases.py +303 -0
- scientific_writer/.claude/skills/literature-review/scripts/verify_citations.py +222 -0
- scientific_writer/.claude/skills/markitdown/INSTALLATION_GUIDE.md +318 -0
- scientific_writer/.claude/skills/markitdown/LICENSE.txt +22 -0
- scientific_writer/.claude/skills/markitdown/OPENROUTER_INTEGRATION.md +359 -0
- scientific_writer/.claude/skills/markitdown/QUICK_REFERENCE.md +309 -0
- scientific_writer/.claude/skills/markitdown/README.md +184 -0
- scientific_writer/.claude/skills/markitdown/SKILL.md +450 -0
- scientific_writer/.claude/skills/markitdown/SKILL_SUMMARY.md +307 -0
- scientific_writer/.claude/skills/markitdown/assets/example_usage.md +463 -0
- scientific_writer/.claude/skills/markitdown/references/api_reference.md +399 -0
- scientific_writer/.claude/skills/markitdown/references/file_formats.md +542 -0
- scientific_writer/.claude/skills/markitdown/scripts/batch_convert.py +228 -0
- scientific_writer/.claude/skills/markitdown/scripts/convert_literature.py +283 -0
- scientific_writer/.claude/skills/markitdown/scripts/convert_with_ai.py +243 -0
- scientific_writer/.claude/skills/paper-2-web/SKILL.md +455 -0
- scientific_writer/.claude/skills/paper-2-web/references/installation.md +141 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2poster.md +346 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2video.md +305 -0
- scientific_writer/.claude/skills/paper-2-web/references/paper2web.md +187 -0
- scientific_writer/.claude/skills/paper-2-web/references/usage_examples.md +436 -0
- scientific_writer/.claude/skills/peer-review/SKILL.md +375 -0
- scientific_writer/.claude/skills/peer-review/references/common_issues.md +552 -0
- scientific_writer/.claude/skills/peer-review/references/reporting_standards.md +290 -0
- scientific_writer/.claude/skills/research-grants/README.md +285 -0
- scientific_writer/.claude/skills/research-grants/SKILL.md +896 -0
- scientific_writer/.claude/skills/research-grants/assets/budget_justification_template.md +453 -0
- scientific_writer/.claude/skills/research-grants/assets/nih_specific_aims_template.md +166 -0
- scientific_writer/.claude/skills/research-grants/assets/nsf_project_summary_template.md +92 -0
- scientific_writer/.claude/skills/research-grants/references/broader_impacts.md +392 -0
- scientific_writer/.claude/skills/research-grants/references/darpa_guidelines.md +636 -0
- scientific_writer/.claude/skills/research-grants/references/doe_guidelines.md +586 -0
- scientific_writer/.claude/skills/research-grants/references/nih_guidelines.md +851 -0
- scientific_writer/.claude/skills/research-grants/references/nsf_guidelines.md +570 -0
- scientific_writer/.claude/skills/research-grants/references/specific_aims_guide.md +458 -0
- scientific_writer/.claude/skills/research-lookup/README.md +116 -0
- scientific_writer/.claude/skills/research-lookup/SKILL.md +443 -0
- scientific_writer/.claude/skills/research-lookup/examples.py +174 -0
- scientific_writer/.claude/skills/research-lookup/lookup.py +93 -0
- scientific_writer/.claude/skills/research-lookup/research_lookup.py +335 -0
- scientific_writer/.claude/skills/research-lookup/scripts/research_lookup.py +261 -0
- scientific_writer/.claude/skills/scholar-evaluation/SKILL.md +254 -0
- scientific_writer/.claude/skills/scholar-evaluation/references/evaluation_framework.md +663 -0
- scientific_writer/.claude/skills/scholar-evaluation/scripts/calculate_scores.py +378 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/SKILL.md +530 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/common_biases.md +364 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/evidence_hierarchy.md +484 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/experimental_design.md +496 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/logical_fallacies.md +478 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/scientific_method.md +169 -0
- scientific_writer/.claude/skills/scientific-critical-thinking/references/statistical_pitfalls.md +506 -0
- scientific_writer/.claude/skills/scientific-schematics/SKILL.md +2035 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/block_diagram_template.tex +199 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/circuit_template.tex +159 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/flowchart_template.tex +161 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/pathway_template.tex +162 -0
- scientific_writer/.claude/skills/scientific-schematics/assets/tikz_styles.tex +422 -0
- scientific_writer/.claude/skills/scientific-schematics/references/best_practices.md +562 -0
- scientific_writer/.claude/skills/scientific-schematics/references/diagram_types.md +637 -0
- scientific_writer/.claude/skills/scientific-schematics/references/python_libraries.md +791 -0
- scientific_writer/.claude/skills/scientific-schematics/references/tikz_guide.md +734 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/circuit_generator.py +307 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/compile_tikz.py +292 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/generate_flowchart.py +281 -0
- scientific_writer/.claude/skills/scientific-schematics/scripts/pathway_diagram.py +406 -0
- scientific_writer/.claude/skills/scientific-writing/SKILL.md +443 -0
- scientific_writer/.claude/skills/scientific-writing/references/citation_styles.md +720 -0
- scientific_writer/.claude/skills/scientific-writing/references/figures_tables.md +806 -0
- scientific_writer/.claude/skills/scientific-writing/references/imrad_structure.md +658 -0
- scientific_writer/.claude/skills/scientific-writing/references/reporting_guidelines.md +748 -0
- scientific_writer/.claude/skills/scientific-writing/references/writing_principles.md +824 -0
- scientific_writer/.claude/skills/treatment-plans/README.md +483 -0
- scientific_writer/.claude/skills/treatment-plans/SKILL.md +817 -0
- scientific_writer/.claude/skills/treatment-plans/assets/chronic_disease_management_plan.tex +636 -0
- scientific_writer/.claude/skills/treatment-plans/assets/general_medical_treatment_plan.tex +616 -0
- scientific_writer/.claude/skills/treatment-plans/assets/mental_health_treatment_plan.tex +745 -0
- scientific_writer/.claude/skills/treatment-plans/assets/pain_management_plan.tex +770 -0
- scientific_writer/.claude/skills/treatment-plans/assets/perioperative_care_plan.tex +724 -0
- scientific_writer/.claude/skills/treatment-plans/assets/quality_checklist.md +471 -0
- scientific_writer/.claude/skills/treatment-plans/assets/rehabilitation_treatment_plan.tex +727 -0
- scientific_writer/.claude/skills/treatment-plans/references/goal_setting_frameworks.md +411 -0
- scientific_writer/.claude/skills/treatment-plans/references/intervention_guidelines.md +507 -0
- scientific_writer/.claude/skills/treatment-plans/references/regulatory_compliance.md +476 -0
- scientific_writer/.claude/skills/treatment-plans/references/specialty_specific_guidelines.md +607 -0
- scientific_writer/.claude/skills/treatment-plans/references/treatment_plan_standards.md +456 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/check_completeness.py +318 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/generate_template.py +244 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/timeline_generator.py +369 -0
- scientific_writer/.claude/skills/treatment-plans/scripts/validate_treatment_plan.py +367 -0
- scientific_writer/.claude/skills/venue-templates/SKILL.md +590 -0
- scientific_writer/.claude/skills/venue-templates/assets/grants/nih_specific_aims.tex +235 -0
- scientific_writer/.claude/skills/venue-templates/assets/grants/nsf_proposal_template.tex +375 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/nature_article.tex +171 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/neurips_article.tex +283 -0
- scientific_writer/.claude/skills/venue-templates/assets/journals/plos_one.tex +317 -0
- scientific_writer/.claude/skills/venue-templates/assets/posters/beamerposter_academic.tex +311 -0
- scientific_writer/.claude/skills/venue-templates/references/conferences_formatting.md +564 -0
- scientific_writer/.claude/skills/venue-templates/references/grants_requirements.md +787 -0
- scientific_writer/.claude/skills/venue-templates/references/journals_formatting.md +486 -0
- scientific_writer/.claude/skills/venue-templates/references/posters_guidelines.md +628 -0
- scientific_writer/.claude/skills/venue-templates/scripts/customize_template.py +206 -0
- scientific_writer/.claude/skills/venue-templates/scripts/query_template.py +260 -0
- scientific_writer/.claude/skills/venue-templates/scripts/validate_format.py +255 -0
- scientific_writer/CLAUDE.md +748 -0
- scientific_writer/__init__.py +2 -2
- scientific_writer/api.py +14 -7
- scientific_writer/cli.py +12 -7
- scientific_writer/core.py +27 -5
- {scientific_writer-2.1.1.dist-info → scientific_writer-2.2.2.dist-info}/METADATA +5 -1
- scientific_writer-2.2.2.dist-info/RECORD +312 -0
- scientific_writer-2.1.1.dist-info/RECORD +0 -11
- {scientific_writer-2.1.1.dist-info → scientific_writer-2.2.2.dist-info}/WHEEL +0 -0
- {scientific_writer-2.1.1.dist-info → scientific_writer-2.2.2.dist-info}/entry_points.txt +0 -0
- {scientific_writer-2.1.1.dist-info → scientific_writer-2.2.2.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,542 @@
|
|
|
1
|
+
# File Format Support
|
|
2
|
+
|
|
3
|
+
This document provides detailed information about each file format supported by MarkItDown.
|
|
4
|
+
|
|
5
|
+
## Document Formats
|
|
6
|
+
|
|
7
|
+
### PDF (.pdf)
|
|
8
|
+
|
|
9
|
+
**Capabilities**:
|
|
10
|
+
- Text extraction
|
|
11
|
+
- Table detection
|
|
12
|
+
- Metadata extraction
|
|
13
|
+
- OCR for scanned documents (with dependencies)
|
|
14
|
+
|
|
15
|
+
**Dependencies**:
|
|
16
|
+
```bash
|
|
17
|
+
pip install 'markitdown[pdf]'
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**Best For**:
|
|
21
|
+
- Scientific papers
|
|
22
|
+
- Reports
|
|
23
|
+
- Books
|
|
24
|
+
- Forms
|
|
25
|
+
|
|
26
|
+
**Limitations**:
|
|
27
|
+
- Complex layouts may not preserve perfect formatting
|
|
28
|
+
- Scanned PDFs require OCR setup
|
|
29
|
+
- Some PDF features (annotations, forms) may not convert
|
|
30
|
+
|
|
31
|
+
**Example**:
|
|
32
|
+
```python
|
|
33
|
+
from markitdown import MarkItDown
|
|
34
|
+
|
|
35
|
+
md = MarkItDown()
|
|
36
|
+
result = md.convert("research_paper.pdf")
|
|
37
|
+
print(result.text_content)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Enhanced with Azure Document Intelligence**:
|
|
41
|
+
```python
|
|
42
|
+
md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
|
|
43
|
+
result = md.convert("complex_layout.pdf")
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### Microsoft Word (.docx)
|
|
49
|
+
|
|
50
|
+
**Capabilities**:
|
|
51
|
+
- Text extraction
|
|
52
|
+
- Table conversion
|
|
53
|
+
- Heading hierarchy
|
|
54
|
+
- List formatting
|
|
55
|
+
- Basic text formatting (bold, italic)
|
|
56
|
+
|
|
57
|
+
**Dependencies**:
|
|
58
|
+
```bash
|
|
59
|
+
pip install 'markitdown[docx]'
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**Best For**:
|
|
63
|
+
- Research papers
|
|
64
|
+
- Reports
|
|
65
|
+
- Documentation
|
|
66
|
+
- Manuscripts
|
|
67
|
+
|
|
68
|
+
**Preserved Elements**:
|
|
69
|
+
- Headings (converted to Markdown headers)
|
|
70
|
+
- Tables (converted to Markdown tables)
|
|
71
|
+
- Lists (bulleted and numbered)
|
|
72
|
+
- Basic formatting (bold, italic)
|
|
73
|
+
- Paragraphs
|
|
74
|
+
|
|
75
|
+
**Example**:
|
|
76
|
+
```python
|
|
77
|
+
result = md.convert("manuscript.docx")
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
### PowerPoint (.pptx)
|
|
83
|
+
|
|
84
|
+
**Capabilities**:
|
|
85
|
+
- Slide content extraction
|
|
86
|
+
- Speaker notes
|
|
87
|
+
- Table extraction
|
|
88
|
+
- Image descriptions (with AI)
|
|
89
|
+
|
|
90
|
+
**Dependencies**:
|
|
91
|
+
```bash
|
|
92
|
+
pip install 'markitdown[pptx]'
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
**Best For**:
|
|
96
|
+
- Presentations
|
|
97
|
+
- Lecture slides
|
|
98
|
+
- Conference talks
|
|
99
|
+
|
|
100
|
+
**Output Format**:
|
|
101
|
+
```markdown
|
|
102
|
+
# Slide 1: Title
|
|
103
|
+
|
|
104
|
+
Content from slide 1...
|
|
105
|
+
|
|
106
|
+
**Notes**: Speaker notes appear here
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
# Slide 2: Next Topic
|
|
111
|
+
|
|
112
|
+
...
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**With AI Image Descriptions**:
|
|
116
|
+
```python
|
|
117
|
+
from openai import OpenAI
|
|
118
|
+
|
|
119
|
+
client = OpenAI()
|
|
120
|
+
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
|
|
121
|
+
result = md.convert("presentation.pptx")
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
### Excel (.xlsx, .xls)
|
|
127
|
+
|
|
128
|
+
**Capabilities**:
|
|
129
|
+
- Sheet extraction
|
|
130
|
+
- Table formatting
|
|
131
|
+
- Data preservation
|
|
132
|
+
- Formula values (calculated)
|
|
133
|
+
|
|
134
|
+
**Dependencies**:
|
|
135
|
+
```bash
|
|
136
|
+
pip install 'markitdown[xlsx]' # Modern Excel
|
|
137
|
+
pip install 'markitdown[xls]' # Legacy Excel
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
**Best For**:
|
|
141
|
+
- Data tables
|
|
142
|
+
- Research data
|
|
143
|
+
- Statistical results
|
|
144
|
+
- Experimental data
|
|
145
|
+
|
|
146
|
+
**Output Format**:
|
|
147
|
+
```markdown
|
|
148
|
+
# Sheet: Results
|
|
149
|
+
|
|
150
|
+
| Sample | Control | Treatment | P-value |
|
|
151
|
+
|--------|---------|-----------|---------|
|
|
152
|
+
| 1 | 10.2 | 12.5 | 0.023 |
|
|
153
|
+
| 2 | 9.8 | 11.9 | 0.031 |
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**Example**:
|
|
157
|
+
```python
|
|
158
|
+
result = md.convert("experimental_data.xlsx")
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Image Formats
|
|
164
|
+
|
|
165
|
+
### Images (.jpg, .jpeg, .png, .gif, .webp)
|
|
166
|
+
|
|
167
|
+
**Capabilities**:
|
|
168
|
+
- EXIF metadata extraction
|
|
169
|
+
- OCR text extraction
|
|
170
|
+
- AI-powered image descriptions
|
|
171
|
+
|
|
172
|
+
**Dependencies**:
|
|
173
|
+
```bash
|
|
174
|
+
pip install 'markitdown[all]' # Includes image support
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**Best For**:
|
|
178
|
+
- Scanned documents
|
|
179
|
+
- Charts and graphs
|
|
180
|
+
- Scientific diagrams
|
|
181
|
+
- Photographs with text
|
|
182
|
+
|
|
183
|
+
**Output Without AI**:
|
|
184
|
+
```markdown
|
|
185
|
+

|
|
186
|
+
|
|
187
|
+
**EXIF Data**:
|
|
188
|
+
- Camera: Canon EOS 5D
|
|
189
|
+
- Date: 2024-01-15
|
|
190
|
+
- Resolution: 4000x3000
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
**Output With AI**:
|
|
194
|
+
```python
|
|
195
|
+
from openai import OpenAI
|
|
196
|
+
|
|
197
|
+
client = OpenAI()
|
|
198
|
+
md = MarkItDown(
|
|
199
|
+
llm_client=client,
|
|
200
|
+
llm_model="gpt-4o",
|
|
201
|
+
llm_prompt="Describe this scientific diagram in detail"
|
|
202
|
+
)
|
|
203
|
+
result = md.convert("graph.png")
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
**OCR for Text Extraction**:
|
|
207
|
+
Requires Tesseract OCR:
|
|
208
|
+
```bash
|
|
209
|
+
# macOS
|
|
210
|
+
brew install tesseract
|
|
211
|
+
|
|
212
|
+
# Ubuntu
|
|
213
|
+
sudo apt-get install tesseract-ocr
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Audio Formats
|
|
219
|
+
|
|
220
|
+
### Audio (.wav, .mp3)
|
|
221
|
+
|
|
222
|
+
**Capabilities**:
|
|
223
|
+
- Metadata extraction
|
|
224
|
+
- Speech-to-text transcription
|
|
225
|
+
- Duration and technical info
|
|
226
|
+
|
|
227
|
+
**Dependencies**:
|
|
228
|
+
```bash
|
|
229
|
+
pip install 'markitdown[audio-transcription]'
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
**Best For**:
|
|
233
|
+
- Lecture recordings
|
|
234
|
+
- Interviews
|
|
235
|
+
- Podcasts
|
|
236
|
+
- Meeting recordings
|
|
237
|
+
|
|
238
|
+
**Output Format**:
|
|
239
|
+
```markdown
|
|
240
|
+
# Audio: interview.mp3
|
|
241
|
+
|
|
242
|
+
**Metadata**:
|
|
243
|
+
- Duration: 45:32
|
|
244
|
+
- Bitrate: 320kbps
|
|
245
|
+
- Sample Rate: 44100Hz
|
|
246
|
+
|
|
247
|
+
**Transcription**:
|
|
248
|
+
[Transcribed text appears here...]
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
**Example**:
|
|
252
|
+
```python
|
|
253
|
+
result = md.convert("lecture.mp3")
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Web Formats
|
|
259
|
+
|
|
260
|
+
### HTML (.html, .htm)
|
|
261
|
+
|
|
262
|
+
**Capabilities**:
|
|
263
|
+
- Clean HTML to Markdown conversion
|
|
264
|
+
- Link preservation
|
|
265
|
+
- Table conversion
|
|
266
|
+
- List formatting
|
|
267
|
+
|
|
268
|
+
**Best For**:
|
|
269
|
+
- Web pages
|
|
270
|
+
- Documentation
|
|
271
|
+
- Blog posts
|
|
272
|
+
- Online articles
|
|
273
|
+
|
|
274
|
+
**Output Format**: Clean Markdown with preserved links and structure
|
|
275
|
+
|
|
276
|
+
**Example**:
|
|
277
|
+
```python
|
|
278
|
+
result = md.convert("webpage.html")
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
### YouTube URLs
|
|
284
|
+
|
|
285
|
+
**Capabilities**:
|
|
286
|
+
- Fetch video transcriptions
|
|
287
|
+
- Extract video metadata
|
|
288
|
+
- Caption download
|
|
289
|
+
|
|
290
|
+
**Dependencies**:
|
|
291
|
+
```bash
|
|
292
|
+
pip install 'markitdown[youtube-transcription]'
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
**Best For**:
|
|
296
|
+
- Educational videos
|
|
297
|
+
- Lectures
|
|
298
|
+
- Talks
|
|
299
|
+
- Tutorials
|
|
300
|
+
|
|
301
|
+
**Example**:
|
|
302
|
+
```python
|
|
303
|
+
result = md.convert("https://www.youtube.com/watch?v=VIDEO_ID")
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## Data Formats
|
|
309
|
+
|
|
310
|
+
### CSV (.csv)
|
|
311
|
+
|
|
312
|
+
**Capabilities**:
|
|
313
|
+
- Automatic table conversion
|
|
314
|
+
- Delimiter detection
|
|
315
|
+
- Header preservation
|
|
316
|
+
|
|
317
|
+
**Output Format**: Markdown tables
|
|
318
|
+
|
|
319
|
+
**Example**:
|
|
320
|
+
```python
|
|
321
|
+
result = md.convert("data.csv")
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Output**:
|
|
325
|
+
```markdown
|
|
326
|
+
| Column1 | Column2 | Column3 |
|
|
327
|
+
|---------|---------|---------|
|
|
328
|
+
| Value1 | Value2 | Value3 |
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
### JSON (.json)
|
|
334
|
+
|
|
335
|
+
**Capabilities**:
|
|
336
|
+
- Structured representation
|
|
337
|
+
- Pretty formatting
|
|
338
|
+
- Nested data visualization
|
|
339
|
+
|
|
340
|
+
**Best For**:
|
|
341
|
+
- API responses
|
|
342
|
+
- Configuration files
|
|
343
|
+
- Data exports
|
|
344
|
+
|
|
345
|
+
**Example**:
|
|
346
|
+
```python
|
|
347
|
+
result = md.convert("data.json")
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
---
|
|
351
|
+
|
|
352
|
+
### XML (.xml)
|
|
353
|
+
|
|
354
|
+
**Capabilities**:
|
|
355
|
+
- Structure preservation
|
|
356
|
+
- Attribute extraction
|
|
357
|
+
- Formatted output
|
|
358
|
+
|
|
359
|
+
**Best For**:
|
|
360
|
+
- Configuration files
|
|
361
|
+
- Data interchange
|
|
362
|
+
- Structured documents
|
|
363
|
+
|
|
364
|
+
**Example**:
|
|
365
|
+
```python
|
|
366
|
+
result = md.convert("config.xml")
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## Archive Formats
|
|
372
|
+
|
|
373
|
+
### ZIP (.zip)
|
|
374
|
+
|
|
375
|
+
**Capabilities**:
|
|
376
|
+
- Iterates through archive contents
|
|
377
|
+
- Converts each file individually
|
|
378
|
+
- Maintains directory structure in output
|
|
379
|
+
|
|
380
|
+
**Best For**:
|
|
381
|
+
- Document collections
|
|
382
|
+
- Project archives
|
|
383
|
+
- Batch conversions
|
|
384
|
+
|
|
385
|
+
**Output Format**:
|
|
386
|
+
```markdown
|
|
387
|
+
# Archive: documents.zip
|
|
388
|
+
|
|
389
|
+
## File: document1.pdf
|
|
390
|
+
[Content from document1.pdf...]
|
|
391
|
+
|
|
392
|
+
---
|
|
393
|
+
|
|
394
|
+
## File: document2.docx
|
|
395
|
+
[Content from document2.docx...]
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
**Example**:
|
|
399
|
+
```python
|
|
400
|
+
result = md.convert("archive.zip")
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## E-book Formats
|
|
406
|
+
|
|
407
|
+
### EPUB (.epub)
|
|
408
|
+
|
|
409
|
+
**Capabilities**:
|
|
410
|
+
- Full text extraction
|
|
411
|
+
- Chapter structure
|
|
412
|
+
- Metadata extraction
|
|
413
|
+
|
|
414
|
+
**Best For**:
|
|
415
|
+
- E-books
|
|
416
|
+
- Digital publications
|
|
417
|
+
- Long-form content
|
|
418
|
+
|
|
419
|
+
**Output Format**: Markdown with preserved chapter structure
|
|
420
|
+
|
|
421
|
+
**Example**:
|
|
422
|
+
```python
|
|
423
|
+
result = md.convert("book.epub")
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
428
|
+
## Other Formats
|
|
429
|
+
|
|
430
|
+
### Outlook Messages (.msg)
|
|
431
|
+
|
|
432
|
+
**Capabilities**:
|
|
433
|
+
- Email content extraction
|
|
434
|
+
- Attachment listing
|
|
435
|
+
- Metadata (from, to, subject, date)
|
|
436
|
+
|
|
437
|
+
**Dependencies**:
|
|
438
|
+
```bash
|
|
439
|
+
pip install 'markitdown[outlook]'
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
**Best For**:
|
|
443
|
+
- Email archives
|
|
444
|
+
- Communication records
|
|
445
|
+
|
|
446
|
+
**Example**:
|
|
447
|
+
```python
|
|
448
|
+
result = md.convert("message.msg")
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
## Format-Specific Tips
|
|
454
|
+
|
|
455
|
+
### PDF Best Practices
|
|
456
|
+
|
|
457
|
+
1. **Use Azure Document Intelligence for complex layouts**:
|
|
458
|
+
```python
|
|
459
|
+
md = MarkItDown(docintel_endpoint="endpoint_url")
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
2. **For scanned PDFs, ensure OCR is set up**:
|
|
463
|
+
```bash
|
|
464
|
+
brew install tesseract # macOS
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
3. **Split very large PDFs before conversion** for better performance
|
|
468
|
+
|
|
469
|
+
### PowerPoint Best Practices
|
|
470
|
+
|
|
471
|
+
1. **Use AI for visual content**:
|
|
472
|
+
```python
|
|
473
|
+
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
2. **Check speaker notes** - they're included in output
|
|
477
|
+
|
|
478
|
+
3. **Complex animations won't be captured** - static content only
|
|
479
|
+
|
|
480
|
+
### Excel Best Practices
|
|
481
|
+
|
|
482
|
+
1. **Large spreadsheets** may take time to convert
|
|
483
|
+
|
|
484
|
+
2. **Formulas are converted to their calculated values**
|
|
485
|
+
|
|
486
|
+
3. **Multiple sheets** are all included in output
|
|
487
|
+
|
|
488
|
+
4. **Charts become text descriptions** (use AI for better descriptions)
|
|
489
|
+
|
|
490
|
+
### Image Best Practices
|
|
491
|
+
|
|
492
|
+
1. **Use AI for meaningful descriptions**:
|
|
493
|
+
```python
|
|
494
|
+
md = MarkItDown(
|
|
495
|
+
llm_client=client,
|
|
496
|
+
llm_model="gpt-4o",
|
|
497
|
+
llm_prompt="Describe this scientific figure in detail"
|
|
498
|
+
)
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
2. **For text-heavy images, ensure OCR dependencies** are installed
|
|
502
|
+
|
|
503
|
+
3. **High-resolution images** may take longer to process
|
|
504
|
+
|
|
505
|
+
### Audio Best Practices
|
|
506
|
+
|
|
507
|
+
1. **Clear audio** produces better transcriptions
|
|
508
|
+
|
|
509
|
+
2. **Long recordings** may take significant time
|
|
510
|
+
|
|
511
|
+
3. **Consider splitting long audio files** for faster processing
|
|
512
|
+
|
|
513
|
+
---
|
|
514
|
+
|
|
515
|
+
## Unsupported Formats
|
|
516
|
+
|
|
517
|
+
If you need to convert an unsupported format:
|
|
518
|
+
|
|
519
|
+
1. **Create a custom converter** (see `api_reference.md`)
|
|
520
|
+
2. **Look for plugins** on GitHub (#markitdown-plugin)
|
|
521
|
+
3. **Pre-convert to supported format** (e.g., convert .rtf to .docx)
|
|
522
|
+
|
|
523
|
+
---
|
|
524
|
+
|
|
525
|
+
## Format Detection
|
|
526
|
+
|
|
527
|
+
MarkItDown automatically detects format from:
|
|
528
|
+
|
|
529
|
+
1. **File extension** (primary method)
|
|
530
|
+
2. **MIME type** (fallback)
|
|
531
|
+
3. **File signature** (magic bytes, fallback)
|
|
532
|
+
|
|
533
|
+
**Override detection**:
|
|
534
|
+
```python
|
|
535
|
+
# Force specific format
|
|
536
|
+
result = md.convert("file_without_extension", file_extension=".pdf")
|
|
537
|
+
|
|
538
|
+
# With streams
|
|
539
|
+
with open("file", "rb") as f:
|
|
540
|
+
result = md.convert_stream(f, file_extension=".pdf")
|
|
541
|
+
```
|
|
542
|
+
|