natural-pdf 0.1.15__tar.gz → 0.1.16__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- natural_pdf-0.1.16/.pre-commit-config.yaml +12 -0
- {natural_pdf-0.1.15/natural_pdf.egg-info → natural_pdf-0.1.16}/PKG-INFO +27 -45
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/installation/index.md +6 -32
- natural_pdf-0.1.16/docs/layout-analysis/index.ipynb +961 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/layout-analysis/index.md +32 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/ocr/index.md +4 -9
- natural_pdf-0.1.16/docs/tutorials/01-loading-and-extraction.ipynb +328 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/01-loading-and-extraction.md +0 -4
- natural_pdf-0.1.16/docs/tutorials/02-finding-elements.ipynb +352 -0
- natural_pdf-0.1.16/docs/tutorials/03-extracting-blocks.ipynb +159 -0
- natural_pdf-0.1.16/docs/tutorials/04-table-extraction.ipynb +579 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/04-table-extraction.md +22 -1
- natural_pdf-0.1.16/docs/tutorials/05-excluding-content.ipynb +8402 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.ipynb +30 -38
- natural_pdf-0.1.16/docs/tutorials/07-layout-analysis.ipynb +630 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/07-layout-analysis.md +21 -6
- natural_pdf-0.1.16/docs/tutorials/07-working-with-regions.ipynb +477 -0
- natural_pdf-0.1.16/docs/tutorials/08-spatial-navigation.ipynb +520 -0
- natural_pdf-0.1.16/docs/tutorials/09-section-extraction.ipynb +2270 -0
- natural_pdf-0.1.16/docs/tutorials/10-form-field-extraction.ipynb +496 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.ipynb +6 -6
- natural_pdf-0.1.16/docs/tutorials/12-ocr-integration.ipynb +4129 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/12-ocr-integration.md +30 -28
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.ipynb +176 -184
- natural_pdf-0.1.16/docs/tutorials/14-categorizing-documents.ipynb +2142 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/__init__.py +31 -0
- natural_pdf-0.1.16/natural_pdf/analyzers/layout/gemini.py +265 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_manager.py +9 -5
- natural_pdf-0.1.16/natural_pdf/analyzers/layout/layout_options.py +179 -0
- natural_pdf-0.1.16/natural_pdf/analyzers/layout/paddle.py +450 -0
- natural_pdf-0.1.16/natural_pdf/analyzers/layout/table_structure_utils.py +78 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/shape_detection_mixin.py +770 -405
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/mixin.py +2 -8
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/collections/pdf_collection.py +25 -30
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/highlighting_service.py +47 -32
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/page.py +117 -75
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/pdf.py +19 -22
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/base.py +9 -9
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/collections.py +105 -50
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/region.py +200 -126
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/paddleocr.py +38 -13
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/__init__.py +3 -3
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/collections.py +303 -132
- natural_pdf-0.1.16/natural_pdf/flows/element.py +527 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/flow.py +33 -16
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/region.py +142 -79
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_doctr.py +37 -4
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_easyocr.py +23 -3
- natural_pdf-0.1.16/natural_pdf/ocr/engine_paddle.py +409 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_surya.py +8 -3
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_manager.py +75 -76
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_options.py +52 -87
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/__init__.py +25 -12
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/lancedb_search_service.py +91 -54
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/numpy_search_service.py +86 -65
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/searchable_mixin.py +2 -2
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/selectors/parser.py +125 -81
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/finetune/fine_tune_paddleocr.md +30 -20
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/widgets/__init__.py +1 -1
- natural_pdf-0.1.16/natural_pdf/widgets/viewer.py +522 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16/natural_pdf.egg-info}/PKG-INFO +27 -45
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/SOURCES.txt +12 -1
- natural_pdf-0.1.16/natural_pdf.egg-info/requires.txt +79 -0
- natural_pdf-0.1.16/noxfile.py +87 -0
- natural_pdf-0.1.16/pdfs/image.png +0 -0
- natural_pdf-0.1.16/pdfs/image.png.pdf +0 -0
- natural_pdf-0.1.16/pdfs/red.pdf +0 -0
- natural_pdf-0.1.16/pdfs/tiny-ocr-2.pdf +0 -0
- natural_pdf-0.1.16/pdfs/tiny-ocr-3.pdf +0 -0
- natural_pdf-0.1.16/pdfs/tiny-ocr-small.jpg +0 -0
- natural_pdf-0.1.16/pdfs/tiny-ocr-wide.jpg +0 -0
- natural_pdf-0.1.16/pdfs/tiny-ocr.pdf +0 -0
- natural_pdf-0.1.16/pdfs/tiny.pdf +0 -0
- natural_pdf-0.1.16/pdfs/word-counter.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pyproject.toml +29 -55
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/conftest.py +19 -12
- natural_pdf-0.1.16/tests/exporters/test_paddleocr_exporter.py +78 -0
- natural_pdf-0.1.16/tests/test_core/test_containment_geometry.py +35 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_elements.py +61 -55
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_loading.py +12 -11
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_spatial.py +101 -69
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_text_extraction.py +27 -26
- natural_pdf-0.1.16/tests/test_optional_deps.py +173 -0
- natural_pdf-0.1.15/docs/layout-analysis/index.ipynb +0 -897
- natural_pdf-0.1.15/docs/tutorials/01-loading-and-extraction.ipynb +0 -3089
- natural_pdf-0.1.15/docs/tutorials/02-finding-elements.ipynb +0 -375
- natural_pdf-0.1.15/docs/tutorials/03-extracting-blocks.ipynb +0 -167
- natural_pdf-0.1.15/docs/tutorials/04-table-extraction.ipynb +0 -217
- natural_pdf-0.1.15/docs/tutorials/05-excluding-content.ipynb +0 -8410
- natural_pdf-0.1.15/docs/tutorials/07-layout-analysis.ipynb +0 -280
- natural_pdf-0.1.15/docs/tutorials/07-working-with-regions.ipynb +0 -485
- natural_pdf-0.1.15/docs/tutorials/08-spatial-navigation.ipynb +0 -528
- natural_pdf-0.1.15/docs/tutorials/09-section-extraction.ipynb +0 -2482
- natural_pdf-0.1.15/docs/tutorials/10-form-field-extraction.ipynb +0 -504
- natural_pdf-0.1.15/docs/tutorials/12-ocr-integration.ipynb +0 -3565
- natural_pdf-0.1.15/docs/tutorials/14-categorizing-documents.ipynb +0 -2150
- natural_pdf-0.1.15/natural_pdf/analyzers/layout/gemini.py +0 -290
- natural_pdf-0.1.15/natural_pdf/analyzers/layout/layout_options.py +0 -109
- natural_pdf-0.1.15/natural_pdf/analyzers/layout/paddle.py +0 -297
- natural_pdf-0.1.15/natural_pdf/flows/element.py +0 -382
- natural_pdf-0.1.15/natural_pdf/ocr/engine_paddle.py +0 -158
- natural_pdf-0.1.15/natural_pdf/widgets/frontend/viewer.js +0 -88
- natural_pdf-0.1.15/natural_pdf/widgets/viewer.py +0 -766
- natural_pdf-0.1.15/natural_pdf.egg-info/requires.txt +0 -107
- natural_pdf-0.1.15/noxfile.py +0 -109
- natural_pdf-0.1.15/tests/exporters/test_paddleocr_exporter.py +0 -140
- natural_pdf-0.1.15/tests/test_core/test_containment_geometry.py +0 -26
- natural_pdf-0.1.15/tests/test_optional_deps.py +0 -259
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/analysis_framework.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/coding-style.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/edit-md-instead-of-ipynb.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/minimal-comments.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/natural-pdf-overview.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/user-friendly-library-code.mdc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.github/workflows/docs.yml +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.gitignore +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/01-execute_notebooks.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/02-run_all_tutorials.sh +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/CLAUDE.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/LICENSE +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/MANIFEST.in +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/README.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/audit_packaging.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/check_run_md.sh +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/api/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/favicon.png +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/favicon.svg +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/javascripts/custom.js +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/logo.svg +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/sample-screen.png +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/social-preview.png +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/social-preview.svg +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/stylesheets/custom.css +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/categorizing-documents/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/data-extraction/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/document-qa/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/document-qa/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/element-selection/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/element-selection/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/finetuning/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/interactive-widget/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/interactive-widget/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/loops-and-groups/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/loops-and-groups/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/pdf-navigation/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/pdf-navigation/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/reflowing-pages/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/reflowing-pages/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/regions/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/regions/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tables/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tables/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-analysis/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-analysis/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-extraction/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-extraction/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/02-finding-elements.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/03-extracting-blocks.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/05-excluding-content.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/07-working-with-regions.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/08-spatial-navigation.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/09-section-extraction.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/10-form-field-extraction.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/14-categorizing-documents.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/index.ipynb +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/index.md +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/region.png +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/mkdocs.yml +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/base.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/docling.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_analyzer.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/pdfplumber_table_finder.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/surya.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/tatr.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/yolo.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_options.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_structure.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/utils.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/manager.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/results.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/collections/mixins.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/element_manager.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/line.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/rect.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/text.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/export/mixin.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/base.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/pdf.ttf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/sRGB.icc +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr_font.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/original_pdf.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/searchable_pdf.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/manager.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/mixin.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/result.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_factory.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/utils.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/qa/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/qa/document_qa.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/search_options.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/search_service_protocol.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/selectors/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/css/style.css +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/index.html +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/js/app.js +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/words.txt +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/__init__.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/debug.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/highlighting.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/identifiers.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/locks.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/packaging.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/reading_order.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/text_extraction.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/visualization.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/dependency_links.txt +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/top_level.txt +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/.gitkeep +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/01-practice.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/0500000US42001.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/0500000US42007.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/2014 Statistics.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/2019 Statistics.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/30.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/Atlanta_Public_Schools_GA_sample.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/anexo_edital_6604_1743480-table.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/cia-doc.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/geometry.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/multicolumn.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/needs-ocr.pdf +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/publish.sh +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/sample-screen.png +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/setup.cfg +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_loading_original.py +0 -0
- {natural_pdf-0.1.15 → natural_pdf-0.1.16}/uv.lock +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: natural-pdf
|
3
|
-
Version: 0.1.
|
3
|
+
Version: 0.1.16
|
4
4
|
Summary: A more intuitive interface for working with PDFs
|
5
5
|
Author-email: Jonathan Soma <jonathan.soma@gmail.com>
|
6
6
|
License-Expression: MIT
|
@@ -12,6 +12,7 @@ Requires-Python: >=3.9
|
|
12
12
|
Description-Content-Type: text/markdown
|
13
13
|
License-File: LICENSE
|
14
14
|
Requires-Dist: pdfplumber
|
15
|
+
Requires-Dist: colormath2
|
15
16
|
Requires-Dist: pillow
|
16
17
|
Requires-Dist: colour
|
17
18
|
Requires-Dist: numpy
|
@@ -21,47 +22,31 @@ Requires-Dist: pydantic
|
|
21
22
|
Requires-Dist: jenkspy
|
22
23
|
Requires-Dist: pikepdf>=9.7.0
|
23
24
|
Requires-Dist: scipy
|
24
|
-
|
25
|
-
Requires-Dist:
|
26
|
-
|
27
|
-
Requires-Dist:
|
28
|
-
Requires-Dist:
|
29
|
-
|
30
|
-
Requires-Dist: paddlepaddle; extra == "paddle"
|
31
|
-
Requires-Dist: paddleocr; extra == "paddle"
|
32
|
-
Provides-Extra: layout-yolo
|
33
|
-
Requires-Dist: doclayout_yolo; extra == "layout-yolo"
|
34
|
-
Requires-Dist: natural-pdf[core-ml]; extra == "layout-yolo"
|
35
|
-
Provides-Extra: surya
|
36
|
-
Requires-Dist: surya-ocr; extra == "surya"
|
37
|
-
Requires-Dist: natural-pdf[core-ml]; extra == "surya"
|
38
|
-
Provides-Extra: doctr
|
39
|
-
Requires-Dist: python-doctr[torch]; extra == "doctr"
|
40
|
-
Requires-Dist: natural-pdf[core-ml]; extra == "doctr"
|
41
|
-
Provides-Extra: docling
|
42
|
-
Requires-Dist: docling; extra == "docling"
|
43
|
-
Requires-Dist: natural-pdf[core-ml]; extra == "docling"
|
44
|
-
Provides-Extra: llm
|
45
|
-
Requires-Dist: openai>=1.0; extra == "llm"
|
25
|
+
Requires-Dist: torch
|
26
|
+
Requires-Dist: torchvision
|
27
|
+
Requires-Dist: transformers[sentencepiece]<=4.34.1
|
28
|
+
Requires-Dist: huggingface_hub>=0.29.3
|
29
|
+
Requires-Dist: sentence-transformers
|
30
|
+
Requires-Dist: timm
|
46
31
|
Provides-Extra: test
|
47
32
|
Requires-Dist: pytest; extra == "test"
|
33
|
+
Requires-Dist: pytest-xdist; extra == "test"
|
34
|
+
Requires-Dist: setuptools; extra == "test"
|
48
35
|
Provides-Extra: search
|
49
36
|
Requires-Dist: lancedb; extra == "search"
|
50
37
|
Requires-Dist: pyarrow; extra == "search"
|
51
38
|
Provides-Extra: favorites
|
52
39
|
Requires-Dist: natural-pdf[deskew]; extra == "favorites"
|
53
|
-
Requires-Dist: natural-pdf[llm]; extra == "favorites"
|
54
|
-
Requires-Dist: natural-pdf[surya]; extra == "favorites"
|
55
|
-
Requires-Dist: natural-pdf[easyocr]; extra == "favorites"
|
56
|
-
Requires-Dist: natural-pdf[layout_yolo]; extra == "favorites"
|
57
40
|
Requires-Dist: natural-pdf[ocr-export]; extra == "favorites"
|
58
|
-
Requires-Dist: natural-pdf[viewer]; extra == "favorites"
|
59
41
|
Requires-Dist: natural-pdf[search]; extra == "favorites"
|
42
|
+
Requires-Dist: ipywidgets; extra == "favorites"
|
43
|
+
Requires-Dist: surya-ocr; extra == "favorites"
|
60
44
|
Provides-Extra: dev
|
61
45
|
Requires-Dist: black; extra == "dev"
|
62
46
|
Requires-Dist: isort; extra == "dev"
|
63
47
|
Requires-Dist: mypy; extra == "dev"
|
64
48
|
Requires-Dist: pytest; extra == "dev"
|
49
|
+
Requires-Dist: pytest-xdist; extra == "dev"
|
65
50
|
Requires-Dist: nox; extra == "dev"
|
66
51
|
Requires-Dist: nox-uv; extra == "dev"
|
67
52
|
Requires-Dist: build; extra == "dev"
|
@@ -71,31 +56,28 @@ Requires-Dist: nbformat; extra == "dev"
|
|
71
56
|
Requires-Dist: jupytext; extra == "dev"
|
72
57
|
Requires-Dist: nbclient; extra == "dev"
|
73
58
|
Requires-Dist: ipykernel; extra == "dev"
|
59
|
+
Requires-Dist: pre-commit; extra == "dev"
|
60
|
+
Requires-Dist: setuptools; extra == "dev"
|
74
61
|
Provides-Extra: deskew
|
75
62
|
Requires-Dist: deskew>=1.5; extra == "deskew"
|
76
63
|
Requires-Dist: img2pdf; extra == "deskew"
|
64
|
+
Provides-Extra: addons
|
65
|
+
Requires-Dist: surya-ocr; extra == "addons"
|
66
|
+
Requires-Dist: doclayout_yolo; extra == "addons"
|
67
|
+
Requires-Dist: paddlepaddle>=3.0.0; extra == "addons"
|
68
|
+
Requires-Dist: paddleocr>=3.0.0; extra == "addons"
|
69
|
+
Requires-Dist: ipywidgets>=7.0.0; extra == "addons"
|
70
|
+
Requires-Dist: easyocr; extra == "addons"
|
71
|
+
Requires-Dist: surya-ocr; extra == "addons"
|
72
|
+
Requires-Dist: doclayout_yolo; extra == "addons"
|
73
|
+
Requires-Dist: python-doctr[torch]; extra == "addons"
|
74
|
+
Requires-Dist: docling; extra == "addons"
|
77
75
|
Provides-Extra: all
|
78
|
-
Requires-Dist: natural-pdf[viewer]; extra == "all"
|
79
|
-
Requires-Dist: natural-pdf[easyocr]; extra == "all"
|
80
|
-
Requires-Dist: natural-pdf[paddle]; extra == "all"
|
81
|
-
Requires-Dist: natural-pdf[layout_yolo]; extra == "all"
|
82
|
-
Requires-Dist: natural-pdf[surya]; extra == "all"
|
83
|
-
Requires-Dist: natural-pdf[doctr]; extra == "all"
|
84
76
|
Requires-Dist: natural-pdf[ocr-export]; extra == "all"
|
85
|
-
Requires-Dist: natural-pdf[docling]; extra == "all"
|
86
|
-
Requires-Dist: natural-pdf[llm]; extra == "all"
|
87
|
-
Requires-Dist: natural-pdf[core-ml]; extra == "all"
|
88
77
|
Requires-Dist: natural-pdf[deskew]; extra == "all"
|
89
78
|
Requires-Dist: natural-pdf[test]; extra == "all"
|
90
79
|
Requires-Dist: natural-pdf[search]; extra == "all"
|
91
|
-
|
92
|
-
Requires-Dist: torch; extra == "core-ml"
|
93
|
-
Requires-Dist: torchvision; extra == "core-ml"
|
94
|
-
Requires-Dist: transformers[sentencepiece]; extra == "core-ml"
|
95
|
-
Requires-Dist: huggingface_hub; extra == "core-ml"
|
96
|
-
Requires-Dist: sentence-transformers; extra == "core-ml"
|
97
|
-
Requires-Dist: numpy; extra == "core-ml"
|
98
|
-
Requires-Dist: timm; extra == "core-ml"
|
80
|
+
Requires-Dist: natural-pdf[addons]; extra == "all"
|
99
81
|
Provides-Extra: ocr-export
|
100
82
|
Requires-Dist: pikepdf; extra == "ocr-export"
|
101
83
|
Provides-Extra: export-extras
|
@@ -12,50 +12,24 @@ pip install natural-pdf
|
|
12
12
|
|
13
13
|
But! If you want to recognize text, do page layout analysis, document q-and-a or other things, you can install optional dependencies.
|
14
14
|
|
15
|
-
```bash
|
16
|
-
# Install deskewing, OCR (surya and easyocr),
|
17
|
-
# layout analysis (yolo), and interactive browsing
|
18
|
-
pip install natural-pdf[favorites]
|
19
|
-
|
20
|
-
# Install **everything**
|
21
|
-
pip install natural-pdf[all]
|
22
|
-
```
|
23
|
-
|
24
|
-
|
25
|
-
### Optional Dependencies
|
26
|
-
|
27
15
|
Natural PDF has modular dependencies for different features. Install them based on your needs:
|
28
16
|
|
29
17
|
```bash
|
30
|
-
# Interactive PDF viewer
|
31
|
-
pip install natural-pdf[viewer]
|
32
|
-
|
33
18
|
# Deskewing
|
34
19
|
pip install natural-pdf[deskew]
|
35
20
|
|
36
|
-
#
|
37
|
-
pip install natural-pdf[easyocr]
|
38
|
-
pip install natural-pdf[surya]
|
39
|
-
pip install natural-pdf[paddle]
|
40
|
-
pip install natural-pdf[doctr]
|
41
|
-
|
42
|
-
# Layout analysis
|
43
|
-
pip install natural-pdf[surya]
|
44
|
-
pip install natural-pdf[docling]
|
45
|
-
pip install natural-pdf[layout_yolo]
|
46
|
-
pip install natural-pdf[paddle]
|
47
|
-
|
48
|
-
# AI stuff
|
49
|
-
pip install natural-pdf[core-ml]
|
21
|
+
# LLM features (OpenAI)
|
50
22
|
pip install natural-pdf[llm]
|
51
23
|
|
52
24
|
# Semantic search
|
53
|
-
pip install natural-pdf[
|
25
|
+
pip install natural-pdf[search]
|
54
26
|
|
55
|
-
# Install everything
|
56
|
-
pip install natural-pdf[
|
27
|
+
# Install everything in the 'favorites' collection
|
28
|
+
pip install natural-pdf[favorites]
|
57
29
|
```
|
58
30
|
|
31
|
+
Other OCR and layout analysis engines like `surya`, `easyocr`, `paddle`, `doctr`, and `docling` can be installed via `pip` as needed. The library will provide you with an error message and installation command if you try to use an engine that isn't installed.
|
32
|
+
|
59
33
|
## Your First PDF Extraction
|
60
34
|
|
61
35
|
Here's a quick example to make sure everything is working:
|