natural-pdf 0.1.15__tar.gz → 0.1.16__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (249) hide show
  1. natural_pdf-0.1.16/.pre-commit-config.yaml +12 -0
  2. {natural_pdf-0.1.15/natural_pdf.egg-info → natural_pdf-0.1.16}/PKG-INFO +27 -45
  3. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/installation/index.md +6 -32
  4. natural_pdf-0.1.16/docs/layout-analysis/index.ipynb +961 -0
  5. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/layout-analysis/index.md +32 -0
  6. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/ocr/index.md +4 -9
  7. natural_pdf-0.1.16/docs/tutorials/01-loading-and-extraction.ipynb +328 -0
  8. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/01-loading-and-extraction.md +0 -4
  9. natural_pdf-0.1.16/docs/tutorials/02-finding-elements.ipynb +352 -0
  10. natural_pdf-0.1.16/docs/tutorials/03-extracting-blocks.ipynb +159 -0
  11. natural_pdf-0.1.16/docs/tutorials/04-table-extraction.ipynb +579 -0
  12. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/04-table-extraction.md +22 -1
  13. natural_pdf-0.1.16/docs/tutorials/05-excluding-content.ipynb +8402 -0
  14. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.ipynb +30 -38
  15. natural_pdf-0.1.16/docs/tutorials/07-layout-analysis.ipynb +630 -0
  16. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/07-layout-analysis.md +21 -6
  17. natural_pdf-0.1.16/docs/tutorials/07-working-with-regions.ipynb +477 -0
  18. natural_pdf-0.1.16/docs/tutorials/08-spatial-navigation.ipynb +520 -0
  19. natural_pdf-0.1.16/docs/tutorials/09-section-extraction.ipynb +2270 -0
  20. natural_pdf-0.1.16/docs/tutorials/10-form-field-extraction.ipynb +496 -0
  21. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.ipynb +6 -6
  22. natural_pdf-0.1.16/docs/tutorials/12-ocr-integration.ipynb +4129 -0
  23. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/12-ocr-integration.md +30 -28
  24. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.ipynb +176 -184
  25. natural_pdf-0.1.16/docs/tutorials/14-categorizing-documents.ipynb +2142 -0
  26. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/__init__.py +31 -0
  27. natural_pdf-0.1.16/natural_pdf/analyzers/layout/gemini.py +265 -0
  28. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_manager.py +9 -5
  29. natural_pdf-0.1.16/natural_pdf/analyzers/layout/layout_options.py +179 -0
  30. natural_pdf-0.1.16/natural_pdf/analyzers/layout/paddle.py +450 -0
  31. natural_pdf-0.1.16/natural_pdf/analyzers/layout/table_structure_utils.py +78 -0
  32. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/shape_detection_mixin.py +770 -405
  33. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/mixin.py +2 -8
  34. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/collections/pdf_collection.py +25 -30
  35. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/highlighting_service.py +47 -32
  36. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/page.py +117 -75
  37. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/pdf.py +19 -22
  38. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/base.py +9 -9
  39. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/collections.py +105 -50
  40. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/region.py +200 -126
  41. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/paddleocr.py +38 -13
  42. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/__init__.py +3 -3
  43. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/collections.py +303 -132
  44. natural_pdf-0.1.16/natural_pdf/flows/element.py +527 -0
  45. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/flow.py +33 -16
  46. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/flows/region.py +142 -79
  47. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_doctr.py +37 -4
  48. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_easyocr.py +23 -3
  49. natural_pdf-0.1.16/natural_pdf/ocr/engine_paddle.py +409 -0
  50. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_surya.py +8 -3
  51. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_manager.py +75 -76
  52. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_options.py +52 -87
  53. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/__init__.py +25 -12
  54. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/lancedb_search_service.py +91 -54
  55. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/numpy_search_service.py +86 -65
  56. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/searchable_mixin.py +2 -2
  57. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/selectors/parser.py +125 -81
  58. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/finetune/fine_tune_paddleocr.md +30 -20
  59. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/widgets/__init__.py +1 -1
  60. natural_pdf-0.1.16/natural_pdf/widgets/viewer.py +522 -0
  61. {natural_pdf-0.1.15 → natural_pdf-0.1.16/natural_pdf.egg-info}/PKG-INFO +27 -45
  62. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/SOURCES.txt +12 -1
  63. natural_pdf-0.1.16/natural_pdf.egg-info/requires.txt +79 -0
  64. natural_pdf-0.1.16/noxfile.py +87 -0
  65. natural_pdf-0.1.16/pdfs/image.png +0 -0
  66. natural_pdf-0.1.16/pdfs/image.png.pdf +0 -0
  67. natural_pdf-0.1.16/pdfs/red.pdf +0 -0
  68. natural_pdf-0.1.16/pdfs/tiny-ocr-2.pdf +0 -0
  69. natural_pdf-0.1.16/pdfs/tiny-ocr-3.pdf +0 -0
  70. natural_pdf-0.1.16/pdfs/tiny-ocr-small.jpg +0 -0
  71. natural_pdf-0.1.16/pdfs/tiny-ocr-wide.jpg +0 -0
  72. natural_pdf-0.1.16/pdfs/tiny-ocr.pdf +0 -0
  73. natural_pdf-0.1.16/pdfs/tiny.pdf +0 -0
  74. natural_pdf-0.1.16/pdfs/word-counter.pdf +0 -0
  75. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pyproject.toml +29 -55
  76. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/conftest.py +19 -12
  77. natural_pdf-0.1.16/tests/exporters/test_paddleocr_exporter.py +78 -0
  78. natural_pdf-0.1.16/tests/test_core/test_containment_geometry.py +35 -0
  79. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_elements.py +61 -55
  80. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_loading.py +12 -11
  81. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_spatial.py +101 -69
  82. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_core/test_text_extraction.py +27 -26
  83. natural_pdf-0.1.16/tests/test_optional_deps.py +173 -0
  84. natural_pdf-0.1.15/docs/layout-analysis/index.ipynb +0 -897
  85. natural_pdf-0.1.15/docs/tutorials/01-loading-and-extraction.ipynb +0 -3089
  86. natural_pdf-0.1.15/docs/tutorials/02-finding-elements.ipynb +0 -375
  87. natural_pdf-0.1.15/docs/tutorials/03-extracting-blocks.ipynb +0 -167
  88. natural_pdf-0.1.15/docs/tutorials/04-table-extraction.ipynb +0 -217
  89. natural_pdf-0.1.15/docs/tutorials/05-excluding-content.ipynb +0 -8410
  90. natural_pdf-0.1.15/docs/tutorials/07-layout-analysis.ipynb +0 -280
  91. natural_pdf-0.1.15/docs/tutorials/07-working-with-regions.ipynb +0 -485
  92. natural_pdf-0.1.15/docs/tutorials/08-spatial-navigation.ipynb +0 -528
  93. natural_pdf-0.1.15/docs/tutorials/09-section-extraction.ipynb +0 -2482
  94. natural_pdf-0.1.15/docs/tutorials/10-form-field-extraction.ipynb +0 -504
  95. natural_pdf-0.1.15/docs/tutorials/12-ocr-integration.ipynb +0 -3565
  96. natural_pdf-0.1.15/docs/tutorials/14-categorizing-documents.ipynb +0 -2150
  97. natural_pdf-0.1.15/natural_pdf/analyzers/layout/gemini.py +0 -290
  98. natural_pdf-0.1.15/natural_pdf/analyzers/layout/layout_options.py +0 -109
  99. natural_pdf-0.1.15/natural_pdf/analyzers/layout/paddle.py +0 -297
  100. natural_pdf-0.1.15/natural_pdf/flows/element.py +0 -382
  101. natural_pdf-0.1.15/natural_pdf/ocr/engine_paddle.py +0 -158
  102. natural_pdf-0.1.15/natural_pdf/widgets/frontend/viewer.js +0 -88
  103. natural_pdf-0.1.15/natural_pdf/widgets/viewer.py +0 -766
  104. natural_pdf-0.1.15/natural_pdf.egg-info/requires.txt +0 -107
  105. natural_pdf-0.1.15/noxfile.py +0 -109
  106. natural_pdf-0.1.15/tests/exporters/test_paddleocr_exporter.py +0 -140
  107. natural_pdf-0.1.15/tests/test_core/test_containment_geometry.py +0 -26
  108. natural_pdf-0.1.15/tests/test_optional_deps.py +0 -259
  109. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/analysis_framework.mdc +0 -0
  110. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/coding-style.mdc +0 -0
  111. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/edit-md-instead-of-ipynb.mdc +0 -0
  112. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/minimal-comments.mdc +0 -0
  113. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/natural-pdf-overview.mdc +0 -0
  114. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.cursor/rules/user-friendly-library-code.mdc +0 -0
  115. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.github/workflows/docs.yml +0 -0
  116. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/.gitignore +0 -0
  117. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/01-execute_notebooks.py +0 -0
  118. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/02-run_all_tutorials.sh +0 -0
  119. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/CLAUDE.md +0 -0
  120. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/LICENSE +0 -0
  121. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/MANIFEST.in +0 -0
  122. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/README.md +0 -0
  123. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/audit_packaging.py +0 -0
  124. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/check_run_md.sh +0 -0
  125. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/api/index.md +0 -0
  126. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/favicon.png +0 -0
  127. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/favicon.svg +0 -0
  128. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/javascripts/custom.js +0 -0
  129. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/logo.svg +0 -0
  130. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/sample-screen.png +0 -0
  131. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/social-preview.png +0 -0
  132. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/social-preview.svg +0 -0
  133. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/assets/stylesheets/custom.css +0 -0
  134. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/categorizing-documents/index.md +0 -0
  135. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/data-extraction/index.md +0 -0
  136. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/document-qa/index.ipynb +0 -0
  137. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/document-qa/index.md +0 -0
  138. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/element-selection/index.ipynb +0 -0
  139. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/element-selection/index.md +0 -0
  140. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/finetuning/index.md +0 -0
  141. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/index.md +0 -0
  142. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/interactive-widget/index.ipynb +0 -0
  143. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/interactive-widget/index.md +0 -0
  144. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/loops-and-groups/index.ipynb +0 -0
  145. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/loops-and-groups/index.md +0 -0
  146. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/pdf-navigation/index.ipynb +0 -0
  147. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/pdf-navigation/index.md +0 -0
  148. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/reflowing-pages/index.ipynb +0 -0
  149. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/reflowing-pages/index.md +0 -0
  150. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/regions/index.ipynb +0 -0
  151. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/regions/index.md +0 -0
  152. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tables/index.ipynb +0 -0
  153. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tables/index.md +0 -0
  154. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-analysis/index.ipynb +0 -0
  155. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-analysis/index.md +0 -0
  156. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-extraction/index.ipynb +0 -0
  157. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/text-extraction/index.md +0 -0
  158. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/02-finding-elements.md +0 -0
  159. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/03-extracting-blocks.md +0 -0
  160. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/05-excluding-content.md +0 -0
  161. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.md +0 -0
  162. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/07-working-with-regions.md +0 -0
  163. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/08-spatial-navigation.md +0 -0
  164. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/09-section-extraction.md +0 -0
  165. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/10-form-field-extraction.md +0 -0
  166. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.md +0 -0
  167. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.md +0 -0
  168. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/tutorials/14-categorizing-documents.md +0 -0
  169. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/index.ipynb +0 -0
  170. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/index.md +0 -0
  171. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/docs/visual-debugging/region.png +0 -0
  172. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/mkdocs.yml +0 -0
  173. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/__init__.py +0 -0
  174. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/__init__.py +0 -0
  175. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/base.py +0 -0
  176. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/docling.py +0 -0
  177. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_analyzer.py +0 -0
  178. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/pdfplumber_table_finder.py +0 -0
  179. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/surya.py +0 -0
  180. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/tatr.py +0 -0
  181. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/yolo.py +0 -0
  182. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_options.py +0 -0
  183. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_structure.py +0 -0
  184. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/analyzers/utils.py +0 -0
  185. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/manager.py +0 -0
  186. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/classification/results.py +0 -0
  187. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/collections/mixins.py +0 -0
  188. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/__init__.py +0 -0
  189. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/core/element_manager.py +0 -0
  190. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/__init__.py +0 -0
  191. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/line.py +0 -0
  192. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/rect.py +0 -0
  193. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/elements/text.py +0 -0
  194. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/export/mixin.py +0 -0
  195. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/__init__.py +0 -0
  196. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/base.py +0 -0
  197. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/__init__.py +0 -0
  198. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/pdf.ttf +0 -0
  199. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/data/sRGB.icc +0 -0
  200. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr.py +0 -0
  201. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr_font.py +0 -0
  202. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/original_pdf.py +0 -0
  203. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/exporters/searchable_pdf.py +0 -0
  204. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/manager.py +0 -0
  205. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/mixin.py +0 -0
  206. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/extraction/result.py +0 -0
  207. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/__init__.py +0 -0
  208. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/engine.py +0 -0
  209. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_factory.py +0 -0
  210. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/ocr/utils.py +0 -0
  211. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/qa/__init__.py +0 -0
  212. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/qa/document_qa.py +0 -0
  213. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/search_options.py +0 -0
  214. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/search/search_service_protocol.py +0 -0
  215. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/selectors/__init__.py +0 -0
  216. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/__init__.py +0 -0
  217. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/css/style.css +0 -0
  218. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/index.html +0 -0
  219. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/js/app.js +0 -0
  220. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/templates/spa/words.txt +0 -0
  221. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/__init__.py +0 -0
  222. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/debug.py +0 -0
  223. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/highlighting.py +0 -0
  224. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/identifiers.py +0 -0
  225. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/locks.py +0 -0
  226. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/packaging.py +0 -0
  227. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/reading_order.py +0 -0
  228. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/text_extraction.py +0 -0
  229. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf/utils/visualization.py +0 -0
  230. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/dependency_links.txt +0 -0
  231. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/natural_pdf.egg-info/top_level.txt +0 -0
  232. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/.gitkeep +0 -0
  233. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/01-practice.pdf +0 -0
  234. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/0500000US42001.pdf +0 -0
  235. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/0500000US42007.pdf +0 -0
  236. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/2014 Statistics.pdf +0 -0
  237. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/2019 Statistics.pdf +0 -0
  238. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/30.pdf +0 -0
  239. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/Atlanta_Public_Schools_GA_sample.pdf +0 -0
  240. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/anexo_edital_6604_1743480-table.pdf +0 -0
  241. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/cia-doc.pdf +0 -0
  242. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/geometry.pdf +0 -0
  243. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/multicolumn.pdf +0 -0
  244. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/pdfs/needs-ocr.pdf +0 -0
  245. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/publish.sh +0 -0
  246. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/sample-screen.png +0 -0
  247. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/setup.cfg +0 -0
  248. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/tests/test_loading_original.py +0 -0
  249. {natural_pdf-0.1.15 → natural_pdf-0.1.16}/uv.lock +0 -0
@@ -0,0 +1,12 @@
1
+ repos:
2
+ - repo: https://github.com/psf/black
3
+ rev: 24.4.2
4
+ hooks:
5
+ - id: black
6
+ language_version: python3
7
+ - repo: https://github.com/pycqa/isort
8
+ rev: 5.13.2
9
+ hooks:
10
+ - id: isort
11
+ name: isort (python)
12
+ language_version: python3
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: natural-pdf
3
- Version: 0.1.15
3
+ Version: 0.1.16
4
4
  Summary: A more intuitive interface for working with PDFs
5
5
  Author-email: Jonathan Soma <jonathan.soma@gmail.com>
6
6
  License-Expression: MIT
@@ -12,6 +12,7 @@ Requires-Python: >=3.9
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE
14
14
  Requires-Dist: pdfplumber
15
+ Requires-Dist: colormath2
15
16
  Requires-Dist: pillow
16
17
  Requires-Dist: colour
17
18
  Requires-Dist: numpy
@@ -21,47 +22,31 @@ Requires-Dist: pydantic
21
22
  Requires-Dist: jenkspy
22
23
  Requires-Dist: pikepdf>=9.7.0
23
24
  Requires-Dist: scipy
24
- Provides-Extra: viewer
25
- Requires-Dist: ipywidgets<9.0.0,>=7.0.0; extra == "viewer"
26
- Provides-Extra: easyocr
27
- Requires-Dist: easyocr; extra == "easyocr"
28
- Requires-Dist: natural-pdf[core-ml]; extra == "easyocr"
29
- Provides-Extra: paddle
30
- Requires-Dist: paddlepaddle; extra == "paddle"
31
- Requires-Dist: paddleocr; extra == "paddle"
32
- Provides-Extra: layout-yolo
33
- Requires-Dist: doclayout_yolo; extra == "layout-yolo"
34
- Requires-Dist: natural-pdf[core-ml]; extra == "layout-yolo"
35
- Provides-Extra: surya
36
- Requires-Dist: surya-ocr; extra == "surya"
37
- Requires-Dist: natural-pdf[core-ml]; extra == "surya"
38
- Provides-Extra: doctr
39
- Requires-Dist: python-doctr[torch]; extra == "doctr"
40
- Requires-Dist: natural-pdf[core-ml]; extra == "doctr"
41
- Provides-Extra: docling
42
- Requires-Dist: docling; extra == "docling"
43
- Requires-Dist: natural-pdf[core-ml]; extra == "docling"
44
- Provides-Extra: llm
45
- Requires-Dist: openai>=1.0; extra == "llm"
25
+ Requires-Dist: torch
26
+ Requires-Dist: torchvision
27
+ Requires-Dist: transformers[sentencepiece]<=4.34.1
28
+ Requires-Dist: huggingface_hub>=0.29.3
29
+ Requires-Dist: sentence-transformers
30
+ Requires-Dist: timm
46
31
  Provides-Extra: test
47
32
  Requires-Dist: pytest; extra == "test"
33
+ Requires-Dist: pytest-xdist; extra == "test"
34
+ Requires-Dist: setuptools; extra == "test"
48
35
  Provides-Extra: search
49
36
  Requires-Dist: lancedb; extra == "search"
50
37
  Requires-Dist: pyarrow; extra == "search"
51
38
  Provides-Extra: favorites
52
39
  Requires-Dist: natural-pdf[deskew]; extra == "favorites"
53
- Requires-Dist: natural-pdf[llm]; extra == "favorites"
54
- Requires-Dist: natural-pdf[surya]; extra == "favorites"
55
- Requires-Dist: natural-pdf[easyocr]; extra == "favorites"
56
- Requires-Dist: natural-pdf[layout_yolo]; extra == "favorites"
57
40
  Requires-Dist: natural-pdf[ocr-export]; extra == "favorites"
58
- Requires-Dist: natural-pdf[viewer]; extra == "favorites"
59
41
  Requires-Dist: natural-pdf[search]; extra == "favorites"
42
+ Requires-Dist: ipywidgets; extra == "favorites"
43
+ Requires-Dist: surya-ocr; extra == "favorites"
60
44
  Provides-Extra: dev
61
45
  Requires-Dist: black; extra == "dev"
62
46
  Requires-Dist: isort; extra == "dev"
63
47
  Requires-Dist: mypy; extra == "dev"
64
48
  Requires-Dist: pytest; extra == "dev"
49
+ Requires-Dist: pytest-xdist; extra == "dev"
65
50
  Requires-Dist: nox; extra == "dev"
66
51
  Requires-Dist: nox-uv; extra == "dev"
67
52
  Requires-Dist: build; extra == "dev"
@@ -71,31 +56,28 @@ Requires-Dist: nbformat; extra == "dev"
71
56
  Requires-Dist: jupytext; extra == "dev"
72
57
  Requires-Dist: nbclient; extra == "dev"
73
58
  Requires-Dist: ipykernel; extra == "dev"
59
+ Requires-Dist: pre-commit; extra == "dev"
60
+ Requires-Dist: setuptools; extra == "dev"
74
61
  Provides-Extra: deskew
75
62
  Requires-Dist: deskew>=1.5; extra == "deskew"
76
63
  Requires-Dist: img2pdf; extra == "deskew"
64
+ Provides-Extra: addons
65
+ Requires-Dist: surya-ocr; extra == "addons"
66
+ Requires-Dist: doclayout_yolo; extra == "addons"
67
+ Requires-Dist: paddlepaddle>=3.0.0; extra == "addons"
68
+ Requires-Dist: paddleocr>=3.0.0; extra == "addons"
69
+ Requires-Dist: ipywidgets>=7.0.0; extra == "addons"
70
+ Requires-Dist: easyocr; extra == "addons"
71
+ Requires-Dist: surya-ocr; extra == "addons"
72
+ Requires-Dist: doclayout_yolo; extra == "addons"
73
+ Requires-Dist: python-doctr[torch]; extra == "addons"
74
+ Requires-Dist: docling; extra == "addons"
77
75
  Provides-Extra: all
78
- Requires-Dist: natural-pdf[viewer]; extra == "all"
79
- Requires-Dist: natural-pdf[easyocr]; extra == "all"
80
- Requires-Dist: natural-pdf[paddle]; extra == "all"
81
- Requires-Dist: natural-pdf[layout_yolo]; extra == "all"
82
- Requires-Dist: natural-pdf[surya]; extra == "all"
83
- Requires-Dist: natural-pdf[doctr]; extra == "all"
84
76
  Requires-Dist: natural-pdf[ocr-export]; extra == "all"
85
- Requires-Dist: natural-pdf[docling]; extra == "all"
86
- Requires-Dist: natural-pdf[llm]; extra == "all"
87
- Requires-Dist: natural-pdf[core-ml]; extra == "all"
88
77
  Requires-Dist: natural-pdf[deskew]; extra == "all"
89
78
  Requires-Dist: natural-pdf[test]; extra == "all"
90
79
  Requires-Dist: natural-pdf[search]; extra == "all"
91
- Provides-Extra: core-ml
92
- Requires-Dist: torch; extra == "core-ml"
93
- Requires-Dist: torchvision; extra == "core-ml"
94
- Requires-Dist: transformers[sentencepiece]; extra == "core-ml"
95
- Requires-Dist: huggingface_hub; extra == "core-ml"
96
- Requires-Dist: sentence-transformers; extra == "core-ml"
97
- Requires-Dist: numpy; extra == "core-ml"
98
- Requires-Dist: timm; extra == "core-ml"
80
+ Requires-Dist: natural-pdf[addons]; extra == "all"
99
81
  Provides-Extra: ocr-export
100
82
  Requires-Dist: pikepdf; extra == "ocr-export"
101
83
  Provides-Extra: export-extras
@@ -12,50 +12,24 @@ pip install natural-pdf
12
12
 
13
13
  But! If you want to recognize text, do page layout analysis, document q-and-a or other things, you can install optional dependencies.
14
14
 
15
- ```bash
16
- # Install deskewing, OCR (surya and easyocr),
17
- # layout analysis (yolo), and interactive browsing
18
- pip install natural-pdf[favorites]
19
-
20
- # Install **everything**
21
- pip install natural-pdf[all]
22
- ```
23
-
24
-
25
- ### Optional Dependencies
26
-
27
15
  Natural PDF has modular dependencies for different features. Install them based on your needs:
28
16
 
29
17
  ```bash
30
- # Interactive PDF viewer
31
- pip install natural-pdf[viewer]
32
-
33
18
  # Deskewing
34
19
  pip install natural-pdf[deskew]
35
20
 
36
- # OCR options
37
- pip install natural-pdf[easyocr]
38
- pip install natural-pdf[surya]
39
- pip install natural-pdf[paddle]
40
- pip install natural-pdf[doctr]
41
-
42
- # Layout analysis
43
- pip install natural-pdf[surya]
44
- pip install natural-pdf[docling]
45
- pip install natural-pdf[layout_yolo]
46
- pip install natural-pdf[paddle]
47
-
48
- # AI stuff
49
- pip install natural-pdf[core-ml]
21
+ # LLM features (OpenAI)
50
22
  pip install natural-pdf[llm]
51
23
 
52
24
  # Semantic search
53
- pip install natural-pdf[core-ml]
25
+ pip install natural-pdf[search]
54
26
 
55
- # Install everything
56
- pip install natural-pdf[all]
27
+ # Install everything in the 'favorites' collection
28
+ pip install natural-pdf[favorites]
57
29
  ```
58
30
 
31
+ Other OCR and layout analysis engines like `surya`, `easyocr`, `paddle`, `doctr`, and `docling` can be installed via `pip` as needed. The library will provide you with an error message and installation command if you try to use an engine that isn't installed.
32
+
59
33
  ## Your First PDF Extraction
60
34
 
61
35
  Here's a quick example to make sure everything is working: