natural-pdf 0.1.14__tar.gz → 0.1.16__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (250) hide show
  1. natural_pdf-0.1.16/.pre-commit-config.yaml +12 -0
  2. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/CLAUDE.md +1 -1
  3. {natural_pdf-0.1.14/natural_pdf.egg-info → natural_pdf-0.1.16}/PKG-INFO +27 -45
  4. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/api/index.md +1 -1
  5. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/installation/index.md +6 -32
  6. natural_pdf-0.1.16/docs/layout-analysis/index.ipynb +961 -0
  7. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/layout-analysis/index.md +33 -1
  8. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/ocr/index.md +4 -9
  9. natural_pdf-0.1.16/docs/tables/index.ipynb +665 -0
  10. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tables/index.md +1 -1
  11. natural_pdf-0.1.16/docs/tutorials/01-loading-and-extraction.ipynb +328 -0
  12. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/01-loading-and-extraction.md +0 -4
  13. natural_pdf-0.1.16/docs/tutorials/02-finding-elements.ipynb +352 -0
  14. natural_pdf-0.1.16/docs/tutorials/03-extracting-blocks.ipynb +159 -0
  15. natural_pdf-0.1.16/docs/tutorials/04-table-extraction.ipynb +579 -0
  16. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/04-table-extraction.md +22 -1
  17. natural_pdf-0.1.16/docs/tutorials/05-excluding-content.ipynb +8402 -0
  18. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.ipynb +28 -28
  19. natural_pdf-0.1.16/docs/tutorials/07-layout-analysis.ipynb +630 -0
  20. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/07-layout-analysis.md +21 -6
  21. natural_pdf-0.1.16/docs/tutorials/07-working-with-regions.ipynb +477 -0
  22. natural_pdf-0.1.16/docs/tutorials/08-spatial-navigation.ipynb +520 -0
  23. natural_pdf-0.1.16/docs/tutorials/09-section-extraction.ipynb +2270 -0
  24. natural_pdf-0.1.16/docs/tutorials/10-form-field-extraction.ipynb +496 -0
  25. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.ipynb +6 -6
  26. natural_pdf-0.1.16/docs/tutorials/12-ocr-integration.ipynb +4129 -0
  27. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/12-ocr-integration.md +30 -28
  28. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.ipynb +173 -173
  29. natural_pdf-0.1.16/docs/tutorials/14-categorizing-documents.ipynb +2142 -0
  30. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/__init__.py +31 -0
  31. natural_pdf-0.1.16/natural_pdf/analyzers/layout/gemini.py +265 -0
  32. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_manager.py +9 -5
  33. natural_pdf-0.1.16/natural_pdf/analyzers/layout/layout_options.py +179 -0
  34. natural_pdf-0.1.16/natural_pdf/analyzers/layout/paddle.py +450 -0
  35. natural_pdf-0.1.16/natural_pdf/analyzers/layout/table_structure_utils.py +78 -0
  36. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/shape_detection_mixin.py +770 -405
  37. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/classification/mixin.py +2 -8
  38. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/collections/pdf_collection.py +25 -30
  39. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/core/highlighting_service.py +47 -32
  40. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/core/page.py +226 -70
  41. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/core/pdf.py +19 -22
  42. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/base.py +9 -9
  43. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/collections.py +105 -50
  44. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/region.py +320 -113
  45. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/paddleocr.py +38 -13
  46. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/flows/__init__.py +3 -3
  47. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/flows/collections.py +303 -132
  48. natural_pdf-0.1.16/natural_pdf/flows/element.py +527 -0
  49. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/flows/flow.py +33 -16
  50. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/flows/region.py +142 -79
  51. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_doctr.py +37 -4
  52. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_easyocr.py +23 -3
  53. natural_pdf-0.1.16/natural_pdf/ocr/engine_paddle.py +409 -0
  54. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/engine_surya.py +8 -3
  55. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_manager.py +75 -76
  56. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_options.py +52 -87
  57. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/__init__.py +25 -12
  58. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/lancedb_search_service.py +91 -54
  59. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/numpy_search_service.py +86 -65
  60. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/searchable_mixin.py +2 -2
  61. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/selectors/parser.py +125 -81
  62. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/finetune/fine_tune_paddleocr.md +30 -20
  63. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/widgets/__init__.py +1 -1
  64. natural_pdf-0.1.16/natural_pdf/widgets/viewer.py +522 -0
  65. {natural_pdf-0.1.14 → natural_pdf-0.1.16/natural_pdf.egg-info}/PKG-INFO +27 -45
  66. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf.egg-info/SOURCES.txt +13 -1
  67. natural_pdf-0.1.16/natural_pdf.egg-info/requires.txt +79 -0
  68. natural_pdf-0.1.16/noxfile.py +87 -0
  69. natural_pdf-0.1.16/pdfs/30.pdf +0 -0
  70. natural_pdf-0.1.16/pdfs/image.png +0 -0
  71. natural_pdf-0.1.16/pdfs/image.png.pdf +0 -0
  72. natural_pdf-0.1.16/pdfs/red.pdf +0 -0
  73. natural_pdf-0.1.16/pdfs/tiny-ocr-2.pdf +0 -0
  74. natural_pdf-0.1.16/pdfs/tiny-ocr-3.pdf +0 -0
  75. natural_pdf-0.1.16/pdfs/tiny-ocr-small.jpg +0 -0
  76. natural_pdf-0.1.16/pdfs/tiny-ocr-wide.jpg +0 -0
  77. natural_pdf-0.1.16/pdfs/tiny-ocr.pdf +0 -0
  78. natural_pdf-0.1.16/pdfs/tiny.pdf +0 -0
  79. natural_pdf-0.1.16/pdfs/word-counter.pdf +0 -0
  80. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pyproject.toml +29 -55
  81. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/conftest.py +19 -12
  82. natural_pdf-0.1.16/tests/exporters/test_paddleocr_exporter.py +78 -0
  83. natural_pdf-0.1.16/tests/test_core/test_containment_geometry.py +35 -0
  84. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/test_core/test_elements.py +61 -55
  85. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/test_core/test_loading.py +12 -11
  86. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/test_core/test_spatial.py +101 -69
  87. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/test_core/test_text_extraction.py +27 -26
  88. natural_pdf-0.1.16/tests/test_optional_deps.py +173 -0
  89. natural_pdf-0.1.14/docs/layout-analysis/index.ipynb +0 -1974
  90. natural_pdf-0.1.14/docs/tables/index.ipynb +0 -662
  91. natural_pdf-0.1.14/docs/tutorials/01-loading-and-extraction.ipynb +0 -3081
  92. natural_pdf-0.1.14/docs/tutorials/02-finding-elements.ipynb +0 -352
  93. natural_pdf-0.1.14/docs/tutorials/03-extracting-blocks.ipynb +0 -159
  94. natural_pdf-0.1.14/docs/tutorials/04-table-extraction.ipynb +0 -209
  95. natural_pdf-0.1.14/docs/tutorials/05-excluding-content.ipynb +0 -8402
  96. natural_pdf-0.1.14/docs/tutorials/07-layout-analysis.ipynb +0 -272
  97. natural_pdf-0.1.14/docs/tutorials/07-working-with-regions.ipynb +0 -477
  98. natural_pdf-0.1.14/docs/tutorials/08-spatial-navigation.ipynb +0 -520
  99. natural_pdf-0.1.14/docs/tutorials/09-section-extraction.ipynb +0 -2474
  100. natural_pdf-0.1.14/docs/tutorials/10-form-field-extraction.ipynb +0 -496
  101. natural_pdf-0.1.14/docs/tutorials/12-ocr-integration.ipynb +0 -3448
  102. natural_pdf-0.1.14/docs/tutorials/14-categorizing-documents.ipynb +0 -2142
  103. natural_pdf-0.1.14/natural_pdf/analyzers/layout/gemini.py +0 -290
  104. natural_pdf-0.1.14/natural_pdf/analyzers/layout/layout_options.py +0 -109
  105. natural_pdf-0.1.14/natural_pdf/analyzers/layout/paddle.py +0 -297
  106. natural_pdf-0.1.14/natural_pdf/flows/element.py +0 -382
  107. natural_pdf-0.1.14/natural_pdf/ocr/engine_paddle.py +0 -158
  108. natural_pdf-0.1.14/natural_pdf/widgets/frontend/viewer.js +0 -88
  109. natural_pdf-0.1.14/natural_pdf/widgets/viewer.py +0 -766
  110. natural_pdf-0.1.14/natural_pdf.egg-info/requires.txt +0 -107
  111. natural_pdf-0.1.14/noxfile.py +0 -109
  112. natural_pdf-0.1.14/tests/exporters/test_paddleocr_exporter.py +0 -140
  113. natural_pdf-0.1.14/tests/test_core/test_containment_geometry.py +0 -26
  114. natural_pdf-0.1.14/tests/test_optional_deps.py +0 -259
  115. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/analysis_framework.mdc +0 -0
  116. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/coding-style.mdc +0 -0
  117. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/edit-md-instead-of-ipynb.mdc +0 -0
  118. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/minimal-comments.mdc +0 -0
  119. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/natural-pdf-overview.mdc +0 -0
  120. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.cursor/rules/user-friendly-library-code.mdc +0 -0
  121. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.github/workflows/docs.yml +0 -0
  122. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/.gitignore +0 -0
  123. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/01-execute_notebooks.py +0 -0
  124. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/02-run_all_tutorials.sh +0 -0
  125. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/LICENSE +0 -0
  126. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/MANIFEST.in +0 -0
  127. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/README.md +0 -0
  128. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/audit_packaging.py +0 -0
  129. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/check_run_md.sh +0 -0
  130. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/favicon.png +0 -0
  131. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/favicon.svg +0 -0
  132. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/javascripts/custom.js +0 -0
  133. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/logo.svg +0 -0
  134. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/sample-screen.png +0 -0
  135. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/social-preview.png +0 -0
  136. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/social-preview.svg +0 -0
  137. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/assets/stylesheets/custom.css +0 -0
  138. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/categorizing-documents/index.md +0 -0
  139. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/data-extraction/index.md +0 -0
  140. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/document-qa/index.ipynb +0 -0
  141. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/document-qa/index.md +0 -0
  142. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/element-selection/index.ipynb +0 -0
  143. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/element-selection/index.md +0 -0
  144. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/finetuning/index.md +0 -0
  145. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/index.md +0 -0
  146. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/interactive-widget/index.ipynb +0 -0
  147. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/interactive-widget/index.md +0 -0
  148. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/loops-and-groups/index.ipynb +0 -0
  149. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/loops-and-groups/index.md +0 -0
  150. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/pdf-navigation/index.ipynb +0 -0
  151. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/pdf-navigation/index.md +0 -0
  152. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/reflowing-pages/index.ipynb +0 -0
  153. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/reflowing-pages/index.md +0 -0
  154. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/regions/index.ipynb +0 -0
  155. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/regions/index.md +0 -0
  156. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/text-analysis/index.ipynb +0 -0
  157. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/text-analysis/index.md +0 -0
  158. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/text-extraction/index.ipynb +0 -0
  159. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/text-extraction/index.md +0 -0
  160. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/02-finding-elements.md +0 -0
  161. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/03-extracting-blocks.md +0 -0
  162. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/05-excluding-content.md +0 -0
  163. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/06-document-qa.md +0 -0
  164. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/07-working-with-regions.md +0 -0
  165. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/08-spatial-navigation.md +0 -0
  166. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/09-section-extraction.md +0 -0
  167. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/10-form-field-extraction.md +0 -0
  168. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/11-enhanced-table-processing.md +0 -0
  169. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/13-semantic-search.md +0 -0
  170. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/tutorials/14-categorizing-documents.md +0 -0
  171. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/visual-debugging/index.ipynb +0 -0
  172. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/visual-debugging/index.md +0 -0
  173. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/docs/visual-debugging/region.png +0 -0
  174. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/mkdocs.yml +0 -0
  175. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/__init__.py +0 -0
  176. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/__init__.py +0 -0
  177. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/base.py +0 -0
  178. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/docling.py +0 -0
  179. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/layout_analyzer.py +0 -0
  180. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/pdfplumber_table_finder.py +0 -0
  181. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/surya.py +0 -0
  182. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/tatr.py +0 -0
  183. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/layout/yolo.py +0 -0
  184. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_options.py +0 -0
  185. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/text_structure.py +0 -0
  186. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/analyzers/utils.py +0 -0
  187. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/classification/manager.py +0 -0
  188. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/classification/results.py +0 -0
  189. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/collections/mixins.py +0 -0
  190. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/core/__init__.py +0 -0
  191. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/core/element_manager.py +0 -0
  192. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/__init__.py +0 -0
  193. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/line.py +0 -0
  194. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/rect.py +0 -0
  195. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/elements/text.py +0 -0
  196. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/export/mixin.py +0 -0
  197. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/__init__.py +0 -0
  198. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/base.py +0 -0
  199. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/data/__init__.py +0 -0
  200. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/data/pdf.ttf +0 -0
  201. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/data/sRGB.icc +0 -0
  202. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr.py +0 -0
  203. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/hocr_font.py +0 -0
  204. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/original_pdf.py +0 -0
  205. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/exporters/searchable_pdf.py +0 -0
  206. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/extraction/manager.py +0 -0
  207. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/extraction/mixin.py +0 -0
  208. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/extraction/result.py +0 -0
  209. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/__init__.py +0 -0
  210. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/engine.py +0 -0
  211. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/ocr_factory.py +0 -0
  212. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/ocr/utils.py +0 -0
  213. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/qa/__init__.py +0 -0
  214. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/qa/document_qa.py +0 -0
  215. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/search_options.py +0 -0
  216. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/search/search_service_protocol.py +0 -0
  217. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/selectors/__init__.py +0 -0
  218. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/__init__.py +0 -0
  219. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/spa/css/style.css +0 -0
  220. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/spa/index.html +0 -0
  221. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/spa/js/app.js +0 -0
  222. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/templates/spa/words.txt +0 -0
  223. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/__init__.py +0 -0
  224. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/debug.py +0 -0
  225. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/highlighting.py +0 -0
  226. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/identifiers.py +0 -0
  227. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/locks.py +0 -0
  228. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/packaging.py +0 -0
  229. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/reading_order.py +0 -0
  230. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/text_extraction.py +0 -0
  231. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf/utils/visualization.py +0 -0
  232. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf.egg-info/dependency_links.txt +0 -0
  233. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/natural_pdf.egg-info/top_level.txt +0 -0
  234. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/.gitkeep +0 -0
  235. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/01-practice.pdf +0 -0
  236. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/0500000US42001.pdf +0 -0
  237. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/0500000US42007.pdf +0 -0
  238. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/2014 Statistics.pdf +0 -0
  239. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/2019 Statistics.pdf +0 -0
  240. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/Atlanta_Public_Schools_GA_sample.pdf +0 -0
  241. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/anexo_edital_6604_1743480-table.pdf +0 -0
  242. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/cia-doc.pdf +0 -0
  243. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/geometry.pdf +0 -0
  244. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/multicolumn.pdf +0 -0
  245. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/pdfs/needs-ocr.pdf +0 -0
  246. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/publish.sh +0 -0
  247. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/sample-screen.png +0 -0
  248. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/setup.cfg +0 -0
  249. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/tests/test_loading_original.py +0 -0
  250. {natural_pdf-0.1.14 → natural_pdf-0.1.16}/uv.lock +0 -0
@@ -0,0 +1,12 @@
1
+ repos:
2
+ - repo: https://github.com/psf/black
3
+ rev: 24.4.2
4
+ hooks:
5
+ - id: black
6
+ language_version: python3
7
+ - repo: https://github.com/pycqa/isort
8
+ rev: 5.13.2
9
+ hooks:
10
+ - id: isort
11
+ name: isort (python)
12
+ language_version: python3
@@ -865,7 +865,7 @@ data = table.extract_table()
865
865
 
866
866
  # Or explicitly specify the method to use
867
867
  data_tatr = table.extract_table(method='tatr') # Uses detected table structure
868
- data_plumber = table.extract_table(method='plumber') # Uses pdfplumber's algorithm
868
+ data_plumber = table.extract_table(method='pdfplumber') # Uses pdfplumber's algorithm
869
869
 
870
870
  # Work with table components directly
871
871
  rows = page.find_all('region[type=table-row][model=tatr]')
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: natural-pdf
3
- Version: 0.1.14
3
+ Version: 0.1.16
4
4
  Summary: A more intuitive interface for working with PDFs
5
5
  Author-email: Jonathan Soma <jonathan.soma@gmail.com>
6
6
  License-Expression: MIT
@@ -12,6 +12,7 @@ Requires-Python: >=3.9
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE
14
14
  Requires-Dist: pdfplumber
15
+ Requires-Dist: colormath2
15
16
  Requires-Dist: pillow
16
17
  Requires-Dist: colour
17
18
  Requires-Dist: numpy
@@ -21,47 +22,31 @@ Requires-Dist: pydantic
21
22
  Requires-Dist: jenkspy
22
23
  Requires-Dist: pikepdf>=9.7.0
23
24
  Requires-Dist: scipy
24
- Provides-Extra: viewer
25
- Requires-Dist: ipywidgets<9.0.0,>=7.0.0; extra == "viewer"
26
- Provides-Extra: easyocr
27
- Requires-Dist: easyocr; extra == "easyocr"
28
- Requires-Dist: natural-pdf[core-ml]; extra == "easyocr"
29
- Provides-Extra: paddle
30
- Requires-Dist: paddlepaddle; extra == "paddle"
31
- Requires-Dist: paddleocr; extra == "paddle"
32
- Provides-Extra: layout-yolo
33
- Requires-Dist: doclayout_yolo; extra == "layout-yolo"
34
- Requires-Dist: natural-pdf[core-ml]; extra == "layout-yolo"
35
- Provides-Extra: surya
36
- Requires-Dist: surya-ocr; extra == "surya"
37
- Requires-Dist: natural-pdf[core-ml]; extra == "surya"
38
- Provides-Extra: doctr
39
- Requires-Dist: python-doctr[torch]; extra == "doctr"
40
- Requires-Dist: natural-pdf[core-ml]; extra == "doctr"
41
- Provides-Extra: docling
42
- Requires-Dist: docling; extra == "docling"
43
- Requires-Dist: natural-pdf[core-ml]; extra == "docling"
44
- Provides-Extra: llm
45
- Requires-Dist: openai>=1.0; extra == "llm"
25
+ Requires-Dist: torch
26
+ Requires-Dist: torchvision
27
+ Requires-Dist: transformers[sentencepiece]<=4.34.1
28
+ Requires-Dist: huggingface_hub>=0.29.3
29
+ Requires-Dist: sentence-transformers
30
+ Requires-Dist: timm
46
31
  Provides-Extra: test
47
32
  Requires-Dist: pytest; extra == "test"
33
+ Requires-Dist: pytest-xdist; extra == "test"
34
+ Requires-Dist: setuptools; extra == "test"
48
35
  Provides-Extra: search
49
36
  Requires-Dist: lancedb; extra == "search"
50
37
  Requires-Dist: pyarrow; extra == "search"
51
38
  Provides-Extra: favorites
52
39
  Requires-Dist: natural-pdf[deskew]; extra == "favorites"
53
- Requires-Dist: natural-pdf[llm]; extra == "favorites"
54
- Requires-Dist: natural-pdf[surya]; extra == "favorites"
55
- Requires-Dist: natural-pdf[easyocr]; extra == "favorites"
56
- Requires-Dist: natural-pdf[layout_yolo]; extra == "favorites"
57
40
  Requires-Dist: natural-pdf[ocr-export]; extra == "favorites"
58
- Requires-Dist: natural-pdf[viewer]; extra == "favorites"
59
41
  Requires-Dist: natural-pdf[search]; extra == "favorites"
42
+ Requires-Dist: ipywidgets; extra == "favorites"
43
+ Requires-Dist: surya-ocr; extra == "favorites"
60
44
  Provides-Extra: dev
61
45
  Requires-Dist: black; extra == "dev"
62
46
  Requires-Dist: isort; extra == "dev"
63
47
  Requires-Dist: mypy; extra == "dev"
64
48
  Requires-Dist: pytest; extra == "dev"
49
+ Requires-Dist: pytest-xdist; extra == "dev"
65
50
  Requires-Dist: nox; extra == "dev"
66
51
  Requires-Dist: nox-uv; extra == "dev"
67
52
  Requires-Dist: build; extra == "dev"
@@ -71,31 +56,28 @@ Requires-Dist: nbformat; extra == "dev"
71
56
  Requires-Dist: jupytext; extra == "dev"
72
57
  Requires-Dist: nbclient; extra == "dev"
73
58
  Requires-Dist: ipykernel; extra == "dev"
59
+ Requires-Dist: pre-commit; extra == "dev"
60
+ Requires-Dist: setuptools; extra == "dev"
74
61
  Provides-Extra: deskew
75
62
  Requires-Dist: deskew>=1.5; extra == "deskew"
76
63
  Requires-Dist: img2pdf; extra == "deskew"
64
+ Provides-Extra: addons
65
+ Requires-Dist: surya-ocr; extra == "addons"
66
+ Requires-Dist: doclayout_yolo; extra == "addons"
67
+ Requires-Dist: paddlepaddle>=3.0.0; extra == "addons"
68
+ Requires-Dist: paddleocr>=3.0.0; extra == "addons"
69
+ Requires-Dist: ipywidgets>=7.0.0; extra == "addons"
70
+ Requires-Dist: easyocr; extra == "addons"
71
+ Requires-Dist: surya-ocr; extra == "addons"
72
+ Requires-Dist: doclayout_yolo; extra == "addons"
73
+ Requires-Dist: python-doctr[torch]; extra == "addons"
74
+ Requires-Dist: docling; extra == "addons"
77
75
  Provides-Extra: all
78
- Requires-Dist: natural-pdf[viewer]; extra == "all"
79
- Requires-Dist: natural-pdf[easyocr]; extra == "all"
80
- Requires-Dist: natural-pdf[paddle]; extra == "all"
81
- Requires-Dist: natural-pdf[layout_yolo]; extra == "all"
82
- Requires-Dist: natural-pdf[surya]; extra == "all"
83
- Requires-Dist: natural-pdf[doctr]; extra == "all"
84
76
  Requires-Dist: natural-pdf[ocr-export]; extra == "all"
85
- Requires-Dist: natural-pdf[docling]; extra == "all"
86
- Requires-Dist: natural-pdf[llm]; extra == "all"
87
- Requires-Dist: natural-pdf[core-ml]; extra == "all"
88
77
  Requires-Dist: natural-pdf[deskew]; extra == "all"
89
78
  Requires-Dist: natural-pdf[test]; extra == "all"
90
79
  Requires-Dist: natural-pdf[search]; extra == "all"
91
- Provides-Extra: core-ml
92
- Requires-Dist: torch; extra == "core-ml"
93
- Requires-Dist: torchvision; extra == "core-ml"
94
- Requires-Dist: transformers[sentencepiece]; extra == "core-ml"
95
- Requires-Dist: huggingface_hub; extra == "core-ml"
96
- Requires-Dist: sentence-transformers; extra == "core-ml"
97
- Requires-Dist: numpy; extra == "core-ml"
98
- Requires-Dist: timm; extra == "core-ml"
80
+ Requires-Dist: natural-pdf[addons]; extra == "all"
99
81
  Provides-Extra: ocr-export
100
82
  Requires-Dist: pikepdf; extra == "ocr-export"
101
83
  Provides-Extra: export-extras
@@ -107,7 +107,7 @@ class Region:
107
107
  | `save_image(path, resolution=72, crop_only=False)` | Save an image of just the region | `path`: Path to save image<br>`resolution`: Image resolution in DPI<br>`crop_only`: Whether to exclude border | `None` |
108
108
  | `get_sections(start_elements, end_elements=None, boundary_inclusion='start')` | Get sections within the region | `start_elements`: Elements marking section starts<br>`end_elements`: Elements marking section ends<br>`boundary_inclusion`: How to include boundaries | `list[Region]` |
109
109
  | `ask(question, min_confidence=0.0, model=None, debug=False)` | Ask a question about the region content | `question`: Question to ask<br>`min_confidence`: Minimum confidence threshold<br>`model`: Optional model name or path<br>`debug`: Whether to save debug files | `dict`: Result with answer and metadata |
110
- | `extract_table(method=None, table_settings=None, use_ocr=False)` | Extract table data from the region | `method`: Extraction method ("plumber", "tatr")<br>`table_settings`: Custom settings for extraction<br>`use_ocr`: Whether to use OCR text | `list`: Table data as rows and columns |
110
+ | `extract_table(method=None, table_settings=None, use_ocr=False)` | Extract table data from the region | `method`: Extraction method ("pdfplumber", "tatr")<br>`table_settings`: Custom settings for extraction<br>`use_ocr`: Whether to use OCR text | `list`: Table data as rows and columns |
111
111
  | `intersects(other)` | Check if this region intersects with another | `other`: Another region | `bool`: True if regions intersect |
112
112
  | `contains(x, y)` | Check if a point is within the region | `x`: X coordinate<br>`y`: Y coordinate | `bool`: True if point is in region |
113
113
 
@@ -12,50 +12,24 @@ pip install natural-pdf
12
12
 
13
13
  But! If you want to recognize text, do page layout analysis, document q-and-a or other things, you can install optional dependencies.
14
14
 
15
- ```bash
16
- # Install deskewing, OCR (surya and easyocr),
17
- # layout analysis (yolo), and interactive browsing
18
- pip install natural-pdf[favorites]
19
-
20
- # Install **everything**
21
- pip install natural-pdf[all]
22
- ```
23
-
24
-
25
- ### Optional Dependencies
26
-
27
15
  Natural PDF has modular dependencies for different features. Install them based on your needs:
28
16
 
29
17
  ```bash
30
- # Interactive PDF viewer
31
- pip install natural-pdf[viewer]
32
-
33
18
  # Deskewing
34
19
  pip install natural-pdf[deskew]
35
20
 
36
- # OCR options
37
- pip install natural-pdf[easyocr]
38
- pip install natural-pdf[surya]
39
- pip install natural-pdf[paddle]
40
- pip install natural-pdf[doctr]
41
-
42
- # Layout analysis
43
- pip install natural-pdf[surya]
44
- pip install natural-pdf[docling]
45
- pip install natural-pdf[layout_yolo]
46
- pip install natural-pdf[paddle]
47
-
48
- # AI stuff
49
- pip install natural-pdf[core-ml]
21
+ # LLM features (OpenAI)
50
22
  pip install natural-pdf[llm]
51
23
 
52
24
  # Semantic search
53
- pip install natural-pdf[core-ml]
25
+ pip install natural-pdf[search]
54
26
 
55
- # Install everything
56
- pip install natural-pdf[all]
27
+ # Install everything in the 'favorites' collection
28
+ pip install natural-pdf[favorites]
57
29
  ```
58
30
 
31
+ Other OCR and layout analysis engines like `surya`, `easyocr`, `paddle`, `doctr`, and `docling` can be installed via `pip` as needed. The library will provide you with an error message and installation command if you try to use an engine that isn't installed.
32
+
59
33
  ## Your First PDF Extraction
60
34
 
61
35
  Here's a quick example to make sure everything is working: