natural-pdf 0.1.1__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. natural_pdf-0.1.2/PKG-INFO +124 -0
  2. natural_pdf-0.1.2/README.md +81 -0
  3. natural_pdf-0.1.2/docs/assets/sample-screen.png +0 -0
  4. natural_pdf-0.1.2/docs/index.md +170 -0
  5. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/installation/index.md +1 -2
  6. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/regions/index.md +3 -8
  7. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/05-excluding-content.md +13 -10
  8. natural_pdf-0.1.2/natural_pdf/analyzers/layout/layout_analyzer.py +255 -0
  9. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/layout_manager.py +9 -6
  10. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/layout_options.py +2 -4
  11. natural_pdf-0.1.2/natural_pdf/analyzers/layout/surya.py +259 -0
  12. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/region.py +52 -25
  13. natural_pdf-0.1.2/natural_pdf.egg-info/PKG-INFO +124 -0
  14. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf.egg-info/SOURCES.txt +2 -0
  15. natural_pdf-0.1.2/notebooks/Examples.ipynb +1293 -0
  16. natural_pdf-0.1.2/sample-screen.png +0 -0
  17. natural_pdf-0.1.1/PKG-INFO +0 -295
  18. natural_pdf-0.1.1/README.md +0 -252
  19. natural_pdf-0.1.1/docs/index.md +0 -299
  20. natural_pdf-0.1.1/natural_pdf/analyzers/layout/layout_analyzer.py +0 -166
  21. natural_pdf-0.1.1/natural_pdf/analyzers/layout/surya.py +0 -151
  22. natural_pdf-0.1.1/natural_pdf.egg-info/PKG-INFO +0 -295
  23. natural_pdf-0.1.1/notebooks/Examples.ipynb +0 -1166
  24. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/.github/workflows/docs.yml +0 -0
  25. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/.gitignore +0 -0
  26. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/CLAUDE.md +0 -0
  27. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/LICENSE +0 -0
  28. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/MANIFEST.in +0 -0
  29. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/check_run_md.sh +0 -0
  30. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/api/index.md +0 -0
  31. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/favicon.png +0 -0
  32. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/favicon.svg +0 -0
  33. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/javascripts/custom.js +0 -0
  34. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/logo.svg +0 -0
  35. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/social-preview.png +0 -0
  36. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/social-preview.svg +0 -0
  37. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/assets/stylesheets/custom.css +0 -0
  38. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/document-qa/index.ipynb +0 -0
  39. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/document-qa/index.md +0 -0
  40. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/element-selection/index.ipynb +0 -0
  41. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/element-selection/index.md +0 -0
  42. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/interactive-widget/index.ipynb +0 -0
  43. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/interactive-widget/index.md +0 -0
  44. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/layout-analysis/index.ipynb +0 -0
  45. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/layout-analysis/index.md +0 -0
  46. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/ocr/index.md +0 -0
  47. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/pdf-navigation/index.ipynb +0 -0
  48. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/pdf-navigation/index.md +0 -0
  49. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/regions/index.ipynb +0 -0
  50. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tables/index.ipynb +0 -0
  51. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tables/index.md +0 -0
  52. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/text-analysis/index.ipynb +0 -0
  53. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/text-analysis/index.md +0 -0
  54. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/text-extraction/index.ipynb +0 -0
  55. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/text-extraction/index.md +0 -0
  56. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/01-loading-and-extraction.ipynb +0 -0
  57. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/01-loading-and-extraction.md +0 -0
  58. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/02-finding-elements.ipynb +0 -0
  59. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/02-finding-elements.md +0 -0
  60. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/03-extracting-blocks.ipynb +0 -0
  61. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/03-extracting-blocks.md +0 -0
  62. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/04-table-extraction.ipynb +0 -0
  63. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/04-table-extraction.md +0 -0
  64. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/05-excluding-content.ipynb +0 -0
  65. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/06-document-qa.ipynb +0 -0
  66. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/06-document-qa.md +0 -0
  67. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/07-layout-analysis.ipynb +0 -0
  68. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/07-layout-analysis.md +0 -0
  69. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/07-working-with-regions.ipynb +0 -0
  70. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/07-working-with-regions.md +0 -0
  71. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/08-spatial-navigation.ipynb +0 -0
  72. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/08-spatial-navigation.md +0 -0
  73. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/09-section-extraction.ipynb +0 -0
  74. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/09-section-extraction.md +0 -0
  75. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/10-form-field-extraction.ipynb +0 -0
  76. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/10-form-field-extraction.md +0 -0
  77. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/11-enhanced-table-processing.ipynb +0 -0
  78. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/11-enhanced-table-processing.md +0 -0
  79. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/12-ocr-integration.ipynb +0 -0
  80. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/tutorials/12-ocr-integration.md +0 -0
  81. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/visual-debugging/index.ipynb +0 -0
  82. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/visual-debugging/index.md +0 -0
  83. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/docs/visual-debugging/region.png +0 -0
  84. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/execute_notebooks.py +0 -0
  85. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/mkdocs.yml +0 -0
  86. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/__init__.py +0 -0
  87. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/__init__.py +0 -0
  88. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/__init__.py +0 -0
  89. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/base.py +0 -0
  90. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/docling.py +0 -0
  91. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/paddle.py +0 -0
  92. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/tatr.py +0 -0
  93. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/layout/yolo.py +0 -0
  94. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/text_options.py +0 -0
  95. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/text_structure.py +0 -0
  96. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/analyzers/utils.py +0 -0
  97. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/core/__init__.py +0 -0
  98. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/core/element_manager.py +0 -0
  99. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/core/highlighting_service.py +0 -0
  100. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/core/page.py +0 -0
  101. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/core/pdf.py +0 -0
  102. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/__init__.py +0 -0
  103. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/base.py +0 -0
  104. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/collections.py +0 -0
  105. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/line.py +0 -0
  106. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/rect.py +0 -0
  107. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/elements/text.py +0 -0
  108. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/__init__.py +0 -0
  109. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/engine.py +0 -0
  110. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/engine_easyocr.py +0 -0
  111. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/engine_paddle.py +0 -0
  112. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/engine_surya.py +0 -0
  113. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/ocr_manager.py +0 -0
  114. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/ocr/ocr_options.py +0 -0
  115. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/qa/__init__.py +0 -0
  116. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/qa/document_qa.py +0 -0
  117. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/selectors/__init__.py +0 -0
  118. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/selectors/parser.py +0 -0
  119. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/templates/__init__.py +0 -0
  120. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/templates/ocr_debug.html +0 -0
  121. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/utils/__init__.py +0 -0
  122. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/utils/highlighting.py +0 -0
  123. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/utils/reading_order.py +0 -0
  124. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/utils/visualization.py +0 -0
  125. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/widgets/__init__.py +0 -0
  126. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/widgets/frontend/viewer.js +0 -0
  127. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf/widgets/viewer.py +0 -0
  128. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf.egg-info/dependency_links.txt +0 -0
  129. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf.egg-info/requires.txt +0 -0
  130. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/natural_pdf.egg-info/top_level.txt +0 -0
  131. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/all_detected_regions.png +0 -0
  132. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/all_elements.png +0 -0
  133. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/basic_highlighting.png +0 -0
  134. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/chainable_layout.png +0 -0
  135. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/chained_analysis.png +0 -0
  136. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/color_names.png +0 -0
  137. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/color_names_with_boxes.png +0 -0
  138. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/conf_display_highlight_all.png +0 -0
  139. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/conf_display_highlight_layout.png +0 -0
  140. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/conf_display_layout_only.png +0 -0
  141. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/confidence_color_coded.png +0 -0
  142. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/debug_page_image.png +0 -0
  143. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/detected_table.png +0 -0
  144. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/dimension_analysis.txt +0 -0
  145. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/direct_ocr_debug.png +0 -0
  146. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/easyocr_debug_input.png +0 -0
  147. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/easyocr_results.png +0 -0
  148. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/easyocr_test_input.png +0 -0
  149. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/exclusion_optimization_regions.png +0 -0
  150. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/explicit_confidence_display.png +0 -0
  151. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/footer_overlap_test.png +0 -0
  152. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_all.png +0 -0
  153. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_all_styles.png +0 -0
  154. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_all_with_all_layouts.png +0 -0
  155. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_all_with_attrs.png +0 -0
  156. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_all_with_yolo.png +0 -0
  157. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_by_confidence.png +0 -0
  158. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_color_test_1.png +0 -0
  159. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_color_test_2.png +0 -0
  160. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_color_test_3.png +0 -0
  161. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_color_test_4.png +0 -0
  162. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_layout_method.png +0 -0
  163. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_multiple.png +0 -0
  164. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_no_attrs.png +0 -0
  165. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_region.png +0 -0
  166. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_single.png +0 -0
  167. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_specific_types.png +0 -0
  168. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_specific_types_with_boxes.png +0 -0
  169. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_specific_types_with_tables.png +0 -0
  170. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test.png +0 -0
  171. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_colors.png +0 -0
  172. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_individual.png +0 -0
  173. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_individual_annotated.png +0 -0
  174. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_individual_with_structure.png +0 -0
  175. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_individual_with_structure_yolo.png +0 -0
  176. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_test_individual_with_tables.png +0 -0
  177. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/highlight_with_attrs.png +0 -0
  178. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_conf_default.png +0 -0
  179. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_conf_high.png +0 -0
  180. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_detection.png +0 -0
  181. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_fix_test.png +0 -0
  182. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_fix_test2.png +0 -0
  183. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_fix_test3.png +0 -0
  184. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/layout_fix_test4.png +0 -0
  185. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/model_comparison.png +0 -0
  186. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/multiple_attributes_display.png +0 -0
  187. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_confidence_visualization.png +0 -0
  188. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_debug.png +0 -0
  189. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_debug_page.html +0 -0
  190. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_highlight_all_test.png +0 -0
  191. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_highlight_test.png +0 -0
  192. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_highlighted.png +0 -0
  193. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_simplified.png +0 -0
  194. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_threshold_comparison.png +0 -0
  195. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_visualization_clean.png +0 -0
  196. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_visualization_highlights.png +0 -0
  197. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/ocr_visualization_text.png +0 -0
  198. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddle_layout_detection.png +0 -0
  199. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddle_layout_polygons.png +0 -0
  200. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddle_layout_sources.png +0 -0
  201. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddle_layout_with_text.png +0 -0
  202. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddle_layout_without_text.png +0 -0
  203. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddleocr_highlights.png +0 -0
  204. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddleocr_results.png +0 -0
  205. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/paddleocr_test_input.png +0 -0
  206. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/page_1_for_ocr.png +0 -0
  207. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/page_4_for_ocr.png +0 -0
  208. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_exclusion_test.png +0 -0
  209. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_management_test.png +0 -0
  210. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_ocr_cropped.png +0 -0
  211. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_ocr_debug.png +0 -0
  212. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_ocr_full_page.png +0 -0
  213. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/region_ocr_highlighted.png +0 -0
  214. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/spatial_navigation.png +0 -0
  215. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/standard_highlight_all.png +0 -0
  216. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/table_no_ocr.csv +0 -0
  217. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/table_structure.png +0 -0
  218. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/table_structure_detail.png +0 -0
  219. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/table_with_ocr.csv +0 -0
  220. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/tatr_cells_test.png +0 -0
  221. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/tatr_ocr_table_test.png +0 -0
  222. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/tatr_regions.png +0 -0
  223. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/tatr_regions.txt +0 -0
  224. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/text_styles.png +0 -0
  225. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/titles_only.png +0 -0
  226. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/width_1200px.png +0 -0
  227. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/width_800px.png +0 -0
  228. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/width_default.png +0 -0
  229. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/width_with_scale.png +0 -0
  230. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/yolo_regions.png +0 -0
  231. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/output/yolo_regions.txt +0 -0
  232. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/.gitkeep +0 -0
  233. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/01-practice.pdf +0 -0
  234. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/0500000US42001.pdf +0 -0
  235. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/0500000US42007.pdf +0 -0
  236. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/2014 Statistics.pdf +0 -0
  237. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/2019 Statistics.pdf +0 -0
  238. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/Atlanta_Public_Schools_GA_sample.pdf +0 -0
  239. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pdfs/needs-ocr.pdf +0 -0
  240. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/publish.sh +0 -0
  241. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/pyproject.toml +0 -0
  242. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/run_all_tutorials.sh +0 -0
  243. {natural_pdf-0.1.1 → natural_pdf-0.1.2}/setup.cfg +0 -0
@@ -0,0 +1,124 @@
1
+ Metadata-Version: 2.4
2
+ Name: natural-pdf
3
+ Version: 0.1.2
4
+ Summary: A more intuitive interface for working with PDFs
5
+ Author-email: Jonathan Soma <jonathan.soma@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/jsoma/natural-pdf
8
+ Project-URL: Repository, https://github.com/jsoma/natural-pdf
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.7
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: pdfplumber>=0.7.0
15
+ Requires-Dist: Pillow>=8.0.0
16
+ Requires-Dist: colour>=0.1.5
17
+ Requires-Dist: numpy>=1.20.0
18
+ Requires-Dist: urllib3>=1.26.0
19
+ Requires-Dist: torch>=2.0.0
20
+ Requires-Dist: torchvision>=0.15.0
21
+ Requires-Dist: transformers>=4.30.0
22
+ Requires-Dist: huggingface_hub>=0.19.0
23
+ Provides-Extra: interactive
24
+ Requires-Dist: ipywidgets<9.0.0,>=7.0.0; extra == "interactive"
25
+ Provides-Extra: easyocr
26
+ Requires-Dist: easyocr; extra == "easyocr"
27
+ Provides-Extra: paddle
28
+ Requires-Dist: paddlepaddle; extra == "paddle"
29
+ Requires-Dist: paddleocr; extra == "paddle"
30
+ Provides-Extra: layout-yolo
31
+ Requires-Dist: doclayout_yolo; extra == "layout-yolo"
32
+ Provides-Extra: surya
33
+ Requires-Dist: surya-ocr; extra == "surya"
34
+ Provides-Extra: qa
35
+ Provides-Extra: all
36
+ Requires-Dist: ipywidgets<9.0.0,>=7.0.0; extra == "all"
37
+ Requires-Dist: easyocr; extra == "all"
38
+ Requires-Dist: paddlepaddle; extra == "all"
39
+ Requires-Dist: paddleocr; extra == "all"
40
+ Requires-Dist: doclayout_yolo; extra == "all"
41
+ Requires-Dist: surya-ocr; extra == "all"
42
+ Dynamic: license-file
43
+
44
+ # Natural PDF
45
+
46
+ A friendly library for working with PDFs, built on top of [pdfplumber](https://github.com/jsvine/pdfplumber).
47
+
48
+ Natural PDF lets you find and extract content from PDFs using simple code that makes sense.
49
+
50
+ - [Complete documentation here](https://jsoma.github.io/natural-pdf)
51
+ - [Live demos here](https://colab.research.google.com/github/jsoma/natural-pdf/)
52
+
53
+ <div style="max-width: 400px; margin: auto"><a href="sample-screen.png"><img src="sample-screen.png"></a></div>
54
+
55
+ ## Installation
56
+
57
+ ```bash
58
+ pip install natural-pdf
59
+ ```
60
+
61
+ For optional features like specific OCR engines, layout analysis models, or the interactive Jupyter widget, you can install extras:
62
+
63
+ ```bash
64
+ # Example: Install with EasyOCR support
65
+ pip install natural-pdf[easyocr]
66
+ pip install natural-pdf[surya]
67
+ pip install natural-pdf[paddle]
68
+
69
+ # Example: Install with interactive viewer support
70
+ pip install natural-pdf[interactive]
71
+
72
+ # Install everything
73
+ pip install natural-pdf[all]
74
+ ```
75
+
76
+ See the [installation guide](https://jsoma.github.io/natural-pdf/installation/) for more details on extras.
77
+
78
+ ## Quick Start
79
+
80
+ ```python
81
+ from natural_pdf import PDF
82
+
83
+ # Open a PDF
84
+ pdf = PDF('document.pdf')
85
+ page = pdf.pages[0]
86
+
87
+ # Find elements using CSS-like selectors
88
+ heading = page.find('text:contains("Summary"):bold')
89
+
90
+ # Extract content below the heading
91
+ content = heading.below().extract_text()
92
+ print("Content below Summary:", content[:100] + "...")
93
+
94
+ # Exclude headers/footers automatically (example)
95
+ # You might define these based on common text or position
96
+ page.add_exclusion(page.find('text:contains("CONFIDENTIAL")').above())
97
+ page.add_exclusion(page.find_all('line')[-1].below())
98
+
99
+ # Extract clean text from the page
100
+ clean_text = page.extract_text()
101
+ print("\nClean page text:", clean_text[:200] + "...")
102
+
103
+ # Highlight the heading and view the page
104
+ heading.highlight(color='red')
105
+ page.to_image()
106
+ ```
107
+
108
+ And as a fun bonus, `page.viewer()` will provide an interactive method to explore the PDF.
109
+
110
+ ## Key Features
111
+
112
+ Natural PDF offers a range of features for working with PDFs:
113
+
114
+ * **CSS-like Selectors:** Find elements using intuitive query strings (`page.find('text:bold')`).
115
+ * **Spatial Navigation:** Select content relative to other elements (`heading.below()`, `element.select_until(...)`).
116
+ * **Text & Table Extraction:** Get clean text or structured table data, automatically handling exclusions.
117
+ * **OCR Integration:** Extract text from scanned documents using engines like EasyOCR, PaddleOCR, or Surya.
118
+ * **Layout Analysis:** Detect document structures (titles, paragraphs, tables) using AI models.
119
+ * **Document QA:** Ask natural language questions about your document's content.
120
+ * **Visual Debugging:** Highlight elements and use an interactive viewer or save images to understand your selections.
121
+
122
+ ## Learn More
123
+
124
+ Dive deeper into the features and explore advanced usage in the [**Complete Documentation**](https://jsoma.github.io/natural-pdf).
@@ -0,0 +1,81 @@
1
+ # Natural PDF
2
+
3
+ A friendly library for working with PDFs, built on top of [pdfplumber](https://github.com/jsvine/pdfplumber).
4
+
5
+ Natural PDF lets you find and extract content from PDFs using simple code that makes sense.
6
+
7
+ - [Complete documentation here](https://jsoma.github.io/natural-pdf)
8
+ - [Live demos here](https://colab.research.google.com/github/jsoma/natural-pdf/)
9
+
10
+ <div style="max-width: 400px; margin: auto"><a href="sample-screen.png"><img src="sample-screen.png"></a></div>
11
+
12
+ ## Installation
13
+
14
+ ```bash
15
+ pip install natural-pdf
16
+ ```
17
+
18
+ For optional features like specific OCR engines, layout analysis models, or the interactive Jupyter widget, you can install extras:
19
+
20
+ ```bash
21
+ # Example: Install with EasyOCR support
22
+ pip install natural-pdf[easyocr]
23
+ pip install natural-pdf[surya]
24
+ pip install natural-pdf[paddle]
25
+
26
+ # Example: Install with interactive viewer support
27
+ pip install natural-pdf[interactive]
28
+
29
+ # Install everything
30
+ pip install natural-pdf[all]
31
+ ```
32
+
33
+ See the [installation guide](https://jsoma.github.io/natural-pdf/installation/) for more details on extras.
34
+
35
+ ## Quick Start
36
+
37
+ ```python
38
+ from natural_pdf import PDF
39
+
40
+ # Open a PDF
41
+ pdf = PDF('document.pdf')
42
+ page = pdf.pages[0]
43
+
44
+ # Find elements using CSS-like selectors
45
+ heading = page.find('text:contains("Summary"):bold')
46
+
47
+ # Extract content below the heading
48
+ content = heading.below().extract_text()
49
+ print("Content below Summary:", content[:100] + "...")
50
+
51
+ # Exclude headers/footers automatically (example)
52
+ # You might define these based on common text or position
53
+ page.add_exclusion(page.find('text:contains("CONFIDENTIAL")').above())
54
+ page.add_exclusion(page.find_all('line')[-1].below())
55
+
56
+ # Extract clean text from the page
57
+ clean_text = page.extract_text()
58
+ print("\nClean page text:", clean_text[:200] + "...")
59
+
60
+ # Highlight the heading and view the page
61
+ heading.highlight(color='red')
62
+ page.to_image()
63
+ ```
64
+
65
+ And as a fun bonus, `page.viewer()` will provide an interactive method to explore the PDF.
66
+
67
+ ## Key Features
68
+
69
+ Natural PDF offers a range of features for working with PDFs:
70
+
71
+ * **CSS-like Selectors:** Find elements using intuitive query strings (`page.find('text:bold')`).
72
+ * **Spatial Navigation:** Select content relative to other elements (`heading.below()`, `element.select_until(...)`).
73
+ * **Text & Table Extraction:** Get clean text or structured table data, automatically handling exclusions.
74
+ * **OCR Integration:** Extract text from scanned documents using engines like EasyOCR, PaddleOCR, or Surya.
75
+ * **Layout Analysis:** Detect document structures (titles, paragraphs, tables) using AI models.
76
+ * **Document QA:** Ask natural language questions about your document's content.
77
+ * **Visual Debugging:** Highlight elements and use an interactive viewer or save images to understand your selections.
78
+
79
+ ## Learn More
80
+
81
+ Dive deeper into the features and explore advanced usage in the [**Complete Documentation**](https://jsoma.github.io/natural-pdf).
@@ -0,0 +1,170 @@
1
+ # Natural PDF
2
+
3
+ A friendly library for working with PDFs, built on top of [pdfplumber](https://github.com/jsvine/pdfplumber).
4
+
5
+ Natural PDF lets you find and extract content from PDFs using simple code that makes sense.
6
+
7
+ - [Live demo here](https://colab.research.google.com/github/jsoma/natural-pdf/blob/main/notebooks/Examples.ipynb)
8
+
9
+ <div style="max-width: 400px; margin: auto"><a href="assets/sample-screen.png"><img src="assets/sample-screen.png"></a></div>
10
+
11
+ ## Installation
12
+
13
+ ```
14
+ pip install natural_pdf
15
+ # All the extras
16
+ pip install "natural_pdf[all]"
17
+ ```
18
+
19
+ ## Quick Example
20
+
21
+ ```python
22
+ from natural_pdf import PDF
23
+
24
+ pdf = PDF('document.pdf')
25
+ page = pdf.pages[0]
26
+
27
+ # Find the title and get content below it
28
+ title = page.find('text:contains("Summary"):bold')
29
+ content = title.below().extract_text()
30
+
31
+ # Exclude everything above 'CONFIDENTIAL' and below last line on page
32
+ page.add_exclusion(page.find('text:contains("CONFIDENTIAL")').above())
33
+ page.add_exclusion(page.find_all('line')[-1].below())
34
+
35
+ # Get the clean text without header/footer
36
+ clean_text = page.extract_text()
37
+ ```
38
+
39
+ ## Key Features
40
+
41
+ Here are a few highlights of what you can do:
42
+
43
+ ### Find Elements with Selectors
44
+
45
+ Use CSS-like selectors to find text, shapes, and more.
46
+
47
+ ```python
48
+ # Find bold text containing "Revenue"
49
+ page.find('text:contains("Revenue"):bold').extract_text()
50
+
51
+ # Find all large text
52
+ page.find_all('text[size>=12]').extract_text()
53
+ ```
54
+
55
+ [Learn more about selectors →](element-selection/index.ipynb)
56
+
57
+ ### Navigate Spatially
58
+
59
+ Move around the page relative to elements, not just coordinates.
60
+
61
+ ```python
62
+ # Extract text below a specific heading
63
+ intro_text = page.find('text:contains("Introduction")').below().extract_text()
64
+
65
+ # Extract text from one heading to the next
66
+ methods_text = page.find('text:contains("Methods")').below(
67
+ until='text:contains("Results")'
68
+ ).extract_text()
69
+ ```
70
+
71
+ [Explore more navigation methods →](pdf-navigation/index.ipynb)
72
+
73
+ ### Extract Clean Text
74
+
75
+ Easily extract text content, automatically handling common page elements like headers and footers (if exclusions are set).
76
+
77
+ ```python
78
+ # Extract all text from the page (respecting exclusions)
79
+ page_text = page.extract_text()
80
+
81
+ # Extract text from a specific region
82
+ some_region = page.find(...)
83
+ region_text = some_region.extract_text()
84
+ ```
85
+
86
+ [Learn about text extraction →](text-extraction/index.ipynb)
87
+ [Learn about exclusion zones →](regions/index.ipynb#exclusion-zones)
88
+
89
+ ### Apply OCR
90
+
91
+ Extract text from scanned documents using various OCR engines.
92
+
93
+ ```python
94
+ # Apply OCR using the default engine
95
+ ocr_elements = page.apply_ocr()
96
+
97
+ # Extract text (will use OCR results if available)
98
+ text = page.extract_text()
99
+ ```
100
+
101
+ [Explore OCR options →](ocr/index.md)
102
+
103
+ ### Analyze Document Layout
104
+
105
+ Use AI models to detect document structures like titles, paragraphs, and tables.
106
+
107
+ ```python
108
+ # Detect document structure
109
+ page.analyze_layout()
110
+
111
+ # Highlight titles and tables
112
+ page.find_all('region[type=title]').highlight(color="purple")
113
+ page.find_all('region[type=table]').highlight(color="blue")
114
+
115
+ # Extract data from the first table
116
+ table_data = page.find('region[type=table]').extract_table()
117
+ ```
118
+
119
+ [Learn about layout models →](layout-analysis/index.ipynb)
120
+ [Working with tables? →](tables/index.ipynb)
121
+
122
+ ### Document Question Answering
123
+
124
+ Ask natural language questions directly to your documents.
125
+
126
+ ```python
127
+ # Ask a question
128
+ result = pdf.ask("What was the company's revenue in 2022?")
129
+ if result.get("found", False):
130
+ print(f"Answer: {result['answer']}")
131
+ ```
132
+
133
+ [Learn about Document QA →](document-qa/index.ipynb)
134
+
135
+ ### Visualize Your Work
136
+
137
+ Debug and understand your extractions visually.
138
+
139
+ ```python
140
+ # Highlight headings
141
+ page.find_all('text[size>=14]').highlight(color="red", label="Headings")
142
+
143
+ # Launch the interactive viewer (Jupyter)
144
+ # Requires: pip install natural-pdf[interactive]
145
+ page.viewer()
146
+
147
+ # Or save an image
148
+ # page.save_image("highlighted.png")
149
+ ```
150
+
151
+ [See more visualization options →](visual-debugging/index.ipynb)
152
+
153
+ ## Documentation Topics
154
+
155
+ Choose what you want to learn about:
156
+
157
+ ### Task-based Guides
158
+ - [Getting Started](installation/index.md): Install the library and run your first extraction
159
+ - [PDF Navigation](pdf-navigation/index.ipynb): Open PDFs and work with pages
160
+ - [Element Selection](element-selection/index.ipynb): Find text and other elements using selectors
161
+ - [Text Extraction](text-extraction/index.ipynb): Extract clean text from documents
162
+ - [Regions](regions/index.ipynb): Work with specific areas of a page
163
+ - [Visual Debugging](visual-debugging/index.ipynb): See what you're extracting
164
+ - [OCR](ocr/index.md): Extract text from scanned documents
165
+ - [Layout Analysis](layout-analysis/index.ipynb): Detect document structure
166
+ - [Tables](tables/index.ipynb): Extract tabular data
167
+ - [Document QA](document-qa/index.ipynb): Ask questions to your documents
168
+
169
+ ### Reference
170
+ - [API Reference](api/index.md): Complete library reference
@@ -57,8 +57,7 @@ print(text)
57
57
 
58
58
  # Find something specific
59
59
  title = page.find('text:bold')
60
- if title:
61
- print(f"Found title: {title.text}")
60
+ print(f"Found title: {title.text}")
62
61
  ```
63
62
 
64
63
  ## What's Next?
@@ -221,12 +221,6 @@ print(f"Original text: {len(full_text)} chars\nText with exclusion: {len(text_wi
221
221
  print(f"Difference: {len(full_text) - len(text_with_exclusion)} chars excluded")
222
222
  ```
223
223
 
224
- ```python
225
- # Temporarily bypass exclusions if needed
226
- text_ignoring_exclusion = full_page_region.extract_text(use_exclusions=False)
227
- print(f"Text ignoring exclusions: {len(text_ignoring_exclusion)} chars (should match original)")
228
- ```
229
-
230
224
  ```python
231
225
  # When done with this page, clear exclusions
232
226
  page.clear_exclusions()
@@ -253,10 +247,11 @@ pdf.add_exclusion(
253
247
 
254
248
  # PDF-level exclusions are used whenever you extract text
255
249
  # Let's try on the first three pages
256
- for i in range(min(3, len(pdf.pages))):
250
+ for page in pdf.pages[:3]:
257
251
  page_i = pdf.pages[i]
258
252
  text = page_i.extract_text()
259
- print(f"Page {i+1}: {len(text)} characters after exclusions")
253
+ text_original = page_i.extract_text(use_exclusions=False)
254
+ print(f"Page {page.number} – Before: {len(text_original)} After: {len(text)}")
260
255
  ```
261
256
 
262
257
  ```python
@@ -23,12 +23,12 @@ page = pdf.pages[0]
23
23
  full_text_unfiltered = page.extract_text()
24
24
 
25
25
  # Show the last 200 characters (likely containing footer text)
26
- "Unfiltered text (last 200 chars): " + full_text_unfiltered[-200:]
26
+ full_text_unfiltered[-200:]
27
27
  ```
28
28
 
29
29
  ## Approach 1: Excluding a Fixed Area
30
30
 
31
- A simple way to exclude headers or footers is to define a fixed region based on page coordinates. Let's exclude the bottom 50 points of the page.
31
+ A simple way to exclude headers or footers is to define a fixed region based on page coordinates. Let's exclude the bottom 200 pixels of the page.
32
32
 
33
33
  ```python
34
34
  from natural_pdf import PDF
@@ -36,26 +36,29 @@ from natural_pdf import PDF
36
36
  pdf_url = "https://github.com/jsoma/natural-pdf/raw/refs/heads/main/pdfs/0500000US42007.pdf"
37
37
  pdf = PDF(pdf_url)
38
38
 
39
- # Define the exclusion region directly using a lambda function
40
- footer_height = 50
39
+ # Define the exclusion region on every page using a lambda function
40
+ footer_height = 200
41
41
  pdf.add_exclusion(
42
42
  lambda page: page.region(top=page.height - footer_height),
43
- label="Bottom 50pt Footer"
43
+ label="Bottom 200pt Footer"
44
44
  )
45
45
 
46
46
  # Now extract text from the first page again, exclusions are active by default
47
47
  page = pdf.pages[0]
48
- filtered_text = page.extract_text() # use_exclusions=True is default
49
-
50
- # Show the last 200 chars with footer area excluded
51
- "Fixed Area Excluded (last 200 chars): " + filtered_text[-200:]
52
48
 
53
49
  # Visualize the excluded area
54
50
  footer_region_viz = page.region(top=page.height - footer_height)
55
- footer_region_viz.show(label="Excluded Footer Area")
51
+ footer_region_viz.highlight(label="Excluded Footer Area")
56
52
  page.to_image()
57
53
  ```
58
54
 
55
+ ```python
56
+ filtered_text = page.extract_text() # use_exclusions=True is default
57
+
58
+ # Show the last 200 chars with footer area excluded
59
+ filtered_text[-200:]
60
+ ```
61
+
59
62
  This method is simple but might cut off content if the footer height varies or content extends lower on some pages.
60
63
 
61
64
  ## Approach 2: Excluding Based on Elements