PyPI - pdfalyzer - Versions diffs - 1.17.6__tar.gz → 1.17.9__tar.gz - Mend

pdfalyzer 1.17.6tar.gz → 1.17.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,14 @@
 # NEXT RELEASE
+### 1.17.9
+* Broaden exception handling in `FontInfo` extraction
+### 1.17.8
+* Handle `AttributeError` in `FontInfo` extraction
+### 1.17.7
+* Bump `pypdf` to 6.1.3 (fixes [#31](https://github.com/michelcrypt4d4mus/pdfalyzer/issues/31)), `PyMuPDF` to 1.26.5
 ### 1.17.6
 * Better handling for errors resulting from bugs in PyPDF
 * Properly close file handle when pdfalyzing is complete

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: pdfalyzer
-Version: 1.17.6
+Version: 1.17.9
 Summary: Analyze PDFs with colors (and YARA). Visualize a PDF's inner tree-like data structure, check it against a library of YARA rules, force decodes of suspicious font binaries, and more.
 Home-page: https://github.com/michelcrypt4d4mus/pdfalyzer
 License: GPL-3.0-or-later
@@ -22,9 +22,9 @@ Classifier: Topic :: Artistic Software
 Classifier: Topic :: Scientific/Engineering :: Visualization
 Classifier: Topic :: Security
 Provides-Extra: extract
-Requires-Dist: PyMuPDF (>=1.26.4,<2.0.0) ; extra == "extract"
+Requires-Dist: PyMuPDF (>=1.26.5,<2.0.0) ; extra == "extract"
 Requires-Dist: anytree (>=2.13,<3.0)
-Requires-Dist: pypdf (>=6.0.0,<7.0.0)
+Requires-Dist: pypdf (>=6.1.3,<7.0.0)
 Requires-Dist: pytesseract (>=0.3.13,<0.4.0) ; extra == "extract"
 Requires-Dist: yaralyzer (>=1.0.9,<2.0.0)
 Project-URL: Changelog, https://github.com/michelcrypt4d4mus/pdfalyzer/blob/master/CHANGELOG.md
@@ -67,9 +67,8 @@ If you're looking for one of these things this may be the tool for you.
 ### What It Don't Do
 This tool is mostly for examining/working with a PDF's data and logical structure. As such it doesn't have much to offer as far as extracting text, rendering[^3], writing, etc. etc.
-If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it; embedded javascript etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
+If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it. Embedded javascript and `/OpenAction` nodes etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
--------------
 # Installation
 #### All Platforms
@@ -99,7 +98,6 @@ brew install pdfalyzer
    sudo apt-get install build-essential libssl-dev libffi-dev rustc
    ```
--------------
 # Usage
@@ -115,7 +113,7 @@ If you provide none of the flags in the `ANALYSIS SELECTION` section of the `--h
 The `--streams` output is the one used to hunt for patterns in the embedded bytes and can be _extremely_ verbose depending on the `--quote-char` options chosen (or not chosen) and contents of the PDF. [The Yaralyzer](https://github.com/michelcrypt4d4mus/yaralyzer) handles this task; if you want to hunt for patterns in the bytes other than bytes surrounded by backticks/frontslashes/brackets/quotes/etc. you may want to use The Yaralyzer directly. As The Yaralyzer is a prequisite for The Pdfalyzer you may already have the `yaralyze` command installed and available.
-### Setting Command Line Options Permanently With A `.pdfalyzer` File
+#### Setting Command Line Options Permanently With A `.pdfalyzer` File
 When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` in these places in this order:
 1. the current directory
@@ -123,12 +121,9 @@ When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfa
 If it finds a `.pdfalyzer` file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
-### Environment Variables
+#### Environment Variables
 Even if you don't configure your own `.pdfalyzer` file you may still glean some insight from reading the descriptions of the various variables in [`.pdfalyzer.example`](.pdfalyzer.example); there's a little more exposition there than in the output of `pdfalyze -h`.
-### Colors And Themes
-Run `pdfalyzer_show_color_theme` to see the color theme employed.
 ### Guarantees
 Warnings will be printed if any PDF object ID between 1 and the `/Size` reported by the PDF itself could not be successfully placed in the tree. If you do not get any warnings then all[^2] of the inner PDF objects should be seen in the output.
@@ -136,7 +131,22 @@ Warnings will be printed if any PDF object ID between 1 and the `/Size` reported
 [BUFFERZONE Team](https://bufferzonesecurity.com) posted [an excellent example](https://bufferzonesecurity.com/the-beginners-guide-to-adobe-pdf-malware-reverse-engineering-part-1/) of how one might use The Pdfalyzer in tandem with [Didier Stevens' PDF tools](#installing-didier-stevenss-pdf-analysis-tools) to investigate a potentially malicious PDF (archived in [the `doc/` dir in this repo](./doc/) if the link rots).
-## Use As A Code Library
+## Included Command Line Tools
+The Pdfalyzer comes with a few command line tools for doing stuff with PDFs:
+* `combine_pdfs` - Combines multiple PDFs into a single PDF. Run `combine_pdfs --help` for more info.
+* `extract_pdf_pages` - Extracts page ranges (e.g. "10-25") from a PDF and writes them to a new PDF. Run `extract_pdf_pages --help` for more info.
+* `extract_pdf_text` - Extracts text from a PDF, including applying OCR to all embedded images. Run `extract_pdf_text --help` for more info.
+* `pdfalyzer_show_color_theme` - Run to see the color theme employed in Pdfalyzer's output.
+Running `extract_pdf_text` requires that you install The Pdfalyzer's optional dependencies:
+```bash
+pipx install pdfalyzer[extract]
+```
+## As A Python Library
 For info about setting up a dev environment see [Contributing](#contributing) below.
 At its core The Pdfalyzer is taking PDF internal objects gathered by [PyPDF](https://github.com/py-pdf/pypdf) and wrapping them in [AnyTree](https://github.com/c0fec0de/anytree)'s `NodeMixin` class.  Given that things like searching the tree or accessing internal PDF properties will be done through those packages' code it may be helpful to review their documentation.
@@ -247,20 +257,6 @@ Things like, say, a hidden binary `/F` (PDF instruction meaning "URL") followed
 # PDF Resources
-## Included PDF Tools
-The Pdfalyzer comes with a few command line tools:
-* `combine_pdfs` - Combines multiple PDFs into a single PDF. Run `combine_pdfs --help` for more info.
-* `extract_pdf_pages` - Extracts page ranges (e.g. "10-25") from a PDF and writes them to a new PDF. Run `extract_pdf_pages --help` for more info.
-* `extract_pdf_text` - Extracts text from a PDF, including applying OCR to all embedded images. Run `extract_pdf_text --help` for more info.
-Running `extract_pdf_text` requires that you install The Pdfalyzer's optional dependencies:
-```bash
-pipx install pdfalyzer[extract]
-```
 ## 3rd Party PDF Tools
 ### Installing Didier Stevens's PDF Analysis Tools
 Stevens's tools provide comprehensive info about the contents of a PDF, are guaranteed not to trigger the rendering of any malicious content (especially `pdfid.py`), and have been battle tested for well over a decade. It would probably be a good idea to analyze your PDF with his tools before you start working with this one.

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/README.md RENAMED Viewed

@@ -33,9 +33,8 @@ If you're looking for one of these things this may be the tool for you.
 ### What It Don't Do
 This tool is mostly for examining/working with a PDF's data and logical structure. As such it doesn't have much to offer as far as extracting text, rendering[^3], writing, etc. etc.
-If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it; embedded javascript etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
+If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it. Embedded javascript and `/OpenAction` nodes etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
--------------
 # Installation
 #### All Platforms
@@ -65,7 +64,6 @@ brew install pdfalyzer
    sudo apt-get install build-essential libssl-dev libffi-dev rustc
    ```
--------------
 # Usage
@@ -81,7 +79,7 @@ If you provide none of the flags in the `ANALYSIS SELECTION` section of the `--h
 The `--streams` output is the one used to hunt for patterns in the embedded bytes and can be _extremely_ verbose depending on the `--quote-char` options chosen (or not chosen) and contents of the PDF. [The Yaralyzer](https://github.com/michelcrypt4d4mus/yaralyzer) handles this task; if you want to hunt for patterns in the bytes other than bytes surrounded by backticks/frontslashes/brackets/quotes/etc. you may want to use The Yaralyzer directly. As The Yaralyzer is a prequisite for The Pdfalyzer you may already have the `yaralyze` command installed and available.
-### Setting Command Line Options Permanently With A `.pdfalyzer` File
+#### Setting Command Line Options Permanently With A `.pdfalyzer` File
 When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` in these places in this order:
 1. the current directory
@@ -89,12 +87,9 @@ When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfa
 If it finds a `.pdfalyzer` file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
-### Environment Variables
+#### Environment Variables
 Even if you don't configure your own `.pdfalyzer` file you may still glean some insight from reading the descriptions of the various variables in [`.pdfalyzer.example`](.pdfalyzer.example); there's a little more exposition there than in the output of `pdfalyze -h`.
-### Colors And Themes
-Run `pdfalyzer_show_color_theme` to see the color theme employed.
 ### Guarantees
 Warnings will be printed if any PDF object ID between 1 and the `/Size` reported by the PDF itself could not be successfully placed in the tree. If you do not get any warnings then all[^2] of the inner PDF objects should be seen in the output.
@@ -102,7 +97,22 @@ Warnings will be printed if any PDF object ID between 1 and the `/Size` reported
 [BUFFERZONE Team](https://bufferzonesecurity.com) posted [an excellent example](https://bufferzonesecurity.com/the-beginners-guide-to-adobe-pdf-malware-reverse-engineering-part-1/) of how one might use The Pdfalyzer in tandem with [Didier Stevens' PDF tools](#installing-didier-stevenss-pdf-analysis-tools) to investigate a potentially malicious PDF (archived in [the `doc/` dir in this repo](./doc/) if the link rots).
-## Use As A Code Library
+## Included Command Line Tools
+The Pdfalyzer comes with a few command line tools for doing stuff with PDFs:
+* `combine_pdfs` - Combines multiple PDFs into a single PDF. Run `combine_pdfs --help` for more info.
+* `extract_pdf_pages` - Extracts page ranges (e.g. "10-25") from a PDF and writes them to a new PDF. Run `extract_pdf_pages --help` for more info.
+* `extract_pdf_text` - Extracts text from a PDF, including applying OCR to all embedded images. Run `extract_pdf_text --help` for more info.
+* `pdfalyzer_show_color_theme` - Run to see the color theme employed in Pdfalyzer's output.
+Running `extract_pdf_text` requires that you install The Pdfalyzer's optional dependencies:
+```bash
+pipx install pdfalyzer[extract]
+```
+## As A Python Library
 For info about setting up a dev environment see [Contributing](#contributing) below.
 At its core The Pdfalyzer is taking PDF internal objects gathered by [PyPDF](https://github.com/py-pdf/pypdf) and wrapping them in [AnyTree](https://github.com/c0fec0de/anytree)'s `NodeMixin` class.  Given that things like searching the tree or accessing internal PDF properties will be done through those packages' code it may be helpful to review their documentation.
@@ -213,20 +223,6 @@ Things like, say, a hidden binary `/F` (PDF instruction meaning "URL") followed
 # PDF Resources
-## Included PDF Tools
-The Pdfalyzer comes with a few command line tools:
-* `combine_pdfs` - Combines multiple PDFs into a single PDF. Run `combine_pdfs --help` for more info.
-* `extract_pdf_pages` - Extracts page ranges (e.g. "10-25") from a PDF and writes them to a new PDF. Run `extract_pdf_pages --help` for more info.
-* `extract_pdf_text` - Extracts text from a PDF, including applying OCR to all embedded images. Run `extract_pdf_text --help` for more info.
-Running `extract_pdf_text` requires that you install The Pdfalyzer's optional dependencies:
-```bash
-pipx install pdfalyzer[extract]
-```
 ## 3rd Party PDF Tools
 ### Installing Didier Stevens's PDF Analysis Tools
 Stevens's tools provide comprehensive info about the contents of a PDF, are guaranteed not to trigger the rendering of any malicious content (especially `pdfid.py`), and have been battle tested for well over a decade. It would probably be a good idea to analyze your PDF with his tools before you start working with this one.

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/helpers/image_helper.py RENAMED Viewed

@@ -2,7 +2,7 @@ from typing import Optional
 from yaralyzer.output.rich_console import console
-from pdfalyzer.helpers.rich_text_helper import warning_text
+from pdfalyzer.helpers.rich_text_helper import print_warning
 def ocr_text(image: "Image.Image", image_name: str) -> Optional[str]:  # noqa F821
@@ -15,10 +15,10 @@ def ocr_text(image: "Image.Image", image_name: str) -> Optional[str]:  # noqa F8
         text = pytesseract.image_to_string(image)
     except pytesseract.pytesseract.TesseractError:
         console.print_exception()
-        console.print(warning_text(f"Tesseract OCR failure '{image_name}'! No OCR text extracted..."))
+        print_warning(f"Tesseract OCR failure '{image_name}'! No OCR text extracted...")
     except OSError as e:
         if 'truncated' in str(e):
-            console.print(warning_text(f"Truncated image file '{image_name}'!"))
+            print_warning(f"Truncated image file '{image_name}'!")
         else:
             console.print_exception()
             console.print(f"Error while extracting '{image_name}'!", style='bright_red')

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/helpers/rich_text_helper.py RENAMED Viewed

@@ -21,26 +21,9 @@ pdfalyzer_console = Console(color_system='256')
 stderr_console = Console(color_system='256', file=stderr)
-def print_highlighted(msg: Union[str, Text], **kwargs) -> None:
-    """Print 'msg' with Rich highlighting."""
-    pdfalyzer_console.print(msg, highlight=True, **kwargs)
-def quoted_text(
-    _string: str,
-    style: str = '',
-    quote_char_style: str = 'white',
-    quote_char: str = "'"
-) -> Text:
-    """Wrap _string in 'quote_char'. Style 'quote_char' with 'quote_char_style'."""
-    quote_char_txt = Text(quote_char, style=quote_char_style)
-    txt = quote_char_txt + Text(_string, style=style) + quote_char_txt
-    txt.justify = 'center'
-    return txt
-def indented_bullet(msg: Union[str, Text], style: Optional[str] = None) -> Text:
-    return Text('  ') + bullet_text(msg, style)
+def attention_getting_panel(text: Text, title: str, style: str = 'white on red') -> Padding:
+    p = Panel(text, padding=(2), title=title, style=style)
+    return Padding(p, pad=(1, 10, 2, 10))
 def bullet_text(msg: Union[str, Text], style: Optional[str] = None) -> Text:
@@ -50,6 +33,23 @@ def bullet_text(msg: Union[str, Text], style: Optional[str] = None) -> Text:
     return Text(ARROW_BULLET).append(msg)
+def comma_join_txt(text_objs: List[Text]) -> Text:
+    return Text(", ").join(text_objs)
+def error_text(text: Union[str, Text]) -> Text:
+    msg = Text('').append(f"ERROR", style='bright_red').append(": ")
+    if isinstance(text, Text):
+        return msg + text
+    else:
+        return msg.append(text)
+def indented_bullet(msg: Union[str, Text], style: Optional[str] = None) -> Text:
+    return Text('  ') + bullet_text(msg, style)
 def mild_warning(msg: str) -> None:
     console.print(indented_bullet(Text(msg, style='mild_warning')))
@@ -67,10 +67,6 @@ def node_label(idnum: int, label: str, pdf_object: PdfObject, underline: bool =
     return text
-def comma_join_txt(text_objs: List[Text]) -> Text:
-    return Text(", ").join(text_objs)
 def number_and_pct(_number: int, total: int, digits: int = 1) -> Text:
     """Return e.g. '8 (80%)'."""
     return Text(str(_number), style='bright_white').append_text(pct_txt(_number, total, digits))
@@ -82,28 +78,37 @@ def pct_txt(_number: int, total: int, digits: int = 1) -> Text:
     return Text(f"({pct}%)", style='blue')
-def warning_text(text: Union[str, Text]) -> Text:
-    msg = Text('').append(f"WARNING", style='bright_yellow').append(": ")
+def print_error(text: Union[str, Text]) -> Text:
+    console.line()
+    console.print(error_text(text))
-    if isinstance(text, Text):
-        return msg + text
-    else:
-        return msg.append(text)
+def print_highlighted(msg: Union[str, Text], **kwargs) -> None:
+    """Print 'msg' with Rich highlighting."""
+    pdfalyzer_console.print(msg, highlight=True, **kwargs)
-def error_text(text: Union[str, Text]) -> Text:
-    msg = Text('').append(f"ERROR", style='bright_red').append(": ")
-    if isinstance(text, Text):
-        return msg + text
-    else:
-        return msg.append(text)
+def print_warning(text: Union[str, Text]) -> None:
+    console.print(_warning_text(text))
-def attention_getting_panel(text: Text, title: str, style: str = 'white on red') -> Padding:
-    p = Panel(text, padding=(2), title=title, style=style)
-    return Padding(p, pad=(1, 10, 2, 10))
+def quoted_text(
+    _string: str,
+    style: str = '',
+    quote_char_style: str = 'white',
+    quote_char: str = "'"
+) -> Text:
+    """Wrap _string in 'quote_char'. Style 'quote_char' with 'quote_char_style'."""
+    quote_char_txt = Text(quote_char, style=quote_char_style)
+    txt = quote_char_txt + Text(_string, style=style) + quote_char_txt
+    txt.justify = 'center'
+    return txt
-def print_error(text: Union[str, Text]) -> Text:
-    console.print(error_text(text))
+def _warning_text(text: Union[str, Text]) -> Text:
+    msg = Text('').append(f"WARNING", style='bright_yellow').append(": ")
+    if isinstance(text, Text):
+        return msg + text
+    else:
+        return msg.append(text)

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/output/pdfalyzer_presenter.py RENAMED Viewed

@@ -20,6 +20,7 @@ from pdfalyzer.binary.binary_scanner import BinaryScanner
 from pdfalyzer.config import PdfalyzerConfig
 from pdfalyzer.decorators.pdf_tree_node import DECODE_FAILURE_LEN
 from pdfalyzer.detection.yaralyzer_helper import get_bytes_yaralyzer, get_file_yaralyzer
+from pdfalyzer.helpers.rich_text_helper import print_error
 from pdfalyzer.helpers.string_helper import pp
 from pdfalyzer.output.layout import (print_fatal_error_panel, print_section_header, print_section_subheader,
      print_section_sub_subheader)
@@ -27,12 +28,19 @@ from pdfalyzer.output.tables.decoding_stats_table import build_decoding_stats_ta
 from pdfalyzer.output.tables.pdf_node_rich_table import generate_rich_tree, get_symlink_representation
 from pdfalyzer.output.tables.stream_objects_table import stream_objects_table
 from pdfalyzer.pdfalyzer import Pdfalyzer
-# from pdfalyzer.util.adobe_strings import *
 INTERNAL_YARA_ERROR_MSG = "Internal YARA error! YARA's error codes can be checked here: https://github.com/VirusTotal/yara/blob/master/libyara/include/yara/error.h"  # noqa: E501
 class PdfalyzerPresenter:
+    """
+    Handles formatting of console text output for Pdfalyzer class.
+    Attributes:
+        pdfalyzer (Pdfalyzer): Pdfalyzer for a given PDF file
+        yaralyzer (Yaralyzer): Yaralyzer for a given PDF file
+    """
     def __init__(self, pdfalyzer: Pdfalyzer):
         self.pdfalyzer = pdfalyzer
         self.yaralyzer = get_file_yaralyzer(self.pdfalyzer.pdf_path)
@@ -83,6 +91,9 @@ class PdfalyzerPresenter:
         """Print informatin about all fonts that appear in this PDF."""
         print_section_header(f'{len(self.pdfalyzer.font_infos)} fonts found in {self.pdfalyzer.pdf_basename}')
+        if self.pdfalyzer.font_info_extraction_error:
+            print_error(f"Failed to extract font information (error: {self.pdfalyzer.font_info_extraction_error})")
         for font_info in [fi for fi in self.pdfalyzer.font_infos if font_idnum is None or font_idnum == fi.idnum]:
             font_info.print_summary()

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/pdfalyzer.py RENAMED Viewed

@@ -19,6 +19,7 @@ from pdfalyzer.decorators.indeterminate_node import IndeterminateNode
 from pdfalyzer.decorators.pdf_tree_node import PdfTreeNode
 from pdfalyzer.decorators.pdf_tree_verifier import PdfTreeVerifier
 from pdfalyzer.font_info import FontInfo
+from pdfalyzer.helpers.rich_text_helper import print_error
 from pdfalyzer.pdf_object_relationship import PdfObjectRelationship
 from pdfalyzer.util.adobe_strings import *
 from pdfalyzer.util.exceptions import PdfWalkError
@@ -37,6 +38,7 @@ class Pdfalyzer:
     Attributes:
         font_infos (List[FontInfo]): Font summary objects
+        font_info_extraction_error (Optional[Exception]): Error encountered extracting FontInfo (if any)
         max_generation (int): Max revision number ("generation") encounted in this PDF.
         nodes_encountered (Dict[int, PdfTreeNode]): Nodes we've traversed already.
         pdf_basename (str): The base name of the PDF file (with extension).
@@ -70,6 +72,7 @@ class Pdfalyzer:
         # Initialize tracking variables
         self.font_infos: List[FontInfo] = []  # Font summary objects
+        self.font_info_extraction_error: Optional[Exception] = None
         self.max_generation = 0  # PDF revisions are "generations"; this is the max generation encountered
         self.nodes_encountered: Dict[int, PdfTreeNode] = {}  # Nodes we've seen already
         self._indeterminate_ids = set()  # See INDETERMINATE_REF_KEYS comment
@@ -220,14 +223,22 @@ class Pdfalyzer:
     def _extract_font_infos(self) -> None:
         """Extract information about fonts in the tree and place it in `self.font_infos`."""
         for node in self.node_iterator():
-            if isinstance(node.obj, dict) and RESOURCES in node.obj:
-                log.debug(f"Extracting fonts from node with '{RESOURCES}' key: {node}...")
-                known_font_ids = [fi.idnum for fi in self.font_infos]
+            if not (isinstance(node.obj, dict) and RESOURCES in node.obj):
+                continue
+            log.debug(f"Extracting fonts from node with '{RESOURCES}' key: {node}...")
+            known_font_ids = [fi.idnum for fi in self.font_infos]
+            try:
                 self.font_infos += [
                     fi for fi in FontInfo.extract_font_infos(node.obj)
                     if fi.idnum not in known_font_ids
                 ]
+            except Exception as e:
+                self.font_info_extraction_error = e
+                console.line()
+                log.warning(f"Failed to extract font information from node: {node} (error: {e})")
+                console.line()
     def _build_or_find_node(self, relationship: IndirectObject, relationship_key: str) -> PdfTreeNode:
         """If node in self.nodes_encountered already then return it, otherwise build a node and store it."""

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "pdfalyzer"
-version = "1.17.6"
+version = "1.17.9"
 description = "Analyze PDFs with colors (and YARA). Visualize a PDF's inner tree-like data structure, check it against a library of YARA rules, force decodes of suspicious font binaries, and more."
 authors = ["Michel de Cryptadamus <michel@cryptadamus.com>"]
 license = "GPL-3.0-or-later"
@@ -67,8 +67,8 @@ packages = [
 [tool.poetry.dependencies]
 python = "^3.10"
 anytree = "~=2.13"
-pypdf = "^6.0.0"
-PyMuPDF = {version = "^1.26.4", optional = true}
+pypdf = "^6.1.3"
+PyMuPDF = {version = "^1.26.5", optional = true}
 pytesseract = {version = "^0.3.13", optional = true}
 yaralyzer = "^1.0.9"

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/.pdfalyzer.example RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/LICENSE RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/__init__.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/__main__.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/binary/binary_scanner.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/config.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/decorators/document_model_printer.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/decorators/indeterminate_node.py RENAMED Viewed

File without changes

{pdfalyzer-1.17.6 → pdfalyzer-1.17.9}/pdfalyzer/decorators/pdf_file.py RENAMED Viewed

@@ -173,8 +173,8 @@ class PdfFile:
         except EmptyFileError:
             log.warning("Skipping empty file!")
         except PdfStreamError as e:
-            print_error(f"Error parsing PDF file '{self.file_path}': {e}")
             stderr_console.print_exception()
+            print_error(f"Error parsing PDF file '{self.file_path}': {e}")
         return "\n\n".join(extracted_pages).strip()