PyPI - pdfalyzer - Versions diffs - 1.16.13__tar.gz → 1.16.14__tar.gz - Mend

pdfalyzer 1.16.13tar.gz → 1.16.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of pdfalyzer might be problematic. Click here for more details.

Files changed (49) hide show

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,9 @@
 # NEXT RELEASE
+### 1.16.14
+* Bump `yaralyzer` to v1.0.9
+* Drop support for python 3.9
 ### 1.16.13
 * Bump `yaralyzer` to v1.0.7 and fix reference to yaralyzer's renamed `prefix_with_style()` method

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/PKG-INFO RENAMED Viewed

@@ -1,13 +1,13 @@
 Metadata-Version: 2.1
 Name: pdfalyzer
-Version: 1.16.13
+Version: 1.16.14
 Summary: PDF analysis tool. Scan a PDF with YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more.
 Home-page: https://github.com/michelcrypt4d4mus/pdfalyzer
 License: GPL-3.0-or-later
 Keywords: ascii art,binary,color,cybersecurity,DFIR,encoding,font,infosec,maldoc,malicious pdf,malware,malware analysis,pdf,pdfs,pdf analysis,pypdf,threat assessment,threat hunting,threat intelligence,threat research,threatintel,visualization,yara
 Author: Michel de Cryptadamus
 Author-email: michel@cryptadamus.com
-Requires-Python: >=3.9.2,<4.0
+Requires-Python: >=3.10,<4.0
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Information Technology
@@ -18,13 +18,12 @@ Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
-Classifier: Programming Language :: Python :: 3.9
 Classifier: Topic :: Artistic Software
 Classifier: Topic :: Scientific/Engineering :: Visualization
 Classifier: Topic :: Security
 Requires-Dist: anytree (>=2.13,<3.0)
 Requires-Dist: pypdf (>=6.0.0,<7.0.0)
-Requires-Dist: yaralyzer (>=1.0.7,<2.0.0)
+Requires-Dist: yaralyzer (>=1.0.9,<2.0.0)
 Project-URL: Changelog, https://github.com/michelcrypt4d4mus/pdfalyzer/blob/master/CHANGELOG.md
 Project-URL: Documentation, https://github.com/michelcrypt4d4mus/pdfalyzer
 Project-URL: Repository, https://github.com/michelcrypt4d4mus/pdfalyzer
@@ -114,7 +113,12 @@ If you provide none of the flags in the `ANALYSIS SELECTION` section of the `--h
 The `--streams` output is the one used to hunt for patterns in the embedded bytes and can be _extremely_ verbose depending on the `--quote-char` options chosen (or not chosen) and contents of the PDF. [The Yaralyzer](https://github.com/michelcrypt4d4mus/yaralyzer) handles this task; if you want to hunt for patterns in the bytes other than bytes surrounded by backticks/frontslashes/brackets/quotes/etc. you may want to use The Yaralyzer directly. As The Yaralyzer is a prequisite for The Pdfalyzer you may already have the `yaralyze` command installed and available.
 ### Setting Command Line Options Permanently With A `.pdfalyzer` File
-When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` first in the current directory and then in the home directory. If it finds a file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
+When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` in these places in this order:
+1. the current directory
+2. the user's home directory
+If it finds a `.pdfalyzer` file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
 ### Environment Variables
 Even if you don't configure your own `.pdfalyzer` file you may still glean some insight from reading the descriptions of the various variables in [`.pdfalyzer.example`](.pdfalyzer.example); there's a little more exposition there than in the output of `pdfalyze -h`.
@@ -125,10 +129,9 @@ Run `pdfalyzer_show_color_theme` to see the color theme employed.
 ### Guarantees
 Warnings will be printed if any PDF object ID between 1 and the `/Size` reported by the PDF itself could not be successfully placed in the tree. If you do not get any warnings then all[^2] of the inner PDF objects should be seen in the output.
-## Example Usage
+## Example Malicious PDF Investigation
 [BUFFERZONE Team](https://bufferzonesecurity.com) posted [an excellent example](https://bufferzonesecurity.com/the-beginners-guide-to-adobe-pdf-malware-reverse-engineering-part-1/) of how one might use The Pdfalyzer in tandem with [Didier Stevens' PDF tools](#installing-didier-stevenss-pdf-analysis-tools) to investigate a potentially malicious PDF (archived in [the `doc/` dir in this repo](./doc/) if the link rots).
--------------
 ## Use As A Code Library
 For info about setting up a dev environment see [Contributing](#contributing) below.

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/README.md RENAMED Viewed

@@ -82,7 +82,12 @@ If you provide none of the flags in the `ANALYSIS SELECTION` section of the `--h
 The `--streams` output is the one used to hunt for patterns in the embedded bytes and can be _extremely_ verbose depending on the `--quote-char` options chosen (or not chosen) and contents of the PDF. [The Yaralyzer](https://github.com/michelcrypt4d4mus/yaralyzer) handles this task; if you want to hunt for patterns in the bytes other than bytes surrounded by backticks/frontslashes/brackets/quotes/etc. you may want to use The Yaralyzer directly. As The Yaralyzer is a prequisite for The Pdfalyzer you may already have the `yaralyze` command installed and available.
 ### Setting Command Line Options Permanently With A `.pdfalyzer` File
-When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` first in the current directory and then in the home directory. If it finds a file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
+When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` in these places in this order:
+1. the current directory
+2. the user's home directory
+If it finds a `.pdfalyzer` file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
 ### Environment Variables
 Even if you don't configure your own `.pdfalyzer` file you may still glean some insight from reading the descriptions of the various variables in [`.pdfalyzer.example`](.pdfalyzer.example); there's a little more exposition there than in the output of `pdfalyze -h`.
@@ -93,10 +98,9 @@ Run `pdfalyzer_show_color_theme` to see the color theme employed.
 ### Guarantees
 Warnings will be printed if any PDF object ID between 1 and the `/Size` reported by the PDF itself could not be successfully placed in the tree. If you do not get any warnings then all[^2] of the inner PDF objects should be seen in the output.
-## Example Usage
+## Example Malicious PDF Investigation
 [BUFFERZONE Team](https://bufferzonesecurity.com) posted [an excellent example](https://bufferzonesecurity.com/the-beginners-guide-to-adobe-pdf-malware-reverse-engineering-part-1/) of how one might use The Pdfalyzer in tandem with [Didier Stevens' PDF tools](#installing-didier-stevenss-pdf-analysis-tools) to investigate a potentially malicious PDF (archived in [the `doc/` dir in this repo](./doc/) if the link rots).
--------------
 ## Use As A Code Library
 For info about setting up a dev environment see [Contributing](#contributing) below.

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/binary/binary_scanner.py RENAMED Viewed

@@ -1,6 +1,5 @@
 """
-Class for handling binary data - scanning through it for various suspicious patterns as well as forcing
-various character encodings upon it to see what comes out.
+`BinaryScanner` class.
 """
 from collections import defaultdict
 from typing import Iterator, Optional, Tuple
@@ -28,8 +27,18 @@ from pdfalyzer.util.adobe_strings import CONTENTS, CURRENTFILE_EEXEC, FONT_FILE_
 class BinaryScanner:
+    """
+    Class for handling binary data - scanning through it for various suspicious patterns as well as forcing
+    various character encodings upon it to see what comes out.
+    """
     def __init__(self, _bytes: bytes, owner: PdfTreeNode, label: Optional[Text] = None):
-        """'owner' arg is an optional link back to the object containing this binary."""
+        """
+        Args:
+            _bytes (bytes): The binary data to be scanned.
+            owner (PdfTreeNode): The `PdfTreeNode` that contains this binary data.
+            label (Optional[Text]): A rich `Text` label for the binary data (e.g. the PDF object's address).
+        """
         self.bytes = _bytes
         self.label = label
         self.owner = owner
@@ -42,7 +51,7 @@ class BinaryScanner:
         self.regex_extraction_stats = defaultdict(lambda: RegexMatchMetrics())
     def check_for_dangerous_instructions(self) -> None:
-        """Scan for all the strings in DANGEROUS_INSTRUCTIONS list and decode bytes around them."""
+        """Scan for all the strings in `DANGEROUS_INSTRUCTIONS` list and decode bytes around them."""
         subheader = "Scanning Binary For Anything That Could Be Described As 'sus'..."
         print_section_sub_subheader(subheader, style=f"bright_red")
@@ -71,8 +80,8 @@ class BinaryScanner:
     def force_decode_quoted_bytes(self) -> None:
         """
-        Find all strings matching QUOTE_PATTERNS (AKA between quote chars) and decode them with various encodings.
-        The --quote-type arg will limit this decode to just one kind of quote.
+        Find all strings matching `QUOTE_PATTERNS` (AKA between quote chars) and decode them with various
+        encodings. The `--quote-type` arg will limit this decode to just one kind of quote.
         """
         quote_selections = PdfalyzerConfig._args.extract_quoteds
@@ -100,11 +109,11 @@ class BinaryScanner:
     # YARA rules are written on the fly and then YARA does the matching.
     # -------------------------------------------------------------------------------
     def extract_guillemet_quoted_bytes(self) -> Iterator[Tuple[BytesMatch, BytesDecoder]]:
-        """Iterate on all strings surrounded by Guillemet quotes, e.g. «string»"""
+        """Iterate on all strings surrounded by Guillemet quotes, e.g. «string»."""
         return self._quote_yaralyzer(QUOTE_PATTERNS[GUILLEMET], GUILLEMET).match_iterator()
     def extract_backtick_quoted_bytes(self) -> Iterator[Tuple[BytesMatch, BytesDecoder]]:
-        """Returns an interator over all strings surrounded by backticks"""
+        """Returns an interator over all strings surrounded by backticks."""
         return self._quote_yaralyzer(QUOTE_PATTERNS[BACKTICK], BACKTICK).match_iterator()
     def extract_front_slash_quoted_bytes(self) -> Iterator[Tuple[BytesMatch, BytesDecoder]]:
@@ -137,7 +146,14 @@ class BinaryScanner:
         console.line()
     def process_yara_matches(self, yaralyzer: Yaralyzer, pattern: str, force: bool = False) -> None:
-        """Decide whether to attempt to decode the matched bytes, track stats. force param ignores min/max length."""
+        """
+        Decide whether to attempt to decode the matched bytes and track stats.
+        Args:
+            yaralyzer (Yaralyzer): The `Yaralyzer` instance to use for finding matches.
+            pattern (str): The pattern being searched for (used for stats tracking).
+            force (bool): If `True`, decode all matches even if they are very short or very long.
+        """
         for bytes_match, decoder in yaralyzer.match_iterator():
             log.debug(f"Trackings match stats for {pattern}, bytes_match: {bytes_match}, is_decodable: {bytes_match.is_decodable()}")  # noqa: E501
@@ -162,7 +178,7 @@ class BinaryScanner:
         return self.bytes.split(CURRENTFILE_EEXEC)[1] if CURRENTFILE_EEXEC in self.bytes else self.bytes
     def _quote_yaralyzer(self, quote_pattern: str, quote_type: str):
-        """Helper method to build a Yaralyzer for a quote_pattern"""
+        """Helper method to build a Yaralyzer for a `quote_pattern`."""
         label = f"{quote_type}_Quoted"
         if quote_type == GUILLEMET:
@@ -177,7 +193,7 @@ class BinaryScanner:
         rules_label: Optional[str] = None,
         pattern_label: Optional[str] = None
     ) -> Yaralyzer:
-        """Build a yaralyzer to scan self.bytes"""
+        """Build a `yaralyzer` to scan `self.bytes`."""
         return Yaralyzer.for_patterns(
             patterns=[escape_yara_pattern(pattern)],
             patterns_type=pattern_type,
@@ -198,5 +214,5 @@ class BinaryScanner:
         self.suppression_notice_queue = []
     def _eexec_idx(self) -> int:
-        """Returns the location of CURRENTFILES_EEXEC within the binary stream data (or 0 if it's not there)."""
+        """Returns the location of `CURRENTFILES_EEXEC` within the binary stream data (or 0 if it's not there)."""
         return self.bytes.find(CURRENTFILE_EEXEC) if CURRENTFILE_EEXEC in self.bytes else 0

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/config.py RENAMED Viewed

@@ -9,9 +9,10 @@ from os import environ, pardir, path
 from yaralyzer.config import YaralyzerConfig, is_env_var_set_and_not_false, is_invoked_by_pytest
 PDFALYZE = 'pdfalyze'
+PDFALYZER = f"{PDFALYZE}r"
 ALL_STREAMS = -1
 PYTEST_FLAG = 'INVOKED_BY_PYTEST'
-PROJECT_ROOT = path.join(str(importlib.resources.files('pdfalyzer')), pardir)
+PROJECT_ROOT = path.join(str(importlib.resources.files(PDFALYZER)), pardir)
 # 3rd part pdf-parser.py
 PDF_PARSER_EXECUTABLE_ENV_VAR = 'PDFALYZER_PDF_PARSER_PY_PATH'

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/decorators/indeterminate_node.py RENAMED Viewed

@@ -1,11 +1,3 @@
-"""
-Some nodes cannot be placed until we have walked the rest of the tree. For instance
-if we encounter a /Page that relationships /Resources we need to know if there's a
-/Pages parent of the /Page before committing to a tree structure.
-This class handles choosing among the candidates for a given PDF object's parent node
-(AKA "figuring out where to place the node in the PDF object tree").
-"""
 from typing import Callable, List, Optional
 from rich.markup import escape
@@ -18,6 +10,14 @@ from pdfalyzer.util.adobe_strings import *
 class IndeterminateNode:
+    """
+    Class to handle choosing among the candidates for a given PDF object's parent node.
+    Some nodes cannot be placed until we have walked the rest of the tree. For instance
+    if we encounter a /Page that relationships /Resources we need to know if there's a
+    /Pages parent of the /Page before committing to a tree structure.
+    """
     def __init__(self, node: PdfTreeNode) -> None:
         self.node = node
@@ -56,7 +56,7 @@ class IndeterminateNode:
         self.node.set_parent(parent)
-    def find_node_with_most_descendants(self, list_of_nodes: List[PdfTreeNode] = None) -> PdfTreeNode:
+    def find_node_with_most_descendants(self, list_of_nodes: Optional[List[PdfTreeNode]] = None) -> PdfTreeNode:
         """Find node with a reference to this one that has the most descendants"""
         list_of_nodes = list_of_nodes or [r.from_node for r in self.node.non_tree_relationships]
         max_descendants = max([node.descendants_count() for node in list_of_nodes])
@@ -64,7 +64,7 @@ class IndeterminateNode:
     def _has_only_similar_relationships(self) -> bool:
         """
-        Returns True if all the nodes w/references to this one have the same type or if all the
+        Returns `True` if all the nodes w/references to this one have the same type or if all the
         reference_keys that point to this node are the same.
         """
         unique_refferer_labels = self.node.unique_labels_of_referring_nodes()
@@ -125,6 +125,6 @@ class IndeterminateNode:
 def find_node_with_lowest_id(list_of_nodes: List[PdfTreeNode]) -> PdfTreeNode:
-    """Find node in list_of_nodes_with_lowest ID."""
+    """Return node in `list_of_nodes` with lowest ID."""
     lowest_idnum = min([n.idnum for n in list_of_nodes])
     return next(n for n in list_of_nodes if n.idnum == lowest_idnum)

pdfalyzer-1.16.14/pdfalyzer/decorators/pdf_file.py ADDED Viewed

@@ -0,0 +1,50 @@
+from os import path
+from pathlib import Path
+from typing import List, Optional, Union
+class PdfFile:
+    """
+    Wrapper for a PDF file path that provides useful methods and properties.
+    """
+    def __init__(self, file_path: Union[str, Path]) -> None:
+        self.file_path: Path = Path(file_path)
+        if not self.file_path.exists():
+            raise FileNotFoundError(f"File '{file_path}' does not exist.")
+        self.dirname = self.file_path.parent
+        self.basename: str = path.basename(file_path)
+        self.basename_without_ext: str = str(Path(self.basename).with_suffix(''))
+        self.extname: str = self.file_path.suffix
+        self.text_extraction_attempted: bool = False
+    def extract_page_range(
+            self,
+            page_range: PageRange,
+            destination_dir: Optional[Path] = None,
+            extra_file_suffix: Optional[str] = None
+        ) -> Path:
+        """Extract a range of pages to a new PDF file (or 1 page if last_page_number not provided.)"""
+        destination_dir = destination_dir or DEFAULT_PDF_ERRORS_DIR
+        create_dir_if_it_does_not_exist(destination_dir)
+        if extra_file_suffix is None:
+            file_suffix = page_range.file_suffix()
+        else:
+            file_suffix = f"{page_range.file_suffix()}__{extra_file_suffix}"
+        extracted_pages_pdf_basename = insert_suffix_before_extension(self.file_path, file_suffix).name
+        extracted_pages_pdf_path = destination_dir.joinpath(extracted_pages_pdf_basename)
+        stderr_console.print(f"Extracting {page_range.file_suffix()} from '{self.file_path}' to '{extracted_pages_pdf_path}'...")
+        pdf_writer = PdfWriter()
+        with open(self.file_path, 'rb') as source_pdf:
+            pdf_writer.append(fileobj=source_pdf, pages=page_range.to_tuple())
+        if SortableFile.confirm_file_overwrite(extracted_pages_pdf_path):
+            with open(extracted_pages_pdf_path, 'wb') as extracted_pages_pdf:
+                pdf_writer.write(extracted_pages_pdf)
+        stderr_console.print(f"Wrote new PDF '{extracted_pages_pdf_path}'.")
+        return extracted_pages_pdf_path

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/decorators/pdf_object_properties.py RENAMED Viewed

@@ -15,7 +15,7 @@ from pdfalyzer.util.adobe_strings import *
 class PdfObjectProperties:
-    """Simple class to extract critical features of a PdfObject."""
+    """Simple class to extract critical features of a `PdfObject`."""
     def __init__(
         self,
@@ -86,7 +86,7 @@ class PdfObjectProperties:
         obj: PdfObject,
         is_single_row_table: bool = False
     ) -> List[Union[Text, str]]:
-        """PDF object property at reference_key becomes a formatted 3-tuple for use in Rich tables."""
+        """PDF object property at `reference_key` becomes a formatted 3-tuple for use in Rich tables."""
         with_resolved_refs = cls.resolve_references(reference_key, obj)
         return [
@@ -101,7 +101,7 @@ class PdfObjectProperties:
     # TODO: this doesn't recurse...
     @classmethod
     def _obj_to_rich_text(cls, obj: Any) -> Text:
-        """Recurse through obj and build a Text object."""
+        """Recurse through `obj` and build a `Text` object."""
         if isinstance(obj, dict):
             key_value_pairs = [Text(f"{k}: ").append_text(cls._obj_to_rich_text(v)) for k, v in obj.items()]
             return Text('{').append_text(comma_join_txt(key_value_pairs)).append('}')

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/decorators/pdf_tree_node.py RENAMED Viewed

@@ -1,10 +1,5 @@
 """
-PDF node decorator - wraps actual PDF objects to make them anytree nodes.
-Also adds decorators/generators for Rich text representation.
-Child/parent relationships should be set using the add_child() and set_parent()
-methods and not set directly. (TODO: this could be done better with anytree
-hooks)
+`PdfTreeNode` decorates a `PdfObject` with tree structure information.
 """
 from typing import Callable, List, Optional
@@ -27,11 +22,22 @@ DECODE_FAILURE_LEN = -1
 class PdfTreeNode(NodeMixin, PdfObjectProperties):
+    """
+    PDF node decorator - wraps actual PDF objects to make them `anytree` nodes.
+    Also adds decorators/generators for Rich text representation.
+    Child/parent relationships should be set using the `add_child()` and `set_parent()`
+    methods and not set directly.
+    TODO: this could be done better with anytree hooks.
+    """
     def __init__(self, obj: PdfObject, address: str, idnum: int):
         """
-        obj:     The underlying PDF object
-        address: The first address that points from some node to this one
-        idnum:   ID used in the reference
+        Args:
+            obj (PdfObject): The underlying PDF object
+            address (str): The first address that points from some node to this one
+            idnum (int): ID used in the reference
         """
         PdfObjectProperties.__init__(self, obj, address, idnum)
         self.non_tree_relationships: List[PdfObjectRelationship] = []
@@ -54,7 +60,7 @@ class PdfTreeNode(NodeMixin, PdfObjectProperties):
     @classmethod
     def from_reference(cls, ref: IndirectObject, address: str) -> 'PdfTreeNode':
-        """Builds a PdfTreeDecorator from an IndirectObject."""
+        """Alternate constructor to Build a `PdfTreeNode` from an `IndirectObject`."""
         try:
             return cls(ref.get_object(), address, ref.idnum)
         except PdfReadError as e:
@@ -82,7 +88,7 @@ class PdfTreeNode(NodeMixin, PdfObjectProperties):
             child.set_parent(self)
     def add_non_tree_relationship(self, relationship: PdfObjectRelationship) -> None:
-        """Add a relationship that points at this node's PDF object. TODO: doesn't include parent/child"""
+        """Add a relationship that points at this node's PDF object. TODO: doesn't include parent/child."""
         if relationship in self.non_tree_relationships:
             return

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/decorators/pdf_tree_verifier.py RENAMED Viewed

@@ -11,6 +11,8 @@ from pdfalyzer.util.adobe_strings import *
 class PdfTreeVerifier:
+    """Class to verify that the PDF tree is complete/contains all the nodes in the PDF file."""
     def __init__(self, pdfalyzer: 'Pdfalyzer') -> None:
         self.pdfalyzer = pdfalyzer

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/detection/yaralyzer_helper.py RENAMED Viewed

@@ -1,14 +1,17 @@
 """
-Class to help with the pre-configured YARA rules in the /yara directory.
+Functions to help with the pre-configured YARA rules in the /yara directory.
 """
 from importlib.resources import as_file, files
 from sys import exit
 from typing import Optional, Union
 from yaralyzer.config import YaralyzerConfig
+from yaralyzer.output.rich_console import print_fatal_error_and_exit
 from yaralyzer.yaralyzer import Yaralyzer
-YARA_RULES_DIR = files('pdfalyzer').joinpath('yara_rules')
+from pdfalyzer.config import PDFALYZER
+YARA_RULES_DIR = files(PDFALYZER).joinpath('yara_rules')
 YARA_RULES_FILES = [
     'didier_stevens.yara',
@@ -20,11 +23,12 @@ YARA_RULES_FILES = [
 def get_file_yaralyzer(file_path_to_scan: str) -> Yaralyzer:
-    """Get a yaralyzer for a file path"""
+    """Get a yaralyzer for a file path."""
     return _build_yaralyzer(file_path_to_scan)
 def get_bytes_yaralyzer(scannable: bytes, label: str) -> Yaralyzer:
+    """Get a yaralyzer for a `scannable` bytes."""
     return _build_yaralyzer(scannable, label)
@@ -44,10 +48,5 @@ def _build_yaralyzer(scannable: Union[bytes, str], label: Optional[str] = None)
                         try:
                             return Yaralyzer.for_rules_files(rules_paths, scannable, label)
-                        except ValueError as e:
-                            # TODO: use YARA_FILE_DOES_NOT_EXIST_ERROR_MSG variable
-                            if "it doesn't exist" in str(e):
-                                print(str(e))
-                                exit(1)
-                            else:
-                                raise e
+                        except FileNotFoundError as e:
+                            print_fatal_error_and_exit(str(e))

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/helpers/rich_text_helper.py RENAMED Viewed

@@ -55,5 +55,6 @@ def number_and_pct(_number: int, total: int, digits: int = 1) -> Text:
 def pct_txt(_number: int, total: int, digits: int = 1) -> Text:
+    """Return nicely formatted percentage, e.g. '(80%)'."""
     pct = (100 * float(_number) / float(total)).__round__(digits)
     return Text(f"({pct}%)", style='blue')

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/output/character_mapping.py RENAMED Viewed

@@ -19,7 +19,7 @@ CHARMAP_PADDING = (0, 2, 0, 10)
 def print_character_mapping(font: 'FontInfo') -> None:  # noqa: F821
-    """Prints the character mapping extracted by PyPDF._charmap in tidy columns"""
+    """Prints the character mapping extracted by PyPDF._charmap in tidy columns."""
     if font.character_mapping is None or len(font.character_mapping) == 0:
         log.info(f"No character map found in {font}")
         return

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/output/layout.py RENAMED Viewed

@@ -1,6 +1,8 @@
 """
 Methods to help with the formatting of the output tables, headers, panels, etc.
 """
+from typing import List
 from rich import box
 from rich.padding import Padding
 from rich.panel import Panel
@@ -11,7 +13,7 @@ DEFAULT_SUBTABLE_COL_STYLES = ['white', 'bright_white']
 HEADER_PADDING = (1, 1)
-def generate_subtable(cols, header_style='subtable') -> Table:
+def generate_subtable(cols: List[str], header_style: str = 'subtable') -> Table:
     """Suited for placement in larger tables."""
     table = Table(
         box=box.SIMPLE,
@@ -33,10 +35,12 @@ def generate_subtable(cols, header_style='subtable') -> Table:
 def subheading_width() -> int:
+    """Return 75% of the console width."""
     return int(console_width() * 0.75)
 def half_width() -> int:
+    """Return 50% of the console width."""
     return int(console_width() * 0.5)
@@ -46,28 +50,34 @@ def pad_header(header: str) -> Padding:
 def print_section_header(headline: str, style: str = '') -> None:
+    """Prints a full-width section header with padding above and below."""
     console.line(2)
     _print_header_panel(headline, f"{style} reverse", True, console_width(), HEADER_PADDING)
     console.line()
 def print_section_subheader(headline: str, style: str = '') -> None:
+    """Prints a half-width section subheader with padding above."""
     console.line()
     _print_header_panel(headline, style, True, subheading_width(), HEADER_PADDING)
 def print_section_sub_subheader(headline: str, style: str = ''):
+    """Prints a half-width section sub-subheader with no padding above."""
     console.line()
     _print_header_panel(headline, style, True, half_width())
-def print_headline_panel(headline, style: str = ''):
+def print_headline_panel(headline: str, style: str = ''):
+    """Prints a full-width headline panel with no padding above or below."""
     _print_header_panel(headline, style, False, console_width())
-def print_fatal_error_panel(headline):
+def print_fatal_error_panel(headline: str):
+    """Prints a full-width red blinking panel for fatal errors."""
     print_headline_panel(headline, style='red blink')
 def _print_header_panel(headline: str, style: str, expand: bool, width: int, padding: tuple = (0,)) -> None:
+    """Helper to print a rich `Panel` with the given style, width, and padding."""
     console.print(Panel(headline, style=style, expand=expand, width=width or subheading_width(), padding=padding))

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/output/tables/decoding_stats_table.py RENAMED Viewed

@@ -19,7 +19,7 @@ DECODES_SUBTABLE_COLS = ['Encoding', '#', 'Decoded', '#', 'Forced', '#', 'Failed
 def build_decoding_stats_table(scanner: BinaryScanner) -> Table:
-    """Diplay aggregate results on the decoding attempts we made on subsets of scanner.bytes"""
+    """Diplay aggregate results on the decoding attempts we made on subsets of `scanner.bytes`."""
     stats_table = _new_decoding_stats_table(scanner.label.plain if scanner.label else '')
     regexes_not_found_in_stream = []
@@ -58,9 +58,9 @@ def build_decoding_stats_table(scanner: BinaryScanner) -> Table:
     return stats_table
-def _new_decoding_stats_table(title) -> Table:
-    """Build an empty table for displaying decoding stats"""
-    title = prefix_with_style(title, style='blue underline')
+def _new_decoding_stats_table(title_str: str) -> Table:
+    """Build an empty table for displaying decoding stats."""
+    title = prefix_with_style(title_str, style='blue underline')
     title.append(": Decoding Attempts Summary Statistics", style='bright_white bold')
     table = Table(

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/output/tables/font_summary_table.py RENAMED Viewed

@@ -15,8 +15,8 @@ ATTRIBUTES_TO_SHOW_IN_SUMMARY_TABLE = [
 ]
-def font_summary_table(font):
-    """Build a Rich Table with important info about the font"""
+def font_summary_table(font: 'FontInfo') -> Table:  # noqa: F821
+    """Build a Rich `Table` with important info about the font"""
     table = Table('', '', show_header=False)
     table.columns[0].style = 'font.property'
     table.columns[0].justify = 'right'

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/pdfalyzer.py RENAMED Viewed

@@ -1,10 +1,5 @@
 """
-Walks the PDF objects and builds the PDF logical structure tree by
-wrapping each internal PDF object in a PdfTreeNode. Tree is managed by
-the anytree library. Information about the tree as a whole is stored
-in this class.
-Once the PDF is parsed this class manages access to
-information about or from the underlying PDF tree.
+PDFalyzer: Analyze and explore the structure of PDF files.
 """
 from os.path import basename
 from typing import Dict, Iterator, List, Optional
@@ -31,7 +26,19 @@ TRAILER_FALLBACK_ID = 10000000
 class Pdfalyzer:
+    """
+    Walks a PDF's internals and builds the PDF logical structure tree.
+    Each of the PDF's internal objects isw rapped in a `PdfTreeNode` object. The tree is managed
+    by the `anytree` library. Information about the tree as a whole is stored in this class.
+    Once the PDF is parsed this class provides access to info about or from the underlying PDF tree.
+    """
     def __init__(self, pdf_path: str):
+        """
+        Args:
+            pdf_path: Path to the PDF file to analyze
+        """
         self.pdf_path = pdf_path
         self.pdf_basename = basename(pdf_path)
         self.pdf_bytes = load_binary_data(pdf_path)
@@ -72,7 +79,7 @@ class Pdfalyzer:
         log.info(f"Walk complete.")
     def walk_node(self, node: PdfTreeNode) -> None:
-        """Recursively walk the PDF's tree structure starting at a given node"""
+        """Recursively walk the PDF's tree structure starting at a given node."""
         log.info(f'walk_node() called with {node}. Object dump:\n{print_with_header(node.obj, node.label)}')
         nodes_to_walk_next = [self._add_relationship_to_pdf_tree(r) for r in node.references_to_other_nodes()]
         node.all_references_processed = True
@@ -82,7 +89,7 @@ class Pdfalyzer:
                 self.walk_node(next_node)
     def find_node_by_idnum(self, idnum) -> Optional[PdfTreeNode]:
-        """Find node with idnum in the tree. Return None if that node is not reachable from the root."""
+        """Find node with `idnum` in the tree. Return `None` if that node is not reachable from the root."""
         nodes = [
             node for node in findall_by_attr(self.pdf_tree, name='idnum', value=idnum)
             if not isinstance(node, SymlinkNode)
@@ -96,7 +103,7 @@ class Pdfalyzer:
             raise PdfWalkError(f"Too many nodes had id {idnum}: {nodes}")
     def is_in_tree(self, search_for_node: PdfTreeNode) -> bool:
-        """Returns true if search_for_node is in the tree already."""
+        """Returns true if `search_for_node` is in the tree already."""
         return any([node == search_for_node for node in self.node_iterator()])
     def node_iterator(self) -> Iterator[PdfTreeNode]:
@@ -110,7 +117,7 @@ class Pdfalyzer:
     def _add_relationship_to_pdf_tree(self, relationship: PdfObjectRelationship) -> Optional[PdfTreeNode]:
         """
-        Place the relationship 'node' in the tree. Returns an optional node that should be
+        Place the `relationship` node in the tree. Returns an optional node that should be
         placed in the PDF node processing queue.
         """
         log.info(f'Assessing relationship {relationship}...')
@@ -172,7 +179,7 @@ class Pdfalyzer:
         return to_node
     def _resolve_indeterminate_nodes(self) -> None:
-        """Place all indeterminate nodes in the tree."""
+        """Place all indeterminate nodes in the tree. Called after all nodes have been walked."""
         indeterminate_nodes = [self.nodes_encountered[idnum] for idnum in self.indeterminate_ids]
         indeterminate_nodes_string = "\n   ".join([f"{node}" for node in indeterminate_nodes])
         log.info(f"Resolving {len(indeterminate_nodes)} indeterminate nodes: {indeterminate_nodes_string}")
@@ -185,7 +192,7 @@ class Pdfalyzer:
             IndeterminateNode(node).place_node()
     def _extract_font_infos(self) -> None:
-        """Extract information about fonts in the tree and place it in self.font_infos"""
+        """Extract information about fonts in the tree and place it in `self.font_infos`."""
         for node in self.node_iterator():
             if isinstance(node.obj, dict) and RESOURCES in node.obj:
                 log.debug(f"Extracting fonts from node with '{RESOURCES}' key: {node}...")
@@ -207,6 +214,6 @@ class Pdfalyzer:
         return new_node
     def _print_nodes_encountered(self) -> None:
-        """Debug method that displays which nodes have already been walked"""
+        """Debug method that displays which nodes have already been walked."""
         for i in sorted(self.nodes_encountered.keys()):
             console.print(f'{i}: {self.nodes_encountered[i]}')

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pdfalyzer/util/argument_parser.py RENAMED Viewed

@@ -15,7 +15,7 @@ from rich.text import Text
 from yaralyzer.util.argument_parser import export, parser, parse_arguments as parse_yaralyzer_args, source
 from yaralyzer.util.logging import log, log_and_print, log_argparse_result, log_current_config, log_invocation
-from pdfalyzer.config import ALL_STREAMS, PdfalyzerConfig
+from pdfalyzer.config import ALL_STREAMS, PDFALYZER, PdfalyzerConfig
 from pdfalyzer.detection.constants.binary_regexes import QUOTE_PATTERNS
 from pdfalyzer.helpers.filesystem_helper import (do_all_files_exist, extract_page_number, file_exists, is_pdf,
      with_pdf_extension)
@@ -124,9 +124,9 @@ parser._action_groups = parser._action_groups[:2] + [parser._action_groups[-1]]
 # Main argument parsing begins #
 ################################
 def parse_arguments():
-    """Parse command line args. Most settings are communicated to the app by setting env vars"""
+    """Parse command line args. Most args can also be communicated to the app by setting env vars."""
     if '--version' in sys.argv:
-        print(f"pdfalyzer {version('pdfalyzer')}")
+        print(f"pdfalyzer {version(PDFALYZER)}")
         sys.exit()
     args = parser.parse_args()
@@ -158,10 +158,16 @@ def parse_arguments():
     return args
-def output_sections(args, pdfalyzer) -> List[OutputSection]:
+def output_sections(args: Namespace, pdfalyzer: 'Pdfalyzer') -> List[OutputSection]:  # noqa: F821
     """
     Determine which of the tree visualizations, font scans, etc should be run.
     If nothing is specified output ALL sections other than --streams which is v. slow/verbose.
+    Args:
+        args: parsed command line arguments
+        pdfalyzer: the `pdfalyzer` instance whose methods will be called to produce output
+    Returns:
+        List[OutputSection]: List of `OutputSection` namedtuples with 'argument' and 'method' fields
     """
     # Create a partial for print_font_info() because it's the only one that can take an argument
     # partials have no __name__ so update_wrapper() propagates the 'print_font_info' as this partial's name
@@ -196,9 +202,10 @@ def all_sections_chosen(args):
     return len([s for s in ALL_SECTIONS if vars(args)[s]]) == len(ALL_SECTIONS)
-###############################################
-# Separate arg parser for combine_pdfs script #
-###############################################
+#############################################################
+#  Separate arg parsers for combine_pdfs and other scripts  #
+#############################################################
 MAX_QUALITY = 10
 combine_pdfs_parser = ArgumentParser(

{pdfalyzer-1.16.13 → pdfalyzer-1.16.14}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "pdfalyzer"
-version = "1.16.13"
+version = "1.16.14"
 description = "PDF analysis tool. Scan a PDF with YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more."
 authors = ["Michel de Cryptadamus <michel@cryptadamus.com>"]
 license = "GPL-3.0-or-later"
@@ -15,7 +15,6 @@ classifiers = [
     "Intended Audience :: Information Technology",
     "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)",
     "Programming Language :: Python",
-    "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
     "Programming Language :: Python :: 3.11",
     "Programming Language :: Python :: 3.12",
@@ -66,10 +65,10 @@ packages = [
 #   Dependencies    #
 #####################
 [tool.poetry.dependencies]
-python = "^3.9,>=3.9.2"
+python = "^3.10"
 anytree = "~=2.13"
 pypdf = "^6.0.0"
-yaralyzer = "^1.0.7"
+yaralyzer = "^1.0.9"
 [tool.poetry.group.dev.dependencies]
 flake8 = "^7.3.0"