PyPI - pdfalyzer - Versions diffs - 1.16.10__tar.gz → 1.16.12__tar.gz - Mend

pdfalyzer 1.16.10tar.gz → 1.16.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of pdfalyzer might be problematic. Click here for more details.

Files changed (48) hide show

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,13 @@
 # NEXT RELEASE
+### 1.16.12
+* Bump `PyPDF` to v6.0.0
+### 1.16.11
+* Fix typo in `combine_pdfs` help
+* Add some more PyPi classifiers
+* Add a `.flake8` config and fix a bunch of style issues
 ### 1.16.10
 * Add `Environment :: Console` and `Programming Language :: Python` to pypi classifiers
 * Add `.pdfalyzer.example` to PyPi package

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/PKG-INFO RENAMED Viewed

@@ -1,13 +1,13 @@
 Metadata-Version: 2.1
 Name: pdfalyzer
-Version: 1.16.10
-Summary: A PDF analysis toolkit. Scan a PDF with relevant YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more.
+Version: 1.16.12
+Summary: PDF analysis tool. Scan a PDF with YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more.
 Home-page: https://github.com/michelcrypt4d4mus/pdfalyzer
 License: GPL-3.0-or-later
-Keywords: ascii art,binary,color,cybersecurity,DFIR,encoding,font,infosec,maldoc,malicious pdf,malware,malware analysis,pdf,pdfs,pdf analysis,threat assessment,visualization,yara
+Keywords: ascii art,binary,color,cybersecurity,DFIR,encoding,font,infosec,maldoc,malicious pdf,malware,malware analysis,pdf,pdfs,pdf analysis,pypdf,threat assessment,threat hunting,threat intelligence,threat research,threatintel,visualization,yara
 Author: Michel de Cryptadamus
 Author-email: michel@cryptadamus.com
-Requires-Python: >=3.9.2,<4.0.0
+Requires-Python: >=3.9.2,<4.0
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Information Technology
@@ -16,11 +16,14 @@ Classifier: Programming Language :: Python
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.9
 Classifier: Topic :: Artistic Software
 Classifier: Topic :: Scientific/Engineering :: Visualization
 Classifier: Topic :: Security
 Requires-Dist: anytree (>=2.13,<3.0)
-Requires-Dist: pypdf (>=5.9.0,<6.0.0)
+Requires-Dist: pypdf (>=6.0.0,<7.0.0)
 Requires-Dist: yaralyzer (>=1.0.4,<2.0.0)
 Project-URL: Changelog, https://github.com/michelcrypt4d4mus/pdfalyzer/blob/master/CHANGELOG.md
 Project-URL: Documentation, https://github.com/michelcrypt4d4mus/pdfalyzer
@@ -62,10 +65,12 @@ If you're looking for one of these things this may be the tool for you.
 ### What It Don't Do
 This tool is mostly for examining/working with a PDF's data and logical structure. As such it doesn't have much to offer as far as extracting text, rendering[^3], writing, etc. etc.
+If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it; embedded javascript etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
 -------------
 # Installation
+#### All Platforms
 Installation with [pipx](https://pypa.github.io/pipx/)[^4] is preferred though `pip3` / `pip` should also work.
 ```sh
 pipx install pdfalyzer
@@ -73,7 +78,12 @@ pipx install pdfalyzer
 See [PyPDF installation notes](https://github.com/py-pdf/pypdf#installation) about `PyCryptodome` if you plan to `pdfalyze` any files that use AES encryption.
-If you are on macOS someone out there was kind enough to make [The Pdfalyzer available via homebrew](https://formulae.brew.sh/formula/pdfalyzer) so `brew install pdfalyzer` should work.
+#### macOS Homebrew
+If you are on macOS and use `homebrew` someone out there was kind enough to make [The Pdfalyzer available via homebrew](https://formulae.brew.sh/formula/pdfalyzer) so this should work:
+```sh
+brew install pdfalyzer
+```
 ### Troubleshooting
 1. If you used `pip3` instead of `pipx` and have an issue you should try to install with `pipx`.

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/README.md RENAMED Viewed

@@ -33,10 +33,12 @@ If you're looking for one of these things this may be the tool for you.
 ### What It Don't Do
 This tool is mostly for examining/working with a PDF's data and logical structure. As such it doesn't have much to offer as far as extracting text, rendering[^3], writing, etc. etc.
+If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it; embedded javascript etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
 -------------
 # Installation
+#### All Platforms
 Installation with [pipx](https://pypa.github.io/pipx/)[^4] is preferred though `pip3` / `pip` should also work.
 ```sh
 pipx install pdfalyzer
@@ -44,7 +46,12 @@ pipx install pdfalyzer
 See [PyPDF installation notes](https://github.com/py-pdf/pypdf#installation) about `PyCryptodome` if you plan to `pdfalyze` any files that use AES encryption.
-If you are on macOS someone out there was kind enough to make [The Pdfalyzer available via homebrew](https://formulae.brew.sh/formula/pdfalyzer) so `brew install pdfalyzer` should work.
+#### macOS Homebrew
+If you are on macOS and use `homebrew` someone out there was kind enough to make [The Pdfalyzer available via homebrew](https://formulae.brew.sh/formula/pdfalyzer) so this should work:
+```sh
+brew install pdfalyzer
+```
 ### Troubleshooting
 1. If you used `pip3` instead of `pipx` and have an issue you should try to install with `pipx`.

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/__init__.py RENAMED Viewed

@@ -1,7 +1,6 @@
 import code
 import sys
 from os import environ, getcwd, path
-from pathlib import Path
 from dotenv import load_dotenv
 from pypdf import PdfWriter
@@ -23,7 +22,7 @@ from rich.text import Text
 from yaralyzer.helpers.rich_text_helper import prefix_with_plain_text_obj
 from yaralyzer.output.file_export import invoke_rich_export
 from yaralyzer.output.rich_console import console
-from yaralyzer.util.logging import log, log_and_print
+from yaralyzer.util.logging import log_and_print
 from pdfalyzer.helpers.filesystem_helper import file_size_in_mb, set_max_open_files
 from pdfalyzer.helpers.rich_text_helper import print_highlighted
@@ -51,8 +50,8 @@ def pdfalyze():
         log_and_print(f"Binary stream extraction complete, files written to '{args.output_dir}'.\nExiting.\n")
         sys.exit()
-    # The method that gets called is related to the argument name. See 'possible_output_sections' list in argument_parser.py
-    # Analysis exports wrap themselves around the methods that actually generate the analyses
+    # The method that gets called is related to the argument name. See 'possible_output_sections' list in
+    # argument_parser.py. Analysis exports wrap themselves around the methods that actually generate the analyses.
     for (arg, method) in output_sections(args, pdfalyzer):
         if args.output_dir:
             output_basepath = PdfalyzerConfig.get_output_basepath(method)
@@ -89,7 +88,7 @@ def pdfalyzer_show_color_theme() -> None:
         if name not in ['reset', 'repr_url']
     ]
-    console.print(Columns(colors, column_first=True, padding=(0,3)))
+    console.print(Columns(colors, column_first=True, padding=(0, 3)))
 def combine_pdfs():
@@ -114,7 +113,11 @@ def combine_pdfs():
     for i, page in enumerate(merger.pages):
         if args.image_quality < MAX_QUALITY:
             for j, img in enumerate(page.images):
-                print_highlighted(f"  -> Reducing image #{j + 1} quality on page {i + 1} to {args.image_quality}...", style='dim')
+                print_highlighted(
+                    f"  -> Reducing image #{j + 1} quality on page {i + 1} to {args.image_quality}...",
+                    style='dim'
+                )
                 img.replace(img.image, quality=args.image_quality)
         print_highlighted(f"  -> Compressing page {i + 1}...", style='dim')

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/binary/binary_scanner.py RENAMED Viewed

@@ -21,7 +21,7 @@ from yaralyzer.util.logging import log
 from pdfalyzer.config import PdfalyzerConfig
 from pdfalyzer.decorators.pdf_tree_node import PdfTreeNode
 from pdfalyzer.detection.constants.binary_regexes import (BACKTICK, DANGEROUS_PDF_KEYS_TO_HUNT_ONLY_IN_FONTS,
-     DANGEROUS_PDF_KEYS_TO_HUNT_ONLY_IN_FONTS, DANGEROUS_STRINGS, FRONTSLASH, GUILLEMET, QUOTE_PATTERNS)
+     DANGEROUS_STRINGS, FRONTSLASH, GUILLEMET, QUOTE_PATTERNS)
 from pdfalyzer.helpers.string_helper import generate_hyphen_line
 from pdfalyzer.output.layout import print_headline_panel, print_section_sub_subheader
 from pdfalyzer.util.adobe_strings import CONTENTS, CURRENTFILE_EEXEC, FONT_FILE_KEYS
@@ -36,7 +36,7 @@ class BinaryScanner:
         self.stream_length = len(_bytes)
         if label is None and isinstance(owner, PdfTreeNode):
-             self.label = owner.__rich__()
+            self.label = owner.__rich__()
         self.suppression_notice_queue = []
         self.regex_extraction_stats = defaultdict(lambda: RegexMatchMetrics())
@@ -86,8 +86,12 @@ class BinaryScanner:
                 print_headline_panel(msg, style='dim')
                 continue
+            print_section_sub_subheader(
+                f"Forcing Decode of {quote_type.capitalize()} Quoted Strings",
+                style=BYTES_NO_DIM
+            )
             quote_pattern = QUOTE_PATTERNS[quote_type]
-            print_section_sub_subheader(f"Forcing Decode of {quote_type.capitalize()} Quoted Strings", style=BYTES_NO_DIM)
             yaralyzer = self._quote_yaralyzer(quote_pattern, quote_type)
             self.process_yara_matches(yaralyzer, f"{quote_type}_quoted")
@@ -135,7 +139,7 @@ class BinaryScanner:
     def process_yara_matches(self, yaralyzer: Yaralyzer, pattern: str, force: bool = False) -> None:
         """Decide whether to attempt to decode the matched bytes, track stats. force param ignores min/max length."""
         for bytes_match, decoder in yaralyzer.match_iterator():
-            log.debug(f"Trackings stats for match: {pattern}, bytes_match: {bytes_match}, is_decodable: {bytes_match.is_decodable()}")
+            log.debug(f"Trackings match stats for {pattern}, bytes_match: {bytes_match}, is_decodable: {bytes_match.is_decodable()}")  # noqa: E501
             # Send suppressed decodes to a queue and track the reason for the suppression in the stats
             if not (bytes_match.is_decodable() or force):
@@ -145,7 +149,7 @@ class BinaryScanner:
             # Print out any queued suppressed notices before printing non suppressed matches
             self._print_suppression_notices()
             console.print(decoder)
-            self.regex_extraction_stats[pattern].tally_match(decoder) # TODO: This call must come after print(decoder)
+            self.regex_extraction_stats[pattern].tally_match(decoder)  # TODO: This call must come after print(decoder)
         self._print_suppression_notices()
@@ -167,12 +171,12 @@ class BinaryScanner:
             return self._pattern_yaralyzer(quote_pattern, REGEX, label, label)
     def _pattern_yaralyzer(
-            self,
-            pattern: str,
-            pattern_type: str,
-            rules_label: Optional[str] = None,
-            pattern_label: Optional[str] = None
-        ) -> Yaralyzer:
+        self,
+        pattern: str,
+        pattern_type: str,
+        rules_label: Optional[str] = None,
+        pattern_label: Optional[str] = None
+    ) -> Yaralyzer:
         """Build a yaralyzer to scan self.bytes"""
         return Yaralyzer.for_patterns(
             patterns=[escape_yara_pattern(pattern)],

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/decorators/pdf_object_properties.py RENAMED Viewed

@@ -16,13 +16,14 @@ from pdfalyzer.util.adobe_strings import *
 class PdfObjectProperties:
     """Simple class to extract critical features of a PdfObject."""
     def __init__(
-            self,
-            pdf_object: PdfObject,
-            address: str,
-            idnum: int,
-            indirect_object: Optional[IndirectObject] = None
-        ):
+        self,
+        pdf_object: PdfObject,
+        address: str,
+        idnum: int,
+        indirect_object: Optional[IndirectObject] = None
+    ):
         self.idnum = idnum
         self.obj = pdf_object
         self.indirect_object = indirect_object
@@ -57,7 +58,7 @@ class PdfObjectProperties:
         else:
             self.first_address = address
-        log.debug(f"Node ID: {self.idnum}, type: {self.type}, subtype: {self.sub_type}, " + \
+        log.debug(f"Node ID: {self.idnum}, type: {self.type}, subtype: {self.sub_type}, " +
                   f"label: {self.label}, first_address: {self.first_address}")
     @classmethod
@@ -80,11 +81,11 @@ class PdfObjectProperties:
     @classmethod
     def to_table_row(
-            cls,
-            reference_key: str,
-            obj: PdfObject,
-            is_single_row_table: bool = False
-        ) -> List[Union[Text, str]]:
+        cls,
+        reference_key: str,
+        obj: PdfObject,
+        is_single_row_table: bool = False
+    ) -> List[Union[Text, str]]:
         """PDF object property at reference_key becomes a formatted 3-tuple for use in Rich tables."""
         with_resolved_refs = cls.resolve_references(reference_key, obj)

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/decorators/pdf_tree_node.py RENAMED Viewed

@@ -6,7 +6,7 @@ Child/parent relationships should be set using the add_child() and set_parent()
 methods and not set directly. (TODO: this could be done better with anytree
 hooks)
 """
-from typing import Callable, List, Optional, Set
+from typing import Callable, List, Optional
 from anytree import NodeMixin, SymlinkNode
 from pypdf.errors import PdfReadError
@@ -163,11 +163,14 @@ class PdfTreeNode(NodeMixin, PdfObjectProperties):
                 return None
         else:
             address = refs_to_this_node[0].address
             # If other node's label doesn't start with a NON_STANDARD_ADDRESS string
-            #   and any of the relationships pointing at this node use something other than a
-            #       NON_STANDARD_ADDRESS_NODES string to refer here, print a warning about multiple refs.
-            if not (is_prefixed_by_any(from_node.label, NON_STANDARD_ADDRESS_NODES) or \
-                        all(ref.address in NON_STANDARD_ADDRESS_NODES for ref in refs_to_this_node)):
+            # AND any of the relationships pointing at this node use something other than a
+            #     NON_STANDARD_ADDRESS_NODES string to refer here,
+            # then print a warning about multiple refs.
+            if not (is_prefixed_by_any(from_node.label, NON_STANDARD_ADDRESS_NODES)
+                    or
+                    all(ref.address in NON_STANDARD_ADDRESS_NODES for ref in refs_to_this_node)):
                 refs_to_this_node_str = "\n   ".join([f"{i + 1}. {r}" for i, r in enumerate(refs_to_this_node)])
                 msg = f"Multiple refs from {from_node} to {self}:\n   {refs_to_this_node_str}"
                 log.warning(msg + f"\nCommon address of refs: {address}")

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/decorators/pdf_tree_verifier.py RENAMED Viewed

@@ -37,7 +37,10 @@ class PdfTreeVerifier:
             log.warning(f"Methodd doesn't check revisions but this doc is generation {self.pdfalyzer.max_generation}")
         # We expect to see all ordinals up to the number of nodes /Trailer claims exist as obj. IDs.
-        missing_node_ids = [i for i in range(1, self.pdfalyzer.pdf_size) if self.pdfalyzer.find_node_by_idnum(i) is None]
+        missing_node_ids = [
+            i for i in range(1, self.pdfalyzer.pdf_size)
+            if self.pdfalyzer.find_node_by_idnum(i) is None
+        ]
         for idnum in missing_node_ids:
             ref = IndirectObject(idnum, self.pdfalyzer.max_generation, self.pdfalyzer.pdf_reader)
@@ -57,13 +60,13 @@ class PdfTreeVerifier:
                 log.error(f"Cannot find ref {ref} in PDF!")
                 continue
             elif isinstance(obj, (NumberObject, NameObject)):
-                log.info(f"Obj {idnum} is a {type(obj)} w/value {obj}; if relationshipd by /Length etc. this is a nonissue but maybe worth doublechecking")
+                log.info(f"Obj {idnum} is a {type(obj)} w/value {obj}; if relationshipd by /Length etc. this is a nonissue but maybe worth doublechecking")  # noqa: E501
                 continue
             elif not isinstance(obj, dict):
-                log.error(f"Obj {idnum} ({obj}) of type {type(obj)} isn't dict, cannot determine if it should be in tree")
+                log.error(f"Obj {idnum} ({obj}) of type {type(obj)} isn't dict, cannot determine if it should be in tree")  # noqa: E501
                 continue
             elif TYPE not in obj:
-                msg = f"Obj {idnum} has no {TYPE} and is not in tree. Either a loose node w/no data or an error in pdfalyzer."
+                msg = f"Obj {idnum} has no {TYPE} and is not in tree. Either a loose node w/no data or an error in pdfalyzer."  # noqa: E501
                 msg += f"\nHere's the contents for you to assess:\n{obj}"
                 log.warning(msg)
                 continue

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/detection/constants/binary_regexes.py RENAMED Viewed

@@ -36,13 +36,13 @@ PARENTHESES = 'parentheses'
 QUOTE_PATTERNS = {
     BACKTICK: '`.+`',
-    BRACKET: '\\[.+\\]',  # { 91 [-] 93 }
-    CURLY_BRACKET: '{.+}',  # { 123 [-] 125 }
-    DOUBLE_LESS_THAN: '<<.+>>', # Hex { 60 60 [-] 62 62 }
+    BRACKET: '\\[.+\\]',         # { 91 [-] 93 }
+    CURLY_BRACKET: '{.+}',       # { 123 [-] 125 }
+    DOUBLE_LESS_THAN: '<<.+>>',  # Hex { 60 60 [-] 62 62 }
     ESCAPED_SINGLE: "\\'.+\\'",
     ESCAPED_DOUBLE: '\\".+\\"',
-    FRONTSLASH: '/.+/',  # { 47 [-] 47 }
-    GUILLEMET: 'AB [-] BB',  # Guillemet quotes are not ANSI so require byte pattern
-    LESS_THAN: '<.+>',  # Hex { 60 [-] 62 }
-    PARENTHESES: '\\(.+\\)', # Hex { 28 [-] 29 }
+    FRONTSLASH: '/.+/',          # { 47 [-] 47 }
+    GUILLEMET: 'AB [-] BB',      # Guillemet quotes are not ANSI so require byte pattern
+    LESS_THAN: '<.+>',           # Hex { 60 [-] 62 }
+    PARENTHESES: '\\(.+\\)',     # Hex { 28 [-] 29 }
 }

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/detection/yaralyzer_helper.py RENAMED Viewed

@@ -8,8 +8,6 @@ from typing import Optional, Union
 from yaralyzer.config import YaralyzerConfig
 from yaralyzer.yaralyzer import Yaralyzer
-from pdfalyzer.config import PdfalyzerConfig
 YARA_RULES_DIR = files('pdfalyzer').joinpath('yara_rules')
 YARA_RULES_FILES = [
@@ -38,7 +36,7 @@ def _build_yaralyzer(scannable: Union[bytes, str], label: Optional[str] = None)
             with as_file(YARA_RULES_DIR.joinpath(YARA_RULES_FILES[2])) as yara2:
                 with as_file(YARA_RULES_DIR.joinpath(YARA_RULES_FILES[3])) as yara3:
                     with as_file(YARA_RULES_DIR.joinpath(YARA_RULES_FILES[4])) as yara4:
-                        # If there is a custom yara_rules argument file use that instead of the files in the yara_rules/ dir
+                        # If there is a custom yara_rules arg, use that instead of the files in the yara_rules/ dir
                         rules_paths = YaralyzerConfig.args.yara_rules_files or []
                         if not YaralyzerConfig.args.no_default_yara_rules:

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/font_info.py RENAMED Viewed

@@ -142,19 +142,19 @@ class FontInfo:
         console.line()
     # TODO: currently unused
-    def preview_bytes_at_advertised_lengths(self):
-        """Show the bytes at the boundaries provided by /Length1, /Length2, and /Length3, if they exist"""
-        lengths = self.lengths or []
+    # def preview_bytes_at_advertised_lengths(self):
+    #     """Show the bytes at the boundaries provided by /Length1, /Length2, and /Length3, if they exist"""
+    #     lengths = self.lengths or []
-        if self.lengths is None or len(lengths) <= 1:
-            console.print("No length demarcations to preview.", style='grey.dark')
+    #     if self.lengths is None or len(lengths) <= 1:
+    #         console.print("No length demarcations to preview.", style='grey.dark')
-        for i, demarcation in enumerate(lengths[1:]):
-            console.print(f"{self.font_file} at /Length{i} ({demarcation}):")
-            print(f"\n  Stream before: {self.stream_data[demarcation - FONT_SECTION_PREVIEW_LEN:demarcation + 1]}")
-            print(f"\n  Stream after: {self.stream_data[demarcation:demarcation + FONT_SECTION_PREVIEW_LEN]}")
+    #     for i, demarcation in enumerate(lengths[1:]):
+    #         console.print(f"{self.font_file} at /Length{i} ({demarcation}):")
+    #         print(f"\n  Stream before: {self.stream_data[demarcation - FONT_SECTION_PREVIEW_LEN:demarcation + 1]}")
+    #         print(f"\n  Stream after: {self.stream_data[demarcation:demarcation + FONT_SECTION_PREVIEW_LEN]}")
-        print(f"\nfinal bytes back from {self.stream_data.lengths[2]} + 10: {self.stream_data[-10 - -f.lengths[2]:]}")
+    #     print(f"\nfinal bytes back from {self.stream_data.lengths[2]} + 10: {self.stream_data[-10 - -f.lengths[2]:]}")
     def __str__(self) -> str:
         return self.display_title

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/filesystem_helper.py RENAMED Viewed

@@ -15,7 +15,7 @@ OPEN_FILES_BUFFER = 30        # we might have some files open already so we need
 PDF_EXT = '.pdf'
 # TODO: this kind of type alias is not supported until Python 3.12
-#type StrOrPath = Union[str, Path]
+# type StrOrPath = Union[str, Path]
 def with_pdf_extension(file_path: Union[str, Path]) -> str:
@@ -92,11 +92,11 @@ def set_max_open_files(num_filehandles: int = DEFAULT_MAX_OPEN_FILES) -> tuple[O
             resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
         except (ValueError, resource.error):
             try:
-               hard = soft
-               print_highlighted(f"Retrying setting max open files (soft, hard)=({soft}, {hard})", style='yellow')
-               resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
+                hard = soft
+                print_highlighted(f"Retrying setting max open files (soft, hard)=({soft}, {hard})", style='yellow')
+                resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
             except Exception:
-               print_highlighted('Failed to set max open files / ulimit, giving up!', style='error')
-               soft,hard = resource.getrlimit(resource.RLIMIT_NOFILE)
+                print_highlighted('Failed to set max open files / ulimit, giving up!', style='error')
+                soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
     return (soft, hard)

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/pdf_object_helper.py RENAMED Viewed

@@ -6,7 +6,6 @@ from typing import List, Optional
 from pypdf.generic import IndirectObject, PdfObject
 from pdfalyzer.pdf_object_relationship import PdfObjectRelationship
-from pdfalyzer.util.adobe_strings import *
 def pdf_object_id(pdf_object) -> Optional[int]:

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/rich_text_helper.py RENAMED Viewed

@@ -20,11 +20,11 @@ def print_highlighted(msg: Union[str, Text], **kwargs) -> None:
 def quoted_text(
-        _string: str,
-        style: str = '',
-        quote_char_style: str = 'white',
-        quote_char: str = "'"
-    ) -> Text:
+    _string: str,
+    style: str = '',
+    quote_char_style: str = 'white',
+    quote_char: str = "'"
+) -> Text:
     """Wrap _string in 'quote_char'. Style 'quote_char' with 'quote_char_style'."""
     quote_char_txt = Text(quote_char, style=quote_char_style)
     txt = quote_char_txt + Text(_string, style=style) + quote_char_txt

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/character_mapping.py RENAMED Viewed

@@ -8,6 +8,7 @@ from yaralyzer.helpers.bytes_helper import print_bytes
 from yaralyzer.output.rich_console import console
 from yaralyzer.util.logging import log
+# from pdfalyzer.font_info import FontInfo  # Causes circular import
 from pdfalyzer.helpers.rich_text_helper import quoted_text
 from pdfalyzer.helpers.string_helper import pp
 from pdfalyzer.output.layout import print_headline_panel, subheading_width
@@ -17,7 +18,7 @@ CHARMAP_TITLE_PADDING = (1, 0, 0, 2)
 CHARMAP_PADDING = (0, 2, 0, 10)
-def print_character_mapping(font: 'FontInfo') -> None:
+def print_character_mapping(font: 'FontInfo') -> None:  # noqa: F821
     """Prints the character mapping extracted by PyPDF._charmap in tidy columns"""
     if font.character_mapping is None or len(font.character_mapping) == 0:
         log.info(f"No character map found in {font}")
@@ -37,7 +38,7 @@ def print_character_mapping(font: 'FontInfo') -> None:
     console.line()
-def print_prepared_charmap(font: 'FontInfo'):
+def print_prepared_charmap(font: 'FontInfo'):  # noqa: F821
     """Prints the prepared_charmap returned by PyPDF."""
     if font.prepared_char_map is None:
         log.info(f"No prepared_charmap found in {font}")

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/pdfalyzer_presenter.py RENAMED Viewed

@@ -27,7 +27,9 @@ from pdfalyzer.output.tables.decoding_stats_table import build_decoding_stats_ta
 from pdfalyzer.output.tables.pdf_node_rich_table import generate_rich_tree, get_symlink_representation
 from pdfalyzer.output.tables.stream_objects_table import stream_objects_table
 from pdfalyzer.pdfalyzer import Pdfalyzer
-from pdfalyzer.util.adobe_strings import *
+# from pdfalyzer.util.adobe_strings import *
+INTERNAL_YARA_ERROR_MSG = "Internal YARA error! YARA's error codes can be checked here: https://github.com/VirusTotal/yara/blob/master/libyara/include/yara/error.h"  # noqa: E501
 class PdfalyzerPresenter:
@@ -130,9 +132,9 @@ class PdfalyzerPresenter:
         try:
             self.yaralyzer.yaralyze()
-        except yara.Error as e:
+        except yara.Error:
             console.print_exception()
-            print_fatal_error_panel("Internal YARA error! YARA's error codes can be checked here: https://github.com/VirusTotal/yara/blob/master/libyara/include/yara/error.h")
+            print_fatal_error_panel(INTERNAL_YARA_ERROR_MSG)
             return
         YaralyzerConfig.args.standalone_mode = False

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/tables/decoding_stats_table.py RENAMED Viewed

@@ -7,6 +7,7 @@ from rich.table import Table
 from rich.text import Text
 from yaralyzer.helpers.rich_text_helper import CENTER, na_txt, prefix_with_plain_text_obj
+from pdfalyzer.binary.binary_scanner import BinaryScanner
 from pdfalyzer.helpers.rich_text_helper import pct_txt
 from pdfalyzer.output.layout import generate_subtable, half_width, pad_header
@@ -17,7 +18,7 @@ REGEX_SUBTABLE_COLS = ['Metric', 'Value']
 DECODES_SUBTABLE_COLS = ['Encoding', '#', 'Decoded', '#', 'Forced', '#', 'Failed']
-def build_decoding_stats_table(scanner: 'BinaryScanner') -> Table:
+def build_decoding_stats_table(scanner: BinaryScanner) -> Table:
     """Diplay aggregate results on the decoding attempts we made on subsets of scanner.bytes"""
     stats_table = _new_decoding_stats_table(scanner.label.plain if scanner.label else '')
     regexes_not_found_in_stream = []

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/pdf_object_relationship.py RENAMED Viewed

@@ -14,12 +14,12 @@ INCOMPARABLE_PROPS = ['from_obj', 'to_obj']
 class PdfObjectRelationship:
     def __init__(
-            self,
-            from_node: 'PdfTreeNode',
-            to_obj: IndirectObject,
-            reference_key: str,
-            address: str
-        ) -> None:
+        self,
+        from_node: 'PdfTreeNode',
+        to_obj: IndirectObject,
+        reference_key: str,
+        address: str
+    ) -> None:
         """
         In the case of easy key/value pairs the reference_key and the address are the same but
         for more complicated references the address will be the reference_key plus sub references.
@@ -53,12 +53,12 @@ class PdfObjectRelationship:
     @classmethod
     def build_node_references(
-            cls,
-            from_node: 'PdfTreeObject',
-            from_obj: Optional[PdfObject] = None,
-            ref_key: Optional[Union[str, int]] = None,
-            address: Optional[str] = None
-        ) -> List['PdfObjectRelationship']:
+        cls,
+        from_node: 'PdfTreeObject',
+        from_obj: Optional[PdfObject] = None,
+        ref_key: Optional[Union[str, int]] = None,
+        address: Optional[str] = None
+    ) -> List['PdfObjectRelationship']:
         """
         Builds list of relationships 'from_node.obj' contains referencing other PDF objects.
         Initially called with single arg from_node. Other args are employed when recursable

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/pdfalyzer.py RENAMED Viewed

@@ -77,7 +77,7 @@ class Pdfalyzer:
         nodes_to_walk_next = [self._add_relationship_to_pdf_tree(r) for r in node.references_to_other_nodes()]
         node.all_references_processed = True
-        for next_node in [n for n in nodes_to_walk_next if not (n is None or n.all_references_processed) ]:
+        for next_node in [n for n in nodes_to_walk_next if not (n is None or n.all_references_processed)]:
             if not next_node.all_references_processed:
                 self.walk_node(next_node)
@@ -105,7 +105,7 @@ class Pdfalyzer:
     def stream_nodes(self) -> List[PdfTreeNode]:
         """List of actual nodes (not SymlinkNodes) containing streams sorted by PDF object ID"""
-        stream_filter = lambda node: node.contains_stream() and not isinstance(node, SymlinkNode)
+        stream_filter = lambda node: node.contains_stream() and not isinstance(node, SymlinkNode)  # noqa: E731
         return sorted(findall(self.pdf_tree, stream_filter), key=lambda r: r.idnum)
     def _add_relationship_to_pdf_tree(self, relationship: PdfObjectRelationship) -> Optional[PdfTreeNode]:
@@ -114,7 +114,7 @@ class Pdfalyzer:
         placed in the PDF node processing queue.
         """
         log.info(f'Assessing relationship {relationship}...')
-        was_seen_before = (relationship.to_obj.idnum in self.nodes_encountered) # Must come before _build_or_find()
+        was_seen_before = (relationship.to_obj.idnum in self.nodes_encountered)  # Must come before _build_or_find()
         from_node = relationship.from_node
         to_node = self._build_or_find_node(relationship.to_obj, relationship.address)
         self.max_generation = max([self.max_generation, relationship.to_obj.generation or 0])
@@ -133,7 +133,7 @@ class Pdfalyzer:
                 from_node.set_parent(to_node)
             elif to_node.parent is not None:
                 # Some StructElem nodes I have seen use /P or /K despire not being the real parent/child
-                if relationship.from_node.type.startswith(STRUCT_ELEM):# reference_key != relationship.address:
+                if relationship.from_node.type.startswith(STRUCT_ELEM):
                     log.info(f"{relationship} fail: {to_node} parent is already {to_node.parent}")
                 else:
                     log.warning(f"{relationship} fail: {to_node} parent is already {to_node.parent}")
@@ -173,7 +173,6 @@ class Pdfalyzer:
     def _resolve_indeterminate_nodes(self) -> None:
         """Place all indeterminate nodes in the tree."""
-        #set_log_level('INFO')
         indeterminate_nodes = [self.nodes_encountered[idnum] for idnum in self.indeterminate_ids]
         indeterminate_nodes_string = "\n   ".join([f"{node}" for node in indeterminate_nodes])
         log.info(f"Resolving {len(indeterminate_nodes)} indeterminate nodes: {indeterminate_nodes_string}")

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/util/adobe_strings.py RENAMED Viewed

@@ -116,20 +116,20 @@ NON_TREE_REFERENCES = [
 # Some PdfObjects can't be properly placed in the tree until the entire tree is parsed
 INDETERMINATE_REF_KEYS = [
-    ANNOTS,  # At least when it appears in a page
+    ANNOTS,     # At least when it appears in a page
     COLOR_SPACE,
     D,
     DEST,
     EXT_G_STATE,
-    FIELDS,   # At least for  /AcroForm
+    FIELDS,     # At least for  /AcroForm
     FIRST,
     FONT,
     NAMES,
     OPEN_ACTION,
-    P,   # At least for widgets...
+    P,          # At least for widgets...
     RESOURCES,
     XOBJECT,
-    UNLABELED, # TODO: this might be wrong? maybe this is where the /Resources actually live?
+    UNLABELED,  # TODO: this might be wrong? maybe this is where the /Resources actually live?
 ]
 INDETERMINATE_PREFIXES = [p for p in INDETERMINATE_REF_KEYS if len(p) > 2]

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/util/argument_parser.py RENAMED Viewed

@@ -92,9 +92,9 @@ select.add_argument('-c', '--counts', action='store_true',
                     help='show counts of some of the properties of the objects in the PDF')
 select.add_argument('-s', '--streams',
-                    help="scan all the PDF's decoded/decrypted streams for sus content as well as any YARA rule matches. " + \
-                         "brute force is involved; output is verbose. a single OBJ_ID can be optionally provided to " + \
-                         "limit the output to a single internal object. try '-s -- [OTHERARGS]' if you run into an " + \
+                    help="scan all the PDF's decoded/decrypted streams for sus content as well as any YARA rule matches. " +
+                         "brute force is involved; output is verbose. a single OBJ_ID can be optionally provided to " +
+                         "limit the output to a single internal object. try '-s -- [OTHERARGS]' if you run into an " +
                          "argument position related piccadilly.",
                     nargs='?',
                     const=ALL_STREAMS,
@@ -102,7 +102,7 @@ select.add_argument('-s', '--streams',
                     type=int)
 select.add_argument('--extract-quoted',
-                    help="extract and force decode all bytes found between this kind of quotation marks " + \
+                    help="extract and force decode all bytes found between this kind of quotation marks " +
                          "(requires --streams. can be specified more than once)",
                     choices=list(QUOTE_PATTERNS.keys()),
                     dest='extract_quoteds',
@@ -147,7 +147,7 @@ def parse_arguments():
         args.output_dir = args.output_dir or getcwd()
         file_prefix = (args.file_prefix + '__') if args.file_prefix else ''
         args.file_suffix = ('_' + args.file_suffix) if args.file_suffix else ''
-        args.output_basename =  f"{file_prefix}{path.basename(args.file_to_scan_path)}"
+        args.output_basename = f"{file_prefix}{path.basename(args.file_to_scan_path)}"
     elif args.output_dir:
         log.warning('--output-dir provided but no export option was chosen')
@@ -203,8 +203,8 @@ MAX_QUALITY = 10
 combine_pdfs_parser = ArgumentParser(
     description="Combine multiple PDFs into one.",
-    epilog="If all PDFs end in a number (e.g. 'xyz_1.pdf', 'xyz_2.pdf', etc. sort the files as if those were" \
-           " page numebrs prior to merging.",
+    epilog="If all PDFs end in a number (e.g. 'xyz_1.pdf', 'xyz_2.pdf', etc. sort the files as if those were" +
+           " page numbers prior to merging.",
     formatter_class=RichHelpFormatterPlus)
 combine_pdfs_parser.add_argument('pdfs',

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pyproject.toml RENAMED Viewed

@@ -1,13 +1,35 @@
 [tool.poetry]
 name = "pdfalyzer"
-version = "1.16.10"
-description = "A PDF analysis toolkit. Scan a PDF with relevant YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more."
+version = "1.16.12"
+description = "PDF analysis tool. Scan a PDF with YARA rules, visualize its inner tree-like data structure in living color (lots of colors), force decodes of suspicious font binaries, and more."
 authors = ["Michel de Cryptadamus <michel@cryptadamus.com>"]
 license = "GPL-3.0-or-later"
 readme = "README.md"
+documentation = "https://github.com/michelcrypt4d4mus/pdfalyzer"
 homepage = "https://github.com/michelcrypt4d4mus/pdfalyzer"
 repository = "https://github.com/michelcrypt4d4mus/pdfalyzer"
-documentation = "https://github.com/michelcrypt4d4mus/pdfalyzer"
+classifiers = [
+    "Development Status :: 5 - Production/Stable",
+    "Environment :: Console",
+    "Intended Audience :: Information Technology",
+    "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)",
+    "Programming Language :: Python",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Artistic Software",
+    "Topic :: Security",
+    "Topic :: Scientific/Engineering :: Visualization",
+]
+include = [
+    "CHANGELOG.md",
+    "LICENSE",
+    ".pdfalyzer.example"
+]
 keywords = [
     "ascii art",
@@ -25,65 +47,63 @@ keywords = [
     "pdf",
     "pdfs",
     "pdf analysis",
+    "pypdf",
     "threat assessment",
+    "threat hunting",
+    "threat intelligence",
+    "threat research",
+    "threatintel",
     "visualization",
     "yara"
 ]
-classifiers = [
-    "Development Status :: 5 - Production/Stable",
-    "Environment :: Console",
-    "Intended Audience :: Information Technology",
-    "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)",
-    "Programming Language :: Python",
-    "Topic :: Artistic Software",
-    "Topic :: Security",
-    "Topic :: Scientific/Engineering :: Visualization",
-]
-include = [
-    "CHANGELOG.md",
-    "LICENSE",
-    ".pdfalyzer.example"
-]
 packages = [
     { include = "pdfalyzer" }
 ]
-# Dependencies
+#####################
+#   Dependencies    #
+#####################
 [tool.poetry.dependencies]
-python = "^3.9.2"
+python = "^3.9,>=3.9.2"
 anytree = "~=2.13"
-pypdf = "^5.9.0"
+pypdf = "^6.0.0"
 yaralyzer = "^1.0.4"
-# Dev dependencies
 [tool.poetry.group.dev.dependencies]
+flake8 = "^7.3.0"
 pytest = "^7.1.2"
 pytest-skip-slow = "^0.0.3"
-# Scripts
+#############
+#  Scripts  #
+#############
 [tool.poetry.scripts]
 combine_pdfs = 'pdfalyzer:combine_pdfs'
 pdfalyze = 'pdfalyzer:pdfalyze'
 pdfalyzer_show_color_theme = 'pdfalyzer:pdfalyzer_show_color_theme'
-# URLs for PyPi page
+#####################
+#     PyPi URLs     #
+#####################
 [tool.poetry.urls]
 Changelog = "https://github.com/michelcrypt4d4mus/pdfalyzer/blob/master/CHANGELOG.md"
-# Poetry build system
+###############################
+#     Poetry build system     #
+###############################
 [build-system]
-requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
+requires = ["poetry-core>=1.0.0"]
-# Pytest configuration
+##################
+#     pytest     #
+##################
 [tool.pytest.ini_options]
 addopts = [
     "--import-mode=importlib",

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/.pdfalyzer.example RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/LICENSE RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/__main__.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/config.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/decorators/document_model_printer.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/decorators/indeterminate_node.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/detection/constants/javascript_reserved_keywords.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/detection/javascript_hunter.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/dict_helper.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/number_helper.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/helpers/string_helper.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/layout.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/styles/node_colors.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/styles/rich_theme.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/tables/font_summary_table.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/tables/pdf_node_rich_table.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/output/tables/stream_objects_table.py RENAMED Viewed

File without changes

{pdfalyzer-1.16.10 → pdfalyzer-1.16.12}/pdfalyzer/util/debugging.py RENAMED Viewed

@@ -1,7 +1,7 @@
 import logging
 # Starting pdb.set_trace() this way kind of sucks because yr locals are messed up
 def debugger():
     import pdb; pdb.set_trace(locals())