PyPDFForm 3.1.3__py3-none-any.whl → 3.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of PyPDFForm might be problematic. Click here for more details.

PyPDFForm/__init__.py CHANGED
@@ -20,7 +20,7 @@ The library supports various PDF form features, including:
20
20
  PyPDFForm aims to simplify PDF form manipulation, making it accessible to developers of all skill levels.
21
21
  """
22
22
 
23
- __version__ = "3.1.3"
23
+ __version__ = "3.3.0"
24
24
 
25
25
  from .middleware.text import Text # exposing for setting global font attrs
26
26
  from .wrapper import PdfWrapper
PyPDFForm/adapter.py CHANGED
@@ -9,6 +9,7 @@ filling operations, where the input PDF template can be provided in different
9
9
  forms. The module ensures that the input is properly converted into a byte
10
10
  stream before further processing.
11
11
  """
12
+ # TODO: For large PDF files, reading the entire file into memory using `_file.read()` in `fp_or_f_obj_or_stream_to_stream` can be inefficient. Consider streaming or chunking if downstream processing allows.
12
13
 
13
14
  from os.path import isfile
14
15
  from typing import Any, BinaryIO, Union
@@ -63,6 +64,6 @@ def fp_or_f_obj_or_stream_to_stream(
63
64
  if not isfile(fp_or_f_obj_or_stream):
64
65
  pass
65
66
  else:
66
- with open(fp_or_f_obj_or_stream, "rb+") as _file:
67
+ with open(fp_or_f_obj_or_stream, "rb") as _file:
67
68
  result = _file.read()
68
69
  return result
PyPDFForm/constants.py CHANGED
@@ -95,6 +95,7 @@ XFA = "/XFA"
95
95
 
96
96
  # Field flag bits
97
97
  READ_ONLY = 1 << 0
98
+ REQUIRED = 1 << 1
98
99
  MULTILINE = 1 << 12
99
100
  COMB = 1 << 24
100
101
 
PyPDFForm/coordinate.py CHANGED
@@ -6,6 +6,8 @@ This module provides functionality to generate coordinate grids on existing PDF
6
6
  It allows developers to visualize the coordinate system of each page in a PDF, which can be helpful
7
7
  for debugging and precisely positioning elements when filling or drawing on PDF forms.
8
8
  """
9
+ # TODO: The `PdfReader` object is initialized twice (lines 42 and implicitly within `create_watermarks_and_draw` if it re-reads the PDF). Consider initializing it once and passing the object or its relevant parts to avoid redundant parsing, especially for large PDFs.
10
+ # TODO: Drawing operations for lines and texts are performed and merged separately. It might be more efficient to combine all drawing operations for a page into a single `create_watermarks_and_draw` call or to merge all watermarks in one final step to reduce PDF processing overhead.
9
11
 
10
12
  from typing import Tuple
11
13
 
PyPDFForm/filler.py CHANGED
@@ -7,6 +7,11 @@ It includes functions for handling various form field types, such as text fields
7
7
  checkboxes, radio buttons, dropdowns, images, and signatures. The module also
8
8
  supports flattening the filled form to prevent further modifications.
9
9
  """
10
+ # TODO: In `fill` function, `PdfReader(stream_to_io(template))` and `out.append(pdf)` might involve re-parsing or copying the entire PDF. For very large PDFs, consider if `pypdf` offers more efficient ways to modify in-place or stream processing.
11
+ # TODO: The `get_widget_key` function is called repeatedly in a loop. If its internal logic is complex, consider caching its results or optimizing its implementation to avoid redundant computations.
12
+ # TODO: The `signature_image_handler` function involves `get_image_dimensions` and `get_draw_image_resolutions`. If image processing is a bottleneck, consider optimizing these image-related operations, perhaps by using faster image libraries or pre-calculating dimensions if images are reused.
13
+ # TODO: Similar to `coordinate.py`, `get_drawn_stream` involves multiple `create_watermarks_and_draw` and `merge_watermarks_with_pdf` calls. Combining drawing operations or merging watermarks in a single pass could reduce overhead.
14
+ # TODO: The `radio_button_tracker` logic involves iterating through all radio buttons. For forms with many radio buttons, consider optimizing the lookup or update mechanism if performance becomes an issue.
10
15
 
11
16
  from io import BytesIO
12
17
  from typing import Dict, Union, cast
PyPDFForm/font.py CHANGED
@@ -6,6 +6,11 @@ It includes functions for registering fonts with ReportLab and within the PDF's
6
6
  allowing these fonts to be used when filling form fields. The module also provides utilities
7
7
  for extracting font information from TTF streams and managing font names within a PDF.
8
8
  """
9
+ # TODO: In `get_additional_font_params`, iterating through `reader.pages[0][Resources][Font].values()` can be inefficient for PDFs with many fonts. Consider building a font lookup dictionary once per PDF or caching results if this function is called frequently with the same PDF.
10
+ # TODO: In `register_font_acroform`, `PdfReader(stream_to_io(pdf))` and `writer.append(reader)` involve re-parsing and appending the PDF. For large PDFs, passing `PdfReader` and `PdfWriter` objects directly could reduce overhead.
11
+ # TODO: In `register_font_acroform`, `compress(ttf_stream)` can be CPU-intensive. If the same font stream is registered multiple times within a single PDF processing session, consider caching the compressed stream to avoid redundant compression.
12
+ # TODO: In `get_new_font_name`, while `existing` is a set, if `n` needs to increment many times due to a dense range of existing font names, the `while` loop could be slow. However, this is likely a minor bottleneck in typical scenarios.
13
+ # TODO: In `get_all_available_fonts`, the `replace("/", "")` operation on `BaseFont` could be avoided if font names are consistently handled with or without the leading slash to prevent string manipulation overhead in a loop.
9
14
 
10
15
  from functools import lru_cache
11
16
  from io import BytesIO
PyPDFForm/hooks.py CHANGED
@@ -8,6 +8,10 @@ of checkbox and radio button widgets. It also provides functions for flattening
8
8
  generic and radio button widgets. These hooks are triggered during the PDF form
9
9
  filling process, allowing for customization of the form's appearance and behavior.
10
10
  """
11
+ # TODO: In `trigger_widget_hooks`, the PDF is read and written in each call. If this function is part of a larger workflow, consider passing `PdfReader` and `PdfWriter` objects to avoid redundant parsing and writing, allowing modifications to be accumulated and written once.
12
+ # TODO: String manipulations (split/join) in `update_text_field_font`, `update_text_field_font_size`, and `update_text_field_font_color` could be optimized for very long `DA` strings, potentially using more efficient string manipulation techniques or regex if the structure is consistent.
13
+ # TODO: The `get_widget_key` function is called in a loop within `trigger_widget_hooks`. If its internal logic is complex, consider caching its results or optimizing its implementation to avoid redundant computations.
14
+ # TODO: In `flatten_radio` and `flatten_generic`, `annot.get(NameObject(Ff), 0)` is called twice within the conditional. Store this value in a local variable to avoid redundant dictionary lookups.
11
15
 
12
16
  import sys
13
17
  from io import BytesIO
@@ -18,7 +22,8 @@ from pypdf.generic import (ArrayObject, DictionaryObject, FloatObject,
18
22
  NameObject, NumberObject, TextStringObject)
19
23
 
20
24
  from .constants import (COMB, DA, FONT_COLOR_IDENTIFIER, FONT_SIZE_IDENTIFIER,
21
- MULTILINE, READ_ONLY, Annots, Ff, Opt, Parent, Q, Rect)
25
+ MULTILINE, READ_ONLY, REQUIRED, TU, Annots, Ff, Opt,
26
+ Parent, Q, Rect)
22
27
  from .template import get_widget_key
23
28
  from .utils import stream_to_io
24
29
 
@@ -325,3 +330,48 @@ def flatten_generic(annot: DictionaryObject, val: bool) -> None:
325
330
  else int(annot.get(NameObject(Ff), 0)) & ~READ_ONLY
326
331
  )
327
332
  )
333
+
334
+
335
+ def update_field_tooltip(annot: DictionaryObject, val: str) -> None:
336
+ """
337
+ Updates the tooltip (alternate field name) of a form field annotation.
338
+
339
+ This function sets the 'TU' entry in the annotation dictionary, which
340
+ provides a text string that can be used as a tooltip for the field.
341
+
342
+ Args:
343
+ annot (DictionaryObject): The annotation dictionary for the form field.
344
+ val (str): The new tooltip string for the field.
345
+ """
346
+ if val:
347
+ annot[NameObject(TU)] = TextStringObject(val)
348
+
349
+
350
+ def update_field_required(annot: DictionaryObject, val: bool) -> None:
351
+ """
352
+ Updates the 'Required' flag of a form field annotation.
353
+
354
+ This function modifies the Ff (flags) entry in the annotation dictionary
355
+ (or its parent if applicable) to set or unset the 'Required' flag,
356
+ making the field mandatory or optional.
357
+
358
+ Args:
359
+ annot (DictionaryObject): The annotation dictionary for the form field.
360
+ val (bool): True to set the field as required, False to make it optional.
361
+ """
362
+ if Parent in annot and Ff not in annot:
363
+ annot[NameObject(Parent)][NameObject(Ff)] = NumberObject(
364
+ (
365
+ int(annot.get(NameObject(Ff), 0)) | REQUIRED
366
+ if val
367
+ else int(annot.get(NameObject(Ff), 0)) & ~REQUIRED
368
+ )
369
+ )
370
+ else:
371
+ annot[NameObject(Ff)] = NumberObject(
372
+ (
373
+ int(annot.get(NameObject(Ff), 0)) | REQUIRED
374
+ if val
375
+ else int(annot.get(NameObject(Ff), 0)) & ~REQUIRED
376
+ )
377
+ )
PyPDFForm/image.py CHANGED
@@ -6,6 +6,9 @@ It includes functions for rotating images, retrieving image dimensions, and
6
6
  calculating the resolutions for drawing an image on a PDF page, taking into
7
7
  account whether to preserve the aspect ratio.
8
8
  """
9
+ # TODO: In `rotate_image` and `get_image_dimensions`, `BytesIO` is used to wrap the image stream. While necessary for PIL, consider if the `image_stream` is already a file-like object in some calling contexts, which could avoid redundant copying to `BytesIO`.
10
+ # TODO: The `rotate_image` function creates a new `BytesIO` object and saves the image to it. For multiple rotations or image manipulations, consider keeping the `PIL.Image.Image` object in memory and performing operations on it directly before a final save to bytes, to avoid repeated I/O operations.
11
+ # TODO: The `get_image_dimensions` function opens the image to get its size. If image dimensions are frequently needed for the same image, consider caching the dimensions to avoid re-opening and re-parsing the image data.
9
12
 
10
13
  from io import BytesIO
11
14
  from typing import Tuple, Union
@@ -23,6 +23,8 @@ class Widget:
23
23
 
24
24
  SET_ATTR_TRIGGER_HOOK_MAP = {
25
25
  "readonly": "flatten_generic",
26
+ "required": "update_field_required",
27
+ "tooltip": "update_field_tooltip",
26
28
  }
27
29
 
28
30
  def __init__(
@@ -41,7 +43,9 @@ class Widget:
41
43
  self._name = name
42
44
  self._value = value
43
45
  self.desc: str = None
46
+ self.tooltip: str = None # TODO: sync tooltip and desc
44
47
  self.readonly: bool = None
48
+ self.required: bool = None
45
49
  self.hooks_to_trigger: list = []
46
50
 
47
51
  def __setattr__(self, name: str, value: Any) -> None:
@@ -55,6 +55,54 @@ class Dropdown(Widget):
55
55
  self.font: str = None
56
56
  self.choices: Union[tuple, list] = None
57
57
 
58
+ @property
59
+ def value(self) -> int:
60
+ """
61
+ Gets the current value of the dropdown.
62
+
63
+ Returns:
64
+ int: The index of the selected choice.
65
+ """
66
+ return super().value
67
+
68
+ @value.setter
69
+ def value(self, value: Union[str, int]) -> None:
70
+ """
71
+ Sets the value of the dropdown.
72
+
73
+ If the value is a string, it attempts to find the corresponding
74
+ index in the choices list. If not found, the string value is
75
+ added to the choices, and its new index is used.
76
+
77
+ Args:
78
+ value (Union[str, int]): The value to set. Can be a string
79
+ (option text) or an integer (index).
80
+ """
81
+ if isinstance(value, str):
82
+ index = self._get_option_index(value)
83
+ if index is None:
84
+ self.choices = list(self.choices) + [value]
85
+ index = len(self.choices) - 1
86
+ value = index
87
+
88
+ self._value = value
89
+
90
+ def _get_option_index(self, value: str) -> Union[int, None]:
91
+ """
92
+ Gets the index of a given option value in the dropdown's choices.
93
+
94
+ Args:
95
+ value (str): The option value to search for.
96
+
97
+ Returns:
98
+ Union[int, None]: The index of the option if found, otherwise None.
99
+ """
100
+ for i, each in enumerate(self.choices):
101
+ if value == each:
102
+ return i
103
+
104
+ return None
105
+
58
106
  @property
59
107
  def schema_definition(self) -> dict:
60
108
  """
@@ -6,6 +6,7 @@ This module defines the Signature class, which is a subclass of the
6
6
  Widget class. It represents a signature form field in a PDF document,
7
7
  allowing users to add their signature as an image.
8
8
  """
9
+ # TODO: In the `stream` property, `fp_or_f_obj_or_stream_to_stream` is called every time the property is accessed. If the signature image is large or the property is accessed frequently, consider caching the result of `fp_or_f_obj_or_stream_to_stream` to avoid redundant file reads.
9
10
 
10
11
  from os.path import expanduser
11
12
  from typing import Union
PyPDFForm/patterns.py CHANGED
@@ -7,6 +7,10 @@ checkboxes, radio buttons, dropdowns, images, and signatures) based on their
7
7
  properties in the PDF's annotation dictionary. It also provides utility functions
8
8
  for updating these widgets.
9
9
  """
10
+ # TODO: The `WIDGET_TYPE_PATTERNS` list is iterated through to determine widget types. For very large numbers of annotations or complex pattern matching, consider optimizing this lookup, perhaps by pre-compiling regexes or using a more efficient data structure if the patterns allow.
11
+ # TODO: In `update_checkbox_value` and `update_radio_value`, iterating through `annot[AP][N]` to find the correct appearance state might be slow if `N` contains many entries. If possible, a direct lookup or a more optimized search could improve performance.
12
+ # TODO: In `update_dropdown_value`, the list comprehension for `ArrayObject` can be computationally intensive for dropdowns with many choices, as it creates new `TextStringObject` and `ArrayObject` instances for each choice. Consider optimizing this if dropdowns have a very large number of options.
13
+ # TODO: The `get_checkbox_value` and `get_radio_value` functions involve dictionary lookups and comparisons. While generally fast, repeated calls in a tight loop for many widgets could accumulate overhead.
10
14
 
11
15
  from typing import Union
12
16
 
PyPDFForm/template.py CHANGED
@@ -7,6 +7,11 @@ in PDF form templates. It leverages the pypdf library for PDF manipulation
7
7
  and defines specific patterns for identifying and constructing different
8
8
  types of widgets.
9
9
  """
10
+ # TODO: In `build_widgets`, the `get_widgets_by_page` function is called, which then iterates through pages and annotations. For very large PDFs, this initial parsing and iteration can be a bottleneck. Consider optimizing the widget extraction process if possible, perhaps by using a more direct method to access annotations if `pypdf` allows.
11
+ # TODO: The `construct_widget` function iterates through `WIDGET_TYPE_PATTERNS` for each widget. If there are many patterns or many widgets, this repeated iteration could be optimized by pre-compiling patterns or using a more efficient lookup mechanism.
12
+ # TODO: In `get_widget_key`, the recursive call for `Parent` can lead to deep recursion for deeply nested widgets, potentially impacting performance or hitting recursion limits for extremely complex forms. Consider an iterative approach if deep nesting is common.
13
+ # TODO: In `update_widget_keys`, the nested loops iterating through `old_keys`, `out.pages`, and `page.get(Annots, [])` can be very inefficient for large numbers of keys, pages, or annotations. Consider creating a lookup structure for annotations by key to avoid repeated linear scans.
14
+ # TODO: In `update_widget_keys`, `PdfReader(stream_to_io(template))` and `out.append(pdf)` involve re-parsing and appending the PDF. For large PDFs, passing `PdfReader` and `PdfWriter` objects directly could reduce overhead.
10
15
 
11
16
  from functools import lru_cache
12
17
  from io import BytesIO
PyPDFForm/utils.py CHANGED
@@ -12,6 +12,14 @@ It includes functions for:
12
12
  - Generating unique suffixes for internal use.
13
13
  - Enabling Adobe-specific settings in the PDF to ensure proper rendering of form fields.
14
14
  """
15
+ # TODO: In `enable_adobe_mode`, `PdfReader(stream_to_io(pdf))` and `writer.append(reader)` involve re-parsing and appending the PDF. For large PDFs, passing `PdfReader` and `PdfWriter` objects directly could reduce overhead.
16
+ # TODO: In `remove_all_widgets`, `PdfReader(stream_to_io(pdf))` and iterating through pages to add them to a new writer can be inefficient for large PDFs. Consider if `pypdf` offers a more direct way to remove annotations without reconstructing the entire PDF.
17
+ # TODO: In `get_page_streams`, `PdfReader(stream_to_io(pdf))` and then creating a new `PdfWriter` for each page can be very inefficient. It would be more performant to iterate through the pages of a single `PdfReader` and extract their content streams directly if possible, or to use a single `PdfWriter` to extract multiple pages.
18
+ # TODO: In `merge_two_pdfs`, the function reads and writes PDFs multiple times (`PdfReader`, `PdfWriter`, `remove_all_widgets`, then another `PdfReader` and `PdfWriter`). This is highly inefficient. The PDF objects should be passed around and modified in-place as much as possible, with a single final write operation.
19
+ # TODO: The `merge_two_pdfs` function has a `TODO: refactor duplicate logic with copy_watermark_widgets` comment. This indicates a potential for code duplication and inefficiency. Refactoring this to a shared helper function would improve maintainability and potentially performance.
20
+ # TODO: In `find_pattern_match` and `traverse_pattern`, the recursive nature and repeated dictionary lookups (`widget.items()`, `value.get_object()`) can be slow for deeply nested or complex widget structures. Consider optimizing these traversals, perhaps by pre-flattening the widget dictionary or using a more direct access method if `pypdf` allows.
21
+ # TODO: In `extract_widget_property`, the loop iterates through `patterns` and calls `traverse_pattern` for each. If `patterns` is long or `traverse_pattern` is expensive, this could be a bottleneck. Consider optimizing the pattern matching or lookup.
22
+ # TODO: `generate_unique_suffix` uses `choice` in a loop. While generally fast, for extremely high call volumes, pre-generating a pool of characters or using a faster random string generation method might offer minor improvements.
15
23
 
16
24
  from collections.abc import Callable
17
25
  from functools import lru_cache
PyPDFForm/watermark.py CHANGED
@@ -7,6 +7,13 @@ It supports drawing text, lines, and images as watermarks.
7
7
  The module also includes functions to merge these watermarks with the original PDF content
8
8
  and to copy specific widgets from the watermarks to the original PDF.
9
9
  """
10
+ # TODO: In `draw_image`, `ImageReader(image_buff)` is created for each image. If the same image is drawn multiple times, consider caching `ImageReader` objects or passing pre-processed image data to avoid redundant processing.
11
+ # TODO: In `create_watermarks_and_draw`, `PdfReader(stream_to_io(pdf))` is called, which re-parses the PDF. If this function is called repeatedly for the same PDF, consider passing the `PdfReader` object directly to avoid redundant parsing.
12
+ # TODO: In `create_watermarks_and_draw`, the function returns a list of watermarks where only one element is populated. This can be inefficient for memory if there are many pages but only one watermark is created. Consider returning only the created watermark and its page number, and let the caller handle placement.
13
+ # TODO: In `merge_watermarks_with_pdf`, `PdfReader(stream_to_io(pdf))` and `PdfReader(stream_to_io(watermarks[i]))` are called in a loop. This leads to repeated parsing of the base PDF and each watermark. It would be more efficient to parse the base PDF once and then merge watermark pages directly into the existing `PdfWriter` object.
14
+ # TODO: In `copy_watermark_widgets`, the function reads the PDF and watermarks multiple times. Similar to `merge_watermarks_with_pdf`, optimize by parsing the base PDF and watermarks once and then manipulating the `PdfWriter` object.
15
+ # TODO: The `copy_watermark_widgets` function has a `TODO: refactor duplicate logic with merge_two_pdfs` comment. This indicates a potential for code duplication and inefficiency. Refactoring this to a shared helper function would improve maintainability and potentially performance.
16
+ # TODO: In `copy_watermark_widgets`, the nested loops iterating through `watermarks`, `watermark_file.pages`, and `page.get(Annots, [])` can be very inefficient for large numbers of watermarks, pages, or annotations. Consider creating a lookup structure for annotations by key to avoid repeated linear scans.
10
17
 
11
18
  from io import BytesIO
12
19
  from typing import List, Union
PyPDFForm/widgets/base.py CHANGED
@@ -7,6 +7,8 @@ such as text fields, checkboxes, and radio buttons. The Widget class handles
7
7
  basic properties like name, page number, and coordinates, and provides methods
8
8
  for rendering the widget on a PDF page.
9
9
  """
10
+ # TODO: In `watermarks`, `PdfReader(stream_to_io(stream))` is called, which re-parses the PDF for each widget. If multiple widgets are being processed, consider passing the `PdfReader` object directly to avoid redundant parsing.
11
+ # TODO: In `watermarks`, the list comprehension `[watermark.read() if i == self.page_number - 1 else b"" for i in range(page_count)]` creates a new `BytesIO` object and reads from it for each widget. If many widgets are created, this could be optimized by creating the `BytesIO` object once and passing it around, or by directly returning the watermark bytes and its page number.
10
12
 
11
13
  from io import BytesIO
12
14
  from typing import List, Union
@@ -25,6 +25,7 @@ class CheckBoxWidget(Widget):
25
25
  """
26
26
 
27
27
  USER_PARAMS = [
28
+ ("tooltip", "tooltip"),
28
29
  ("button_style", "buttonStyle"),
29
30
  ("tick_color", "textColor"),
30
31
  ("bg_color", "fillColor"),
@@ -32,5 +33,5 @@ class CheckBoxWidget(Widget):
32
33
  ("border_width", "borderWidth"),
33
34
  ]
34
35
  COLOR_PARAMS = ["tick_color", "bg_color", "border_color"]
35
- ALLOWED_HOOK_PARAMS = ["size"]
36
+ ALLOWED_HOOK_PARAMS = ["required", "size"]
36
37
  ACRO_FORM_FUNC = "checkbox"
@@ -4,6 +4,7 @@ This module defines the RadioWidget class, which is a subclass of the
4
4
  CheckBoxWidget class. It represents a radio button form field in a PDF
5
5
  document.
6
6
  """
7
+ # TODO: In `canvas_operations`, `self.acro_form_params.copy()` creates a shallow copy of the dictionary in each iteration of the loop. For a large number of radio buttons, this repeated copying can be inefficient. Consider modifying the dictionary in place and then reverting changes if necessary, or restructuring the data to avoid repeated copying.
7
8
 
8
9
  from typing import List
9
10
 
@@ -5,6 +5,9 @@ representing signature fields in a PDF form. It handles the creation and
5
5
  rendering of signature widgets, as well as the integration of signatures
6
6
  into the PDF document.
7
7
  """
8
+ # TODO: In `watermarks`, `PdfReader(stream_to_io(BEDROCK_PDF))` is called every time the method is invoked. If `BEDROCK_PDF` is static, consider parsing it once and caching the `PdfReader` object to avoid redundant I/O and parsing.
9
+ # TODO: In `watermarks`, the list comprehension `[f.read() if i == self.page_number - 1 else b"" for i in range(page_count)]` reads the entire `BytesIO` object `f` multiple times if `page_count` is large. Read `f` once into a variable and then use that variable in the list comprehension.
10
+ # TODO: The `input_pdf` is created in `watermarks` but only its page count is used. If the `PdfReader` object is not needed for other operations, consider a lighter way to get the page count or pass the `PdfReader` object from the caller if it's already available.
8
11
 
9
12
  from io import BytesIO
10
13
  from typing import List
@@ -31,6 +34,8 @@ class SignatureWidget:
31
34
  Attributes:
32
35
  OPTIONAL_PARAMS (list): A list of tuples, where each tuple contains the
33
36
  parameter name and its default value.
37
+ ALLOWED_HOOK_PARAMS (list): A list of parameter names that can be
38
+ used as hooks to trigger dynamic modifications.
34
39
  BEDROCK_WIDGET_TO_COPY (str): The name of the bedrock widget to copy.
35
40
  """
36
41
 
@@ -38,6 +43,7 @@ class SignatureWidget:
38
43
  ("width", 160),
39
44
  ("height", 90),
40
45
  ]
46
+ ALLOWED_HOOK_PARAMS = ["required", "tooltip"]
41
47
  BEDROCK_WIDGET_TO_COPY = "signature"
42
48
 
43
49
  def __init__(
@@ -68,6 +74,9 @@ class SignatureWidget:
68
74
  self.optional_params = {
69
75
  each[0]: kwargs.get(each[0], each[1]) for each in self.OPTIONAL_PARAMS
70
76
  }
77
+ for each in self.ALLOWED_HOOK_PARAMS:
78
+ if each in kwargs:
79
+ self.hook_params.append((each, kwargs.get(each)))
71
80
 
72
81
  def watermarks(self, stream: bytes) -> List[bytes]:
73
82
  """
PyPDFForm/widgets/text.py CHANGED
@@ -27,6 +27,7 @@ class TextWidget(Widget):
27
27
  """
28
28
 
29
29
  USER_PARAMS = [
30
+ ("tooltip", "tooltip"),
30
31
  ("width", "width"),
31
32
  ("height", "height"),
32
33
  ("font_size", "fontSize"),
@@ -37,6 +38,6 @@ class TextWidget(Widget):
37
38
  ("max_length", "maxlen"),
38
39
  ]
39
40
  COLOR_PARAMS = ["font_color", "bg_color", "border_color"]
40
- ALLOWED_HOOK_PARAMS = ["alignment", "multiline", "comb", "font"]
41
+ ALLOWED_HOOK_PARAMS = ["required", "alignment", "multiline", "comb", "font"]
41
42
  NONE_DEFAULTS = ["max_length"]
42
43
  ACRO_FORM_FUNC = "textfield"
PyPDFForm/wrapper.py CHANGED
@@ -15,6 +15,17 @@ methods for interacting with its form fields and content. It leverages
15
15
  lower-level modules within the `PyPDFForm` library to handle the
16
16
  underlying PDF manipulation.
17
17
  """
18
+ # TODO: The `__add__` method (merging PDFs) involves multiple `self.read()` and `other.read()` calls, leading to redundant PDF parsing. Consider optimizing by passing `PdfReader` objects directly or by performing a single read and then merging.
19
+ # TODO: In `_init_helper`, `build_widgets` and `get_all_available_fonts` both call `self.read()`, causing the PDF to be parsed multiple times. Optimize by parsing the PDF once and passing the `PdfReader` object to these functions.
20
+ # TODO: The `pages` property's implementation involves `get_page_streams(remove_all_widgets(self.read()))` and `copy_watermark_widgets(each, self.read(), None, i)`. This leads to excessive PDF parsing, widget removal, and copying for each page. Refactor to minimize PDF I/O operations, possibly by working with `pypdf` page objects directly.
21
+ # TODO: The `read` method triggers `trigger_widget_hooks` and `enable_adobe_mode`, both of which can involve PDF parsing and writing. Since `read` is called frequently, this can be a performance bottleneck. Consider a more granular dirty-flag system to only apply changes when necessary, or accumulate changes and apply them in a single PDF write operation.
22
+ # TODO: The `write` method calls `self.read()`, which in turn triggers all pending operations. This can lead to redundant processing if `read()` has already been called or if multiple `write()` calls are made.
23
+ # TODO: In `change_version`, replacing a byte string in the entire PDF stream can be inefficient for very large PDFs. Consider if `pypdf` offers a more direct way to update the PDF version without full stream manipulation.
24
+ # TODO: In `generate_coordinate_grid`, `self.read()` is called multiple times, and then `remove_all_widgets`, `generate_coordinate_grid`, and `copy_watermark_widgets` are called, all of which involve PDF parsing and manipulation. Optimize by minimizing PDF I/O and object re-creation.
25
+ # TODO: In `fill`, `self.read()` is called, and then `fill` (from `filler.py`), `remove_all_widgets`, and `copy_watermark_widgets` are called. This is a major operation and likely a performance hotspot due to repeated PDF processing. Streamline the PDF modification workflow to reduce redundant parsing and writing.
26
+ # TODO: In `create_widget`, `obj.watermarks(self.read())` and `copy_watermark_widgets(self.read(), watermarks, [name], None)` involve reading the PDF multiple times. Optimize by passing the PDF stream or `PdfReader` object more efficiently.
27
+ # TODO: The `commit_widget_key_updates` method calls `update_widget_keys`, which involves re-parsing and writing the PDF. For bulk updates, consider a mechanism to apply all key changes in a single PDF modification operation.
28
+ # TODO: General: Many methods repeatedly call `self.read()`, which re-parses the PDF. Consider maintaining a persistent `pypdf.PdfReader` and `pypdf.PdfWriter` object internally and only writing to a byte stream when explicitly requested (e.g., by the `read()` or `write()` methods) to avoid redundant I/O and parsing overhead.
18
29
 
19
30
  from __future__ import annotations
20
31
 
@@ -218,6 +229,22 @@ class PdfWrapper:
218
229
  },
219
230
  }
220
231
 
232
+ @property
233
+ def data(self) -> dict:
234
+ """
235
+ Returns a dictionary of the current data in the PDF form fields.
236
+
237
+ The keys of the dictionary are the form field names, and the values are
238
+ the current values of those fields. This property provides a convenient
239
+ way to extract all filled data from the PDF.
240
+
241
+ Returns:
242
+ dict: A dictionary where keys are form field names (str) and values are
243
+ their corresponding data (Union[str, bool, int, None]).
244
+ """
245
+
246
+ return {key: value.value for key, value in self.widgets.items()}
247
+
221
248
  @property
222
249
  def sample_data(self) -> dict:
223
250
  """
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PyPDFForm
3
- Version: 3.1.3
3
+ Version: 3.3.0
4
4
  Summary: The Python library for PDF forms.
5
5
  Author: Jinge Li
6
6
  License-Expression: MIT
@@ -70,7 +70,7 @@ pip install PyPDFForm
70
70
  ## Quick Example
71
71
  ![Check out the GitHub repository for a live demo if you can't see it here.](https://github.com/chinapandaman/PyPDFForm/raw/master/docs/img/demo.gif)
72
72
 
73
- A sample PDF form can be found [here](https://github.com/chinapandaman/PyPDFForm/raw/master/pdf_samples/sample_template.pdf). Download it and try:
73
+ A sample PDF form can be found [here](https://chinapandaman.github.io/PyPDFForm/pdfs/sample_template.pdf). Download it and try:
74
74
 
75
75
  ```python
76
76
  from PyPDFForm import PdfWrapper
@@ -90,7 +90,7 @@ filled.write("output.pdf")
90
90
  ```
91
91
 
92
92
  After running the above code snippet you can find `output.pdf` at the location you specified,
93
- and it should look like [this](https://github.com/chinapandaman/PyPDFForm/raw/master/pdf_samples/adobe_mode/sample_filled.pdf).
93
+ and it should look like [this](https://chinapandaman.github.io/PyPDFForm/pdfs/sample_filled.pdf).
94
94
 
95
95
  ## Documentation
96
96
 
@@ -0,0 +1,35 @@
1
+ PyPDFForm/__init__.py,sha256=ToO1r5zbCDJutiMz36Y_5mpA35NPoRtwTAsABtKUGEQ,925
2
+ PyPDFForm/adapter.py,sha256=LBxHth0qJFB6rdByRJbsn4x0dftCOAolKVutZeFZm9E,2634
3
+ PyPDFForm/constants.py,sha256=Y5l1qIZGPsSoMl55bOsXaHf3yAY36_b-8KRxTLxXGmk,2541
4
+ PyPDFForm/coordinate.py,sha256=veYOlRyFKIvzLISYA_f-drNBiKOzFwr0EIFCaUAzGgo,4428
5
+ PyPDFForm/filler.py,sha256=fqGIxT3FR3cWo3SMTDYud6Ocs9SZBmSpFv5yg1v19Wk,8450
6
+ PyPDFForm/font.py,sha256=opZjAacsIRFcERXWegPXkOSpmnRrv4y_50yD0_BjWPM,10273
7
+ PyPDFForm/hooks.py,sha256=3ugnhnrB4nFsGL6fc1TtT5Nf_J2QOtM5ZQsm6WVpErY,14279
8
+ PyPDFForm/image.py,sha256=P1P3Ejm8PVPQwpJFGAesjtwS5hxnVItrj75TE3WnFhM,4607
9
+ PyPDFForm/patterns.py,sha256=HbTqzFllQ1cW3CqyNEfVh0qUMeFerbvOd0-HQnkifQQ,9765
10
+ PyPDFForm/template.py,sha256=Jvx99HjLcEG8fZQeGSPZEFcITa4jauPSvenj3XgAf3c,11046
11
+ PyPDFForm/utils.py,sha256=JavhAO4HmYRdujlsPXcZWGXTf7wDXzj4uU1XGRFsAaA,13257
12
+ PyPDFForm/watermark.py,sha256=BJ8NeZLKf-MuJ2XusHiALaQpoqE8j6hHGbWcNhpjxN0,11299
13
+ PyPDFForm/wrapper.py,sha256=KTFou6cXrHtLHVKwngoIr4Pwu4vOfjXY0cWRNNDlW0U,28866
14
+ PyPDFForm/middleware/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
15
+ PyPDFForm/middleware/base.py,sha256=ZmJFh3nSxj6PFjqBqsLih0pXKtcm1o-ctJVWn0v6bbI,3278
16
+ PyPDFForm/middleware/checkbox.py,sha256=OCSZEFD8wQG_Y9qO7Os6VXTaxJCpkRYTxI4wDgG0GZc,1870
17
+ PyPDFForm/middleware/dropdown.py,sha256=pfiMuAOr3ze7eboCB55UKaSR89oLNhvHGvNmDGWHVS0,3855
18
+ PyPDFForm/middleware/image.py,sha256=eKM7anU56jbaECnK6rq0jGsBRY3HW_fM86fgA3hq7xA,585
19
+ PyPDFForm/middleware/radio.py,sha256=PuGDJ8RN1C-MkL9Jf14ABWYV67cN18R66dI4nR-03DU,2211
20
+ PyPDFForm/middleware/signature.py,sha256=P6Mg9AZP5jML7GawsteVZjDaunKb9Yazu5iy0qF60bo,2432
21
+ PyPDFForm/middleware/text.py,sha256=GLKuYvG4BUtNvj-3NkDeIlV1jcouhn7gAqfm9TBWduQ,3936
22
+ PyPDFForm/widgets/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
23
+ PyPDFForm/widgets/base.py,sha256=vudfjwlybj82pxQhy6K8Qds9osFqn219Ze3yMs8QQuU,5786
24
+ PyPDFForm/widgets/bedrock.py,sha256=j6beU04kaQzpAIFZHI5VJLaDT5RVAAa6LzkU1luJpN8,137660
25
+ PyPDFForm/widgets/checkbox.py,sha256=5Cg07d3SmRehkSiROtkK2vl0WKITxmv9BAKVnem8keM,1325
26
+ PyPDFForm/widgets/dropdown.py,sha256=6zZwt6eU9Hgwl-57QfyT3G6c37FkQTJ-XSsXGluWevs,1459
27
+ PyPDFForm/widgets/image.py,sha256=aSD-3MEZFIRL7HYVuO6Os8irfSUOLHA_rHGkqcEIPPA,855
28
+ PyPDFForm/widgets/radio.py,sha256=oFw8Um4g414UH93QJv6dZHRxpq0yuYog09B2W3eE8wo,2612
29
+ PyPDFForm/widgets/signature.py,sha256=L4Et6pxtrEh7U-lnnLZrnvb_dKwGNpI6TZ11HCD0lvY,5147
30
+ PyPDFForm/widgets/text.py,sha256=GjPwajoP20dZMlJGhJrQtwOa4VHGInjYkjUYmLwtRWs,1584
31
+ pypdfform-3.3.0.dist-info/licenses/LICENSE,sha256=43awmYkI6opyTpg19me731iO1WfXZwViqb67oWtCsFY,1065
32
+ pypdfform-3.3.0.dist-info/METADATA,sha256=K7q2yHg1rUw5hwO4gnmgNB8fR-UjPWce-z-0YY7gbWU,4538
33
+ pypdfform-3.3.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
34
+ pypdfform-3.3.0.dist-info/top_level.txt,sha256=GQQKuWqPUjT9YZqwK95NlAQzxjwoQrsxQ8ureM8lWOY,10
35
+ pypdfform-3.3.0.dist-info/RECORD,,
@@ -1,35 +0,0 @@
1
- PyPDFForm/__init__.py,sha256=Si2S1NESLXeTXj-HLtHdFCntkfhCP3WIG8wCgLdWkbk,925
2
- PyPDFForm/adapter.py,sha256=8E_PZlXU1ALWez_pWF_U52cynzowK_NQFYzMJoH9VUk,2428
3
- PyPDFForm/constants.py,sha256=GU0LcNbN-ttYQVVoFGQLysKByJYF4lKoMideU65z_wI,2523
4
- PyPDFForm/coordinate.py,sha256=VMVkqa-VAGJSGVvetZwOxeMzIgQtQdvtn_DI_qSecCE,3876
5
- PyPDFForm/filler.py,sha256=KwStL6YzrNBcDd919ig83MnAxopi8Vnz3QNJzN_CjNM,7272
6
- PyPDFForm/font.py,sha256=Nyk1dHgC9NBkXDTYiGz9eyCwHadpU-JjR-xOM604cpA,9053
7
- PyPDFForm/hooks.py,sha256=A8p67ubWvCvzRt346Q7BjOvbi4_NXBcynXmq6fJTadY,11679
8
- PyPDFForm/image.py,sha256=CAC69jEfSbWbyNJcjLhjWVSNJuFh7frMI70eaiFayHw,3823
9
- PyPDFForm/patterns.py,sha256=RiQKqsOMrB9u4KWj5Kv6GUmcuGI77xMvdOcOcHy_9qE,8717
10
- PyPDFForm/template.py,sha256=lKkja_8Sx6vun1tOklSpdNT1pdelhfVl10kX-G4sLlA,9673
11
- PyPDFForm/utils.py,sha256=hLSVUG6qnE0iTMB-yPNQQIhmm3R69X7fcnbCTDvSUQs,11001
12
- PyPDFForm/watermark.py,sha256=9p1tjaIqicXngTNai_iOEkCoXRYnR66azB4s7wNsZUw,9349
13
- PyPDFForm/wrapper.py,sha256=Ysd1mkvE5OfDkjUI6mNFIt16-JUKwj2ifbp1MioWPUo,25257
14
- PyPDFForm/middleware/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
15
- PyPDFForm/middleware/base.py,sha256=zBO9YP01dAEfFKoHKDg10XcpXEuYdFd-pb5wSFmJJj0,3091
16
- PyPDFForm/middleware/checkbox.py,sha256=OCSZEFD8wQG_Y9qO7Os6VXTaxJCpkRYTxI4wDgG0GZc,1870
17
- PyPDFForm/middleware/dropdown.py,sha256=4HkVNHoYzH0isdBIdjNtViBx263j4KmYtW0SYzER5zQ,2412
18
- PyPDFForm/middleware/image.py,sha256=eKM7anU56jbaECnK6rq0jGsBRY3HW_fM86fgA3hq7xA,585
19
- PyPDFForm/middleware/radio.py,sha256=PuGDJ8RN1C-MkL9Jf14ABWYV67cN18R66dI4nR-03DU,2211
20
- PyPDFForm/middleware/signature.py,sha256=a2IfD36zpEWXWNNWRvtJ6nG6TszkF6Wil82Szsbjfns,2149
21
- PyPDFForm/middleware/text.py,sha256=GLKuYvG4BUtNvj-3NkDeIlV1jcouhn7gAqfm9TBWduQ,3936
22
- PyPDFForm/widgets/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
23
- PyPDFForm/widgets/base.py,sha256=kOfxV8HSmwXoy0vFYPgJKKDQ7RUNMACWfX6T4HeJeOU,5177
24
- PyPDFForm/widgets/bedrock.py,sha256=j6beU04kaQzpAIFZHI5VJLaDT5RVAAa6LzkU1luJpN8,137660
25
- PyPDFForm/widgets/checkbox.py,sha256=s4a0a1pAemQyrz3SpZHzIPoVLvJZAV72KEfxKp15dyk,1281
26
- PyPDFForm/widgets/dropdown.py,sha256=6zZwt6eU9Hgwl-57QfyT3G6c37FkQTJ-XSsXGluWevs,1459
27
- PyPDFForm/widgets/image.py,sha256=aSD-3MEZFIRL7HYVuO6Os8irfSUOLHA_rHGkqcEIPPA,855
28
- PyPDFForm/widgets/radio.py,sha256=nWSQQp06kRISO7Q7FVFeB3PXYvMOSc0SMhRs1bHTxeQ,2261
29
- PyPDFForm/widgets/signature.py,sha256=EqIRIuKSQEg8LJZ_Mu859eEvs0dwO-mzkMNuhHG1Vsg,4034
30
- PyPDFForm/widgets/text.py,sha256=gtheE6_w0vQPRJJ9oj_l9FaMDEGnPtvVR6_axsrmxKI,1540
31
- pypdfform-3.1.3.dist-info/licenses/LICENSE,sha256=43awmYkI6opyTpg19me731iO1WfXZwViqb67oWtCsFY,1065
32
- pypdfform-3.1.3.dist-info/METADATA,sha256=2CB-pD0wqfNYa7yfdRcwCsxqzv2nfXE7asKLaSkyT20,4587
33
- pypdfform-3.1.3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
34
- pypdfform-3.1.3.dist-info/top_level.txt,sha256=GQQKuWqPUjT9YZqwK95NlAQzxjwoQrsxQ8ureM8lWOY,10
35
- pypdfform-3.1.3.dist-info/RECORD,,