acroforge 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c97cab951d5efe0d85539bcc55b368277b95bd1d6595141459be383a1a818c2e
4
- data.tar.gz: 7156afa0720c474b3ee0b287cb994ccb3438c1f58cd10ff297673d19d6b34a30
3
+ metadata.gz: 6fa6d2f65a5a566457acc4a635dcc0ade68017f5aac69f0ea6b1e48e092db8af
4
+ data.tar.gz: d38c53fd44947a5b690fe84fa0674b55260ce897ef94f6828abb6e135b50447c
5
5
  SHA512:
6
- metadata.gz: 35a79c84d8dd14affaa2e99a05c1b40fa7df8942e3e217adcaf05457de1e3303b49f7c9029e4768fa547ea83f41f96be7ece12cd889bc3efac37424a2d97ebbb
7
- data.tar.gz: e0c282b00ddc2f38321a1e2979028886c614fbea59651081f2a9f2d581b2c07b5abb41374ba1a078df2fd7f382b81a7ccf774bd4b5e16aff089487410852020b
6
+ metadata.gz: c73b051b74fff85ac980a1295359f118843f26d3708a75943e3dcde874f2306b90921dd1eeeb024b5ab515a103c14409f674d7a09a2cf7c8cb79532798499bb3
7
+ data.tar.gz: 8440de33e8f66f72ba88c98659fe4a3c527d2ab036820a280fbf65f112503a8eb8ff342b652971a4c5366558b873674e2c6679544b3ed58cf522ee3fc3939308
data/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## [0.3.0] - 2026-06-01
4
+
5
+ ### Added
6
+
7
+ - **`acroforge fields <pdf>` CLI subcommand** — lists every AcroForm field's name, type, and alternate name as an aligned table, or as JSON with `--json`. Decoded options maps render inline (table) or as nested objects (JSON). A PDF without an AcroForm prints a notice and still exits `0`.
8
+ - **Configurable normalized output path for `compile!`** — `Engine#compile!` now accepts a `normalized_out:` kwarg to write the normalized PDF to an exact path; omitting it keeps the existing constructor-derived `<base>_normalized.pdf` location, byte-for-byte. Passing `normalized_out:` equal to the template path overwrites it in place safely (staged through a temp file, then moved). The `acroforge compile` CLI now persists the normalized template by default — next to the input as `<base>_normalized.pdf` — and gains a `--out PATH` flag to send it to an explicit path plus an `--overwrite` flag to rewrite the input PDF in place (`--out` and `--overwrite` are mutually exclusive).
9
+
10
+ ### Changed
11
+
12
+ - **`Engine#fields` decodes persisted options maps.** When `/TU` holds the JSON options map that `compile!` writes for button/choice fields, `:alternate_name` is now returned as a symbol-keyed Hash (e.g. `{ male: "0", female: "1" }`) instead of the raw JSON string. Plain-text tooltips and missing `/TU` entries are unaffected (still String / `nil`).
13
+ - **Image stamping extracted to `AcroForge::ImageStamper`** (internal). `Engine#fill!` behavior, the `ImageTooLargeError` / `UnsupportedImageFormatError` classes, and the validation rules are unchanged; the `MAX_IMAGE_BYTES` / `MAX_IMAGE_DIMENSION` / `TARGET_PPI` constants now live on `ImageStamper` instead of `Engine`.
14
+
15
+ ### Fixed
16
+
17
+ - **Choice fields now auto-map.** `compile!` built options maps for combo/list boxes from `/Opt` but never ran the spatial label lookup on them, so a garbage-named choice field always landed in `unmapped` and its options were discarded. Choice fields now get the same nearest-label resolution as text fields, and their options persist to `/TU` and `select_options`.
18
+
19
+ ### Internal
20
+
21
+ - Synthetic fixture PDFs now carry `/AP` appearance streams on radio/checkbox widgets, so `compile!`'s options-map discovery runs against them in CI. New round-trip spec covers compile! → fields → fill! through the persisted `/TU` map.
22
+
3
23
  ## [0.2.0] - 2026-05-28
4
24
 
5
25
  ### Changed (breaking)
data/README.md CHANGED
@@ -56,17 +56,19 @@ $ acroforge bootstrap broken_form.pdf
56
56
  ## CLI
57
57
 
58
58
  ```text
59
+ acroforge fields <pdf> [--json]
59
60
  acroforge schema infer <pdf> [--out schema.yml] [--sections a,b,c]
60
61
  acroforge relabel propose <pdf> [--out mapping.yml] [--schema schema.yml] [--merge|--overwrite]
61
62
  acroforge relabel apply <pdf> <mapping.yml>
62
- acroforge compile <pdf> [--schema schema.yml]
63
+ acroforge compile <pdf> [--schema schema.yml] [--out normalized.pdf | --overwrite]
63
64
  acroforge bootstrap <pdf> [--schema-out s.yml] [--mapping-out m.yml]
64
65
  acroforge version
65
66
  acroforge help
66
67
  ```
67
68
 
68
69
  | Subcommand | What it does |
69
- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
70
+ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
71
+ | `fields` | Lists every AcroForm field — name, type, alternate name — as a table, or as JSON with `--json`. Read-only; the quickest way to see what's inside a PDF. |
70
72
  | `schema infer` | Runs the heuristic on a PDF and writes a starter schema (canonical key → type + variations). Advisory; you review and edit. |
71
73
  | `relabel propose` | Writes a YAML mapping file proposing a semantic name for every AcroForm field. Sorted by page → top-to-bottom → left-to-right. Default mode `--merge` preserves any `key`/`type` values you've already edited. |
72
74
  | `relabel apply` | Reads a corrected mapping file and rewrites `field[:T]` / `field[:TU]` in the source PDF in place. Auto-disambiguates collisions (`full_name`, `full_name_1`, ...). |
@@ -178,7 +180,7 @@ AcroForge also accepts a legacy "shorthand" form where the value is just an arra
178
180
  _meta:
179
181
  source_pdf: broken_form.pdf
180
182
  generated_at: 2026-05-26T14:32:11Z
181
- acroforge_version: 0.2.0
183
+ acroforge_version: 0.3.0
182
184
  total_fields: 98
183
185
 
184
186
  page0_field6:
data/lib/acroforge/cli.rb CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  require "optparse"
4
4
  require "yaml"
5
+ require "json"
5
6
  require_relative "../acroforge"
6
7
 
7
8
  module AcroForge
@@ -11,7 +12,7 @@ module AcroForge
11
12
  EXIT_VALIDATION_ERROR = 2
12
13
  EXIT_INTERNAL_ERROR = 3
13
14
 
14
- SUBCOMMANDS = %w[schema relabel compile bootstrap annotate prepare version help].freeze
15
+ SUBCOMMANDS = %w[fields schema relabel compile bootstrap annotate prepare version help].freeze
15
16
 
16
17
  module_function
17
18
 
@@ -48,11 +49,12 @@ module AcroForge
48
49
  acroforge: PDF AcroForm engine + relabeler
49
50
 
50
51
  Usage:
52
+ acroforge fields <pdf> [--json]
51
53
  acroforge schema infer <pdf> [--out schema.yml] [--sections a,b,c] [-v]
52
54
  acroforge schema merge <mapping.yml> [--schema schema.yml] [--out schema.yml]
53
55
  acroforge relabel propose <pdf> [--out mapping.yml] [--schema schema.yml] [--merge|--overwrite] [-v]
54
56
  acroforge relabel apply <pdf> <mapping.yml> [--annotate[=PATH]] [-v]
55
- acroforge compile <pdf> [--schema schema.yml]
57
+ acroforge compile <pdf> [--schema schema.yml] [--out normalized.pdf | --overwrite]
56
58
  acroforge bootstrap <pdf> [--schema-out s.yml] [--mapping-out m.yml] [-v]
57
59
  acroforge annotate <pdf> [--mapping mapping.yml] [--out annotated.pdf]
58
60
  acroforge prepare <pdf> [--out prepared.pdf] [--schema schema.yml]
@@ -101,6 +103,42 @@ module AcroForge
101
103
  puts "Applied to #{pdf}: #{parts.join(", ")}."
102
104
  end
103
105
 
106
+ def cmd_fields(argv)
107
+ json = false
108
+ OptionParser.new { |o| o.on("--json") { json = true } }.parse!(argv)
109
+ pdf = argv.shift
110
+ raise ArgumentError, "usage: acroforge fields <pdf> [--json]" unless pdf
111
+ raise Errno::ENOENT, pdf unless File.exist?(pdf)
112
+
113
+ fields = AcroForge::Engine.new(pdf).fields
114
+
115
+ if json
116
+ puts JSON.pretty_generate(fields)
117
+ elsif fields.empty?
118
+ puts "No AcroForm fields found in #{pdf}."
119
+ else
120
+ print_fields_table(fields)
121
+ end
122
+ EXIT_OK
123
+ end
124
+
125
+ def print_fields_table(fields)
126
+ headers = ["NAME", "TYPE", "ALTERNATE NAME"]
127
+ rows = fields.map { |f| [f[:name].to_s, f[:type].to_s, format_alternate_name(f[:alternate_name])] }
128
+ widths = headers.each_with_index.map { |h, i| ([h] + rows.map { |r| r[i] }).map(&:length).max }
129
+ ([headers] + rows).each do |row|
130
+ puts row.each_with_index.map { |cell, i| cell.ljust(widths[i]) }.join(" ").rstrip
131
+ end
132
+ end
133
+
134
+ def format_alternate_name(alt)
135
+ case alt
136
+ when nil then "—"
137
+ when Hash then "{#{alt.map { |k, v| "#{k}: #{v.inspect}" }.join(", ")}}"
138
+ else alt.to_s
139
+ end
140
+ end
141
+
104
142
  def cmd_schema(argv)
105
143
  action = argv.shift
106
144
  case action
@@ -233,20 +271,26 @@ module AcroForge
233
271
 
234
272
  def cmd_compile(argv)
235
273
  schema_path = nil
274
+ out = nil
275
+ overwrite = false
236
276
  OptionParser.new do |opts|
237
277
  opts.on("--schema PATH") { |v| schema_path = v }
278
+ opts.on("--out PATH", "Write the normalized PDF to PATH") { |v| out = v }
279
+ opts.on("--overwrite", "Write the normalized PDF back over the input PDF in place") { overwrite = true }
238
280
  end.parse!(argv)
239
281
  pdf = argv.shift
240
282
  raise ArgumentError, "missing <pdf> argument" if pdf.nil?
283
+ raise ArgumentError, "--out and --overwrite are mutually exclusive" if out && overwrite
241
284
  raise Errno::ENOENT, pdf unless File.exist?(pdf)
242
285
 
243
286
  schema = schema_path ? AcroForge::Schema.load(schema_path) : {}
244
- require "tmpdir"
245
- Dir.mktmpdir do |tmp|
246
- engine = AcroForge::Engine.new(pdf, schema: schema, normalized_dir: tmp)
247
- result = engine.compile!
248
- puts "Mapped: #{result[:mapped].size}, Unmapped: #{result[:unmapped].size}"
249
- end
287
+ out ||= pdf if overwrite
288
+
289
+ engine = AcroForge::Engine.new(pdf, schema: schema, normalized_dir: out ? File.dirname(out) : nil)
290
+ result = engine.compile!(normalized_out: out)
291
+ puts "Mapped: #{result[:mapped].size}, Unmapped: #{result[:unmapped].size}"
292
+ where = overwrite ? "#{engine.normalized_path} (in place)" : engine.normalized_path
293
+ puts "Wrote #{where}: normalized template."
250
294
  EXIT_OK
251
295
  end
252
296
 
@@ -335,7 +379,7 @@ module AcroForge
335
379
  require "tmpdir"
336
380
  Dir.mktmpdir do |tmp|
337
381
  engine = AcroForge::Engine.new(pdf, normalized_dir: tmp)
338
- silenced(verbose: verbose) { engine.compile! }
382
+ silenced(verbose: verbose) { engine.compile!(announce_output: false) }
339
383
 
340
384
  schema = AcroForge::Schema.infer(pdf, engine: engine)
341
385
  AcroForge::Schema.dump(schema, schema_out)
@@ -9,19 +9,10 @@ require_relative "all_text_processor"
9
9
  require_relative "validator"
10
10
  require_relative "constants"
11
11
  require_relative "labels"
12
+ require_relative "image_stamper"
12
13
 
13
14
  module AcroForge
14
- ImageTooLargeError = Class.new(Error)
15
- UnsupportedImageFormatError = Class.new(Error)
16
-
17
15
  class Engine
18
- # Caps a phone-camera passport photo from bloating the output PDF.
19
- MAX_IMAGE_BYTES = 5 * 1024 * 1024
20
- MAX_IMAGE_DIMENSION = 4000
21
- # Auto-downsample images whose pixel resolution far exceeds this PPI
22
- # at the widget's rendered size. Requires ImageMagick on PATH.
23
- TARGET_PPI = 200
24
-
25
16
  attr_reader :template_path, :schema, :overrides, :sections, :normalized_path,
26
17
  :mapped_fields, :unmapped_fields, :filled_fields, :missing_fields,
27
18
  :select_field_options, :new_fields_detected
@@ -68,7 +59,7 @@ module AcroForge
68
59
  elsif field.is_a?(HexaPDF::Type::AcroForm::ChoiceField) then :choice
69
60
  else :other
70
61
  end
71
- extracted << {name: field.full_field_name, type: type, alternate_name: field[:TU]}
62
+ extracted << {name: field.full_field_name, type: type, alternate_name: decode_alternate_name(field[:TU])}
72
63
  end
73
64
  extracted
74
65
  end
@@ -123,7 +114,8 @@ module AcroForge
123
114
  # every field, rename each to its proposed semantic key in place, and
124
115
  # persist the result to @normalized_path. The returned hash is what the
125
116
  # Relabeler and Schema.infer consume.
126
- def compile!
117
+ def compile!(normalized_out: nil, announce_output: true)
118
+ @normalized_path = normalized_out if normalized_out
127
119
  puts ">> Compiling template: #{@template_path}"
128
120
  form = source_doc.acro_form(create: true)
129
121
 
@@ -320,6 +312,9 @@ module AcroForge
320
312
  end
321
313
  end
322
314
  end
315
+
316
+ raw_label = find_nearest_text(page_text_map[page_index], widget[:Rect], mode: :standard)
317
+ raw_label = AcroForge::Labels.humanize(raw_label)
323
318
  else
324
319
  field_rect = widget[:Rect]
325
320
  raw_label = find_nearest_text(page_text_map[page_index], field_rect, mode: :standard)
@@ -464,9 +459,9 @@ module AcroForge
464
459
  end
465
460
  end
466
461
 
467
- source_doc.write(@normalized_path, optimize: true)
462
+ write_normalized!
468
463
  puts ">> Compilation Complete. #{mapped_count} fields mapped."
469
- puts ">> Clean template saved to: #{@normalized_path}\n\n"
464
+ puts ">> Clean template saved to: #{@normalized_path}\n\n" if announce_output
470
465
 
471
466
  {
472
467
  mapped: @mapped_fields,
@@ -506,7 +501,7 @@ module AcroForge
506
501
  if doc_field
507
502
  begin
508
503
  if image_upload?(doc_field) && image_path?(value)
509
- stamp_image_on_widget(normalized_doc, doc_field, value)
504
+ image_stamper.stamp!(normalized_doc, doc_field, value)
510
505
  @filled_fields[key] = value
511
506
  puts " [Stamped] :#{key} <- #{value}"
512
507
  next
@@ -559,7 +554,7 @@ module AcroForge
559
554
  doc_field.field_value = on_state_sym.to_s
560
555
  doc_field.each_widget { |w| w[:AS] = on_state_sym }
561
556
  elsif ["false", "no", "off", "0"].include?(normalized_val)
562
- doc_field.field_value = "Off"
557
+ doc_field.field_value = :Off
563
558
  doc_field.each_widget { |w| w[:AS] = :Off }
564
559
  else
565
560
  doc_field.field_value = value.to_s
@@ -610,6 +605,36 @@ module AcroForge
610
605
 
611
606
  private
612
607
 
608
+ # Writing back over the template HexaPDF still holds open can corrupt or
609
+ # truncate it (the writer reads lazily-loaded objects from that same handle).
610
+ # When the target is the source, stage to a sibling temp file and atomically
611
+ # move it into place once the write has fully completed.
612
+ def write_normalized!
613
+ same_file = File.expand_path(@normalized_path) == File.expand_path(@template_path)
614
+ unless same_file
615
+ source_doc.write(@normalized_path, optimize: true)
616
+ return
617
+ end
618
+
619
+ require "tmpdir"
620
+ require "fileutils"
621
+ tmp = File.join(File.dirname(@normalized_path), ".#{File.basename(@normalized_path)}.#{Process.pid}.tmp")
622
+ source_doc.write(tmp, optimize: true)
623
+ FileUtils.mv(tmp, @normalized_path)
624
+ ensure
625
+ FileUtils.rm_f(tmp) if tmp && File.exist?(tmp)
626
+ end
627
+
628
+ # compile! repurposes /TU to persist button/choice option maps as JSON
629
+ # (see fill!'s decode). Surface those as hashes; real tooltips stay strings.
630
+ def decode_alternate_name(value)
631
+ return value unless value.is_a?(String) && value.lstrip.start_with?("{")
632
+ parsed = JSON.parse(value, symbolize_names: true)
633
+ parsed.is_a?(Hash) ? parsed : value
634
+ rescue JSON::ParserError
635
+ value
636
+ end
637
+
613
638
  def confidence_for(raw_label, target_key)
614
639
  return :none if raw_label.nil? || raw_label.strip.empty?
615
640
  return :high if target_key && @schema.key?(target_key.to_s.to_sym)
@@ -642,161 +667,8 @@ module AcroForge
642
667
  value.is_a?(String) && File.file?(value)
643
668
  end
644
669
 
645
- def stamp_image_on_widget(doc, field, path)
646
- format, image_width, image_height = validate_image!(path)
647
- widget = field.each_widget.first
648
- return unless widget && widget[:Rect]
649
-
650
- page = doc.pages.find { |candidate_page| candidate_page[:Annots]&.include?(widget) }
651
- return unless page
652
-
653
- # Widget Rect is absolute page coords; canvas API is MediaBox-relative.
654
- media_box_x, media_box_y = page.box.value[0], page.box.value[1]
655
- rect_x_min, rect_y_min, rect_x_max, rect_y_max = widget[:Rect]
656
- slot_width = rect_x_max - rect_x_min
657
- slot_height = rect_y_max - rect_y_min
658
- slot_canvas_x = rect_x_min - media_box_x
659
- slot_canvas_y = rect_y_min - media_box_y
660
-
661
- stamp_path = prepare_image_for_slot(path, format, image_width, image_height,
662
- slot_width, slot_height) || path
663
- if stamp_path != path
664
- _, image_width, image_height = image_dimensions(stamp_path)
665
- end
666
-
667
- draw_width, draw_height = fit_inside(image_width, image_height, slot_width, slot_height)
668
- draw_x = slot_canvas_x + (slot_width - draw_width) / 2.0
669
- draw_y = slot_canvas_y + (slot_height - draw_height) / 2.0
670
-
671
- canvas = page.canvas(type: :overlay)
672
- canvas.fill_color(255, 255, 255)
673
- canvas.rectangle(slot_canvas_x, slot_canvas_y, slot_width, slot_height).fill
674
- canvas.image(stamp_path, at: [draw_x, draw_y], width: draw_width, height: draw_height)
675
-
676
- # Bake into the page so the widget's empty appearance doesn't repaint over the image.
677
- page[:Annots].delete(widget)
678
- end
679
-
680
- def fit_inside(image_width, image_height, slot_width, slot_height)
681
- scale = [slot_width.to_f / image_width, slot_height.to_f / image_height].min
682
- [image_width * scale, image_height * scale]
683
- end
684
-
685
- # Trim removes the transparent border around a signature; downsample
686
- # caps source resolution at TARGET_PPI for the widget's longer side.
687
- def prepare_image_for_slot(path, format, image_width, image_height,
688
- slot_width_pt, slot_height_pt)
689
- return nil unless imagemagick_available?
690
- slot_max_pt = [slot_width_pt, slot_height_pt].max
691
- target_max_px = (slot_max_pt * TARGET_PPI / 72.0).ceil
692
- needs_resize = image_width > target_max_px * 2 || image_height > target_max_px * 2
693
- needs_trim = format == :png && png_with_alpha?(path)
694
- return nil unless needs_resize || needs_trim
695
-
696
- ext = (format == :png) ? ".png" : ".jpg"
697
- require "securerandom"
698
- require "tmpdir"
699
- output_path = File.join(Dir.tmpdir,
700
- "acroforge_stamp_#{Process.pid}_#{SecureRandom.hex(4)}#{ext}")
701
- # `format:path` locks the coder, closing the CVE-2016-3714 (ImageTragick) class of attack.
702
- args = ["convert", "#{format}:#{path}"]
703
- args.push("-trim", "+repage") if needs_trim
704
- args.push("-resize", "#{target_max_px}x#{target_max_px}>") if needs_resize
705
- args.push(output_path)
706
- success = system(*args, out: File::NULL, err: File::NULL)
707
- (success && File.exist?(output_path)) ? output_path : nil
708
- end
709
-
710
- # PNG color type 4 = greyscale+alpha, 6 = RGBA — only these are trim-worthy.
711
- def png_with_alpha?(path)
712
- File.open(path, "rb") do |io|
713
- return false unless io.read(8) == "\x89PNG\r\n\x1A\n".b
714
- return false if io.read(8).nil?
715
- return false if io.read(8).nil?
716
- return false if io.read(1).nil?
717
- color_type_byte = io.read(1)
718
- return false if color_type_byte.nil?
719
- color_type = color_type_byte.unpack1("C")
720
- color_type == 4 || color_type == 6
721
- end
722
- end
723
-
724
- def imagemagick_available?
725
- return @imagemagick_available if defined?(@imagemagick_available)
726
- @imagemagick_available = system("which", "convert", out: File::NULL, err: File::NULL)
727
- end
728
-
729
- # Trust boundary in front of ImageMagick: any malformed-input path
730
- # raises a single error class so worker retry policies can key on it.
731
- def validate_image!(path)
732
- size = File.size(path)
733
- if size > MAX_IMAGE_BYTES
734
- raise ImageTooLargeError, "#{path}: #{size} bytes exceeds #{MAX_IMAGE_BYTES} byte cap"
735
- end
736
- format, width, height = image_dimensions(path)
737
- if width > MAX_IMAGE_DIMENSION || height > MAX_IMAGE_DIMENSION
738
- raise ImageTooLargeError,
739
- "#{path}: #{width}x#{height}px exceeds #{MAX_IMAGE_DIMENSION}px per side"
740
- end
741
- [format, width, height]
742
- end
743
-
744
- def image_dimensions(path)
745
- File.open(path, "rb") do |io|
746
- head = read_exact(io, 8, path)
747
- io.rewind
748
- if head.start_with?("\x89PNG\r\n\x1A\n".b)
749
- width, height = read_png_dimensions(io, path)
750
- [:png, width, height]
751
- elsif head[0, 2] == "\xFF\xD8".b
752
- width, height = read_jpeg_dimensions(io, path)
753
- [:jpg, width, height]
754
- else
755
- raise_unsupported(path)
756
- end
757
- end
758
- end
759
-
760
- def read_png_dimensions(io, path)
761
- read_exact(io, 16, path) # 8-byte signature + 4 length + "IHDR"
762
- width = read_exact(io, 4, path).unpack1("N")
763
- height = read_exact(io, 4, path).unpack1("N")
764
- [width, height]
765
- end
766
-
767
- def read_jpeg_dimensions(io, path)
768
- read_exact(io, 2, path) # SOI
769
- loop do
770
- marker_byte = read_exact(io, 1, path).getbyte(0)
771
- raise_unsupported(path) unless marker_byte == 0xFF
772
- # Runs of 0xFF are valid JPEG fill bytes between markers.
773
- marker_code = read_exact(io, 1, path).getbyte(0)
774
- marker_code = read_exact(io, 1, path).getbyte(0) while marker_code == 0xFF
775
- raise_unsupported(path, "no SOF marker found") if marker_code == 0xD9 || marker_code == 0x00
776
- # 0xD0..0xD7 and 0x01 are standalone markers — no length follows.
777
- next if (0xD0..0xD7).cover?(marker_code) || marker_code == 0x01
778
- segment_length = read_exact(io, 2, path).unpack1("n")
779
- raise_unsupported(path, "negative segment length") if segment_length < 2
780
- is_sof_marker = (0xC0..0xCF).cover?(marker_code) && ![0xC4, 0xC8, 0xCC].include?(marker_code)
781
- if is_sof_marker
782
- read_exact(io, 1, path) # precision
783
- height = read_exact(io, 2, path).unpack1("n")
784
- width = read_exact(io, 2, path).unpack1("n")
785
- return [width, height]
786
- else
787
- read_exact(io, segment_length - 2, path)
788
- end
789
- end
790
- end
791
-
792
- def read_exact(io, byte_count, path)
793
- buf = io.read(byte_count)
794
- raise_unsupported(path, "truncated header") if buf.nil? || buf.bytesize < byte_count
795
- buf
796
- end
797
-
798
- def raise_unsupported(path, reason = "only JPG and PNG are supported")
799
- raise UnsupportedImageFormatError, "#{path}: #{reason}"
670
+ def image_stamper
671
+ @image_stamper ||= ImageStamper.new
800
672
  end
801
673
 
802
674
  # Heuristic key for push-button image fields based on widget aspect ratio.
@@ -0,0 +1,175 @@
1
+ # frozen_string_literal: true
2
+
3
+ module AcroForge
4
+ ImageTooLargeError = Class.new(Error)
5
+ UnsupportedImageFormatError = Class.new(Error)
6
+
7
+ # Stamps JPG/PNG files into AcroForm widget rectangles.
8
+ class ImageStamper
9
+ # Caps a phone-camera passport photo from bloating the output PDF.
10
+ MAX_IMAGE_BYTES = 5 * 1024 * 1024
11
+ MAX_IMAGE_DIMENSION = 4000
12
+ # Auto-downsample images whose pixel resolution far exceeds this PPI
13
+ # at the widget's rendered size. Requires ImageMagick on PATH.
14
+ TARGET_PPI = 200
15
+
16
+ def stamp!(doc, field, path)
17
+ format, image_width, image_height = validate_image!(path)
18
+ widget = field.each_widget.first
19
+ return unless widget && widget[:Rect]
20
+
21
+ page = doc.pages.find { |candidate_page| candidate_page[:Annots]&.include?(widget) }
22
+ return unless page
23
+
24
+ # Widget Rect is absolute page coords; canvas API is MediaBox-relative.
25
+ media_box_x, media_box_y = page.box.value[0], page.box.value[1]
26
+ rect_x_min, rect_y_min, rect_x_max, rect_y_max = widget[:Rect]
27
+ slot_width = rect_x_max - rect_x_min
28
+ slot_height = rect_y_max - rect_y_min
29
+ slot_canvas_x = rect_x_min - media_box_x
30
+ slot_canvas_y = rect_y_min - media_box_y
31
+
32
+ stamp_path = prepare_image_for_slot(path, format, image_width, image_height,
33
+ slot_width, slot_height) || path
34
+ if stamp_path != path
35
+ _, image_width, image_height = image_dimensions(stamp_path)
36
+ end
37
+
38
+ draw_width, draw_height = fit_inside(image_width, image_height, slot_width, slot_height)
39
+ draw_x = slot_canvas_x + (slot_width - draw_width) / 2.0
40
+ draw_y = slot_canvas_y + (slot_height - draw_height) / 2.0
41
+
42
+ canvas = page.canvas(type: :overlay)
43
+ canvas.fill_color(255, 255, 255)
44
+ canvas.rectangle(slot_canvas_x, slot_canvas_y, slot_width, slot_height).fill
45
+ canvas.image(stamp_path, at: [draw_x, draw_y], width: draw_width, height: draw_height)
46
+
47
+ # Bake into the page so the widget's empty appearance doesn't repaint over the image.
48
+ page[:Annots].delete(widget)
49
+ end
50
+
51
+ # Trust boundary in front of ImageMagick: any malformed-input path
52
+ # raises a single error class so worker retry policies can key on it.
53
+ def validate_image!(path)
54
+ size = File.size(path)
55
+ if size > MAX_IMAGE_BYTES
56
+ raise ImageTooLargeError, "#{path}: #{size} bytes exceeds #{MAX_IMAGE_BYTES} byte cap"
57
+ end
58
+ format, width, height = image_dimensions(path)
59
+ if width > MAX_IMAGE_DIMENSION || height > MAX_IMAGE_DIMENSION
60
+ raise ImageTooLargeError,
61
+ "#{path}: #{width}x#{height}px exceeds #{MAX_IMAGE_DIMENSION}px per side"
62
+ end
63
+ [format, width, height]
64
+ end
65
+
66
+ private
67
+
68
+ def fit_inside(image_width, image_height, slot_width, slot_height)
69
+ scale = [slot_width.to_f / image_width, slot_height.to_f / image_height].min
70
+ [image_width * scale, image_height * scale]
71
+ end
72
+
73
+ # Trim removes the transparent border around a signature; downsample
74
+ # caps source resolution at TARGET_PPI for the widget's longer side.
75
+ def prepare_image_for_slot(path, format, image_width, image_height,
76
+ slot_width_pt, slot_height_pt)
77
+ return nil unless imagemagick_available?
78
+ slot_max_pt = [slot_width_pt, slot_height_pt].max
79
+ target_max_px = (slot_max_pt * TARGET_PPI / 72.0).ceil
80
+ needs_resize = image_width > target_max_px * 2 || image_height > target_max_px * 2
81
+ needs_trim = format == :png && png_with_alpha?(path)
82
+ return nil unless needs_resize || needs_trim
83
+
84
+ ext = (format == :png) ? ".png" : ".jpg"
85
+ require "securerandom"
86
+ require "tmpdir"
87
+ output_path = File.join(Dir.tmpdir,
88
+ "acroforge_stamp_#{Process.pid}_#{SecureRandom.hex(4)}#{ext}")
89
+ # `format:path` locks the coder, closing the CVE-2016-3714 (ImageTragick) class of attack.
90
+ args = ["convert", "#{format}:#{path}"]
91
+ args.push("-trim", "+repage") if needs_trim
92
+ args.push("-resize", "#{target_max_px}x#{target_max_px}>") if needs_resize
93
+ args.push(output_path)
94
+ success = system(*args, out: File::NULL, err: File::NULL)
95
+ (success && File.exist?(output_path)) ? output_path : nil
96
+ end
97
+
98
+ # PNG color type 4 = greyscale+alpha, 6 = RGBA — only these are trim-worthy.
99
+ def png_with_alpha?(path)
100
+ File.open(path, "rb") do |io|
101
+ return false unless io.read(8) == "\x89PNG\r\n\x1A\n".b
102
+ return false if io.read(8).nil?
103
+ return false if io.read(8).nil?
104
+ return false if io.read(1).nil?
105
+ color_type_byte = io.read(1)
106
+ return false if color_type_byte.nil?
107
+ color_type = color_type_byte.unpack1("C")
108
+ color_type == 4 || color_type == 6
109
+ end
110
+ end
111
+
112
+ def imagemagick_available?
113
+ return @imagemagick_available if defined?(@imagemagick_available)
114
+ @imagemagick_available = system("which", "convert", out: File::NULL, err: File::NULL)
115
+ end
116
+
117
+ def image_dimensions(path)
118
+ File.open(path, "rb") do |io|
119
+ head = read_exact(io, 8, path)
120
+ io.rewind
121
+ if head.start_with?("\x89PNG\r\n\x1A\n".b)
122
+ width, height = read_png_dimensions(io, path)
123
+ [:png, width, height]
124
+ elsif head[0, 2] == "\xFF\xD8".b
125
+ width, height = read_jpeg_dimensions(io, path)
126
+ [:jpg, width, height]
127
+ else
128
+ raise_unsupported(path)
129
+ end
130
+ end
131
+ end
132
+
133
+ def read_png_dimensions(io, path)
134
+ read_exact(io, 16, path) # 8-byte signature + 4 length + "IHDR"
135
+ width = read_exact(io, 4, path).unpack1("N")
136
+ height = read_exact(io, 4, path).unpack1("N")
137
+ [width, height]
138
+ end
139
+
140
+ def read_jpeg_dimensions(io, path)
141
+ read_exact(io, 2, path) # SOI
142
+ loop do
143
+ marker_byte = read_exact(io, 1, path).getbyte(0)
144
+ raise_unsupported(path) unless marker_byte == 0xFF
145
+ # Runs of 0xFF are valid JPEG fill bytes between markers.
146
+ marker_code = read_exact(io, 1, path).getbyte(0)
147
+ marker_code = read_exact(io, 1, path).getbyte(0) while marker_code == 0xFF
148
+ raise_unsupported(path, "no SOF marker found") if marker_code == 0xD9 || marker_code == 0x00
149
+ # 0xD0..0xD7 and 0x01 are standalone markers — no length follows.
150
+ next if (0xD0..0xD7).cover?(marker_code) || marker_code == 0x01
151
+ segment_length = read_exact(io, 2, path).unpack1("n")
152
+ raise_unsupported(path, "negative segment length") if segment_length < 2
153
+ is_sof_marker = (0xC0..0xCF).cover?(marker_code) && ![0xC4, 0xC8, 0xCC].include?(marker_code)
154
+ if is_sof_marker
155
+ read_exact(io, 1, path) # precision
156
+ height = read_exact(io, 2, path).unpack1("n")
157
+ width = read_exact(io, 2, path).unpack1("n")
158
+ return [width, height]
159
+ else
160
+ read_exact(io, segment_length - 2, path)
161
+ end
162
+ end
163
+ end
164
+
165
+ def read_exact(io, byte_count, path)
166
+ buf = io.read(byte_count)
167
+ raise_unsupported(path, "truncated header") if buf.nil? || buf.bytesize < byte_count
168
+ buf
169
+ end
170
+
171
+ def raise_unsupported(path, reason = "only JPG and PNG are supported")
172
+ raise UnsupportedImageFormatError, "#{path}: #{reason}"
173
+ end
174
+ end
175
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module AcroForge
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
data/lib/acroforge.rb CHANGED
@@ -10,6 +10,7 @@ end
10
10
  require_relative "acroforge/all_text_processor"
11
11
  require_relative "acroforge/labels"
12
12
  require_relative "acroforge/validator"
13
+ require_relative "acroforge/image_stamper"
13
14
  require_relative "acroforge/engine"
14
15
  require_relative "acroforge/schema"
15
16
  require_relative "acroforge/relabeler"
metadata CHANGED
@@ -1,13 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: acroforge
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maxwell Nana Forson
8
+ autorequire:
8
9
  bindir: exe
9
10
  cert_chain: []
10
- date: 1980-01-02 00:00:00.000000000 Z
11
+ date: 2026-06-13 00:00:00.000000000 Z
11
12
  dependencies:
12
13
  - !ruby/object:Gem::Dependency
13
14
  name: hexapdf
@@ -44,6 +45,7 @@ files:
44
45
  - lib/acroforge/cli.rb
45
46
  - lib/acroforge/constants.rb
46
47
  - lib/acroforge/engine.rb
48
+ - lib/acroforge/image_stamper.rb
47
49
  - lib/acroforge/labels.rb
48
50
  - lib/acroforge/preparer.rb
49
51
  - lib/acroforge/relabeler.rb
@@ -61,6 +63,7 @@ metadata:
61
63
  changelog_uri: https://github.com/Lzcorp-Solutions/acroforge/blob/main/CHANGELOG.md
62
64
  documentation_uri: https://lzcorp-solutions.github.io/acroforge/
63
65
  rubygems_mfa_required: 'true'
66
+ post_install_message:
64
67
  rdoc_options: []
65
68
  require_paths:
66
69
  - lib
@@ -75,7 +78,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
75
78
  - !ruby/object:Gem::Version
76
79
  version: '0'
77
80
  requirements: []
78
- rubygems_version: 4.0.12
81
+ rubygems_version: 3.1.6
82
+ signing_key:
79
83
  specification_version: 4
80
84
  summary: PDF AcroForm engine with heuristic-assisted field relabeling.
81
85
  test_files: []