acroforge 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -0
- data/README.md +5 -3
- data/lib/acroforge/cli.rb +53 -9
- data/lib/acroforge/engine.rb +43 -171
- data/lib/acroforge/image_stamper.rb +175 -0
- data/lib/acroforge/version.rb +1 -1
- data/lib/acroforge.rb +1 -0
- metadata +7 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6fa6d2f65a5a566457acc4a635dcc0ade68017f5aac69f0ea6b1e48e092db8af
|
|
4
|
+
data.tar.gz: d38c53fd44947a5b690fe84fa0674b55260ce897ef94f6828abb6e135b50447c
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: c73b051b74fff85ac980a1295359f118843f26d3708a75943e3dcde874f2306b90921dd1eeeb024b5ab515a103c14409f674d7a09a2cf7c8cb79532798499bb3
|
|
7
|
+
data.tar.gz: 8440de33e8f66f72ba88c98659fe4a3c527d2ab036820a280fbf65f112503a8eb8ff342b652971a4c5366558b873674e2c6679544b3ed58cf522ee3fc3939308
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,25 @@
|
|
|
1
1
|
# CHANGELOG
|
|
2
2
|
|
|
3
|
+
## [0.3.0] - 2026-06-01
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- **`acroforge fields <pdf>` CLI subcommand** — lists every AcroForm field's name, type, and alternate name as an aligned table, or as JSON with `--json`. Decoded options maps render inline (table) or as nested objects (JSON). A PDF without an AcroForm prints a notice and still exits `0`.
|
|
8
|
+
- **Configurable normalized output path for `compile!`** — `Engine#compile!` now accepts a `normalized_out:` kwarg to write the normalized PDF to an exact path; omitting it keeps the existing constructor-derived `<base>_normalized.pdf` location, byte-for-byte. Passing `normalized_out:` equal to the template path overwrites it in place safely (staged through a temp file, then moved). The `acroforge compile` CLI now persists the normalized template by default — next to the input as `<base>_normalized.pdf` — and gains a `--out PATH` flag to send it to an explicit path plus an `--overwrite` flag to rewrite the input PDF in place (`--out` and `--overwrite` are mutually exclusive).
|
|
9
|
+
|
|
10
|
+
### Changed
|
|
11
|
+
|
|
12
|
+
- **`Engine#fields` decodes persisted options maps.** When `/TU` holds the JSON options map that `compile!` writes for button/choice fields, `:alternate_name` is now returned as a symbol-keyed Hash (e.g. `{ male: "0", female: "1" }`) instead of the raw JSON string. Plain-text tooltips and missing `/TU` entries are unaffected (still String / `nil`).
|
|
13
|
+
- **Image stamping extracted to `AcroForge::ImageStamper`** (internal). `Engine#fill!` behavior, the `ImageTooLargeError` / `UnsupportedImageFormatError` classes, and the validation rules are unchanged; the `MAX_IMAGE_BYTES` / `MAX_IMAGE_DIMENSION` / `TARGET_PPI` constants now live on `ImageStamper` instead of `Engine`.
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
|
|
17
|
+
- **Choice fields now auto-map.** `compile!` built options maps for combo/list boxes from `/Opt` but never ran the spatial label lookup on them, so a garbage-named choice field always landed in `unmapped` and its options were discarded. Choice fields now get the same nearest-label resolution as text fields, and their options persist to `/TU` and `select_options`.
|
|
18
|
+
|
|
19
|
+
### Internal
|
|
20
|
+
|
|
21
|
+
- Synthetic fixture PDFs now carry `/AP` appearance streams on radio/checkbox widgets, so `compile!`'s options-map discovery runs against them in CI. New round-trip spec covers compile! → fields → fill! through the persisted `/TU` map.
|
|
22
|
+
|
|
3
23
|
## [0.2.0] - 2026-05-28
|
|
4
24
|
|
|
5
25
|
### Changed (breaking)
|
data/README.md
CHANGED
|
@@ -56,17 +56,19 @@ $ acroforge bootstrap broken_form.pdf
|
|
|
56
56
|
## CLI
|
|
57
57
|
|
|
58
58
|
```text
|
|
59
|
+
acroforge fields <pdf> [--json]
|
|
59
60
|
acroforge schema infer <pdf> [--out schema.yml] [--sections a,b,c]
|
|
60
61
|
acroforge relabel propose <pdf> [--out mapping.yml] [--schema schema.yml] [--merge|--overwrite]
|
|
61
62
|
acroforge relabel apply <pdf> <mapping.yml>
|
|
62
|
-
acroforge compile <pdf> [--schema schema.yml]
|
|
63
|
+
acroforge compile <pdf> [--schema schema.yml] [--out normalized.pdf | --overwrite]
|
|
63
64
|
acroforge bootstrap <pdf> [--schema-out s.yml] [--mapping-out m.yml]
|
|
64
65
|
acroforge version
|
|
65
66
|
acroforge help
|
|
66
67
|
```
|
|
67
68
|
|
|
68
69
|
| Subcommand | What it does |
|
|
69
|
-
| ----------------- |
|
|
70
|
+
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
71
|
+
| `fields` | Lists every AcroForm field — name, type, alternate name — as a table, or as JSON with `--json`. Read-only; the quickest way to see what's inside a PDF. |
|
|
70
72
|
| `schema infer` | Runs the heuristic on a PDF and writes a starter schema (canonical key → type + variations). Advisory; you review and edit. |
|
|
71
73
|
| `relabel propose` | Writes a YAML mapping file proposing a semantic name for every AcroForm field. Sorted by page → top-to-bottom → left-to-right. Default mode `--merge` preserves any `key`/`type` values you've already edited. |
|
|
72
74
|
| `relabel apply` | Reads a corrected mapping file and rewrites `field[:T]` / `field[:TU]` in the source PDF in place. Auto-disambiguates collisions (`full_name`, `full_name_1`, ...). |
|
|
@@ -178,7 +180,7 @@ AcroForge also accepts a legacy "shorthand" form where the value is just an arra
|
|
|
178
180
|
_meta:
|
|
179
181
|
source_pdf: broken_form.pdf
|
|
180
182
|
generated_at: 2026-05-26T14:32:11Z
|
|
181
|
-
acroforge_version: 0.
|
|
183
|
+
acroforge_version: 0.3.0
|
|
182
184
|
total_fields: 98
|
|
183
185
|
|
|
184
186
|
page0_field6:
|
data/lib/acroforge/cli.rb
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
require "optparse"
|
|
4
4
|
require "yaml"
|
|
5
|
+
require "json"
|
|
5
6
|
require_relative "../acroforge"
|
|
6
7
|
|
|
7
8
|
module AcroForge
|
|
@@ -11,7 +12,7 @@ module AcroForge
|
|
|
11
12
|
EXIT_VALIDATION_ERROR = 2
|
|
12
13
|
EXIT_INTERNAL_ERROR = 3
|
|
13
14
|
|
|
14
|
-
SUBCOMMANDS = %w[schema relabel compile bootstrap annotate prepare version help].freeze
|
|
15
|
+
SUBCOMMANDS = %w[fields schema relabel compile bootstrap annotate prepare version help].freeze
|
|
15
16
|
|
|
16
17
|
module_function
|
|
17
18
|
|
|
@@ -48,11 +49,12 @@ module AcroForge
|
|
|
48
49
|
acroforge: PDF AcroForm engine + relabeler
|
|
49
50
|
|
|
50
51
|
Usage:
|
|
52
|
+
acroforge fields <pdf> [--json]
|
|
51
53
|
acroforge schema infer <pdf> [--out schema.yml] [--sections a,b,c] [-v]
|
|
52
54
|
acroforge schema merge <mapping.yml> [--schema schema.yml] [--out schema.yml]
|
|
53
55
|
acroforge relabel propose <pdf> [--out mapping.yml] [--schema schema.yml] [--merge|--overwrite] [-v]
|
|
54
56
|
acroforge relabel apply <pdf> <mapping.yml> [--annotate[=PATH]] [-v]
|
|
55
|
-
acroforge compile <pdf> [--schema schema.yml]
|
|
57
|
+
acroforge compile <pdf> [--schema schema.yml] [--out normalized.pdf | --overwrite]
|
|
56
58
|
acroforge bootstrap <pdf> [--schema-out s.yml] [--mapping-out m.yml] [-v]
|
|
57
59
|
acroforge annotate <pdf> [--mapping mapping.yml] [--out annotated.pdf]
|
|
58
60
|
acroforge prepare <pdf> [--out prepared.pdf] [--schema schema.yml]
|
|
@@ -101,6 +103,42 @@ module AcroForge
|
|
|
101
103
|
puts "Applied to #{pdf}: #{parts.join(", ")}."
|
|
102
104
|
end
|
|
103
105
|
|
|
106
|
+
def cmd_fields(argv)
|
|
107
|
+
json = false
|
|
108
|
+
OptionParser.new { |o| o.on("--json") { json = true } }.parse!(argv)
|
|
109
|
+
pdf = argv.shift
|
|
110
|
+
raise ArgumentError, "usage: acroforge fields <pdf> [--json]" unless pdf
|
|
111
|
+
raise Errno::ENOENT, pdf unless File.exist?(pdf)
|
|
112
|
+
|
|
113
|
+
fields = AcroForge::Engine.new(pdf).fields
|
|
114
|
+
|
|
115
|
+
if json
|
|
116
|
+
puts JSON.pretty_generate(fields)
|
|
117
|
+
elsif fields.empty?
|
|
118
|
+
puts "No AcroForm fields found in #{pdf}."
|
|
119
|
+
else
|
|
120
|
+
print_fields_table(fields)
|
|
121
|
+
end
|
|
122
|
+
EXIT_OK
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
def print_fields_table(fields)
|
|
126
|
+
headers = ["NAME", "TYPE", "ALTERNATE NAME"]
|
|
127
|
+
rows = fields.map { |f| [f[:name].to_s, f[:type].to_s, format_alternate_name(f[:alternate_name])] }
|
|
128
|
+
widths = headers.each_with_index.map { |h, i| ([h] + rows.map { |r| r[i] }).map(&:length).max }
|
|
129
|
+
([headers] + rows).each do |row|
|
|
130
|
+
puts row.each_with_index.map { |cell, i| cell.ljust(widths[i]) }.join(" ").rstrip
|
|
131
|
+
end
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
def format_alternate_name(alt)
|
|
135
|
+
case alt
|
|
136
|
+
when nil then "—"
|
|
137
|
+
when Hash then "{#{alt.map { |k, v| "#{k}: #{v.inspect}" }.join(", ")}}"
|
|
138
|
+
else alt.to_s
|
|
139
|
+
end
|
|
140
|
+
end
|
|
141
|
+
|
|
104
142
|
def cmd_schema(argv)
|
|
105
143
|
action = argv.shift
|
|
106
144
|
case action
|
|
@@ -233,20 +271,26 @@ module AcroForge
|
|
|
233
271
|
|
|
234
272
|
def cmd_compile(argv)
|
|
235
273
|
schema_path = nil
|
|
274
|
+
out = nil
|
|
275
|
+
overwrite = false
|
|
236
276
|
OptionParser.new do |opts|
|
|
237
277
|
opts.on("--schema PATH") { |v| schema_path = v }
|
|
278
|
+
opts.on("--out PATH", "Write the normalized PDF to PATH") { |v| out = v }
|
|
279
|
+
opts.on("--overwrite", "Write the normalized PDF back over the input PDF in place") { overwrite = true }
|
|
238
280
|
end.parse!(argv)
|
|
239
281
|
pdf = argv.shift
|
|
240
282
|
raise ArgumentError, "missing <pdf> argument" if pdf.nil?
|
|
283
|
+
raise ArgumentError, "--out and --overwrite are mutually exclusive" if out && overwrite
|
|
241
284
|
raise Errno::ENOENT, pdf unless File.exist?(pdf)
|
|
242
285
|
|
|
243
286
|
schema = schema_path ? AcroForge::Schema.load(schema_path) : {}
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
287
|
+
out ||= pdf if overwrite
|
|
288
|
+
|
|
289
|
+
engine = AcroForge::Engine.new(pdf, schema: schema, normalized_dir: out ? File.dirname(out) : nil)
|
|
290
|
+
result = engine.compile!(normalized_out: out)
|
|
291
|
+
puts "Mapped: #{result[:mapped].size}, Unmapped: #{result[:unmapped].size}"
|
|
292
|
+
where = overwrite ? "#{engine.normalized_path} (in place)" : engine.normalized_path
|
|
293
|
+
puts "Wrote #{where}: normalized template."
|
|
250
294
|
EXIT_OK
|
|
251
295
|
end
|
|
252
296
|
|
|
@@ -335,7 +379,7 @@ module AcroForge
|
|
|
335
379
|
require "tmpdir"
|
|
336
380
|
Dir.mktmpdir do |tmp|
|
|
337
381
|
engine = AcroForge::Engine.new(pdf, normalized_dir: tmp)
|
|
338
|
-
silenced(verbose: verbose) { engine.compile! }
|
|
382
|
+
silenced(verbose: verbose) { engine.compile!(announce_output: false) }
|
|
339
383
|
|
|
340
384
|
schema = AcroForge::Schema.infer(pdf, engine: engine)
|
|
341
385
|
AcroForge::Schema.dump(schema, schema_out)
|
data/lib/acroforge/engine.rb
CHANGED
|
@@ -9,19 +9,10 @@ require_relative "all_text_processor"
|
|
|
9
9
|
require_relative "validator"
|
|
10
10
|
require_relative "constants"
|
|
11
11
|
require_relative "labels"
|
|
12
|
+
require_relative "image_stamper"
|
|
12
13
|
|
|
13
14
|
module AcroForge
|
|
14
|
-
ImageTooLargeError = Class.new(Error)
|
|
15
|
-
UnsupportedImageFormatError = Class.new(Error)
|
|
16
|
-
|
|
17
15
|
class Engine
|
|
18
|
-
# Caps a phone-camera passport photo from bloating the output PDF.
|
|
19
|
-
MAX_IMAGE_BYTES = 5 * 1024 * 1024
|
|
20
|
-
MAX_IMAGE_DIMENSION = 4000
|
|
21
|
-
# Auto-downsample images whose pixel resolution far exceeds this PPI
|
|
22
|
-
# at the widget's rendered size. Requires ImageMagick on PATH.
|
|
23
|
-
TARGET_PPI = 200
|
|
24
|
-
|
|
25
16
|
attr_reader :template_path, :schema, :overrides, :sections, :normalized_path,
|
|
26
17
|
:mapped_fields, :unmapped_fields, :filled_fields, :missing_fields,
|
|
27
18
|
:select_field_options, :new_fields_detected
|
|
@@ -68,7 +59,7 @@ module AcroForge
|
|
|
68
59
|
elsif field.is_a?(HexaPDF::Type::AcroForm::ChoiceField) then :choice
|
|
69
60
|
else :other
|
|
70
61
|
end
|
|
71
|
-
extracted << {name: field.full_field_name, type: type, alternate_name: field[:TU]}
|
|
62
|
+
extracted << {name: field.full_field_name, type: type, alternate_name: decode_alternate_name(field[:TU])}
|
|
72
63
|
end
|
|
73
64
|
extracted
|
|
74
65
|
end
|
|
@@ -123,7 +114,8 @@ module AcroForge
|
|
|
123
114
|
# every field, rename each to its proposed semantic key in place, and
|
|
124
115
|
# persist the result to @normalized_path. The returned hash is what the
|
|
125
116
|
# Relabeler and Schema.infer consume.
|
|
126
|
-
def compile!
|
|
117
|
+
def compile!(normalized_out: nil, announce_output: true)
|
|
118
|
+
@normalized_path = normalized_out if normalized_out
|
|
127
119
|
puts ">> Compiling template: #{@template_path}"
|
|
128
120
|
form = source_doc.acro_form(create: true)
|
|
129
121
|
|
|
@@ -320,6 +312,9 @@ module AcroForge
|
|
|
320
312
|
end
|
|
321
313
|
end
|
|
322
314
|
end
|
|
315
|
+
|
|
316
|
+
raw_label = find_nearest_text(page_text_map[page_index], widget[:Rect], mode: :standard)
|
|
317
|
+
raw_label = AcroForge::Labels.humanize(raw_label)
|
|
323
318
|
else
|
|
324
319
|
field_rect = widget[:Rect]
|
|
325
320
|
raw_label = find_nearest_text(page_text_map[page_index], field_rect, mode: :standard)
|
|
@@ -464,9 +459,9 @@ module AcroForge
|
|
|
464
459
|
end
|
|
465
460
|
end
|
|
466
461
|
|
|
467
|
-
|
|
462
|
+
write_normalized!
|
|
468
463
|
puts ">> Compilation Complete. #{mapped_count} fields mapped."
|
|
469
|
-
puts ">> Clean template saved to: #{@normalized_path}\n\n"
|
|
464
|
+
puts ">> Clean template saved to: #{@normalized_path}\n\n" if announce_output
|
|
470
465
|
|
|
471
466
|
{
|
|
472
467
|
mapped: @mapped_fields,
|
|
@@ -506,7 +501,7 @@ module AcroForge
|
|
|
506
501
|
if doc_field
|
|
507
502
|
begin
|
|
508
503
|
if image_upload?(doc_field) && image_path?(value)
|
|
509
|
-
|
|
504
|
+
image_stamper.stamp!(normalized_doc, doc_field, value)
|
|
510
505
|
@filled_fields[key] = value
|
|
511
506
|
puts " [Stamped] :#{key} <- #{value}"
|
|
512
507
|
next
|
|
@@ -559,7 +554,7 @@ module AcroForge
|
|
|
559
554
|
doc_field.field_value = on_state_sym.to_s
|
|
560
555
|
doc_field.each_widget { |w| w[:AS] = on_state_sym }
|
|
561
556
|
elsif ["false", "no", "off", "0"].include?(normalized_val)
|
|
562
|
-
doc_field.field_value =
|
|
557
|
+
doc_field.field_value = :Off
|
|
563
558
|
doc_field.each_widget { |w| w[:AS] = :Off }
|
|
564
559
|
else
|
|
565
560
|
doc_field.field_value = value.to_s
|
|
@@ -610,6 +605,36 @@ module AcroForge
|
|
|
610
605
|
|
|
611
606
|
private
|
|
612
607
|
|
|
608
|
+
# Writing back over the template HexaPDF still holds open can corrupt or
|
|
609
|
+
# truncate it (the writer reads lazily-loaded objects from that same handle).
|
|
610
|
+
# When the target is the source, stage to a sibling temp file and atomically
|
|
611
|
+
# move it into place once the write has fully completed.
|
|
612
|
+
def write_normalized!
|
|
613
|
+
same_file = File.expand_path(@normalized_path) == File.expand_path(@template_path)
|
|
614
|
+
unless same_file
|
|
615
|
+
source_doc.write(@normalized_path, optimize: true)
|
|
616
|
+
return
|
|
617
|
+
end
|
|
618
|
+
|
|
619
|
+
require "tmpdir"
|
|
620
|
+
require "fileutils"
|
|
621
|
+
tmp = File.join(File.dirname(@normalized_path), ".#{File.basename(@normalized_path)}.#{Process.pid}.tmp")
|
|
622
|
+
source_doc.write(tmp, optimize: true)
|
|
623
|
+
FileUtils.mv(tmp, @normalized_path)
|
|
624
|
+
ensure
|
|
625
|
+
FileUtils.rm_f(tmp) if tmp && File.exist?(tmp)
|
|
626
|
+
end
|
|
627
|
+
|
|
628
|
+
# compile! repurposes /TU to persist button/choice option maps as JSON
|
|
629
|
+
# (see fill!'s decode). Surface those as hashes; real tooltips stay strings.
|
|
630
|
+
def decode_alternate_name(value)
|
|
631
|
+
return value unless value.is_a?(String) && value.lstrip.start_with?("{")
|
|
632
|
+
parsed = JSON.parse(value, symbolize_names: true)
|
|
633
|
+
parsed.is_a?(Hash) ? parsed : value
|
|
634
|
+
rescue JSON::ParserError
|
|
635
|
+
value
|
|
636
|
+
end
|
|
637
|
+
|
|
613
638
|
def confidence_for(raw_label, target_key)
|
|
614
639
|
return :none if raw_label.nil? || raw_label.strip.empty?
|
|
615
640
|
return :high if target_key && @schema.key?(target_key.to_s.to_sym)
|
|
@@ -642,161 +667,8 @@ module AcroForge
|
|
|
642
667
|
value.is_a?(String) && File.file?(value)
|
|
643
668
|
end
|
|
644
669
|
|
|
645
|
-
def
|
|
646
|
-
|
|
647
|
-
widget = field.each_widget.first
|
|
648
|
-
return unless widget && widget[:Rect]
|
|
649
|
-
|
|
650
|
-
page = doc.pages.find { |candidate_page| candidate_page[:Annots]&.include?(widget) }
|
|
651
|
-
return unless page
|
|
652
|
-
|
|
653
|
-
# Widget Rect is absolute page coords; canvas API is MediaBox-relative.
|
|
654
|
-
media_box_x, media_box_y = page.box.value[0], page.box.value[1]
|
|
655
|
-
rect_x_min, rect_y_min, rect_x_max, rect_y_max = widget[:Rect]
|
|
656
|
-
slot_width = rect_x_max - rect_x_min
|
|
657
|
-
slot_height = rect_y_max - rect_y_min
|
|
658
|
-
slot_canvas_x = rect_x_min - media_box_x
|
|
659
|
-
slot_canvas_y = rect_y_min - media_box_y
|
|
660
|
-
|
|
661
|
-
stamp_path = prepare_image_for_slot(path, format, image_width, image_height,
|
|
662
|
-
slot_width, slot_height) || path
|
|
663
|
-
if stamp_path != path
|
|
664
|
-
_, image_width, image_height = image_dimensions(stamp_path)
|
|
665
|
-
end
|
|
666
|
-
|
|
667
|
-
draw_width, draw_height = fit_inside(image_width, image_height, slot_width, slot_height)
|
|
668
|
-
draw_x = slot_canvas_x + (slot_width - draw_width) / 2.0
|
|
669
|
-
draw_y = slot_canvas_y + (slot_height - draw_height) / 2.0
|
|
670
|
-
|
|
671
|
-
canvas = page.canvas(type: :overlay)
|
|
672
|
-
canvas.fill_color(255, 255, 255)
|
|
673
|
-
canvas.rectangle(slot_canvas_x, slot_canvas_y, slot_width, slot_height).fill
|
|
674
|
-
canvas.image(stamp_path, at: [draw_x, draw_y], width: draw_width, height: draw_height)
|
|
675
|
-
|
|
676
|
-
# Bake into the page so the widget's empty appearance doesn't repaint over the image.
|
|
677
|
-
page[:Annots].delete(widget)
|
|
678
|
-
end
|
|
679
|
-
|
|
680
|
-
def fit_inside(image_width, image_height, slot_width, slot_height)
|
|
681
|
-
scale = [slot_width.to_f / image_width, slot_height.to_f / image_height].min
|
|
682
|
-
[image_width * scale, image_height * scale]
|
|
683
|
-
end
|
|
684
|
-
|
|
685
|
-
# Trim removes the transparent border around a signature; downsample
|
|
686
|
-
# caps source resolution at TARGET_PPI for the widget's longer side.
|
|
687
|
-
def prepare_image_for_slot(path, format, image_width, image_height,
|
|
688
|
-
slot_width_pt, slot_height_pt)
|
|
689
|
-
return nil unless imagemagick_available?
|
|
690
|
-
slot_max_pt = [slot_width_pt, slot_height_pt].max
|
|
691
|
-
target_max_px = (slot_max_pt * TARGET_PPI / 72.0).ceil
|
|
692
|
-
needs_resize = image_width > target_max_px * 2 || image_height > target_max_px * 2
|
|
693
|
-
needs_trim = format == :png && png_with_alpha?(path)
|
|
694
|
-
return nil unless needs_resize || needs_trim
|
|
695
|
-
|
|
696
|
-
ext = (format == :png) ? ".png" : ".jpg"
|
|
697
|
-
require "securerandom"
|
|
698
|
-
require "tmpdir"
|
|
699
|
-
output_path = File.join(Dir.tmpdir,
|
|
700
|
-
"acroforge_stamp_#{Process.pid}_#{SecureRandom.hex(4)}#{ext}")
|
|
701
|
-
# `format:path` locks the coder, closing the CVE-2016-3714 (ImageTragick) class of attack.
|
|
702
|
-
args = ["convert", "#{format}:#{path}"]
|
|
703
|
-
args.push("-trim", "+repage") if needs_trim
|
|
704
|
-
args.push("-resize", "#{target_max_px}x#{target_max_px}>") if needs_resize
|
|
705
|
-
args.push(output_path)
|
|
706
|
-
success = system(*args, out: File::NULL, err: File::NULL)
|
|
707
|
-
(success && File.exist?(output_path)) ? output_path : nil
|
|
708
|
-
end
|
|
709
|
-
|
|
710
|
-
# PNG color type 4 = greyscale+alpha, 6 = RGBA — only these are trim-worthy.
|
|
711
|
-
def png_with_alpha?(path)
|
|
712
|
-
File.open(path, "rb") do |io|
|
|
713
|
-
return false unless io.read(8) == "\x89PNG\r\n\x1A\n".b
|
|
714
|
-
return false if io.read(8).nil?
|
|
715
|
-
return false if io.read(8).nil?
|
|
716
|
-
return false if io.read(1).nil?
|
|
717
|
-
color_type_byte = io.read(1)
|
|
718
|
-
return false if color_type_byte.nil?
|
|
719
|
-
color_type = color_type_byte.unpack1("C")
|
|
720
|
-
color_type == 4 || color_type == 6
|
|
721
|
-
end
|
|
722
|
-
end
|
|
723
|
-
|
|
724
|
-
def imagemagick_available?
|
|
725
|
-
return @imagemagick_available if defined?(@imagemagick_available)
|
|
726
|
-
@imagemagick_available = system("which", "convert", out: File::NULL, err: File::NULL)
|
|
727
|
-
end
|
|
728
|
-
|
|
729
|
-
# Trust boundary in front of ImageMagick: any malformed-input path
|
|
730
|
-
# raises a single error class so worker retry policies can key on it.
|
|
731
|
-
def validate_image!(path)
|
|
732
|
-
size = File.size(path)
|
|
733
|
-
if size > MAX_IMAGE_BYTES
|
|
734
|
-
raise ImageTooLargeError, "#{path}: #{size} bytes exceeds #{MAX_IMAGE_BYTES} byte cap"
|
|
735
|
-
end
|
|
736
|
-
format, width, height = image_dimensions(path)
|
|
737
|
-
if width > MAX_IMAGE_DIMENSION || height > MAX_IMAGE_DIMENSION
|
|
738
|
-
raise ImageTooLargeError,
|
|
739
|
-
"#{path}: #{width}x#{height}px exceeds #{MAX_IMAGE_DIMENSION}px per side"
|
|
740
|
-
end
|
|
741
|
-
[format, width, height]
|
|
742
|
-
end
|
|
743
|
-
|
|
744
|
-
def image_dimensions(path)
|
|
745
|
-
File.open(path, "rb") do |io|
|
|
746
|
-
head = read_exact(io, 8, path)
|
|
747
|
-
io.rewind
|
|
748
|
-
if head.start_with?("\x89PNG\r\n\x1A\n".b)
|
|
749
|
-
width, height = read_png_dimensions(io, path)
|
|
750
|
-
[:png, width, height]
|
|
751
|
-
elsif head[0, 2] == "\xFF\xD8".b
|
|
752
|
-
width, height = read_jpeg_dimensions(io, path)
|
|
753
|
-
[:jpg, width, height]
|
|
754
|
-
else
|
|
755
|
-
raise_unsupported(path)
|
|
756
|
-
end
|
|
757
|
-
end
|
|
758
|
-
end
|
|
759
|
-
|
|
760
|
-
def read_png_dimensions(io, path)
|
|
761
|
-
read_exact(io, 16, path) # 8-byte signature + 4 length + "IHDR"
|
|
762
|
-
width = read_exact(io, 4, path).unpack1("N")
|
|
763
|
-
height = read_exact(io, 4, path).unpack1("N")
|
|
764
|
-
[width, height]
|
|
765
|
-
end
|
|
766
|
-
|
|
767
|
-
def read_jpeg_dimensions(io, path)
|
|
768
|
-
read_exact(io, 2, path) # SOI
|
|
769
|
-
loop do
|
|
770
|
-
marker_byte = read_exact(io, 1, path).getbyte(0)
|
|
771
|
-
raise_unsupported(path) unless marker_byte == 0xFF
|
|
772
|
-
# Runs of 0xFF are valid JPEG fill bytes between markers.
|
|
773
|
-
marker_code = read_exact(io, 1, path).getbyte(0)
|
|
774
|
-
marker_code = read_exact(io, 1, path).getbyte(0) while marker_code == 0xFF
|
|
775
|
-
raise_unsupported(path, "no SOF marker found") if marker_code == 0xD9 || marker_code == 0x00
|
|
776
|
-
# 0xD0..0xD7 and 0x01 are standalone markers — no length follows.
|
|
777
|
-
next if (0xD0..0xD7).cover?(marker_code) || marker_code == 0x01
|
|
778
|
-
segment_length = read_exact(io, 2, path).unpack1("n")
|
|
779
|
-
raise_unsupported(path, "negative segment length") if segment_length < 2
|
|
780
|
-
is_sof_marker = (0xC0..0xCF).cover?(marker_code) && ![0xC4, 0xC8, 0xCC].include?(marker_code)
|
|
781
|
-
if is_sof_marker
|
|
782
|
-
read_exact(io, 1, path) # precision
|
|
783
|
-
height = read_exact(io, 2, path).unpack1("n")
|
|
784
|
-
width = read_exact(io, 2, path).unpack1("n")
|
|
785
|
-
return [width, height]
|
|
786
|
-
else
|
|
787
|
-
read_exact(io, segment_length - 2, path)
|
|
788
|
-
end
|
|
789
|
-
end
|
|
790
|
-
end
|
|
791
|
-
|
|
792
|
-
def read_exact(io, byte_count, path)
|
|
793
|
-
buf = io.read(byte_count)
|
|
794
|
-
raise_unsupported(path, "truncated header") if buf.nil? || buf.bytesize < byte_count
|
|
795
|
-
buf
|
|
796
|
-
end
|
|
797
|
-
|
|
798
|
-
def raise_unsupported(path, reason = "only JPG and PNG are supported")
|
|
799
|
-
raise UnsupportedImageFormatError, "#{path}: #{reason}"
|
|
670
|
+
def image_stamper
|
|
671
|
+
@image_stamper ||= ImageStamper.new
|
|
800
672
|
end
|
|
801
673
|
|
|
802
674
|
# Heuristic key for push-button image fields based on widget aspect ratio.
|
|
@@ -0,0 +1,175 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module AcroForge
|
|
4
|
+
ImageTooLargeError = Class.new(Error)
|
|
5
|
+
UnsupportedImageFormatError = Class.new(Error)
|
|
6
|
+
|
|
7
|
+
# Stamps JPG/PNG files into AcroForm widget rectangles.
|
|
8
|
+
class ImageStamper
|
|
9
|
+
# Caps a phone-camera passport photo from bloating the output PDF.
|
|
10
|
+
MAX_IMAGE_BYTES = 5 * 1024 * 1024
|
|
11
|
+
MAX_IMAGE_DIMENSION = 4000
|
|
12
|
+
# Auto-downsample images whose pixel resolution far exceeds this PPI
|
|
13
|
+
# at the widget's rendered size. Requires ImageMagick on PATH.
|
|
14
|
+
TARGET_PPI = 200
|
|
15
|
+
|
|
16
|
+
def stamp!(doc, field, path)
|
|
17
|
+
format, image_width, image_height = validate_image!(path)
|
|
18
|
+
widget = field.each_widget.first
|
|
19
|
+
return unless widget && widget[:Rect]
|
|
20
|
+
|
|
21
|
+
page = doc.pages.find { |candidate_page| candidate_page[:Annots]&.include?(widget) }
|
|
22
|
+
return unless page
|
|
23
|
+
|
|
24
|
+
# Widget Rect is absolute page coords; canvas API is MediaBox-relative.
|
|
25
|
+
media_box_x, media_box_y = page.box.value[0], page.box.value[1]
|
|
26
|
+
rect_x_min, rect_y_min, rect_x_max, rect_y_max = widget[:Rect]
|
|
27
|
+
slot_width = rect_x_max - rect_x_min
|
|
28
|
+
slot_height = rect_y_max - rect_y_min
|
|
29
|
+
slot_canvas_x = rect_x_min - media_box_x
|
|
30
|
+
slot_canvas_y = rect_y_min - media_box_y
|
|
31
|
+
|
|
32
|
+
stamp_path = prepare_image_for_slot(path, format, image_width, image_height,
|
|
33
|
+
slot_width, slot_height) || path
|
|
34
|
+
if stamp_path != path
|
|
35
|
+
_, image_width, image_height = image_dimensions(stamp_path)
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
draw_width, draw_height = fit_inside(image_width, image_height, slot_width, slot_height)
|
|
39
|
+
draw_x = slot_canvas_x + (slot_width - draw_width) / 2.0
|
|
40
|
+
draw_y = slot_canvas_y + (slot_height - draw_height) / 2.0
|
|
41
|
+
|
|
42
|
+
canvas = page.canvas(type: :overlay)
|
|
43
|
+
canvas.fill_color(255, 255, 255)
|
|
44
|
+
canvas.rectangle(slot_canvas_x, slot_canvas_y, slot_width, slot_height).fill
|
|
45
|
+
canvas.image(stamp_path, at: [draw_x, draw_y], width: draw_width, height: draw_height)
|
|
46
|
+
|
|
47
|
+
# Bake into the page so the widget's empty appearance doesn't repaint over the image.
|
|
48
|
+
page[:Annots].delete(widget)
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
# Trust boundary in front of ImageMagick: any malformed-input path
|
|
52
|
+
# raises a single error class so worker retry policies can key on it.
|
|
53
|
+
def validate_image!(path)
|
|
54
|
+
size = File.size(path)
|
|
55
|
+
if size > MAX_IMAGE_BYTES
|
|
56
|
+
raise ImageTooLargeError, "#{path}: #{size} bytes exceeds #{MAX_IMAGE_BYTES} byte cap"
|
|
57
|
+
end
|
|
58
|
+
format, width, height = image_dimensions(path)
|
|
59
|
+
if width > MAX_IMAGE_DIMENSION || height > MAX_IMAGE_DIMENSION
|
|
60
|
+
raise ImageTooLargeError,
|
|
61
|
+
"#{path}: #{width}x#{height}px exceeds #{MAX_IMAGE_DIMENSION}px per side"
|
|
62
|
+
end
|
|
63
|
+
[format, width, height]
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
private
|
|
67
|
+
|
|
68
|
+
def fit_inside(image_width, image_height, slot_width, slot_height)
|
|
69
|
+
scale = [slot_width.to_f / image_width, slot_height.to_f / image_height].min
|
|
70
|
+
[image_width * scale, image_height * scale]
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
# Trim removes the transparent border around a signature; downsample
|
|
74
|
+
# caps source resolution at TARGET_PPI for the widget's longer side.
|
|
75
|
+
def prepare_image_for_slot(path, format, image_width, image_height,
|
|
76
|
+
slot_width_pt, slot_height_pt)
|
|
77
|
+
return nil unless imagemagick_available?
|
|
78
|
+
slot_max_pt = [slot_width_pt, slot_height_pt].max
|
|
79
|
+
target_max_px = (slot_max_pt * TARGET_PPI / 72.0).ceil
|
|
80
|
+
needs_resize = image_width > target_max_px * 2 || image_height > target_max_px * 2
|
|
81
|
+
needs_trim = format == :png && png_with_alpha?(path)
|
|
82
|
+
return nil unless needs_resize || needs_trim
|
|
83
|
+
|
|
84
|
+
ext = (format == :png) ? ".png" : ".jpg"
|
|
85
|
+
require "securerandom"
|
|
86
|
+
require "tmpdir"
|
|
87
|
+
output_path = File.join(Dir.tmpdir,
|
|
88
|
+
"acroforge_stamp_#{Process.pid}_#{SecureRandom.hex(4)}#{ext}")
|
|
89
|
+
# `format:path` locks the coder, closing the CVE-2016-3714 (ImageTragick) class of attack.
|
|
90
|
+
args = ["convert", "#{format}:#{path}"]
|
|
91
|
+
args.push("-trim", "+repage") if needs_trim
|
|
92
|
+
args.push("-resize", "#{target_max_px}x#{target_max_px}>") if needs_resize
|
|
93
|
+
args.push(output_path)
|
|
94
|
+
success = system(*args, out: File::NULL, err: File::NULL)
|
|
95
|
+
(success && File.exist?(output_path)) ? output_path : nil
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
# PNG color type 4 = greyscale+alpha, 6 = RGBA — only these are trim-worthy.
|
|
99
|
+
def png_with_alpha?(path)
|
|
100
|
+
File.open(path, "rb") do |io|
|
|
101
|
+
return false unless io.read(8) == "\x89PNG\r\n\x1A\n".b
|
|
102
|
+
return false if io.read(8).nil?
|
|
103
|
+
return false if io.read(8).nil?
|
|
104
|
+
return false if io.read(1).nil?
|
|
105
|
+
color_type_byte = io.read(1)
|
|
106
|
+
return false if color_type_byte.nil?
|
|
107
|
+
color_type = color_type_byte.unpack1("C")
|
|
108
|
+
color_type == 4 || color_type == 6
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
def imagemagick_available?
|
|
113
|
+
return @imagemagick_available if defined?(@imagemagick_available)
|
|
114
|
+
@imagemagick_available = system("which", "convert", out: File::NULL, err: File::NULL)
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
def image_dimensions(path)
|
|
118
|
+
File.open(path, "rb") do |io|
|
|
119
|
+
head = read_exact(io, 8, path)
|
|
120
|
+
io.rewind
|
|
121
|
+
if head.start_with?("\x89PNG\r\n\x1A\n".b)
|
|
122
|
+
width, height = read_png_dimensions(io, path)
|
|
123
|
+
[:png, width, height]
|
|
124
|
+
elsif head[0, 2] == "\xFF\xD8".b
|
|
125
|
+
width, height = read_jpeg_dimensions(io, path)
|
|
126
|
+
[:jpg, width, height]
|
|
127
|
+
else
|
|
128
|
+
raise_unsupported(path)
|
|
129
|
+
end
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
def read_png_dimensions(io, path)
|
|
134
|
+
read_exact(io, 16, path) # 8-byte signature + 4 length + "IHDR"
|
|
135
|
+
width = read_exact(io, 4, path).unpack1("N")
|
|
136
|
+
height = read_exact(io, 4, path).unpack1("N")
|
|
137
|
+
[width, height]
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
def read_jpeg_dimensions(io, path)
|
|
141
|
+
read_exact(io, 2, path) # SOI
|
|
142
|
+
loop do
|
|
143
|
+
marker_byte = read_exact(io, 1, path).getbyte(0)
|
|
144
|
+
raise_unsupported(path) unless marker_byte == 0xFF
|
|
145
|
+
# Runs of 0xFF are valid JPEG fill bytes between markers.
|
|
146
|
+
marker_code = read_exact(io, 1, path).getbyte(0)
|
|
147
|
+
marker_code = read_exact(io, 1, path).getbyte(0) while marker_code == 0xFF
|
|
148
|
+
raise_unsupported(path, "no SOF marker found") if marker_code == 0xD9 || marker_code == 0x00
|
|
149
|
+
# 0xD0..0xD7 and 0x01 are standalone markers — no length follows.
|
|
150
|
+
next if (0xD0..0xD7).cover?(marker_code) || marker_code == 0x01
|
|
151
|
+
segment_length = read_exact(io, 2, path).unpack1("n")
|
|
152
|
+
raise_unsupported(path, "negative segment length") if segment_length < 2
|
|
153
|
+
is_sof_marker = (0xC0..0xCF).cover?(marker_code) && ![0xC4, 0xC8, 0xCC].include?(marker_code)
|
|
154
|
+
if is_sof_marker
|
|
155
|
+
read_exact(io, 1, path) # precision
|
|
156
|
+
height = read_exact(io, 2, path).unpack1("n")
|
|
157
|
+
width = read_exact(io, 2, path).unpack1("n")
|
|
158
|
+
return [width, height]
|
|
159
|
+
else
|
|
160
|
+
read_exact(io, segment_length - 2, path)
|
|
161
|
+
end
|
|
162
|
+
end
|
|
163
|
+
end
|
|
164
|
+
|
|
165
|
+
def read_exact(io, byte_count, path)
|
|
166
|
+
buf = io.read(byte_count)
|
|
167
|
+
raise_unsupported(path, "truncated header") if buf.nil? || buf.bytesize < byte_count
|
|
168
|
+
buf
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
def raise_unsupported(path, reason = "only JPG and PNG are supported")
|
|
172
|
+
raise UnsupportedImageFormatError, "#{path}: #{reason}"
|
|
173
|
+
end
|
|
174
|
+
end
|
|
175
|
+
end
|
data/lib/acroforge/version.rb
CHANGED
data/lib/acroforge.rb
CHANGED
|
@@ -10,6 +10,7 @@ end
|
|
|
10
10
|
require_relative "acroforge/all_text_processor"
|
|
11
11
|
require_relative "acroforge/labels"
|
|
12
12
|
require_relative "acroforge/validator"
|
|
13
|
+
require_relative "acroforge/image_stamper"
|
|
13
14
|
require_relative "acroforge/engine"
|
|
14
15
|
require_relative "acroforge/schema"
|
|
15
16
|
require_relative "acroforge/relabeler"
|
metadata
CHANGED
|
@@ -1,13 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: acroforge
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.3.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Maxwell Nana Forson
|
|
8
|
+
autorequire:
|
|
8
9
|
bindir: exe
|
|
9
10
|
cert_chain: []
|
|
10
|
-
date:
|
|
11
|
+
date: 2026-06-13 00:00:00.000000000 Z
|
|
11
12
|
dependencies:
|
|
12
13
|
- !ruby/object:Gem::Dependency
|
|
13
14
|
name: hexapdf
|
|
@@ -44,6 +45,7 @@ files:
|
|
|
44
45
|
- lib/acroforge/cli.rb
|
|
45
46
|
- lib/acroforge/constants.rb
|
|
46
47
|
- lib/acroforge/engine.rb
|
|
48
|
+
- lib/acroforge/image_stamper.rb
|
|
47
49
|
- lib/acroforge/labels.rb
|
|
48
50
|
- lib/acroforge/preparer.rb
|
|
49
51
|
- lib/acroforge/relabeler.rb
|
|
@@ -61,6 +63,7 @@ metadata:
|
|
|
61
63
|
changelog_uri: https://github.com/Lzcorp-Solutions/acroforge/blob/main/CHANGELOG.md
|
|
62
64
|
documentation_uri: https://lzcorp-solutions.github.io/acroforge/
|
|
63
65
|
rubygems_mfa_required: 'true'
|
|
66
|
+
post_install_message:
|
|
64
67
|
rdoc_options: []
|
|
65
68
|
require_paths:
|
|
66
69
|
- lib
|
|
@@ -75,7 +78,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
75
78
|
- !ruby/object:Gem::Version
|
|
76
79
|
version: '0'
|
|
77
80
|
requirements: []
|
|
78
|
-
rubygems_version:
|
|
81
|
+
rubygems_version: 3.1.6
|
|
82
|
+
signing_key:
|
|
79
83
|
specification_version: 4
|
|
80
84
|
summary: PDF AcroForm engine with heuristic-assisted field relabeling.
|
|
81
85
|
test_files: []
|