safe_image 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e078245fa2e4fb0707b0d477ad0c4b41f31423905ed0dc84f964cd648a5a032f
4
- data.tar.gz: 9155876c2761ac9c6afb2b4da4a97af8b6b7e035ffee553c72dacda6166c1048
3
+ metadata.gz: ce9a76b504fa826aef4164c5748c800a95828ef7fda6bda3a00e7e85f0134f38
4
+ data.tar.gz: 61c5ebac14806e8e3445c3e7aa73bb1ab5191cc414e1af311b752ad0be492bc7
5
5
  SHA512:
6
- metadata.gz: 2f98d64d3b9665a156c27cbcef898cd148e6b559c73c03ba757b1147d2e47f1733d5a6e0ebc7d5c038cb989f874d67747175cd1d094cdf9677f98550cfe635be
7
- data.tar.gz: ae1008885bf83da844145369cd929edd12c0af238fd0d975e2c078ef95b4d7cc301f1fed2da5cdd815d520591938acc144a2bd79d4bd2573336a2c9eb1078e5b
6
+ metadata.gz: 5d15f34ca4de05f1ca4618695bc3993a000ce38a7de54afde86ac87497c0288ad0ed37493ca31431669341be10991bbcdb5192796ea6bdb3af016943adfac2bc
7
+ data.tar.gz: e71a59800452d4d396741e4e28134c47403490100233e803644a90a0ca967abd74fb5572a936a7af247411cb2055378982bccad89085085184aa92c66ca94d0d
data/CHANGELOG.md CHANGED
@@ -5,6 +5,199 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.3.0 - 2026-06-12]
9
+
10
+ ### Changed (breaking)
11
+
12
+ - **`sanitize_svg!` now requires `id_namespace:`.** The argument forces a
13
+ deliberate choice of where the output may be used, removing the footgun of a
14
+ silently-wrong default. Pass `:standalone` for output served only as an
15
+ external `<img>`/CSS-url/file, or a stable per-document String to make it safe
16
+ to inline (see below). Omitting it (or passing `nil`/`""`) raises
17
+ `ArgumentError`. Callers must update: `sanitize_svg!(path)` →
18
+ `sanitize_svg!(path, id_namespace: :standalone)`.
19
+
20
+ ### Added
21
+
22
+ - **`sanitize_svg!` can produce output safe to inline into an HTML DOM.** Pass
23
+ `id_namespace:` a stable per-document value (e.g. the upload sha) and the
24
+ sanitizer prefixes every `id` and every reference to it (`href`/`xlink:href`
25
+ fragments, `url(#…)` in attributes and CSS, ARIA IDREF attributes like
26
+ `aria-labelledby`/`aria-controls`, and every `class` token plus the matching
27
+ `.class` selectors) with the namespace, and scopes every `<style>` selector
28
+ under a `<ns>-scope` class added to the root `<svg>`. Namespacing classes stops
29
+ an inlined SVG from invoking the host page's framework CSS (a bare
30
+ `class="modal fixed"` overlay vector). `var()`/`env()`/`attr()` in presentation
31
+ attributes are rejected outright — they resolve against the host page.
32
+ Inlined into a page, the preserved `<style>` can no longer reach the host
33
+ cascade (`*{visibility:hidden}`, `#header{display:none}`) and ids cannot
34
+ clobber host ids — including references written `URL(#x)`, `url('#x')`, or
35
+ `url("#x")`, which are namespaced like the unquoted form. In this mode the root
36
+ `<svg>`'s `overflow` is also dropped so it clips to its declared viewport (a
37
+ tiny viewport with `overflow:visible` and oversized content would otherwise
38
+ paint a full-page overlay). With `id_namespace: :standalone` the output is the
39
+ document-safe form (no namespacing). The namespace must be a valid ident (a
40
+ letter followed by letters/digits/`_`/`-`); malformed tokens are rejected
41
+ rather than coerced, so two distinct values can never collapse to one. The
42
+ transform is idempotent per namespace. `style=""` attributes are
43
+ element-scoped and inline-safe either way.
44
+ - **`<style>` elements now fail closed on any at-rule.** Previously an at-rule
45
+ block followed by a valid rule (`@font-face{…}.ok{…}`) could keep the trailing
46
+ rule; a stylesheet containing `@` anywhere is now rejected whole, matching the
47
+ documented guarantee.
48
+ - **The SVG sanitizer keeps a safe CSS subset instead of stripping all CSS.**
49
+ `style` attributes (as written by Inkscape) and `<style>` elements (as
50
+ written by Illustrator) now survive sanitisation when they parse against a
51
+ constructed allowlist grammar: properties mirroring the allowed
52
+ presentation attributes, type/class/id selectors, numeric/keyword/color
53
+ values, and `url(#fragment)` references only. Output is reassembled from
54
+ validated tokens — CSS escapes, quotes/strings, at-rules, comments, and
55
+ unknown properties, functions, or selectors drop the declaration, rule, or
56
+ whole stylesheet rather than being interpreted. A single at-rule or nested
57
+ block fails the whole `<style>` element closed. `!important` and modern
58
+ `rgb()/hsl()` slash-alpha (`rgb(R G B / A)`) are preserved; both are parsed
59
+ structurally and re-emitted, and admitting `/` for the alpha keeps CSS
60
+ comments impossible because `*` remains excluded from the value charset.
61
+ - **The presentation-attribute allowlist covers common editor output.** Added
62
+ the safe, widely-emitted SVG presentation properties (and their CSS twins):
63
+ `stroke-dasharray`/`stroke-dashoffset`, `vector-effect`, `marker`/`marker-*`
64
+ (with the `<marker>` element and its geometry attributes), `color`,
65
+ `display`/`visibility`/`overflow`, `paint-order`/`mix-blend-mode`/`isolation`,
66
+ the `*-rendering` hints, and the longhand text properties (`font-style`,
67
+ `font-variant`, `font-stretch`, `text-decoration`, `letter-spacing`,
68
+ `word-spacing`, `dominant-baseline`, `baseline-shift`, `writing-mode`,
69
+ `direction`). The only additions carrying a URL — `marker*` — are constrained
70
+ to `url(#fragment)` like the existing paint and clip/mask references. Filters
71
+ remain out of scope.
72
+
73
+ ### Changed
74
+
75
+ - **Sandboxed operations are served by a pool of resident zygote workers —
76
+ warm Landlock overhead drops from ~100ms to ~3–8ms per operation, with
77
+ near-linear concurrency.** Previously every sandboxed call exec'd a fresh
78
+ Ruby that re-paid interpreter boot, rubygems, the gem's requires, and
79
+ libvips init. A zygote boots that once, then forks a child per operation;
80
+ the child applies rlimits, its per-operation Landlock policy (filesystem
81
+ allowlist, all TCP denied on ABI ≥ 4, abstract-unix-socket/signal scopes on
82
+ ABI ≥ 6), and — when the installed `landlock` gem exposes
83
+ `seccomp_deny_network!` — the helper's deny-all-network seccomp filter
84
+ (blocking sockets of every family, closing the UDP gap the in-process
85
+ Landlock policy alone leaves open), *before* touching untrusted input, and
86
+ exits after the operation. Workers are pooled so N threads run N sandboxed
87
+ operations at once (the single zygote would serialise them): the pool grows
88
+ on demand to `SAFE_IMAGE_ZYGOTE_WORKERS` (default 8) and offered concurrency
89
+ past the cap blocks until a worker frees, bounding concurrent libvips
90
+ memory. An idle worker exits after `Zygote::IDLE_SECONDS` (300, overridable
91
+ via `SAFE_IMAGE_ZYGOTE_IDLE_SECONDS`; idling holds ~16MB private memory and
92
+ no CPU), exits immediately when its parent process does, and `configure!`
93
+ always retires the pool, so no stale process outlives its parent or a
94
+ reconfigure. Forking is sound because the zygote never runs operations
95
+ itself: libvips is initialised but quiescent (zero native threads) at every
96
+ fork — verified empirically. Outputs are byte-identical to both the exec
97
+ worker and unsandboxed operation. `SAFE_IMAGE_ZYGOTE=0` falls back to the
98
+ exec-per-operation worker. Pool correctness under concurrent reconfigure,
99
+ worker death, and fork is enforced by a per-worker generation token (a
100
+ worker checked out when `configure!` lands is retired on return, never
101
+ reused under stale config), channel-health (not process-liveness) reuse
102
+ decisions (a worker whose pipe broke is discarded, never re-pooled), a
103
+ `PR_SET_PDEATHSIG` so an operation child dies with its zygote, and
104
+ parent-side tmp-root cleanup that survives a SIGKILLed worker — all
105
+ exercised by a multithreaded chaos stress test (reconfigure + worker kills)
106
+ asserting no wrong output, no process/fd/tmpdir leak, and no deadlock.
107
+
108
+ - **External tools run with `MALLOC_ARENA_MAX=2`.** Multithreaded optimizer
109
+ tools (oxipng's rayon pool, ImageMagick's OpenMP) otherwise have glibc
110
+ reserve a 64MB malloc arena per thread; combined with the sandbox's
111
+ `RLIMIT_AS` memory cap, that *address-space* reservation spuriously fails
112
+ the tool under concurrency even though real memory use is tiny. Bounding the
113
+ arena count is the standard mitigation and is free for these compute-bound
114
+ tools. Found by stress-testing concurrent sandboxed `convert`.
115
+
116
+ - **rexml loads on first SVG use instead of at `require "safe_image"`.**
117
+ rexml costs ~27ms to parse and only the SVG paths need it, so every boot of
118
+ the gem — and in particular every Landlock sandbox worker, which is a fresh
119
+ Ruby process per operation — was paying it on operations that never touch
120
+ SVG. Measured per-op sandbox cost drops from ~102ms to ~85ms on the vips
121
+ backend and ~75ms to ~58ms on the imagemagick backend.
122
+
123
+ - **Remote fetches reject bad responses from the headers alone.** The
124
+ `Content-Type` allowlist and content-type/extension agreement checks now run
125
+ before any body bytes are read (previously the body was downloaded first and
126
+ rejected afterwards), and the first bytes of the body must be compatible
127
+ with the claimed format's magic bytes — an obviously mislabeled body is
128
+ dropped after the first chunk instead of being downloaded to `max_bytes`.
129
+ - **Remote metadata helpers download only what the answer needs.**
130
+ `remote_size`, `remote_type`, `remote_info` and `remote_animated?` now probe
131
+ the partially-downloaded file at growing thresholds (64KB, 256KB, ...) and
132
+ abort the transfer once the answer is final, instead of always downloading
133
+ up to `max_bytes`. Early answers are only trusted when more data cannot
134
+ change them: "not animated" still requires the complete file (truncated
135
+ animations undercount frames), SVG metadata still downloads the whole
136
+ document so the SVG size cap keeps its meaning, and any prefix probe
137
+ failure falls back to the full download with unchanged validation and
138
+ error behaviour. `fetch_remote` and `remote_dominant_color` still download
139
+ the complete body.
140
+
141
+ ### Fixed
142
+
143
+ - **Lossy PNG optimisation no longer fails when pngquant declines to
144
+ quantise.** pngquant signals "quantised result not used" through exit
145
+ status 98 (`--skip-if-larger`) and 99 (`--quality` not met) — for example
146
+ on low-bit-depth grayscale PNGs its RGBA-palette output cannot beat —
147
+ and `optimize(mode: :lossy)` raised `CommandError` instead of keeping the
148
+ original and continuing to oxipng. Found by running 10 random Wikimedia
149
+ Commons images through the optimizer.
150
+
151
+ - **`optimize` no longer ships sideways JPEGs.** With `strip_metadata: true`
152
+ (the default), stripping deleted the EXIF orientation tag without applying
153
+ the rotation, so an oriented camera photo came out rendered 90/180° wrong.
154
+ `optimize` now bakes the rotation into the pixels first via jpegtran's
155
+ lossless transforms: `-perfect` when the dimensions are MCU-aligned, else
156
+ `-trim`, which drops the partial edge blocks (under one MCU, at most 15px)
157
+ instead of hiding a lossy re-encode. The result hash gains `rotated_from:`
158
+ and `trimmed:` so the trim is reported, never silent — image_optim's jhead
159
+ worker (Discourse's `FileHelper.optimize_image!`) does the same transform
160
+ but trims silently. Without jpegtran an oriented JPEG raises in strict mode
161
+ and is left untouched otherwise; reading the tag goes through the configured
162
+ backend, so `optimize` now also enforces the pixel cap before touching an
163
+ oriented JPEG. Internal callers optimising output the gem just encoded skip
164
+ the check (`assume_upright:`).
165
+
166
+ - **JPEGs with an EXIF orientation no longer fail on the libvips path once
167
+ they outgrow the sequential readahead window (~512px — every real camera
168
+ photo).** `resize`, `crop`, `convert`/`convert_to_jpeg` and the
169
+ `fix_orientation` re-encode tier loaded input with `access: sequential` and
170
+ then autorotated, and the rotation's out-of-order row reads raised
171
+ `VipsJpeg: out of order read`. Oriented images are now reloaded with random
172
+ access before autorotation; upright images keep the streaming sequential
173
+ load, and the pixel cap still runs before any decode.
174
+
175
+ ### Security
176
+
177
+ - **Presentation-attribute `url()` references fail closed unless they are a
178
+ canonical same-document fragment.** A single validation/rewrite grammar now
179
+ governs both the keep decision and the namespace rewrite, so external URLs,
180
+ mismatched quotes, and unterminated forms (`url(#id`, `url(http://evil`) are
181
+ dropped rather than kept on browser parse-error leniency — and no bare,
182
+ un-namespaced reference can survive in inline (`id_namespace:` String) output.
183
+ - **Attribute values containing CSS escapes are rejected outright.**
184
+ Browsers feed SVG presentation attributes through their CSS value parsers,
185
+ where an escape can re-form a token after the sanitizer's pattern checks
186
+ (`ur\6c(...)` is `url(...)`). No allowlisted attribute legitimately
187
+ contains a backslash, so any attribute value with one is now dropped.
188
+ - **SVG parsing rejects encodings the byte-level guards cannot see through.**
189
+ The DOCTYPE/processing-instruction guards are ASCII byte scans; a UTF-16
190
+ document interleaves NUL bytes between the ASCII characters, so a
191
+ `<!DOCTYPE` (and an entity payload behind it) could slip past them while
192
+ REXML still decoded and honoured it. SVG documents must now be UTF-8 (BOM
193
+ allowed) or declare a single-byte ASCII-transparent charset (US-ASCII,
194
+ ISO-8859-*, Windows-125x): UTF-16/32 BOMs, embedded NUL bytes, and declared
195
+ multi-byte or transforming encodings (Shift_JIS, GBK, EUC-*, ISO-2022-*,
196
+ UTF-7) raise `InvalidImageError`. Declared names that fit the allowed shape
197
+ but resolve to no real encoding (e.g. `utf8`, `windows-1259`) also fail
198
+ closed as `InvalidImageError` instead of surfacing REXML's bare
199
+ `ArgumentError`.
200
+
8
201
  ## [0.2.0] - 2026-06-10
9
202
 
10
203
  The host's whole image-processing posture is now decided in one place, once,
data/README.md CHANGED
@@ -142,7 +142,7 @@ sudo pacman -S --needed libvips \
142
142
  | `jpegoptim` | required for JPEG `optimize` | lossless JPEG optimisation and metadata stripping | JPEG `optimize` raises in strict mode |
143
143
  | `oxipng` | required for PNG `optimize` | lossless PNG optimisation | PNG `optimize` raises in strict mode |
144
144
  | `pngquant` | optional | lossy PNG quantisation (`optimize_mode: :lossy`, files < 500KB) | lossy mode silently skips the quantisation pass |
145
- | `jpegtran` (libjpeg-turbo) | optional | lossless tier of `fix_orientation` for MCU-aligned JPEGs | falls back to the libvips re-encode tier |
145
+ | `jpegtran` (libjpeg-turbo) | optional | lossless tier of `fix_orientation`; uprighting EXIF-oriented JPEGs in `optimize` | `fix_orientation` falls back to the libvips re-encode tier; `optimize` of an oriented JPEG raises in strict mode (left untouched otherwise) |
146
146
  | `cjpegli` (libjxl) | optional | higher-quality encoding of generated JPEGs on the `:vips` backend — used automatically when installed | generated JPEGs use the backend's own encoder |
147
147
  | `landlock` gem (Linux kernel ≥ 5.13) | required for `landlock: true` | the atomic sandbox around every operation | `configure!(landlock: true)` raises at boot; `sandbox_available?` is false |
148
148
  | `rexml` gem | automatic | SVG sanitising and SVG metadata | installed as a gem dependency |
@@ -228,7 +228,9 @@ Optimizer operations return a hash:
228
228
  before_bytes: 123_456,
229
229
  after_bytes: 120_000,
230
230
  saved_bytes: 3_456,
231
- tools: ["jpegoptim"]
231
+ tools: ["jpegoptim"],
232
+ rotated_from: nil, # the EXIF orientation baked into the pixels, when one was set
233
+ trimmed: false # true when uprighting dropped partial edge blocks (see optimize)
232
234
  }
233
235
  ```
234
236
 
@@ -367,8 +369,26 @@ SafeImage.dominant_color("upload.png") # => "6F745E"
367
369
 
368
370
  These helpers are intended to cover `FastImage.size(url)` / `FastImage.type(url)`
369
371
  style use cases without another Ruby dependency. They use only Ruby stdlib
370
- `Net::HTTP`, download to a tempfile with a byte cap, then run the normal Safe
371
- Image local metadata path on that tempfile.
372
+ `Net::HTTP` and stream to a tempfile with a byte cap.
373
+
374
+ Like FastImage, the metadata helpers (`remote_size`, `remote_type`,
375
+ `remote_info`, `remote_animated?`) download as little as possible: the normal
376
+ Safe Image local metadata path probes the partially-downloaded tempfile as
377
+ bytes arrive (first at 64KB, then at growing thresholds) and the transfer is
378
+ aborted as soon as the answer is final — typically after the first 64KB.
379
+ Early answers are only trusted when more data cannot change them:
380
+
381
+ - "not animated" is reported only from the complete file, because a truncated
382
+ animation undercounts frames; "animated" is final as soon as a second frame
383
+ is seen
384
+ - SVG metadata always downloads the whole document, so the SVG parser's total
385
+ size cap keeps its meaning
386
+ - a probe failure on a prefix just means the download continues; a file that
387
+ never yields an early answer is downloaded and validated exactly like a
388
+ `fetch_remote` download
389
+
390
+ `fetch_remote` and `remote_dominant_color` need the complete body and always
391
+ download it (up to `max_bytes`).
372
392
 
373
393
  Remote fetching is deliberately conservative:
374
394
 
@@ -392,9 +412,14 @@ Remote fetching is deliberately conservative:
392
412
  - private, loopback, link-local, multicast, documentation, benchmarking,
393
413
  carrier-grade NAT, IPv4-mapped IPv6, NAT64, 6to4/Teredo, and other
394
414
  special-use resolved addresses are rejected by default
395
- - no image decoding happens directly from the socket
415
+ - no image decoding happens directly from the socket; probes only ever see the
416
+ on-disk tempfile
396
417
  - the final response `Content-Type` must be an allowed image type and must agree
397
- with an image-looking URL extension when one is present
418
+ with an image-looking URL extension when one is present — both are enforced
419
+ from the response headers, before any body bytes are downloaded
420
+ - the first bytes of the body must be compatible with the claimed format's
421
+ magic bytes (SVG, which has no signature, is exempt); an obviously mislabeled
422
+ body is dropped after the first chunk instead of being downloaded to the cap
398
423
  - downloaded content is probed before `fetch_remote` yields the tempfile, so the
399
424
  raw downloader cannot be used as a blind extension-based file saver
400
425
  - SVG remote metadata uses the same bounded SVG metadata parser after download;
@@ -668,6 +693,14 @@ JPEG path:
668
693
  - uses `jpegoptim`
669
694
  - `quality:` maps to `jpegoptim --max`
670
695
  - metadata is stripped unless `strip_metadata: false`
696
+ - an EXIF-oriented JPEG is uprighted first with jpegtran's lossless
697
+ transforms, because stripping would otherwise delete the orientation tag
698
+ without applying the rotation and ship the image sideways. MCU-aligned
699
+ images rotate exactly (`-perfect`); others drop the partial edge blocks
700
+ (`-trim`, under one MCU — at most 15px), reported as `trimmed: true` in
701
+ the result rather than re-encoding behind a method named `optimize`.
702
+ Without `jpegtran`, an oriented JPEG raises in strict mode and is left
703
+ untouched otherwise — never stripped sideways.
671
704
 
672
705
  PNG path:
673
706
 
@@ -680,19 +713,56 @@ optimizer tools are tolerated.
680
713
 
681
714
  ### SVG sanitising
682
715
 
683
- #### `SafeImage.sanitize_svg!(path)`
716
+ #### `SafeImage.sanitize_svg!(path, id_namespace:)`
684
717
 
685
- Sanitises an SVG in place using a small REXML allowlist.
718
+ Sanitises an SVG in place using a small REXML allowlist. `id_namespace:` is
719
+ **required** — it forces a deliberate choice of where the output may be used,
720
+ so there is no silently-wrong default (see "Inlining" below):
686
721
 
687
722
  ```ruby
688
- result = SafeImage.sanitize_svg!("icon.svg")
723
+ # served as an <img src>/CSS-url/file and never spliced into a page's DOM:
724
+ result = SafeImage.sanitize_svg!("icon.svg", id_namespace: :standalone)
725
+
726
+ # spliced inline into an HTML DOM (pass a stable, per-document token):
727
+ result = SafeImage.sanitize_svg!("icon.svg", id_namespace: "u#{upload.sha1}")
728
+
689
729
  puts result[:sanitized]
690
730
  ```
691
731
 
732
+ Omitting `id_namespace:` (or passing `nil`/`""`) raises `ArgumentError`.
733
+
692
734
  The sanitizer removes unsafe elements/attributes such as scripts and event
693
735
  handlers. It is intentionally conservative rather than a full browser-grade SVG
694
736
  implementation.
695
737
 
738
+ CSS is reduced to a constructed allowlist subset rather than stripped: `style`
739
+ attributes (as written by Inkscape) and `<style>` elements (as written by
740
+ Illustrator) survive when they parse against a small grammar — allowlisted
741
+ properties, type/class/id selectors, numeric/keyword/color values, and
742
+ `url(#fragment)` references only. The output is reassembled from validated
743
+ tokens, never echoed from the input; escapes, quotes, at-rules (`@import`,
744
+ `@font-face`, `@media`), comments, strings, and unknown
745
+ properties/functions/selectors drop the declaration, rule, or whole stylesheet
746
+ rather than being interpreted.
747
+
748
+ Two behaviours are worth knowing before relying on this:
749
+
750
+ - **The CSS property allowlist mirrors the presentation attributes that have
751
+ CSS-property twins** — `SvgCss::ALLOWED_PROPERTIES` is a subset of
752
+ `SvgSanitizer::ALLOWED_ATTRIBUTES` (asserted by a test), so a `fill:`
753
+ declaration and a `fill=""` attribute are treated identically and a property
754
+ the sanitizer would strip as an attribute is also dropped in CSS. (The
755
+ reverse does not hold: geometry/XML attributes like `width`, `href`, and
756
+ `xmlns` are not CSS properties.) The set covers the common paint, stroke
757
+ (including `stroke-dasharray` and `vector-effect`), marker, text, and
758
+ visibility properties that Inkscape and Illustrator emit; it is deliberately
759
+ narrower than a browser. Filters (`filter`, `fe*`) are not yet included.
760
+ - **A `<style>` element fails closed as a whole** on anything outside a flat
761
+ list of `selector { declarations }` rules. Any at-rule (e.g. one stray
762
+ `@import`), a nested block, or an unbalanced brace discards every rule in that
763
+ element, not just the offending one. Within a well-formed stylesheet,
764
+ individual selectors and declarations still drop independently.
765
+
696
766
  SVG sanitising is defense-in-depth for stored bytes. Applications that serve
697
767
  user-supplied SVGs directly should still use response-level controls such as a
698
768
  restrictive `Content-Security-Policy`, `X-Content-Type-Options: nosniff`, and/or
@@ -700,6 +770,63 @@ attachment/sandbox handling for direct-open routes. Browsers restrict script
700
770
  execution when an SVG is embedded as `<img>`, but a top-level SVG document is a
701
771
  different sink.
702
772
 
773
+ #### Inlining sanitized SVG into an HTML page
774
+
775
+ The `id_namespace:` argument forces this decision at every call site — there is
776
+ no default to get wrong.
777
+
778
+ Pass `:standalone` when the output is only ever served as an external resource —
779
+ `<img src>`, `background-image`, an `<object>`/`<iframe>`, or its own file. This
780
+ is the document-safe form. It is **not** safe to splice directly into an HTML
781
+ DOM: a preserved `<style>` rule like `*{visibility:hidden}` or
782
+ `#header{display:none}` would join the host document's cascade, and the SVG's
783
+ `id`s could clobber host ids — both CSS-injection / UI-redress vectors.
784
+
785
+ Pass a **stable, per-document** String (e.g. the upload's sha) to make the output
786
+ safe to inline:
787
+
788
+ ```ruby
789
+ SafeImage.sanitize_svg!("icon.svg", id_namespace: "u#{upload.sha1}")
790
+ ```
791
+
792
+ With a namespace, the sanitizer:
793
+
794
+ - prefixes every `id` and every reference to it — `href`/`xlink:href` fragments,
795
+ `url(#…)` in attributes and CSS, and ARIA IDREF attributes (`aria-labelledby`,
796
+ `aria-describedby`, `aria-controls`, …) — so internal references stay intact
797
+ but cannot collide with host ids; and
798
+ - prefixes every `class` token (and the `.class` selectors that match them), so
799
+ an attacker can't invoke the host page's framework CSS — a bare
800
+ `class="modal fixed"` would otherwise pick up Bootstrap/Tailwind/app styles and
801
+ become an overlay. Internal class styling still matches because attribute and
802
+ selector are prefixed together; and
803
+ - scopes every `<style>` selector under a `<ns>-scope` class it adds to the root
804
+ `<svg>`, so `*` and type selectors only match that document's own content and
805
+ can never reach the host page; and
806
+ - rejects `var()`, `env()`, and `attr()` in presentation attributes — they
807
+ resolve against the host page (custom properties, environment) and could pull
808
+ in values, including a `url()`, the sanitizer never saw; and
809
+ - drops `overflow` from the root `<svg>` so it clips to its declared viewport — a
810
+ tiny `width`/`height` with `overflow:visible` and oversized content would
811
+ otherwise paint a full-page overlay. Inner elements keep `overflow` (markers
812
+ need it); the root clip bounds them.
813
+
814
+ Because every `<style>` selector is anchored *under* the scope class, a rule
815
+ targeting the root itself — `svg { … }`, `* { … }` intended to include the root,
816
+ or a class on the root such as `.icon { … }` for `<svg class="icon">` — matches
817
+ the root's descendants but not the root element. Root-level styling from a
818
+ `<style>` block therefore does not survive; style the root via attributes if you
819
+ need it. (This is rare in editor exports, which style the root with attributes
820
+ and inner elements with classes.)
821
+
822
+ `style=""` attributes never need selector scoping — a declaration list only
823
+ styles its own element — so they are not a cascade risk in either mode. They can
824
+ still carry `url(#…)` references, though, which are only namespaced when you pass
825
+ a String; so `:standalone` output (bare ids and references) is still not for
826
+ inline use. The transform is idempotent for a given namespace, so re-sanitising
827
+ is a no-op. Use a per-document value so two inlined SVGs on one page don't share
828
+ a namespace.
829
+
703
830
  ### Compatibility aliases
704
831
 
705
832
  Two thin wrappers kept for callers migrating from existing upload pipelines:
@@ -834,11 +961,39 @@ a sandboxed command fails, the operation fails.
834
961
 
835
962
  The sandbox grants read/write access only to the paths inferred from the
836
963
  operation arguments, plus runtime/library paths and temporary directories needed
837
- by Ruby, libvips, ImageMagick, and optimizer tools. Network syscalls are denied
838
- through the Landlock helper's seccomp layer. Worker processes inherit the
964
+ by Ruby, libvips, ImageMagick, and optimizer tools. Worker processes inherit the
839
965
  parent's backend and pixel-ceiling configuration; landlock is forced off
840
966
  inside the worker so sandboxed operations never nest.
841
967
 
968
+ Operations are served by a pool of resident **zygote** workers: each is a
969
+ fresh Ruby process that boots the gem once and then forks a child per
970
+ operation, so the ~85ms boot cost (Ruby + requires + libvips init) is paid
971
+ once per burst instead of per call — a warm sandboxed operation costs ~3–8ms
972
+ over the unsandboxed one. The pool grows on demand to
973
+ `SAFE_IMAGE_ZYGOTE_WORKERS` (default 8), so N threads run N sandboxed
974
+ operations concurrently (throughput scales near-linearly with cores until the
975
+ work itself saturates the CPU); offered concurrency past the cap blocks until
976
+ a worker frees, which also bounds how many libvips decodes run at once.
977
+ Idling is cheap (~16MB private memory per worker, zero CPU), so a worker
978
+ lingers for `Zygote::IDLE_SECONDS` (300) without work before exiting on its
979
+ own; the next operation boots a new one. Workers also exit immediately when
980
+ their parent process does, and `configure!` always retires the pool.
981
+
982
+ A zygote itself never touches untrusted bytes: each forked child first
983
+ applies rlimits, its per-operation Landlock policy (filesystem allowlist, all
984
+ TCP denied on Landlock ABI ≥ 4, abstract-unix-socket/signal scopes on
985
+ ABI ≥ 6), and — when the installed `landlock` gem exposes
986
+ `seccomp_deny_network!` — the helper's deny-all-network seccomp filter (which
987
+ blocks sockets of every family, closing the non-TCP/UDP gap the in-process
988
+ Landlock policy alone leaves open), and only then runs the operation. Forking
989
+ is sound because the zygote never runs operations itself — libvips is
990
+ initialised but quiescent (no native threads) at every fork.
991
+
992
+ `SAFE_IMAGE_ZYGOTE=0` falls back to the exec-per-operation worker (a fresh
993
+ sandboxed Ruby per call through the Landlock helper binary, whose seccomp
994
+ filter denies sockets of every family, no pool); `SAFE_IMAGE_ZYGOTE_WORKERS`
995
+ and `SAFE_IMAGE_ZYGOTE_IDLE_SECONDS` tune the pool cap and idle window.
996
+
842
997
  ## Development
843
998
 
844
999
  ```bash
@@ -182,7 +182,7 @@ module SafeImage
182
182
  format = File.extname(output).delete_prefix(".").downcase
183
183
  format = "jpg" if format == "jpeg"
184
184
  return unless Processor::OPTIMIZABLE_OUTPUTS.include?(format)
185
- Optimizer.optimize(output, mode: :lossless, strip_metadata: true, quality: quality)
185
+ Optimizer.optimize(output, mode: :lossless, strip_metadata: true, quality: quality, assume_upright: true)
186
186
  end
187
187
 
188
188
  # JPEG default when the caller passes no quality: matches what ImageMagick
@@ -264,17 +264,6 @@ module SafeImage
264
264
  convert(from, to, format: "jpg", quality: quality, optimize: optimize, max_pixels: max_pixels, chroma_subsampling: chroma_subsampling)
265
265
  end
266
266
 
267
- # EXIF orientation values mapped onto jpegtran's lossless transforms.
268
- JPEGTRAN_OPERATIONS = {
269
- 2 => ["-flip", "horizontal"],
270
- 3 => ["-rotate", "180"],
271
- 4 => ["-flip", "vertical"],
272
- 5 => ["-transpose"],
273
- 6 => ["-rotate", "90"],
274
- 7 => ["-transverse"],
275
- 8 => ["-rotate", "270"]
276
- }.freeze
277
-
278
267
  def fix_orientation(from, to = from, max_pixels: nil, quality: nil)
279
268
  max_pixels = SafeImage.resolved_max_pixels(max_pixels)
280
269
  output = PathSafety.ensure_safe_output_path!(to).to_s
@@ -323,7 +312,7 @@ module SafeImage
323
312
  def jpegtran_fix_orientation(input, output, orient)
324
313
  started = Process.clock_gettime(Process::CLOCK_MONOTONIC)
325
314
  info = write_through_tempfile(output) do |tmp_path|
326
- Runner.run!(["jpegtran", "-copy", "none", "-perfect", *JPEGTRAN_OPERATIONS.fetch(orient), "-outfile", tmp_path, input])
315
+ Runner.run!(["jpegtran", "-copy", "none", "-perfect", *Optimizer::JPEGTRAN_OPERATIONS.fetch(orient), "-outfile", tmp_path, input])
327
316
  Native.probe(tmp_path)
328
317
  end
329
318
  result_from_info(
@@ -3,7 +3,7 @@
3
3
  require "tempfile"
4
4
 
5
5
  module SafeImage
6
- # Pure-Ruby ICO container support, in the spirit of the REXML SVG path:
6
+ # Pure-Ruby ICO container support, in the spirit of the SVG metadata path:
7
7
  # the directory and legacy DIB payloads are parsed in memory-safe Ruby with
8
8
  # explicit bounds checks, and pixel encoding is delegated to the hardened
9
9
  # native libvips helpers. ImageMagick is never involved.
@@ -45,17 +45,18 @@ module SafeImage
45
45
  out_format = output_format!(format)
46
46
 
47
47
  VipsGlue.with_images do |track|
48
- # Header read through the explicit loader: validates the bytes and
49
- # enforces the pixel cap before any full decode.
50
- header, input_format = load_image(track, input)
51
- check_pixels!(header, max_pixels)
48
+ # Route through the explicit loader for the path's extension and keep
49
+ # the resulting image object for the resize. Do not call the generic
50
+ # filename-based `thumbnail` operation here: it re-sniffs the path and
51
+ # can observe different bytes if a hostile local path is replaced
52
+ # between the header check and the decode.
53
+ image, input_format = load_image(track, input, autorotate: true)
54
+ check_pixels!(image, max_pixels)
52
55
 
53
- # Thumbnail from the file so libvips can shrink on load (e.g.
54
- # libjpeg DCT downscaling); auto-rotates by default.
55
56
  thumb = track.call(
56
57
  VipsGlue.operation(
57
- "thumbnail",
58
- { filename: input, width: width, height: height,
58
+ "thumbnail_image",
59
+ { in: image, width: width, height: height,
59
60
  size: "both", crop: "centre", fail_on: "error" }
60
61
  )
61
62
  )
@@ -75,7 +76,7 @@ module SafeImage
75
76
  out_format = output_format!(format)
76
77
 
77
78
  VipsGlue.with_images do |track|
78
- image, input_format = load_image(track, String(input))
79
+ image, input_format = load_image(track, String(input), autorotate: true)
79
80
  check_pixels!(image, max_pixels)
80
81
  rotated = track.call(VipsGlue.operation("autorot", { in: image }))
81
82
  resized = track.call(VipsGlue.operation("resize", { in: rotated, scale: scale }))
@@ -94,7 +95,7 @@ module SafeImage
94
95
  out_format = output_format!(format)
95
96
 
96
97
  VipsGlue.with_images do |track|
97
- image, input_format = load_image(track, String(input))
98
+ image, input_format = load_image(track, String(input), autorotate: true)
98
99
  check_pixels!(image, max_pixels)
99
100
  rotated = track.call(VipsGlue.operation("autorot", { in: image }))
100
101
 
@@ -116,7 +117,7 @@ module SafeImage
116
117
  out_format = output_format!(format)
117
118
 
118
119
  VipsGlue.with_images do |track|
119
- image, input_format = load_image(track, String(input))
120
+ image, input_format = load_image(track, String(input), autorotate: true)
120
121
  check_pixels!(image, max_pixels)
121
122
  rotated = track.call(VipsGlue.operation("autorot", { in: image }))
122
123
 
@@ -306,7 +307,7 @@ module SafeImage
306
307
  raise ArgumentError, "quality must be 1..100" unless (1..100).cover?(quality)
307
308
  end
308
309
 
309
- def load_image(track, path)
310
+ def load_image(track, path, autorotate: false)
310
311
  format = normalized_format(File.extname(path).delete_prefix("."))
311
312
  raise UnsupportedFormatError, "unsupported input format" unless format
312
313
 
@@ -319,9 +320,17 @@ module SafeImage
319
320
  raise UnsupportedFormatError, "this libvips build has no JPEG XL loader" unless VipsGlue.type_find?("jxlload")
320
321
  end
321
322
 
322
- image = track.call(
323
- VipsGlue.operation(LOADERS.fetch(format), { filename: path, access: "sequential", fail_on: "error" })
324
- )
323
+ options = { filename: path, access: "sequential", fail_on: "error" }
324
+ image = track.call(VipsGlue.operation(LOADERS.fetch(format), options))
325
+
326
+ # autorot flips/rotates pull rows out of input order, which a
327
+ # sequential source can only serve while the image fits its readahead
328
+ # window (~512px); larger oriented images fail with "out of order
329
+ # read". Reload those with random access — the open itself is
330
+ # header-only, so the caller's pixel cap still runs before any decode.
331
+ if autorotate && VipsGlue.orientation(image) > 1
332
+ image = track.call(VipsGlue.operation(LOADERS.fetch(format), options.merge(access: "random")))
333
+ end
325
334
  [image, format]
326
335
  end
327
336