safe_image 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +193 -0
- data/README.md +166 -11
- data/lib/safe_image/discourse_compat.rb +2 -13
- data/lib/safe_image/ico.rb +1 -1
- data/lib/safe_image/native.rb +24 -15
- data/lib/safe_image/optimizer.rb +79 -4
- data/lib/safe_image/processor.rb +1 -1
- data/lib/safe_image/remote.rb +174 -8
- data/lib/safe_image/runner.rb +9 -1
- data/lib/safe_image/sandbox.rb +41 -14
- data/lib/safe_image/svg_css.rb +314 -0
- data/lib/safe_image/svg_metadata.rb +179 -53
- data/lib/safe_image/svg_sanitizer.rb +524 -43
- data/lib/safe_image/version.rb +1 -1
- data/lib/safe_image/zygote.rb +619 -0
- data/lib/safe_image.rb +12 -0
- metadata +18 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ce9a76b504fa826aef4164c5748c800a95828ef7fda6bda3a00e7e85f0134f38
|
|
4
|
+
data.tar.gz: 61c5ebac14806e8e3445c3e7aa73bb1ab5191cc414e1af311b752ad0be492bc7
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5d15f34ca4de05f1ca4618695bc3993a000ce38a7de54afde86ac87497c0288ad0ed37493ca31431669341be10991bbcdb5192796ea6bdb3af016943adfac2bc
|
|
7
|
+
data.tar.gz: e71a59800452d4d396741e4e28134c47403490100233e803644a90a0ca967abd74fb5572a936a7af247411cb2055378982bccad89085085184aa92c66ca94d0d
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,199 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.3.0 - 2026-06-12]
|
|
9
|
+
|
|
10
|
+
### Changed (breaking)
|
|
11
|
+
|
|
12
|
+
- **`sanitize_svg!` now requires `id_namespace:`.** The argument forces a
|
|
13
|
+
deliberate choice of where the output may be used, removing the footgun of a
|
|
14
|
+
silently-wrong default. Pass `:standalone` for output served only as an
|
|
15
|
+
external `<img>`/CSS-url/file, or a stable per-document String to make it safe
|
|
16
|
+
to inline (see below). Omitting it (or passing `nil`/`""`) raises
|
|
17
|
+
`ArgumentError`. Callers must update: `sanitize_svg!(path)` →
|
|
18
|
+
`sanitize_svg!(path, id_namespace: :standalone)`.
|
|
19
|
+
|
|
20
|
+
### Added
|
|
21
|
+
|
|
22
|
+
- **`sanitize_svg!` can produce output safe to inline into an HTML DOM.** Pass
|
|
23
|
+
`id_namespace:` a stable per-document value (e.g. the upload sha) and the
|
|
24
|
+
sanitizer prefixes every `id` and every reference to it (`href`/`xlink:href`
|
|
25
|
+
fragments, `url(#…)` in attributes and CSS, ARIA IDREF attributes like
|
|
26
|
+
`aria-labelledby`/`aria-controls`, and every `class` token plus the matching
|
|
27
|
+
`.class` selectors) with the namespace, and scopes every `<style>` selector
|
|
28
|
+
under a `<ns>-scope` class added to the root `<svg>`. Namespacing classes stops
|
|
29
|
+
an inlined SVG from invoking the host page's framework CSS (a bare
|
|
30
|
+
`class="modal fixed"` overlay vector). `var()`/`env()`/`attr()` in presentation
|
|
31
|
+
attributes are rejected outright — they resolve against the host page.
|
|
32
|
+
Inlined into a page, the preserved `<style>` can no longer reach the host
|
|
33
|
+
cascade (`*{visibility:hidden}`, `#header{display:none}`) and ids cannot
|
|
34
|
+
clobber host ids — including references written `URL(#x)`, `url('#x')`, or
|
|
35
|
+
`url("#x")`, which are namespaced like the unquoted form. In this mode the root
|
|
36
|
+
`<svg>`'s `overflow` is also dropped so it clips to its declared viewport (a
|
|
37
|
+
tiny viewport with `overflow:visible` and oversized content would otherwise
|
|
38
|
+
paint a full-page overlay). With `id_namespace: :standalone` the output is the
|
|
39
|
+
document-safe form (no namespacing). The namespace must be a valid ident (a
|
|
40
|
+
letter followed by letters/digits/`_`/`-`); malformed tokens are rejected
|
|
41
|
+
rather than coerced, so two distinct values can never collapse to one. The
|
|
42
|
+
transform is idempotent per namespace. `style=""` attributes are
|
|
43
|
+
element-scoped and inline-safe either way.
|
|
44
|
+
- **`<style>` elements now fail closed on any at-rule.** Previously an at-rule
|
|
45
|
+
block followed by a valid rule (`@font-face{…}.ok{…}`) could keep the trailing
|
|
46
|
+
rule; a stylesheet containing `@` anywhere is now rejected whole, matching the
|
|
47
|
+
documented guarantee.
|
|
48
|
+
- **The SVG sanitizer keeps a safe CSS subset instead of stripping all CSS.**
|
|
49
|
+
`style` attributes (as written by Inkscape) and `<style>` elements (as
|
|
50
|
+
written by Illustrator) now survive sanitisation when they parse against a
|
|
51
|
+
constructed allowlist grammar: properties mirroring the allowed
|
|
52
|
+
presentation attributes, type/class/id selectors, numeric/keyword/color
|
|
53
|
+
values, and `url(#fragment)` references only. Output is reassembled from
|
|
54
|
+
validated tokens — CSS escapes, quotes/strings, at-rules, comments, and
|
|
55
|
+
unknown properties, functions, or selectors drop the declaration, rule, or
|
|
56
|
+
whole stylesheet rather than being interpreted. A single at-rule or nested
|
|
57
|
+
block fails the whole `<style>` element closed. `!important` and modern
|
|
58
|
+
`rgb()/hsl()` slash-alpha (`rgb(R G B / A)`) are preserved; both are parsed
|
|
59
|
+
structurally and re-emitted, and admitting `/` for the alpha keeps CSS
|
|
60
|
+
comments impossible because `*` remains excluded from the value charset.
|
|
61
|
+
- **The presentation-attribute allowlist covers common editor output.** Added
|
|
62
|
+
the safe, widely-emitted SVG presentation properties (and their CSS twins):
|
|
63
|
+
`stroke-dasharray`/`stroke-dashoffset`, `vector-effect`, `marker`/`marker-*`
|
|
64
|
+
(with the `<marker>` element and its geometry attributes), `color`,
|
|
65
|
+
`display`/`visibility`/`overflow`, `paint-order`/`mix-blend-mode`/`isolation`,
|
|
66
|
+
the `*-rendering` hints, and the longhand text properties (`font-style`,
|
|
67
|
+
`font-variant`, `font-stretch`, `text-decoration`, `letter-spacing`,
|
|
68
|
+
`word-spacing`, `dominant-baseline`, `baseline-shift`, `writing-mode`,
|
|
69
|
+
`direction`). The only additions carrying a URL — `marker*` — are constrained
|
|
70
|
+
to `url(#fragment)` like the existing paint and clip/mask references. Filters
|
|
71
|
+
remain out of scope.
|
|
72
|
+
|
|
73
|
+
### Changed
|
|
74
|
+
|
|
75
|
+
- **Sandboxed operations are served by a pool of resident zygote workers —
|
|
76
|
+
warm Landlock overhead drops from ~100ms to ~3–8ms per operation, with
|
|
77
|
+
near-linear concurrency.** Previously every sandboxed call exec'd a fresh
|
|
78
|
+
Ruby that re-paid interpreter boot, rubygems, the gem's requires, and
|
|
79
|
+
libvips init. A zygote boots that once, then forks a child per operation;
|
|
80
|
+
the child applies rlimits, its per-operation Landlock policy (filesystem
|
|
81
|
+
allowlist, all TCP denied on ABI ≥ 4, abstract-unix-socket/signal scopes on
|
|
82
|
+
ABI ≥ 6), and — when the installed `landlock` gem exposes
|
|
83
|
+
`seccomp_deny_network!` — the helper's deny-all-network seccomp filter
|
|
84
|
+
(blocking sockets of every family, closing the UDP gap the in-process
|
|
85
|
+
Landlock policy alone leaves open), *before* touching untrusted input, and
|
|
86
|
+
exits after the operation. Workers are pooled so N threads run N sandboxed
|
|
87
|
+
operations at once (the single zygote would serialise them): the pool grows
|
|
88
|
+
on demand to `SAFE_IMAGE_ZYGOTE_WORKERS` (default 8) and offered concurrency
|
|
89
|
+
past the cap blocks until a worker frees, bounding concurrent libvips
|
|
90
|
+
memory. An idle worker exits after `Zygote::IDLE_SECONDS` (300, overridable
|
|
91
|
+
via `SAFE_IMAGE_ZYGOTE_IDLE_SECONDS`; idling holds ~16MB private memory and
|
|
92
|
+
no CPU), exits immediately when its parent process does, and `configure!`
|
|
93
|
+
always retires the pool, so no stale process outlives its parent or a
|
|
94
|
+
reconfigure. Forking is sound because the zygote never runs operations
|
|
95
|
+
itself: libvips is initialised but quiescent (zero native threads) at every
|
|
96
|
+
fork — verified empirically. Outputs are byte-identical to both the exec
|
|
97
|
+
worker and unsandboxed operation. `SAFE_IMAGE_ZYGOTE=0` falls back to the
|
|
98
|
+
exec-per-operation worker. Pool correctness under concurrent reconfigure,
|
|
99
|
+
worker death, and fork is enforced by a per-worker generation token (a
|
|
100
|
+
worker checked out when `configure!` lands is retired on return, never
|
|
101
|
+
reused under stale config), channel-health (not process-liveness) reuse
|
|
102
|
+
decisions (a worker whose pipe broke is discarded, never re-pooled), a
|
|
103
|
+
`PR_SET_PDEATHSIG` so an operation child dies with its zygote, and
|
|
104
|
+
parent-side tmp-root cleanup that survives a SIGKILLed worker — all
|
|
105
|
+
exercised by a multithreaded chaos stress test (reconfigure + worker kills)
|
|
106
|
+
asserting no wrong output, no process/fd/tmpdir leak, and no deadlock.
|
|
107
|
+
|
|
108
|
+
- **External tools run with `MALLOC_ARENA_MAX=2`.** Multithreaded optimizer
|
|
109
|
+
tools (oxipng's rayon pool, ImageMagick's OpenMP) otherwise have glibc
|
|
110
|
+
reserve a 64MB malloc arena per thread; combined with the sandbox's
|
|
111
|
+
`RLIMIT_AS` memory cap, that *address-space* reservation spuriously fails
|
|
112
|
+
the tool under concurrency even though real memory use is tiny. Bounding the
|
|
113
|
+
arena count is the standard mitigation and is free for these compute-bound
|
|
114
|
+
tools. Found by stress-testing concurrent sandboxed `convert`.
|
|
115
|
+
|
|
116
|
+
- **rexml loads on first SVG use instead of at `require "safe_image"`.**
|
|
117
|
+
rexml costs ~27ms to parse and only the SVG paths need it, so every boot of
|
|
118
|
+
the gem — and in particular every Landlock sandbox worker, which is a fresh
|
|
119
|
+
Ruby process per operation — was paying it on operations that never touch
|
|
120
|
+
SVG. Measured per-op sandbox cost drops from ~102ms to ~85ms on the vips
|
|
121
|
+
backend and ~75ms to ~58ms on the imagemagick backend.
|
|
122
|
+
|
|
123
|
+
- **Remote fetches reject bad responses from the headers alone.** The
|
|
124
|
+
`Content-Type` allowlist and content-type/extension agreement checks now run
|
|
125
|
+
before any body bytes are read (previously the body was downloaded first and
|
|
126
|
+
rejected afterwards), and the first bytes of the body must be compatible
|
|
127
|
+
with the claimed format's magic bytes — an obviously mislabeled body is
|
|
128
|
+
dropped after the first chunk instead of being downloaded to `max_bytes`.
|
|
129
|
+
- **Remote metadata helpers download only what the answer needs.**
|
|
130
|
+
`remote_size`, `remote_type`, `remote_info` and `remote_animated?` now probe
|
|
131
|
+
the partially-downloaded file at growing thresholds (64KB, 256KB, ...) and
|
|
132
|
+
abort the transfer once the answer is final, instead of always downloading
|
|
133
|
+
up to `max_bytes`. Early answers are only trusted when more data cannot
|
|
134
|
+
change them: "not animated" still requires the complete file (truncated
|
|
135
|
+
animations undercount frames), SVG metadata still downloads the whole
|
|
136
|
+
document so the SVG size cap keeps its meaning, and any prefix probe
|
|
137
|
+
failure falls back to the full download with unchanged validation and
|
|
138
|
+
error behaviour. `fetch_remote` and `remote_dominant_color` still download
|
|
139
|
+
the complete body.
|
|
140
|
+
|
|
141
|
+
### Fixed
|
|
142
|
+
|
|
143
|
+
- **Lossy PNG optimisation no longer fails when pngquant declines to
|
|
144
|
+
quantise.** pngquant signals "quantised result not used" through exit
|
|
145
|
+
status 98 (`--skip-if-larger`) and 99 (`--quality` not met) — for example
|
|
146
|
+
on low-bit-depth grayscale PNGs its RGBA-palette output cannot beat —
|
|
147
|
+
and `optimize(mode: :lossy)` raised `CommandError` instead of keeping the
|
|
148
|
+
original and continuing to oxipng. Found by running 10 random Wikimedia
|
|
149
|
+
Commons images through the optimizer.
|
|
150
|
+
|
|
151
|
+
- **`optimize` no longer ships sideways JPEGs.** With `strip_metadata: true`
|
|
152
|
+
(the default), stripping deleted the EXIF orientation tag without applying
|
|
153
|
+
the rotation, so an oriented camera photo came out rendered 90/180° wrong.
|
|
154
|
+
`optimize` now bakes the rotation into the pixels first via jpegtran's
|
|
155
|
+
lossless transforms: `-perfect` when the dimensions are MCU-aligned, else
|
|
156
|
+
`-trim`, which drops the partial edge blocks (under one MCU, at most 15px)
|
|
157
|
+
instead of hiding a lossy re-encode. The result hash gains `rotated_from:`
|
|
158
|
+
and `trimmed:` so the trim is reported, never silent — image_optim's jhead
|
|
159
|
+
worker (Discourse's `FileHelper.optimize_image!`) does the same transform
|
|
160
|
+
but trims silently. Without jpegtran an oriented JPEG raises in strict mode
|
|
161
|
+
and is left untouched otherwise; reading the tag goes through the configured
|
|
162
|
+
backend, so `optimize` now also enforces the pixel cap before touching an
|
|
163
|
+
oriented JPEG. Internal callers optimising output the gem just encoded skip
|
|
164
|
+
the check (`assume_upright:`).
|
|
165
|
+
|
|
166
|
+
- **JPEGs with an EXIF orientation no longer fail on the libvips path once
|
|
167
|
+
they outgrow the sequential readahead window (~512px — every real camera
|
|
168
|
+
photo).** `resize`, `crop`, `convert`/`convert_to_jpeg` and the
|
|
169
|
+
`fix_orientation` re-encode tier loaded input with `access: sequential` and
|
|
170
|
+
then autorotated, and the rotation's out-of-order row reads raised
|
|
171
|
+
`VipsJpeg: out of order read`. Oriented images are now reloaded with random
|
|
172
|
+
access before autorotation; upright images keep the streaming sequential
|
|
173
|
+
load, and the pixel cap still runs before any decode.
|
|
174
|
+
|
|
175
|
+
### Security
|
|
176
|
+
|
|
177
|
+
- **Presentation-attribute `url()` references fail closed unless they are a
|
|
178
|
+
canonical same-document fragment.** A single validation/rewrite grammar now
|
|
179
|
+
governs both the keep decision and the namespace rewrite, so external URLs,
|
|
180
|
+
mismatched quotes, and unterminated forms (`url(#id`, `url(http://evil`) are
|
|
181
|
+
dropped rather than kept on browser parse-error leniency — and no bare,
|
|
182
|
+
un-namespaced reference can survive in inline (`id_namespace:` String) output.
|
|
183
|
+
- **Attribute values containing CSS escapes are rejected outright.**
|
|
184
|
+
Browsers feed SVG presentation attributes through their CSS value parsers,
|
|
185
|
+
where an escape can re-form a token after the sanitizer's pattern checks
|
|
186
|
+
(`ur\6c(...)` is `url(...)`). No allowlisted attribute legitimately
|
|
187
|
+
contains a backslash, so any attribute value with one is now dropped.
|
|
188
|
+
- **SVG parsing rejects encodings the byte-level guards cannot see through.**
|
|
189
|
+
The DOCTYPE/processing-instruction guards are ASCII byte scans; a UTF-16
|
|
190
|
+
document interleaves NUL bytes between the ASCII characters, so a
|
|
191
|
+
`<!DOCTYPE` (and an entity payload behind it) could slip past them while
|
|
192
|
+
REXML still decoded and honoured it. SVG documents must now be UTF-8 (BOM
|
|
193
|
+
allowed) or declare a single-byte ASCII-transparent charset (US-ASCII,
|
|
194
|
+
ISO-8859-*, Windows-125x): UTF-16/32 BOMs, embedded NUL bytes, and declared
|
|
195
|
+
multi-byte or transforming encodings (Shift_JIS, GBK, EUC-*, ISO-2022-*,
|
|
196
|
+
UTF-7) raise `InvalidImageError`. Declared names that fit the allowed shape
|
|
197
|
+
but resolve to no real encoding (e.g. `utf8`, `windows-1259`) also fail
|
|
198
|
+
closed as `InvalidImageError` instead of surfacing REXML's bare
|
|
199
|
+
`ArgumentError`.
|
|
200
|
+
|
|
8
201
|
## [0.2.0] - 2026-06-10
|
|
9
202
|
|
|
10
203
|
The host's whole image-processing posture is now decided in one place, once,
|
data/README.md
CHANGED
|
@@ -142,7 +142,7 @@ sudo pacman -S --needed libvips \
|
|
|
142
142
|
| `jpegoptim` | required for JPEG `optimize` | lossless JPEG optimisation and metadata stripping | JPEG `optimize` raises in strict mode |
|
|
143
143
|
| `oxipng` | required for PNG `optimize` | lossless PNG optimisation | PNG `optimize` raises in strict mode |
|
|
144
144
|
| `pngquant` | optional | lossy PNG quantisation (`optimize_mode: :lossy`, files < 500KB) | lossy mode silently skips the quantisation pass |
|
|
145
|
-
| `jpegtran` (libjpeg-turbo) | optional | lossless tier of `fix_orientation
|
|
145
|
+
| `jpegtran` (libjpeg-turbo) | optional | lossless tier of `fix_orientation`; uprighting EXIF-oriented JPEGs in `optimize` | `fix_orientation` falls back to the libvips re-encode tier; `optimize` of an oriented JPEG raises in strict mode (left untouched otherwise) |
|
|
146
146
|
| `cjpegli` (libjxl) | optional | higher-quality encoding of generated JPEGs on the `:vips` backend — used automatically when installed | generated JPEGs use the backend's own encoder |
|
|
147
147
|
| `landlock` gem (Linux kernel ≥ 5.13) | required for `landlock: true` | the atomic sandbox around every operation | `configure!(landlock: true)` raises at boot; `sandbox_available?` is false |
|
|
148
148
|
| `rexml` gem | automatic | SVG sanitising and SVG metadata | installed as a gem dependency |
|
|
@@ -228,7 +228,9 @@ Optimizer operations return a hash:
|
|
|
228
228
|
before_bytes: 123_456,
|
|
229
229
|
after_bytes: 120_000,
|
|
230
230
|
saved_bytes: 3_456,
|
|
231
|
-
tools: ["jpegoptim"]
|
|
231
|
+
tools: ["jpegoptim"],
|
|
232
|
+
rotated_from: nil, # the EXIF orientation baked into the pixels, when one was set
|
|
233
|
+
trimmed: false # true when uprighting dropped partial edge blocks (see optimize)
|
|
232
234
|
}
|
|
233
235
|
```
|
|
234
236
|
|
|
@@ -367,8 +369,26 @@ SafeImage.dominant_color("upload.png") # => "6F745E"
|
|
|
367
369
|
|
|
368
370
|
These helpers are intended to cover `FastImage.size(url)` / `FastImage.type(url)`
|
|
369
371
|
style use cases without another Ruby dependency. They use only Ruby stdlib
|
|
370
|
-
`Net::HTTP
|
|
371
|
-
|
|
372
|
+
`Net::HTTP` and stream to a tempfile with a byte cap.
|
|
373
|
+
|
|
374
|
+
Like FastImage, the metadata helpers (`remote_size`, `remote_type`,
|
|
375
|
+
`remote_info`, `remote_animated?`) download as little as possible: the normal
|
|
376
|
+
Safe Image local metadata path probes the partially-downloaded tempfile as
|
|
377
|
+
bytes arrive (first at 64KB, then at growing thresholds) and the transfer is
|
|
378
|
+
aborted as soon as the answer is final — typically after the first 64KB.
|
|
379
|
+
Early answers are only trusted when more data cannot change them:
|
|
380
|
+
|
|
381
|
+
- "not animated" is reported only from the complete file, because a truncated
|
|
382
|
+
animation undercounts frames; "animated" is final as soon as a second frame
|
|
383
|
+
is seen
|
|
384
|
+
- SVG metadata always downloads the whole document, so the SVG parser's total
|
|
385
|
+
size cap keeps its meaning
|
|
386
|
+
- a probe failure on a prefix just means the download continues; a file that
|
|
387
|
+
never yields an early answer is downloaded and validated exactly like a
|
|
388
|
+
`fetch_remote` download
|
|
389
|
+
|
|
390
|
+
`fetch_remote` and `remote_dominant_color` need the complete body and always
|
|
391
|
+
download it (up to `max_bytes`).
|
|
372
392
|
|
|
373
393
|
Remote fetching is deliberately conservative:
|
|
374
394
|
|
|
@@ -392,9 +412,14 @@ Remote fetching is deliberately conservative:
|
|
|
392
412
|
- private, loopback, link-local, multicast, documentation, benchmarking,
|
|
393
413
|
carrier-grade NAT, IPv4-mapped IPv6, NAT64, 6to4/Teredo, and other
|
|
394
414
|
special-use resolved addresses are rejected by default
|
|
395
|
-
- no image decoding happens directly from the socket
|
|
415
|
+
- no image decoding happens directly from the socket; probes only ever see the
|
|
416
|
+
on-disk tempfile
|
|
396
417
|
- the final response `Content-Type` must be an allowed image type and must agree
|
|
397
|
-
with an image-looking URL extension when one is present
|
|
418
|
+
with an image-looking URL extension when one is present — both are enforced
|
|
419
|
+
from the response headers, before any body bytes are downloaded
|
|
420
|
+
- the first bytes of the body must be compatible with the claimed format's
|
|
421
|
+
magic bytes (SVG, which has no signature, is exempt); an obviously mislabeled
|
|
422
|
+
body is dropped after the first chunk instead of being downloaded to the cap
|
|
398
423
|
- downloaded content is probed before `fetch_remote` yields the tempfile, so the
|
|
399
424
|
raw downloader cannot be used as a blind extension-based file saver
|
|
400
425
|
- SVG remote metadata uses the same bounded SVG metadata parser after download;
|
|
@@ -668,6 +693,14 @@ JPEG path:
|
|
|
668
693
|
- uses `jpegoptim`
|
|
669
694
|
- `quality:` maps to `jpegoptim --max`
|
|
670
695
|
- metadata is stripped unless `strip_metadata: false`
|
|
696
|
+
- an EXIF-oriented JPEG is uprighted first with jpegtran's lossless
|
|
697
|
+
transforms, because stripping would otherwise delete the orientation tag
|
|
698
|
+
without applying the rotation and ship the image sideways. MCU-aligned
|
|
699
|
+
images rotate exactly (`-perfect`); others drop the partial edge blocks
|
|
700
|
+
(`-trim`, under one MCU — at most 15px), reported as `trimmed: true` in
|
|
701
|
+
the result rather than re-encoding behind a method named `optimize`.
|
|
702
|
+
Without `jpegtran`, an oriented JPEG raises in strict mode and is left
|
|
703
|
+
untouched otherwise — never stripped sideways.
|
|
671
704
|
|
|
672
705
|
PNG path:
|
|
673
706
|
|
|
@@ -680,19 +713,56 @@ optimizer tools are tolerated.
|
|
|
680
713
|
|
|
681
714
|
### SVG sanitising
|
|
682
715
|
|
|
683
|
-
#### `SafeImage.sanitize_svg!(path)`
|
|
716
|
+
#### `SafeImage.sanitize_svg!(path, id_namespace:)`
|
|
684
717
|
|
|
685
|
-
Sanitises an SVG in place using a small REXML allowlist.
|
|
718
|
+
Sanitises an SVG in place using a small REXML allowlist. `id_namespace:` is
|
|
719
|
+
**required** — it forces a deliberate choice of where the output may be used,
|
|
720
|
+
so there is no silently-wrong default (see "Inlining" below):
|
|
686
721
|
|
|
687
722
|
```ruby
|
|
688
|
-
|
|
723
|
+
# served as an <img src>/CSS-url/file and never spliced into a page's DOM:
|
|
724
|
+
result = SafeImage.sanitize_svg!("icon.svg", id_namespace: :standalone)
|
|
725
|
+
|
|
726
|
+
# spliced inline into an HTML DOM (pass a stable, per-document token):
|
|
727
|
+
result = SafeImage.sanitize_svg!("icon.svg", id_namespace: "u#{upload.sha1}")
|
|
728
|
+
|
|
689
729
|
puts result[:sanitized]
|
|
690
730
|
```
|
|
691
731
|
|
|
732
|
+
Omitting `id_namespace:` (or passing `nil`/`""`) raises `ArgumentError`.
|
|
733
|
+
|
|
692
734
|
The sanitizer removes unsafe elements/attributes such as scripts and event
|
|
693
735
|
handlers. It is intentionally conservative rather than a full browser-grade SVG
|
|
694
736
|
implementation.
|
|
695
737
|
|
|
738
|
+
CSS is reduced to a constructed allowlist subset rather than stripped: `style`
|
|
739
|
+
attributes (as written by Inkscape) and `<style>` elements (as written by
|
|
740
|
+
Illustrator) survive when they parse against a small grammar — allowlisted
|
|
741
|
+
properties, type/class/id selectors, numeric/keyword/color values, and
|
|
742
|
+
`url(#fragment)` references only. The output is reassembled from validated
|
|
743
|
+
tokens, never echoed from the input; escapes, quotes, at-rules (`@import`,
|
|
744
|
+
`@font-face`, `@media`), comments, strings, and unknown
|
|
745
|
+
properties/functions/selectors drop the declaration, rule, or whole stylesheet
|
|
746
|
+
rather than being interpreted.
|
|
747
|
+
|
|
748
|
+
Two behaviours are worth knowing before relying on this:
|
|
749
|
+
|
|
750
|
+
- **The CSS property allowlist mirrors the presentation attributes that have
|
|
751
|
+
CSS-property twins** — `SvgCss::ALLOWED_PROPERTIES` is a subset of
|
|
752
|
+
`SvgSanitizer::ALLOWED_ATTRIBUTES` (asserted by a test), so a `fill:`
|
|
753
|
+
declaration and a `fill=""` attribute are treated identically and a property
|
|
754
|
+
the sanitizer would strip as an attribute is also dropped in CSS. (The
|
|
755
|
+
reverse does not hold: geometry/XML attributes like `width`, `href`, and
|
|
756
|
+
`xmlns` are not CSS properties.) The set covers the common paint, stroke
|
|
757
|
+
(including `stroke-dasharray` and `vector-effect`), marker, text, and
|
|
758
|
+
visibility properties that Inkscape and Illustrator emit; it is deliberately
|
|
759
|
+
narrower than a browser. Filters (`filter`, `fe*`) are not yet included.
|
|
760
|
+
- **A `<style>` element fails closed as a whole** on anything outside a flat
|
|
761
|
+
list of `selector { declarations }` rules. Any at-rule (e.g. one stray
|
|
762
|
+
`@import`), a nested block, or an unbalanced brace discards every rule in that
|
|
763
|
+
element, not just the offending one. Within a well-formed stylesheet,
|
|
764
|
+
individual selectors and declarations still drop independently.
|
|
765
|
+
|
|
696
766
|
SVG sanitising is defense-in-depth for stored bytes. Applications that serve
|
|
697
767
|
user-supplied SVGs directly should still use response-level controls such as a
|
|
698
768
|
restrictive `Content-Security-Policy`, `X-Content-Type-Options: nosniff`, and/or
|
|
@@ -700,6 +770,63 @@ attachment/sandbox handling for direct-open routes. Browsers restrict script
|
|
|
700
770
|
execution when an SVG is embedded as `<img>`, but a top-level SVG document is a
|
|
701
771
|
different sink.
|
|
702
772
|
|
|
773
|
+
#### Inlining sanitized SVG into an HTML page
|
|
774
|
+
|
|
775
|
+
The `id_namespace:` argument forces this decision at every call site — there is
|
|
776
|
+
no default to get wrong.
|
|
777
|
+
|
|
778
|
+
Pass `:standalone` when the output is only ever served as an external resource —
|
|
779
|
+
`<img src>`, `background-image`, an `<object>`/`<iframe>`, or its own file. This
|
|
780
|
+
is the document-safe form. It is **not** safe to splice directly into an HTML
|
|
781
|
+
DOM: a preserved `<style>` rule like `*{visibility:hidden}` or
|
|
782
|
+
`#header{display:none}` would join the host document's cascade, and the SVG's
|
|
783
|
+
`id`s could clobber host ids — both CSS-injection / UI-redress vectors.
|
|
784
|
+
|
|
785
|
+
Pass a **stable, per-document** String (e.g. the upload's sha) to make the output
|
|
786
|
+
safe to inline:
|
|
787
|
+
|
|
788
|
+
```ruby
|
|
789
|
+
SafeImage.sanitize_svg!("icon.svg", id_namespace: "u#{upload.sha1}")
|
|
790
|
+
```
|
|
791
|
+
|
|
792
|
+
With a namespace, the sanitizer:
|
|
793
|
+
|
|
794
|
+
- prefixes every `id` and every reference to it — `href`/`xlink:href` fragments,
|
|
795
|
+
`url(#…)` in attributes and CSS, and ARIA IDREF attributes (`aria-labelledby`,
|
|
796
|
+
`aria-describedby`, `aria-controls`, …) — so internal references stay intact
|
|
797
|
+
but cannot collide with host ids; and
|
|
798
|
+
- prefixes every `class` token (and the `.class` selectors that match them), so
|
|
799
|
+
an attacker can't invoke the host page's framework CSS — a bare
|
|
800
|
+
`class="modal fixed"` would otherwise pick up Bootstrap/Tailwind/app styles and
|
|
801
|
+
become an overlay. Internal class styling still matches because attribute and
|
|
802
|
+
selector are prefixed together; and
|
|
803
|
+
- scopes every `<style>` selector under a `<ns>-scope` class it adds to the root
|
|
804
|
+
`<svg>`, so `*` and type selectors only match that document's own content and
|
|
805
|
+
can never reach the host page; and
|
|
806
|
+
- rejects `var()`, `env()`, and `attr()` in presentation attributes — they
|
|
807
|
+
resolve against the host page (custom properties, environment) and could pull
|
|
808
|
+
in values, including a `url()`, the sanitizer never saw; and
|
|
809
|
+
- drops `overflow` from the root `<svg>` so it clips to its declared viewport — a
|
|
810
|
+
tiny `width`/`height` with `overflow:visible` and oversized content would
|
|
811
|
+
otherwise paint a full-page overlay. Inner elements keep `overflow` (markers
|
|
812
|
+
need it); the root clip bounds them.
|
|
813
|
+
|
|
814
|
+
Because every `<style>` selector is anchored *under* the scope class, a rule
|
|
815
|
+
targeting the root itself — `svg { … }`, `* { … }` intended to include the root,
|
|
816
|
+
or a class on the root such as `.icon { … }` for `<svg class="icon">` — matches
|
|
817
|
+
the root's descendants but not the root element. Root-level styling from a
|
|
818
|
+
`<style>` block therefore does not survive; style the root via attributes if you
|
|
819
|
+
need it. (This is rare in editor exports, which style the root with attributes
|
|
820
|
+
and inner elements with classes.)
|
|
821
|
+
|
|
822
|
+
`style=""` attributes never need selector scoping — a declaration list only
|
|
823
|
+
styles its own element — so they are not a cascade risk in either mode. They can
|
|
824
|
+
still carry `url(#…)` references, though, which are only namespaced when you pass
|
|
825
|
+
a String; so `:standalone` output (bare ids and references) is still not for
|
|
826
|
+
inline use. The transform is idempotent for a given namespace, so re-sanitising
|
|
827
|
+
is a no-op. Use a per-document value so two inlined SVGs on one page don't share
|
|
828
|
+
a namespace.
|
|
829
|
+
|
|
703
830
|
### Compatibility aliases
|
|
704
831
|
|
|
705
832
|
Two thin wrappers kept for callers migrating from existing upload pipelines:
|
|
@@ -834,11 +961,39 @@ a sandboxed command fails, the operation fails.
|
|
|
834
961
|
|
|
835
962
|
The sandbox grants read/write access only to the paths inferred from the
|
|
836
963
|
operation arguments, plus runtime/library paths and temporary directories needed
|
|
837
|
-
by Ruby, libvips, ImageMagick, and optimizer tools.
|
|
838
|
-
through the Landlock helper's seccomp layer. Worker processes inherit the
|
|
964
|
+
by Ruby, libvips, ImageMagick, and optimizer tools. Worker processes inherit the
|
|
839
965
|
parent's backend and pixel-ceiling configuration; landlock is forced off
|
|
840
966
|
inside the worker so sandboxed operations never nest.
|
|
841
967
|
|
|
968
|
+
Operations are served by a pool of resident **zygote** workers: each is a
|
|
969
|
+
fresh Ruby process that boots the gem once and then forks a child per
|
|
970
|
+
operation, so the ~85ms boot cost (Ruby + requires + libvips init) is paid
|
|
971
|
+
once per burst instead of per call — a warm sandboxed operation costs ~3–8ms
|
|
972
|
+
over the unsandboxed one. The pool grows on demand to
|
|
973
|
+
`SAFE_IMAGE_ZYGOTE_WORKERS` (default 8), so N threads run N sandboxed
|
|
974
|
+
operations concurrently (throughput scales near-linearly with cores until the
|
|
975
|
+
work itself saturates the CPU); offered concurrency past the cap blocks until
|
|
976
|
+
a worker frees, which also bounds how many libvips decodes run at once.
|
|
977
|
+
Idling is cheap (~16MB private memory per worker, zero CPU), so a worker
|
|
978
|
+
lingers for `Zygote::IDLE_SECONDS` (300) without work before exiting on its
|
|
979
|
+
own; the next operation boots a new one. Workers also exit immediately when
|
|
980
|
+
their parent process does, and `configure!` always retires the pool.
|
|
981
|
+
|
|
982
|
+
A zygote itself never touches untrusted bytes: each forked child first
|
|
983
|
+
applies rlimits, its per-operation Landlock policy (filesystem allowlist, all
|
|
984
|
+
TCP denied on Landlock ABI ≥ 4, abstract-unix-socket/signal scopes on
|
|
985
|
+
ABI ≥ 6), and — when the installed `landlock` gem exposes
|
|
986
|
+
`seccomp_deny_network!` — the helper's deny-all-network seccomp filter (which
|
|
987
|
+
blocks sockets of every family, closing the non-TCP/UDP gap the in-process
|
|
988
|
+
Landlock policy alone leaves open), and only then runs the operation. Forking
|
|
989
|
+
is sound because the zygote never runs operations itself — libvips is
|
|
990
|
+
initialised but quiescent (no native threads) at every fork.
|
|
991
|
+
|
|
992
|
+
`SAFE_IMAGE_ZYGOTE=0` falls back to the exec-per-operation worker (a fresh
|
|
993
|
+
sandboxed Ruby per call through the Landlock helper binary, whose seccomp
|
|
994
|
+
filter denies sockets of every family, no pool); `SAFE_IMAGE_ZYGOTE_WORKERS`
|
|
995
|
+
and `SAFE_IMAGE_ZYGOTE_IDLE_SECONDS` tune the pool cap and idle window.
|
|
996
|
+
|
|
842
997
|
## Development
|
|
843
998
|
|
|
844
999
|
```bash
|
|
@@ -182,7 +182,7 @@ module SafeImage
|
|
|
182
182
|
format = File.extname(output).delete_prefix(".").downcase
|
|
183
183
|
format = "jpg" if format == "jpeg"
|
|
184
184
|
return unless Processor::OPTIMIZABLE_OUTPUTS.include?(format)
|
|
185
|
-
Optimizer.optimize(output, mode: :lossless, strip_metadata: true, quality: quality)
|
|
185
|
+
Optimizer.optimize(output, mode: :lossless, strip_metadata: true, quality: quality, assume_upright: true)
|
|
186
186
|
end
|
|
187
187
|
|
|
188
188
|
# JPEG default when the caller passes no quality: matches what ImageMagick
|
|
@@ -264,17 +264,6 @@ module SafeImage
|
|
|
264
264
|
convert(from, to, format: "jpg", quality: quality, optimize: optimize, max_pixels: max_pixels, chroma_subsampling: chroma_subsampling)
|
|
265
265
|
end
|
|
266
266
|
|
|
267
|
-
# EXIF orientation values mapped onto jpegtran's lossless transforms.
|
|
268
|
-
JPEGTRAN_OPERATIONS = {
|
|
269
|
-
2 => ["-flip", "horizontal"],
|
|
270
|
-
3 => ["-rotate", "180"],
|
|
271
|
-
4 => ["-flip", "vertical"],
|
|
272
|
-
5 => ["-transpose"],
|
|
273
|
-
6 => ["-rotate", "90"],
|
|
274
|
-
7 => ["-transverse"],
|
|
275
|
-
8 => ["-rotate", "270"]
|
|
276
|
-
}.freeze
|
|
277
|
-
|
|
278
267
|
def fix_orientation(from, to = from, max_pixels: nil, quality: nil)
|
|
279
268
|
max_pixels = SafeImage.resolved_max_pixels(max_pixels)
|
|
280
269
|
output = PathSafety.ensure_safe_output_path!(to).to_s
|
|
@@ -323,7 +312,7 @@ module SafeImage
|
|
|
323
312
|
def jpegtran_fix_orientation(input, output, orient)
|
|
324
313
|
started = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
325
314
|
info = write_through_tempfile(output) do |tmp_path|
|
|
326
|
-
Runner.run!(["jpegtran", "-copy", "none", "-perfect", *JPEGTRAN_OPERATIONS.fetch(orient), "-outfile", tmp_path, input])
|
|
315
|
+
Runner.run!(["jpegtran", "-copy", "none", "-perfect", *Optimizer::JPEGTRAN_OPERATIONS.fetch(orient), "-outfile", tmp_path, input])
|
|
327
316
|
Native.probe(tmp_path)
|
|
328
317
|
end
|
|
329
318
|
result_from_info(
|
data/lib/safe_image/ico.rb
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
require "tempfile"
|
|
4
4
|
|
|
5
5
|
module SafeImage
|
|
6
|
-
# Pure-Ruby ICO container support, in the spirit of the
|
|
6
|
+
# Pure-Ruby ICO container support, in the spirit of the SVG metadata path:
|
|
7
7
|
# the directory and legacy DIB payloads are parsed in memory-safe Ruby with
|
|
8
8
|
# explicit bounds checks, and pixel encoding is delegated to the hardened
|
|
9
9
|
# native libvips helpers. ImageMagick is never involved.
|
data/lib/safe_image/native.rb
CHANGED
|
@@ -45,17 +45,18 @@ module SafeImage
|
|
|
45
45
|
out_format = output_format!(format)
|
|
46
46
|
|
|
47
47
|
VipsGlue.with_images do |track|
|
|
48
|
-
#
|
|
49
|
-
#
|
|
50
|
-
|
|
51
|
-
|
|
48
|
+
# Route through the explicit loader for the path's extension and keep
|
|
49
|
+
# the resulting image object for the resize. Do not call the generic
|
|
50
|
+
# filename-based `thumbnail` operation here: it re-sniffs the path and
|
|
51
|
+
# can observe different bytes if a hostile local path is replaced
|
|
52
|
+
# between the header check and the decode.
|
|
53
|
+
image, input_format = load_image(track, input, autorotate: true)
|
|
54
|
+
check_pixels!(image, max_pixels)
|
|
52
55
|
|
|
53
|
-
# Thumbnail from the file so libvips can shrink on load (e.g.
|
|
54
|
-
# libjpeg DCT downscaling); auto-rotates by default.
|
|
55
56
|
thumb = track.call(
|
|
56
57
|
VipsGlue.operation(
|
|
57
|
-
"
|
|
58
|
-
{
|
|
58
|
+
"thumbnail_image",
|
|
59
|
+
{ in: image, width: width, height: height,
|
|
59
60
|
size: "both", crop: "centre", fail_on: "error" }
|
|
60
61
|
)
|
|
61
62
|
)
|
|
@@ -75,7 +76,7 @@ module SafeImage
|
|
|
75
76
|
out_format = output_format!(format)
|
|
76
77
|
|
|
77
78
|
VipsGlue.with_images do |track|
|
|
78
|
-
image, input_format = load_image(track, String(input))
|
|
79
|
+
image, input_format = load_image(track, String(input), autorotate: true)
|
|
79
80
|
check_pixels!(image, max_pixels)
|
|
80
81
|
rotated = track.call(VipsGlue.operation("autorot", { in: image }))
|
|
81
82
|
resized = track.call(VipsGlue.operation("resize", { in: rotated, scale: scale }))
|
|
@@ -94,7 +95,7 @@ module SafeImage
|
|
|
94
95
|
out_format = output_format!(format)
|
|
95
96
|
|
|
96
97
|
VipsGlue.with_images do |track|
|
|
97
|
-
image, input_format = load_image(track, String(input))
|
|
98
|
+
image, input_format = load_image(track, String(input), autorotate: true)
|
|
98
99
|
check_pixels!(image, max_pixels)
|
|
99
100
|
rotated = track.call(VipsGlue.operation("autorot", { in: image }))
|
|
100
101
|
|
|
@@ -116,7 +117,7 @@ module SafeImage
|
|
|
116
117
|
out_format = output_format!(format)
|
|
117
118
|
|
|
118
119
|
VipsGlue.with_images do |track|
|
|
119
|
-
image, input_format = load_image(track, String(input))
|
|
120
|
+
image, input_format = load_image(track, String(input), autorotate: true)
|
|
120
121
|
check_pixels!(image, max_pixels)
|
|
121
122
|
rotated = track.call(VipsGlue.operation("autorot", { in: image }))
|
|
122
123
|
|
|
@@ -306,7 +307,7 @@ module SafeImage
|
|
|
306
307
|
raise ArgumentError, "quality must be 1..100" unless (1..100).cover?(quality)
|
|
307
308
|
end
|
|
308
309
|
|
|
309
|
-
def load_image(track, path)
|
|
310
|
+
def load_image(track, path, autorotate: false)
|
|
310
311
|
format = normalized_format(File.extname(path).delete_prefix("."))
|
|
311
312
|
raise UnsupportedFormatError, "unsupported input format" unless format
|
|
312
313
|
|
|
@@ -319,9 +320,17 @@ module SafeImage
|
|
|
319
320
|
raise UnsupportedFormatError, "this libvips build has no JPEG XL loader" unless VipsGlue.type_find?("jxlload")
|
|
320
321
|
end
|
|
321
322
|
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
323
|
+
options = { filename: path, access: "sequential", fail_on: "error" }
|
|
324
|
+
image = track.call(VipsGlue.operation(LOADERS.fetch(format), options))
|
|
325
|
+
|
|
326
|
+
# autorot flips/rotates pull rows out of input order, which a
|
|
327
|
+
# sequential source can only serve while the image fits its readahead
|
|
328
|
+
# window (~512px); larger oriented images fail with "out of order
|
|
329
|
+
# read". Reload those with random access — the open itself is
|
|
330
|
+
# header-only, so the caller's pixel cap still runs before any decode.
|
|
331
|
+
if autorotate && VipsGlue.orientation(image) > 1
|
|
332
|
+
image = track.call(VipsGlue.operation(LOADERS.fetch(format), options.merge(access: "random")))
|
|
333
|
+
end
|
|
325
334
|
[image, format]
|
|
326
335
|
end
|
|
327
336
|
|