pure_jpeg 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7a0015f811a2250264bfa73727aa37af15fee1af10e4897243045f2f4f54ae07
4
- data.tar.gz: 5085ab8c4bd1d9941c94e116b39ea7ca38f658fdf0e48523cae65fc913f765d0
3
+ metadata.gz: 7c252d678f3b8c916a2430b802569dc5099a19636359e9864637ed9fe01c90a3
4
+ data.tar.gz: 8feec524d8c41ca74be1e9bc47cf41b762cb14dc3500dc03c9240680d5877933
5
5
  SHA512:
6
- metadata.gz: b507962b2ec9650e743b365b8ace5ddb2a9c1d04de126206ac6062c9da46001dc3e8a1f04df356187b3a2458f3383625d864fe12ee0b3c5e3df589755aa540aa
7
- data.tar.gz: 7bf019ea4702bbd7379ad3a1d295acaadd47b185815461b810c08c33299248235b326e9d0cdcfbebe23646f32014487c55212517a9e8a246d4fd5d40891eb62f
6
+ metadata.gz: c8d26b4d4bed6da87f048036b06bc80eddd6c7ff8318032fb9cfccc15f5aea961e62fefe9965dde33c3d33cea5b611b4c2f6cb2397efa2f43effd8eb456fcf93
7
+ data.tar.gz: 63a9fe6bdf1136c8ea6c8e38a6b6641e6be2975fef4fea9394f0b48de22b01862996a87557b5c0bf7d7915020630f08c9285ebf8b8e8482f38f228c31b2d08e7
data/CHANGELOG.md CHANGED
@@ -1,5 +1,34 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.3
4
+
5
+ New features:
6
+
7
+ - `PureJPEG.from_chunky_png` accepts `background: [r, g, b]` to composite transparent PNG pixels before JPEG encoding
8
+
9
+ Fixes:
10
+
11
+ - Invalid `quality` and `chroma_quality` values now raise clear `ArgumentError`s
12
+ - `Image#[]` and `Image#[]=` now raise `IndexError` for out-of-bounds coordinates
13
+ - Hardened JFIF segment length validation for malformed JPEG input
14
+
15
+ ## 0.3.2
16
+
17
+ Performance:
18
+
19
+ - Replaced matrix-multiply float DCT with integer-scaled AAN (Arai-Agui-Nakajima) DCT from the IJG reference implementation -- all-integer, no Float allocations
20
+ - Fixed-point integer arithmetic for RGB/YCbCr color space conversion in both encoder and decoder
21
+ - Eliminated short-lived Array allocations in Huffman encoder (`category_and_bits` split into separate methods)
22
+ - `String#<<` with Integer instead of `byte.chr` to avoid String allocations in bit writer
23
+ - DCT inner loop unrolling to eliminate nested block invocations
24
+ - Unrolled `write_block` and `extract_block_into` inner loops
25
+ - Integer rounding division in quantization (no more Float division + round)
26
+ - Hoisted hash lookups and method calls out of per-pixel loops in decoder
27
+
28
+ Result: ~2.9x faster encode, ~4.6x faster decode on Ruby 4.0.2 with YJIT.
29
+
30
+ Credits: [Ufuk Kayserilioglu](https://github.com/paracycle)
31
+
3
32
  ## 0.3.1
4
33
 
5
34
  Fixes:
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  Convert PNG or other pixel data to JPEG. Or the other way! Implements baseline JPEG encoding (DCT, Huffman, 4:2:0 chroma subsampling) and decodes both baseline and progressive JPEGs. Exposes a variety of encoding options to adjust parts of the JPEG pipeline not normally available (I needed this to recreate the JPEG compression styles of older digital cameras - don't ask..)
8
8
 
9
- It works on CRuby 3.0+, TruffleRuby 33.0, and JRuby 10.0.
9
+ It works on CRuby 3.0+, TruffleRuby 33.0, and JRuby 10.0. There's *almost* 100% test coverage - I need to find some "broken" JPEGs to do the rest (hit me up if you have any sources..)
10
10
 
11
11
  > [!NOTE]
12
12
  > Rubyists might find the [AI Disclosure](#ai-disclosure) section below of interest.
@@ -23,7 +23,7 @@ gem "pure_jpeg"
23
23
  gem install pure_jpeg
24
24
  ```
25
25
 
26
- There are no runtime dependencies. [ChunkyPNG](https://github.com/wvanbergen/chunky_png) is optional (though quite useful) if you want to use `from_chunky_png`. I have a pure PNG encoder/decoder not far behind this that will ultimately plug in nicely too to get 100% pure Ruby graphical bliss ;-)
26
+ There are no runtime dependencies. [ChunkyPNG](https://github.com/wvanbergen/chunky_png) is optional (though quite useful) if you want to use `from_chunky_png`.
27
27
 
28
28
  `examples/` contains some useful example scripts for basic JPEG to PNG and PNG to JPEG conversion if you want to do some quick tests without writing code.
29
29
 
@@ -39,6 +39,13 @@ image = ChunkyPNG::Image.from_file("photo.png")
39
39
  PureJPEG.from_chunky_png(image, quality: 80).write("photo.jpg")
40
40
  ```
41
41
 
42
+ If you want transparent pixels composited against a solid color instead of
43
+ using the PNG's hidden RGB values, pass an RGB background:
44
+
45
+ ```ruby
46
+ PureJPEG.from_chunky_png(image, background: [255, 255, 255], quality: 80).write("photo.jpg")
47
+ ```
48
+
42
49
  ### From any pixel source
43
50
 
44
51
  PureJPEG accepts any object that responds to `width`, `height`, and `[x, y]` (returning an object with `.r`, `.g`, `.b` in 0-255):
@@ -99,7 +106,7 @@ Here's a quick example of sort of the "old digital camera" effect I was looking
99
106
  </tr>
100
107
  </table>
101
108
 
102
- And here's what happens when you convert a PNG with transparency — JPEG doesn't support alpha, so the hidden RGB data behind transparent pixels bleeds through:
109
+ And here's what happens when you convert a PNG with transparency without a `background:` — JPEG doesn't support alpha, so the hidden RGB data behind transparent pixels bleeds through:
103
110
 
104
111
  <table>
105
112
  <tr>
@@ -112,7 +119,7 @@ And here's what happens when you convert a PNG with transparency — JPEG doesn'
112
119
  </tr>
113
120
  </table>
114
121
 
115
- I consider this a feature but you may consider it a deficiency and that a default background of white should be applied. This may be something I'll add if anyone wants it!
122
+ Pass `background: [255, 255, 255]` to composite transparent pixels over white, or any other `[r, g, b]` color you prefer.
116
123
 
117
124
  Note that each stage of the JPEG pipeline is a separate module, so individual components (DCT, quantization, Huffman coding) can be replaced or extended independently which is kinda my plan here as I made this to play around with effects.
118
125
 
@@ -194,29 +201,81 @@ Decoding:
194
201
 
195
202
  Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
196
203
 
197
- Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
204
+ Possible future improvements: ICC profile rendering/conversion.
198
205
 
199
206
  ## Performance
200
207
 
201
- On a 1024x1024 image (Ruby 4.0.1 on my M5):
208
+ On a 1024x1024 image (Apple M5, 5 runs after warmup):
209
+
210
+ | Operation | CRuby 4.0.2 (YJIT) | TruffleRuby 33.0.1 |
211
+ |-----------|---------------------|---------------------|
212
+ | Encode (color, q85) | ~0.16s | ~0.08s |
213
+ | Decode (baseline) | ~0.14s | ~0.05s |
214
+ | Decode (progressive) | ~0.18s | ~0.09s |
202
215
 
203
- | Operation | Time |
204
- |-----------|------|
205
- | Encode (color, q85) | ~1.2s |
206
- | Decode (baseline) | ~1.2s |
207
- | Decode (progressive) | ~1.3s |
216
+ The encoder and decoder use an integer-scaled AAN (Arai-Agui-Nakajima) DCT with fixed-point arithmetic throughout — no Float operations in the hot path. Color space conversion uses fixed-point integer math, and pixel data is stored as packed integers to avoid per-pixel object allocation. TruffleRuby's Graal JIT compiler can optimize these tight integer loops particularly well, resulting in 2-3x faster performance once warmed up.
217
+
218
+ ## Example scripts
219
+
220
+ The `examples/` directory contains ready-to-run scripts. All accept JPEG or PNG input (PNG requires the `chunky_png` gem).
221
+
222
+ **`png_to_jpeg.rb`** -- Convert a PNG to JPEG.
223
+
224
+ ```
225
+ ruby examples/png_to_jpeg.rb [--grayscale] INPUT.png OUTPUT.jpg [quality]
226
+ ```
227
+
228
+ **`jpeg_to_png.rb`** -- Convert a JPEG to PNG.
229
+
230
+ ```
231
+ ruby examples/jpeg_to_png.rb INPUT.jpg OUTPUT.png
232
+ ```
233
+
234
+ **`kodak.rb`** -- Apply the "scrambled quantization" effect that recreates the gritty look of early digicams like the Casio QV-10. See `CREATIVE.md` for more on this.
235
+
236
+ ```
237
+ ruby examples/kodak.rb INPUT.(jpg|png) [OUTPUT.jpg] [quality]
238
+ ```
208
239
 
209
- Both the encoder and decoder use a separable DCT with a precomputed cosine matrix and reuse all per-block buffers to minimize GC pressure. Pixel data is stored as packed integers internally to avoid per-pixel object allocation.
240
+ **`lumacrush.rb`** -- Crush luminance while preserving chrominance. Produces a soft, oil-painting quality where detail is blocky but colors remain accurate.
241
+
242
+ ```
243
+ ruby examples/lumacrush.rb INPUT.(jpg|png) [OUTPUT.jpg] [luma_quality] [chroma_quality]
244
+ ```
245
+
246
+ **`chromacrush.rb`** -- The opposite: sharp detail with collapsed, blocky color patches.
247
+
248
+ ```
249
+ ruby examples/chromacrush.rb INPUT.(jpg|png) [OUTPUT.jpg] [luma_quality] [chroma_quality]
250
+ ```
251
+
252
+ **`loopy.rb`** -- Re-encode a JPEG through multiple passes to see how artifacts accumulate.
253
+
254
+ ```
255
+ ruby examples/loopy.rb INPUT.jpg [quality] [iterations]
256
+ ```
210
257
 
211
258
  ## Some useful `rake` tasks
212
259
 
213
260
  ```
214
261
  bundle install
215
262
  rake test # run the test suite
216
- rake benchmark # benchmark encoding and decoding (3 runs each)
263
+ rake benchmark # basic benchmark of encoding and decoding (5 runs after warmup)
217
264
  rake profile # CPU profile with StackProf (requires the stackprof gem)
218
265
  ```
219
266
 
267
+ ## Full benchmark script
268
+
269
+ `benchmark/run.rb` is a more thorough benchmark that exercises the encode and decode paths in several ways. It auto-enables YJIT, warms up before measuring, and reports object allocations, throughput (iterations/second via `benchmark-ips`), best-of-N wall-clock times, and a sustained mixed workload across encode (q85, q95 optimized, grayscale) and decode (baseline and progressive).
270
+
271
+ ```
272
+ ruby benchmark/run.rb # standard run
273
+ ruby benchmark/run.rb --quick # single-shot wall-clock + allocations
274
+ ruby benchmark/run.rb --full # longer run, includes YJIT runtime stats
275
+ ruby benchmark/run.rb --profile # CPU profile with Vernier (writes JSON to /tmp)
276
+ ruby benchmark/run.rb --profile-alloc # retained-object profile with Vernier
277
+ ```
278
+
220
279
  ## AI Disclosure
221
280
 
222
281
  **Claude Code did the majority of the work.** The math of JPEG encoding/decoding is beyond me, except 'getting it' at a high level. I understand it like I understand the engine in my car :-) *Later update: OpenAI Codex is also reviewing and adding features now. It feels stronger in many areas.*
@@ -233,6 +292,11 @@ rake profile # CPU profile with StackProf (requires the stackprof gem)
233
292
 
234
293
  **The final 10% still takes 90% of the time.** As mentioned above, the first run was quick, but getting things right has taken much longer. v0.1->0.2 has taken longer than 0.1 did! But we now have progressive JPEG support, even more optimizations, better tests, etc. etc.
235
294
 
295
+ ## Credits
296
+
297
+ - [Ufuk Kayserilioglu](https://github.com/paracycle) - Major performance optimizations including integer-scaled AAN DCT, fixed-point color space conversion, and YJIT-targeted improvements.
298
+ - [Keith R. Bennett](https://github.com/keithrbennett) - Coverage testing and adding SimpleCov to the project.
299
+
236
300
  ## License
237
301
 
238
302
  MIT
@@ -17,8 +17,8 @@ module PureJPEG
17
17
  while @bits_in_buffer >= 8
18
18
  @bits_in_buffer -= 8
19
19
  byte = (@buffer >> @bits_in_buffer) & 0xFF
20
- @data << byte.chr
21
- @data << "\x00".b if byte == 0xFF # byte stuffing
20
+ @data << byte
21
+ @data << 0x00 if byte == 0xFF # byte stuffing
22
22
  end
23
23
 
24
24
  @buffer &= (1 << @bits_in_buffer) - 1
data/lib/pure_jpeg/dct.rb CHANGED
@@ -1,10 +1,15 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
+ # Integer-scaled DCT based on the IJG (Independent JPEG Group) reference
5
+ # implementation (jfdctint.c / jidctint.c). Uses the Arai-Agui-Nakajima
6
+ # factorization with 13-bit fixed-point constants.
7
+ #
8
+ # All arithmetic is pure Integer (additions, shifts, multiplies) — no Float
9
+ # operations. This is ~3x faster than the matrix-multiply float DCT under
10
+ # YJIT and eliminates millions of Float object allocations during decode.
4
11
  module DCT
5
- # Precomputed 8x8 DCT matrix: A[k][n] = (C(k)/2) * cos((2n+1)*k*pi/16)
6
- # where C(0) = 1/sqrt(2), C(k) = 1 for k > 0.
7
- # This lets us do the 2D DCT as two 1D matrix-vector multiplies (separable).
12
+ # Keep the float matrix available for reference / testing
8
13
  MATRIX = Array.new(8) { |k|
9
14
  ck = k == 0 ? 0.5 / Math.sqrt(2.0) : 0.5
10
15
  Array.new(8) { |n|
@@ -12,72 +17,215 @@ module PureJPEG
12
17
  }
13
18
  }.freeze
14
19
 
15
- # Flatten for faster indexed access
16
20
  MATRIX_FLAT = MATRIX.flatten.freeze
17
-
18
- # Transposed matrix for inverse DCT: A^T[n][k] = A[k][n]
19
21
  MATRIX_T_FLAT = Array.new(64) { |i| MATRIX_FLAT[(i % 8) * 8 + i / 8] }.freeze
20
22
 
21
- # Separable forward 2D DCT: row pass then column pass.
22
- # Writes result into `out`. Uses `temp` as scratch space.
23
- # All three arrays must be pre-allocated with 64 elements.
24
- def self.forward!(block, temp, out)
25
- # Row pass: temp[y*8+u] = sum_x A[u][x] * block[y*8+x]
26
- m = MATRIX_FLAT
27
- 8.times do |y|
28
- y8 = y << 3
29
- b0 = block[y8]; b1 = block[y8|1]; b2 = block[y8|2]; b3 = block[y8|3]
30
- b4 = block[y8|4]; b5 = block[y8|5]; b6 = block[y8|6]; b7 = block[y8|7]
31
- 8.times do |u|
32
- u8 = u << 3
33
- temp[y8|u] = m[u8]*b0 + m[u8|1]*b1 + m[u8|2]*b2 + m[u8|3]*b3 +
34
- m[u8|4]*b4 + m[u8|5]*b5 + m[u8|6]*b6 + m[u8|7]*b7
35
- end
23
+ # Fixed-point constants (13-bit precision) from IJG reference.
24
+ CONST_BITS = 13
25
+ PASS1_BITS = 2
26
+
27
+ FIX_0_298631336 = 2446
28
+ FIX_0_390180644 = 3196
29
+ FIX_0_541196100 = 4433
30
+ FIX_0_765366865 = 6270
31
+ FIX_0_899976223 = 7373
32
+ FIX_1_175875602 = 9633
33
+ FIX_1_501321110 = 12299
34
+ FIX_1_847759065 = 15137
35
+ FIX_1_961570560 = 16069
36
+ FIX_2_053119869 = 16819
37
+ FIX_2_562915447 = 20995
38
+ FIX_3_072711026 = 25172
39
+
40
+ CB = CONST_BITS
41
+ P1 = PASS1_BITS
42
+ CB_M_P1 = CB - P1 # 11
43
+ CB_P_P1_P3 = CB + P1 + 3 # 18
44
+ P1_P3 = P1 + 3 # 5
45
+ CB2_P_P1 = CB * 2 + P1 # 28 (unused, was for column even-multiplied path)
46
+
47
+ # Forward 2D DCT (in-place). Input: 64-element array of level-shifted
48
+ # integers (-128..127). Output: DCT coefficients (integers).
49
+ # The `_temp` and `_out` parameters are accepted for API compatibility
50
+ # but ignored; computation is done in-place on `data`.
51
+ def self.forward!(data, _temp = nil, _out = nil)
52
+ # Pass 1: process rows
53
+ 8.times do |row|
54
+ i = row << 3
55
+ d0 = data[i]; d1 = data[i+1]; d2 = data[i+2]; d3 = data[i+3]
56
+ d4 = data[i+4]; d5 = data[i+5]; d6 = data[i+6]; d7 = data[i+7]
57
+
58
+ tmp0 = d0 + d7; tmp7 = d0 - d7
59
+ tmp1 = d1 + d6; tmp6 = d1 - d6
60
+ tmp2 = d2 + d5; tmp5 = d2 - d5
61
+ tmp3 = d3 + d4; tmp4 = d3 - d4
62
+
63
+ # Even part
64
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
65
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
66
+
67
+ data[i] = (tmp10 + tmp11) << P1
68
+ data[i+4] = (tmp10 - tmp11) << P1
69
+
70
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
71
+ data[i+2] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
72
+ data[i+6] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
73
+
74
+ # Odd part
75
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
76
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
77
+ z5 = (z3 + z4) * FIX_1_175875602
78
+
79
+ tmp4 = tmp4 * FIX_0_298631336
80
+ tmp5 = tmp5 * FIX_2_053119869
81
+ tmp6 = tmp6 * FIX_3_072711026
82
+ tmp7 = tmp7 * FIX_1_501321110
83
+ z1 = z1 * -FIX_0_899976223
84
+ z2 = z2 * -FIX_2_562915447
85
+ z3 = z3 * -FIX_1_961570560 + z5
86
+ z4 = z4 * -FIX_0_390180644 + z5
87
+
88
+ data[i+7] = (tmp4 + z1 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
89
+ data[i+5] = (tmp5 + z2 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
90
+ data[i+3] = (tmp6 + z2 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
91
+ data[i+1] = (tmp7 + z1 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
36
92
  end
37
93
 
38
- # Column pass: out[v*8+u] = sum_y A[v][y] * temp[y*8+u]
39
- 8.times do |u|
40
- t0 = temp[u]; t1 = temp[8|u]; t2 = temp[16|u]; t3 = temp[24|u]
41
- t4 = temp[32|u]; t5 = temp[40|u]; t6 = temp[48|u]; t7 = temp[56|u]
42
- 8.times do |v|
43
- v8 = v << 3
44
- out[v8|u] = m[v8]*t0 + m[v8|1]*t1 + m[v8|2]*t2 + m[v8|3]*t3 +
45
- m[v8|4]*t4 + m[v8|5]*t5 + m[v8|6]*t6 + m[v8|7]*t7
46
- end
94
+ # Pass 2: process columns
95
+ 8.times do |col|
96
+ d0 = data[col]; d1 = data[col+8]; d2 = data[col+16]; d3 = data[col+24]
97
+ d4 = data[col+32]; d5 = data[col+40]; d6 = data[col+48]; d7 = data[col+56]
98
+
99
+ tmp0 = d0 + d7; tmp7 = d0 - d7
100
+ tmp1 = d1 + d6; tmp6 = d1 - d6
101
+ tmp2 = d2 + d5; tmp5 = d2 - d5
102
+ tmp3 = d3 + d4; tmp4 = d3 - d4
103
+
104
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
105
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
106
+
107
+ data[col] = (tmp10 + tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
108
+ data[col+32] = (tmp10 - tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
109
+
110
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
111
+ data[col+16] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
112
+ data[col+48] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
113
+
114
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
115
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
116
+ z5 = (z3 + z4) * FIX_1_175875602
117
+
118
+ tmp4 = tmp4 * FIX_0_298631336
119
+ tmp5 = tmp5 * FIX_2_053119869
120
+ tmp6 = tmp6 * FIX_3_072711026
121
+ tmp7 = tmp7 * FIX_1_501321110
122
+ z1 = z1 * -FIX_0_899976223
123
+ z2 = z2 * -FIX_2_562915447
124
+ z3 = z3 * -FIX_1_961570560 + z5
125
+ z4 = z4 * -FIX_0_390180644 + z5
126
+
127
+ data[col+56] = (tmp4 + z1 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
128
+ data[col+40] = (tmp5 + z2 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
129
+ data[col+24] = (tmp6 + z2 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
130
+ data[col+8] = (tmp7 + z1 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
47
131
  end
48
132
 
49
- out
133
+ data
50
134
  end
51
135
 
52
- # Separable inverse 2D DCT: same structure as forward but using A^T.
53
- # f = A^T * F * A
54
- def self.inverse!(block, temp, out)
55
- mt = MATRIX_T_FLAT
56
-
57
- # Row pass: temp[v*8+x] = sum_u A^T[x][u] * block[v*8+u]
58
- 8.times do |v|
59
- v8 = v << 3
60
- b0 = block[v8]; b1 = block[v8|1]; b2 = block[v8|2]; b3 = block[v8|3]
61
- b4 = block[v8|4]; b5 = block[v8|5]; b6 = block[v8|6]; b7 = block[v8|7]
62
- 8.times do |x|
63
- x8 = x << 3
64
- temp[v8|x] = mt[x8]*b0 + mt[x8|1]*b1 + mt[x8|2]*b2 + mt[x8|3]*b3 +
65
- mt[x8|4]*b4 + mt[x8|5]*b5 + mt[x8|6]*b6 + mt[x8|7]*b7
66
- end
136
+ # Inverse 2D DCT (in-place). Input: dequantized DCT coefficients (integers).
137
+ # Output: spatial-domain values (integers) that still need +128 level shift.
138
+ def self.inverse!(data, _temp = nil, _out = nil)
139
+ # Pass 1: process columns
140
+ 8.times do |col|
141
+ d0 = data[col]; d2 = data[col+16]; d4 = data[col+32]; d6 = data[col+48]
142
+ d1 = data[col+8]; d3 = data[col+24]; d5 = data[col+40]; d7 = data[col+56]
143
+
144
+ # Even part
145
+ z1 = (d2 + d6) * FIX_0_541196100
146
+ tmp2 = z1 - d6 * FIX_1_847759065
147
+ tmp3 = z1 + d2 * FIX_0_765366865
148
+
149
+ tmp0 = (d0 + d4) << CB
150
+ tmp1 = (d0 - d4) << CB
151
+
152
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
153
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
154
+
155
+ # Odd part
156
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
157
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
158
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
159
+ z5 = (z3 + z4) * FIX_1_175875602
160
+
161
+ tmp0 = tmp0 * FIX_0_298631336
162
+ tmp1 = tmp1 * FIX_2_053119869
163
+ tmp2 = tmp2 * FIX_3_072711026
164
+ tmp3 = tmp3 * FIX_1_501321110
165
+ z1 = z1 * -FIX_0_899976223
166
+ z2 = z2 * -FIX_2_562915447
167
+ z3 = z3 * -FIX_1_961570560 + z5
168
+ z4 = z4 * -FIX_0_390180644 + z5
169
+
170
+ tmp0 += z1 + z3; tmp1 += z2 + z4
171
+ tmp2 += z2 + z3; tmp3 += z1 + z4
172
+
173
+ data[col] = (tmp10 + tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
174
+ data[col+56] = (tmp10 - tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
175
+ data[col+8] = (tmp11 + tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
176
+ data[col+48] = (tmp11 - tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
177
+ data[col+16] = (tmp12 + tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
178
+ data[col+40] = (tmp12 - tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
179
+ data[col+24] = (tmp13 + tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
180
+ data[col+32] = (tmp13 - tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
67
181
  end
68
182
 
69
- # Column pass: out[y*8+x] = sum_v A^T[y][v] * temp[v*8+x]
70
- 8.times do |x|
71
- t0 = temp[x]; t1 = temp[8|x]; t2 = temp[16|x]; t3 = temp[24|x]
72
- t4 = temp[32|x]; t5 = temp[40|x]; t6 = temp[48|x]; t7 = temp[56|x]
73
- 8.times do |y|
74
- y8 = y << 3
75
- out[y8|x] = mt[y8]*t0 + mt[y8|1]*t1 + mt[y8|2]*t2 + mt[y8|3]*t3 +
76
- mt[y8|4]*t4 + mt[y8|5]*t5 + mt[y8|6]*t6 + mt[y8|7]*t7
77
- end
183
+ # Pass 2: process rows
184
+ 8.times do |row|
185
+ i = row << 3
186
+ d0 = data[i]; d2 = data[i+2]; d4 = data[i+4]; d6 = data[i+6]
187
+ d1 = data[i+1]; d3 = data[i+3]; d5 = data[i+5]; d7 = data[i+7]
188
+
189
+ # Even part
190
+ z1 = (d2 + d6) * FIX_0_541196100
191
+ tmp2 = z1 - d6 * FIX_1_847759065
192
+ tmp3 = z1 + d2 * FIX_0_765366865
193
+
194
+ tmp0 = (d0 + d4) << CB
195
+ tmp1 = (d0 - d4) << CB
196
+
197
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
198
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
199
+
200
+ # Odd part
201
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
202
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
203
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
204
+ z5 = (z3 + z4) * FIX_1_175875602
205
+
206
+ tmp0 = tmp0 * FIX_0_298631336
207
+ tmp1 = tmp1 * FIX_2_053119869
208
+ tmp2 = tmp2 * FIX_3_072711026
209
+ tmp3 = tmp3 * FIX_1_501321110
210
+ z1 = z1 * -FIX_0_899976223
211
+ z2 = z2 * -FIX_2_562915447
212
+ z3 = z3 * -FIX_1_961570560 + z5
213
+ z4 = z4 * -FIX_0_390180644 + z5
214
+
215
+ tmp0 += z1 + z3; tmp1 += z2 + z4
216
+ tmp2 += z2 + z3; tmp3 += z1 + z4
217
+
218
+ data[i] = (tmp10 + tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
219
+ data[i+7] = (tmp10 - tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
220
+ data[i+1] = (tmp11 + tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
221
+ data[i+6] = (tmp11 - tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
222
+ data[i+2] = (tmp12 + tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
223
+ data[i+5] = (tmp12 - tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
224
+ data[i+3] = (tmp13 + tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
225
+ data[i+4] = (tmp13 - tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
78
226
  end
79
227
 
80
- out
228
+ data
81
229
  end
82
230
  end
83
231
  end
@@ -78,10 +78,8 @@ module PureJPEG
78
78
 
79
79
  # Reusable buffers
80
80
  zigzag = Array.new(64, 0)
81
- raster = Array.new(64, 0.0)
82
- dequant = Array.new(64, 0.0)
83
- temp = Array.new(64, 0.0)
84
- spatial = Array.new(64, 0.0)
81
+ raster = Array.new(64, 0)
82
+ dequant = Array.new(64, 0)
85
83
 
86
84
  mcus_y.times do |mcu_row|
87
85
  mcus_x.times do |mcu_col|
@@ -104,12 +102,12 @@ module PureJPEG
104
102
  # Inverse pipeline: unzigzag -> dequantize -> IDCT -> level shift
105
103
  Zigzag.unreorder!(zigzag, raster)
106
104
  Quantization.dequantize!(raster, qt, dequant)
107
- DCT.inverse!(dequant, temp, spatial)
105
+ DCT.inverse!(dequant)
108
106
 
109
107
  # Write block into channel buffer
110
108
  bx = (mcu_col * comp.h_sampling + bh) * 8
111
109
  by = (mcu_row * comp.v_sampling + bv) * 8
112
- write_block(spatial, ch[:data], ch[:width], bx, by)
110
+ write_block(dequant, ch[:data], ch[:width], bx, by)
113
111
  end
114
112
  end
115
113
  end
@@ -204,10 +202,8 @@ module PureJPEG
204
202
  end
205
203
 
206
204
  zigzag = Array.new(64, 0)
207
- raster = Array.new(64, 0.0)
208
- dequant = Array.new(64, 0.0)
209
- temp = Array.new(64, 0.0)
210
- spatial = Array.new(64, 0.0)
205
+ raster = Array.new(64, 0)
206
+ dequant = Array.new(64, 0)
211
207
 
212
208
  jfif.components.each do |c|
213
209
  qt = fetch_quant_table!(jfif, c)
@@ -222,8 +218,8 @@ module PureJPEG
222
218
 
223
219
  Zigzag.unreorder!(zigzag, raster)
224
220
  Quantization.dequantize!(raster, qt, dequant)
225
- DCT.inverse!(dequant, temp, spatial)
226
- write_block(spatial, ch[:data], ch[:width], block_x * 8, block_y * 8)
221
+ DCT.inverse!(dequant)
222
+ write_block(dequant, ch[:data], ch[:width], block_x * 8, block_y * 8)
227
223
  end
228
224
  end
229
225
  end
@@ -460,12 +456,16 @@ module PureJPEG
460
456
  # Write an 8x8 spatial block (level-shifted by +128) into a channel buffer.
461
457
  def write_block(spatial, channel, ch_width, bx, by)
462
458
  8.times do |row|
463
- dst_row = (by + row) * ch_width + bx
464
- row8 = row << 3
465
- 8.times do |col|
466
- val = (spatial[row8 | col] + 128.0).round
467
- channel[dst_row + col] = val < 0 ? 0 : (val > 255 ? 255 : val)
468
- end
459
+ dst = (by + row) * ch_width + bx
460
+ r8 = row << 3
461
+ v = spatial[r8] + 128; channel[dst] = v < 0 ? 0 : (v > 255 ? 255 : v)
462
+ v = spatial[r8 | 1] + 128; channel[dst + 1] = v < 0 ? 0 : (v > 255 ? 255 : v)
463
+ v = spatial[r8 | 2] + 128; channel[dst + 2] = v < 0 ? 0 : (v > 255 ? 255 : v)
464
+ v = spatial[r8 | 3] + 128; channel[dst + 3] = v < 0 ? 0 : (v > 255 ? 255 : v)
465
+ v = spatial[r8 | 4] + 128; channel[dst + 4] = v < 0 ? 0 : (v > 255 ? 255 : v)
466
+ v = spatial[r8 | 5] + 128; channel[dst + 5] = v < 0 ? 0 : (v > 255 ? 255 : v)
467
+ v = spatial[r8 | 6] + 128; channel[dst + 6] = v < 0 ? 0 : (v > 255 ? 255 : v)
468
+ v = spatial[r8 | 7] + 128; channel[dst + 7] = v < 0 ? 0 : (v > 255 ? 255 : v)
469
469
  end
470
470
  end
471
471
 
@@ -493,18 +493,27 @@ module PureJPEG
493
493
 
494
494
  def assemble_grayscale(width, height, channels, comp)
495
495
  ch = channels[comp.id]
496
+ ch_data = ch[:data]
497
+ ch_width = ch[:width]
496
498
  pixels = Array.new(width * height)
497
499
  height.times do |y|
498
- src_row = y * ch[:width]
500
+ src_row = y * ch_width
499
501
  dst_row = y * width
500
502
  width.times do |x|
501
- v = ch[:data][src_row + x]
503
+ v = ch_data[src_row + x]
502
504
  pixels[dst_row + x] = (v << 16) | (v << 8) | v
503
505
  end
504
506
  end
505
507
  Image.new(width, height, pixels, icc_profile: @icc_profile)
506
508
  end
507
509
 
510
+ # Fixed-point coefficients (scaled by 2^16) for YCbCr→RGB.
511
+ FP_R_CR = 91881 # 1.402 * 65536
512
+ FP_G_CB = -22554 # -0.344136 * 65536
513
+ FP_G_CR = -46802 # -0.714136 * 65536
514
+ FP_B_CB = 116130 # 1.772 * 65536
515
+ FP_HALF = 32768 # rounding bias
516
+
508
517
  def assemble_color(width, height, channels, components, max_h, max_v)
509
518
  # Upsample chroma channels if needed and convert YCbCr to RGB
510
519
  y_comp, cb_comp, cr_comp = resolve_color_components(components)
@@ -513,29 +522,39 @@ module PureJPEG
513
522
  cb_ch = channels[cb_comp.id]
514
523
  cr_ch = channels[cr_comp.id]
515
524
 
525
+ y_data = y_ch[:data]
526
+ cb_data = cb_ch[:data]
527
+ cr_data = cr_ch[:data]
528
+ y_stride = y_ch[:width]
529
+ cb_stride = cb_ch[:width]
530
+ cr_stride = cr_ch[:width]
531
+ cb_h = cb_comp.h_sampling
532
+ cb_v = cb_comp.v_sampling
533
+ cr_h = cr_comp.h_sampling
534
+ cr_v = cr_comp.v_sampling
535
+
516
536
  pixels = Array.new(width * height)
517
537
 
518
538
  height.times do |py|
519
539
  dst_row = py * width
520
- y_row = py * y_ch[:width]
540
+ y_row = py * y_stride
521
541
 
522
542
  # Chroma coordinates (nearest-neighbor upsampling)
523
- cb_y = (py * cb_comp.v_sampling) / max_v
524
- cr_y = (py * cr_comp.v_sampling) / max_v
525
- cb_row = cb_y * cb_ch[:width]
526
- cr_row = cr_y * cr_ch[:width]
543
+ cb_row = ((py * cb_v) / max_v) * cb_stride
544
+ cr_row = ((py * cr_v) / max_v) * cr_stride
527
545
 
528
546
  width.times do |px|
529
- lum = y_ch[:data][y_row + px]
547
+ lum = y_data[y_row + px]
530
548
 
531
- cb_x = (px * cb_comp.h_sampling) / max_h
532
- cr_x = (px * cr_comp.h_sampling) / max_h
533
- cb = cb_ch[:data][cb_row + cb_x] - 128.0
534
- cr = cr_ch[:data][cr_row + cr_x] - 128.0
549
+ cb_x = (px * cb_h) / max_h
550
+ cr_x = (px * cr_h) / max_h
551
+ cb_val = cb_data[cb_row + cb_x] - 128
552
+ cr_val = cr_data[cr_row + cr_x] - 128
535
553
 
536
- r = (lum + 1.402 * cr).round
537
- g = (lum - 0.344136 * cb - 0.714136 * cr).round
538
- b = (lum + 1.772 * cb).round
554
+ # Fixed-point YCbCr→RGB (all integer arithmetic)
555
+ r = lum + ((FP_R_CR * cr_val + FP_HALF) >> 16)
556
+ g = lum + ((FP_G_CB * cb_val + FP_G_CR * cr_val + FP_HALF) >> 16)
557
+ b = lum + ((FP_B_CB * cb_val + FP_HALF) >> 16)
539
558
 
540
559
  r = r < 0 ? 0 : (r > 255 ? 255 : r)
541
560
  g = g < 0 ? 0 : (g > 255 ? 255 : g)
@@ -42,6 +42,9 @@ module PureJPEG
42
42
  luminance_table: nil, chrominance_table: nil,
43
43
  quantization_modifier: nil, scramble_quantization: false,
44
44
  optimize_huffman: false)
45
+ validate_quality!(quality, "quality")
46
+ validate_quality!(chroma_quality, "chroma_quality") if chroma_quality
47
+
45
48
  @source = source
46
49
  @quality = quality
47
50
  @grayscale = grayscale
@@ -92,6 +95,12 @@ module PureJPEG
92
95
  modified
93
96
  end
94
97
 
98
+ def validate_quality!(value, name)
99
+ unless value.is_a?(Integer) && value.between?(1, 100)
100
+ raise ArgumentError, "#{name} must be an integer between 1 and 100"
101
+ end
102
+ end
103
+
95
104
  def validate_qtable!(table, name)
96
105
  unless table.respond_to?(:length) && table.respond_to?(:all?)
97
106
  raise ArgumentError, "#{name} must be a 64-element array of integers between 1 and 255"
@@ -205,17 +214,14 @@ module PureJPEG
205
214
  padded_w = (width + 7) & ~7
206
215
  padded_h = (height + 7) & ~7
207
216
 
208
- block = Array.new(64, 0.0)
209
- temp = Array.new(64, 0.0)
210
- dct = Array.new(64, 0.0)
217
+ block = Array.new(64, 0)
211
218
  qbuf = Array.new(64, 0)
212
219
  zbuf = Array.new(64, 0)
213
220
 
214
221
  (0...padded_h).step(8) do |by|
215
222
  (0...padded_w).step(8) do |bx|
216
223
  extract_block_into(y_data, width, height, bx, by, block)
217
- transform_block(block, temp, dct, qbuf, zbuf, qtable)
218
- yield zbuf
224
+ yield transform_block(block, qbuf, zbuf, qtable)
219
225
  end
220
226
  end
221
227
  end
@@ -278,37 +284,29 @@ module PureJPEG
278
284
  mcu_w = (width + 15) & ~15
279
285
  mcu_h = (height + 15) & ~15
280
286
 
281
- block = Array.new(64, 0.0)
282
- temp = Array.new(64, 0.0)
283
- dct = Array.new(64, 0.0)
287
+ block = Array.new(64, 0)
284
288
  qbuf = Array.new(64, 0)
285
289
  zbuf = Array.new(64, 0)
286
290
 
287
291
  (0...mcu_h).step(16) do |my|
288
292
  (0...mcu_w).step(16) do |mx|
289
293
  extract_block_into(y_data, width, height, mx, my, block)
290
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
291
- yield :y, zbuf
294
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
292
295
 
293
296
  extract_block_into(y_data, width, height, mx + 8, my, block)
294
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
295
- yield :y, zbuf
297
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
296
298
 
297
299
  extract_block_into(y_data, width, height, mx, my + 8, block)
298
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
299
- yield :y, zbuf
300
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
300
301
 
301
302
  extract_block_into(y_data, width, height, mx + 8, my + 8, block)
302
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
303
- yield :y, zbuf
303
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
304
304
 
305
305
  extract_block_into(cb_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
306
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
307
- yield :cb, zbuf
306
+ yield :cb, transform_block(block, qbuf, zbuf, chr_qt)
308
307
 
309
308
  extract_block_into(cr_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
310
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
311
- yield :cr, zbuf
309
+ yield :cr, transform_block(block, qbuf, zbuf, chr_qt)
312
310
  end
313
311
  end
314
312
  end
@@ -333,9 +331,9 @@ module PureJPEG
333
331
 
334
332
  # --- Shared block pipeline (all buffers pre-allocated) ---
335
333
 
336
- def transform_block(block, temp, dct, qbuf, zbuf, qtable)
337
- DCT.forward!(block, temp, dct)
338
- Quantization.quantize!(dct, qtable, qbuf)
334
+ def transform_block(block, qbuf, zbuf, qtable)
335
+ DCT.forward!(block)
336
+ Quantization.quantize!(block, qtable, qbuf)
339
337
  Zigzag.reorder!(qbuf, zbuf)
340
338
  zbuf
341
339
  end
@@ -352,26 +350,42 @@ module PureJPEG
352
350
  end
353
351
  end
354
352
 
353
+ # Fixed-point coefficients (scaled by 2^16 = 65536) for RGB→YCbCr.
354
+ # Y = 0.299*R + 0.587*G + 0.114*B
355
+ # Cb = -0.168736*R - 0.331264*G + 0.5*B + 128
356
+ # Cr = 0.5*R - 0.418688*G - 0.081312*B + 128
357
+ FP_Y_R = 19595; FP_Y_G = 38470; FP_Y_B = 7471
358
+ FP_CB_R = -11058; FP_CB_G = -21710; FP_CB_B = 32768
359
+ FP_CR_R = 32768; FP_CR_G = -27440; FP_CR_B = -5328
360
+ FP_HALF = 32768 # rounding bias
361
+ FP_128 = 8388608 # 128 << 16
362
+
363
+ def clamp255(v)
364
+ v < 0 ? 0 : (v > 255 ? 255 : v)
365
+ end
366
+
355
367
  def extract_luminance(width, height)
356
368
  luminance = Array.new(width * height)
357
369
  if source.respond_to?(:packed_pixels)
358
370
  packed = source.packed_pixels
359
371
  r_shift, g_shift, b_shift = packed_shifts
372
+ n = width * height
360
373
  i = 0
361
- (width * height).times do
374
+ n.times do
362
375
  color = packed[i]
363
376
  r = (color >> r_shift) & 0xFF
364
377
  g = (color >> g_shift) & 0xFF
365
378
  b = (color >> b_shift) & 0xFF
366
- luminance[i] = (0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
379
+ luminance[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
367
380
  i += 1
368
381
  end
369
382
  else
370
- height.times do |y|
371
- row = y * width
372
- width.times do |x|
373
- pixel = source[x, y]
374
- luminance[row + x] = (0.299 * pixel.r + 0.587 * pixel.g + 0.114 * pixel.b).round.clamp(0, 255)
383
+ height.times do |py|
384
+ row = py * width
385
+ width.times do |px|
386
+ pixel = source[px, py]
387
+ r = pixel.r; g = pixel.g; b = pixel.b
388
+ luminance[row + px] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
375
389
  end
376
390
  end
377
391
  end
@@ -393,9 +407,9 @@ module PureJPEG
393
407
  r = (color >> r_shift) & 0xFF
394
408
  g = (color >> g_shift) & 0xFF
395
409
  b = (color >> b_shift) & 0xFF
396
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
397
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
398
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
410
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
411
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
412
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
399
413
  i += 1
400
414
  end
401
415
  else
@@ -405,9 +419,9 @@ module PureJPEG
405
419
  pixel = source[px, py]
406
420
  r = pixel.r; g = pixel.g; b = pixel.b
407
421
  i = row + px
408
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
409
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
410
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
422
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
423
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
424
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
411
425
  end
412
426
  end
413
427
  end
@@ -442,13 +456,16 @@ module PureJPEG
442
456
  8.times do |row|
443
457
  sy = by + row
444
458
  sy = max_y if sy > max_y
445
- src_row = sy * width
446
- row8 = row << 3
447
- 8.times do |col|
448
- sx = bx + col
449
- sx = max_x if sx > max_x
450
- block[row8 | col] = channel[src_row + sx] - 128.0
451
- end
459
+ src = sy * width
460
+ r8 = row << 3
461
+ x = bx; block[r8] = channel[src + (x > max_x ? max_x : x)] - 128
462
+ x = bx + 1; block[r8 | 1] = channel[src + (x > max_x ? max_x : x)] - 128
463
+ x = bx + 2; block[r8 | 2] = channel[src + (x > max_x ? max_x : x)] - 128
464
+ x = bx + 3; block[r8 | 3] = channel[src + (x > max_x ? max_x : x)] - 128
465
+ x = bx + 4; block[r8 | 4] = channel[src + (x > max_x ? max_x : x)] - 128
466
+ x = bx + 5; block[r8 | 5] = channel[src + (x > max_x ? max_x : x)] - 128
467
+ x = bx + 6; block[r8 | 6] = channel[src + (x > max_x ? max_x : x)] - 128
468
+ x = bx + 7; block[r8 | 7] = channel[src + (x > max_x ? max_x : x)] - 128
452
469
  end
453
470
  block
454
471
  end
@@ -3,17 +3,22 @@
3
3
  module PureJPEG
4
4
  module Huffman
5
5
  class Encoder
6
- def self.category_and_bits(value)
7
- return [0, 0] if value == 0
8
- abs_val = value.abs
6
+ # Return the Huffman category (bit length) for a value.
7
+ # Avoids Array allocation compared to the combined category_and_bits.
8
+ def self.category(value)
9
+ return 0 if value == 0
10
+ v = value.abs
9
11
  cat = 0
10
- v = abs_val
11
12
  while v > 0
12
13
  cat += 1
13
14
  v >>= 1
14
15
  end
15
- bits = value > 0 ? value : value + (1 << cat) - 1
16
- [cat, bits]
16
+ cat
17
+ end
18
+
19
+ # Return the extra bits to encode for a value with the given category.
20
+ def self.value_bits(value, cat)
21
+ value > 0 ? value : value + (1 << cat) - 1
17
22
  end
18
23
 
19
24
  def self.each_ac_item(zigzag)
@@ -39,7 +44,7 @@ module PureJPEG
39
44
  end
40
45
 
41
46
  value = zigzag[i]
42
- cat, = category_and_bits(value)
47
+ cat = category(value)
43
48
  yield (run << 4) | cat, value
44
49
  i += 1
45
50
  end
@@ -73,10 +78,10 @@ module PureJPEG
73
78
  private
74
79
 
75
80
  def encode_dc(diff, writer)
76
- cat, bits = self.class.category_and_bits(diff)
81
+ cat = self.class.category(diff)
77
82
  code, length = @dc_table[cat]
78
83
  writer.write_bits(code, length)
79
- writer.write_bits(bits, cat) if cat > 0
84
+ writer.write_bits(self.class.value_bits(diff, cat), cat) if cat > 0
80
85
  end
81
86
 
82
87
  def encode_ac(zigzag, writer)
@@ -85,8 +90,8 @@ module PureJPEG
85
90
  writer.write_bits(code, length)
86
91
  next if symbol == 0x00 || symbol == 0xF0
87
92
 
88
- cat, bits = self.class.category_and_bits(value)
89
- writer.write_bits(bits, cat)
93
+ cat = self.class.category(value)
94
+ writer.write_bits(self.class.value_bits(value, cat), cat)
90
95
  end
91
96
  end
92
97
  end
@@ -104,7 +109,7 @@ module PureJPEG
104
109
  diff = zigzag[0] - @prev_dc[state_key]
105
110
  @prev_dc[state_key] = zigzag[0]
106
111
 
107
- cat, = Encoder.category_and_bits(diff)
112
+ cat = Encoder.category(diff)
108
113
  @dc_frequencies[cat] += 1
109
114
 
110
115
  Encoder.each_ac_symbol(zigzag) do |symbol|
@@ -38,6 +38,7 @@ module PureJPEG
38
38
  # @param y [Integer] row (0-based)
39
39
  # @return [Source::Pixel] pixel with +.r+, +.g+, +.b+ in 0-255
40
40
  def [](x, y)
41
+ validate_coordinates!(x, y)
41
42
  color = @packed_pixels[y * @width + x]
42
43
  Source::Pixel.new((color >> 16) & 0xFF, (color >> 8) & 0xFF, color & 0xFF)
43
44
  end
@@ -49,6 +50,7 @@ module PureJPEG
49
50
  # @param pixel [Source::Pixel] replacement pixel
50
51
  # @return [Source::Pixel]
51
52
  def []=(x, y, pixel)
53
+ validate_coordinates!(x, y)
52
54
  @packed_pixels[y * @width + x] = (pixel.r << 16) | (pixel.g << 8) | pixel.b
53
55
  pixel
54
56
  end
@@ -85,5 +87,13 @@ module PureJPEG
85
87
  end
86
88
  end
87
89
  end
90
+
91
+ private
92
+
93
+ def validate_coordinates!(x, y)
94
+ unless x.is_a?(Integer) && y.is_a?(Integer) && x >= 0 && y >= 0 && x < @width && y < @height
95
+ raise IndexError, "Pixel coordinate out of bounds: #{x.inspect}, #{y.inspect}"
96
+ end
97
+ end
88
98
  end
89
99
  end
@@ -101,10 +101,9 @@ module PureJPEG
101
101
  ICC_PROFILE_SIG = "ICC_PROFILE\0".b
102
102
 
103
103
  def parse_app2
104
- length = read_u16
105
- end_pos = @pos + length - 2
104
+ end_pos = read_segment_end
106
105
 
107
- if length >= 16 && @data[@pos, 12] == ICC_PROFILE_SIG
106
+ if (end_pos - @pos) >= 14 && @data[@pos, 12] == ICC_PROFILE_SIG
108
107
  @pos += 12
109
108
  seq_no = read_byte
110
109
  _total = read_byte
@@ -121,18 +120,18 @@ module PureJPEG
121
120
  end
122
121
 
123
122
  def skip_segment
124
- length = read_u16
125
- @pos += length - 2
123
+ @pos = read_segment_end
126
124
  end
127
125
 
128
126
  def parse_dqt
129
- length = read_u16
130
- end_pos = @pos + length - 2
127
+ end_pos = read_segment_end
131
128
 
132
129
  while @pos < end_pos
133
130
  info = read_byte
134
131
  precision = (info >> 4) & 0x0F # 0 = 8-bit, 1 = 16-bit
135
132
  table_id = info & 0x0F
133
+ bytes_per_value = precision == 0 ? 1 : 2
134
+ ensure_remaining_in_segment!(end_pos, 64 * bytes_per_value, "DQT")
136
135
 
137
136
  zigzag_table = Array.new(64)
138
137
  64.times do |i|
@@ -146,16 +145,17 @@ module PureJPEG
146
145
  end
147
146
 
148
147
  def parse_dht
149
- length = read_u16
150
- end_pos = @pos + length - 2
148
+ end_pos = read_segment_end
151
149
 
152
150
  while @pos < end_pos
151
+ ensure_remaining_in_segment!(end_pos, 17, "DHT")
153
152
  info = read_byte
154
153
  table_class = (info >> 4) & 0x0F # 0 = DC, 1 = AC
155
154
  table_id = info & 0x0F
156
155
 
157
156
  bits = Array.new(16) { read_byte }
158
157
  total = bits.sum
158
+ ensure_remaining_in_segment!(end_pos, total, "DHT")
159
159
  values = Array.new(total) { read_byte }
160
160
 
161
161
  @huffman_tables[[table_class, table_id]] = { bits: bits, values: values }
@@ -163,11 +163,13 @@ module PureJPEG
163
163
  end
164
164
 
165
165
  def parse_sof0
166
- read_u16 # length
166
+ end_pos = read_segment_end
167
+ ensure_remaining_in_segment!(end_pos, 6, "SOF")
167
168
  read_byte # precision (always 8 for baseline)
168
169
  @height = read_u16
169
170
  @width = read_u16
170
171
  num_components = read_byte
172
+ ensure_remaining_in_segment!(end_pos, num_components * 3, "SOF")
171
173
 
172
174
  @components = Array.new(num_components) do
173
175
  id = read_byte
@@ -177,11 +179,14 @@ module PureJPEG
177
179
  qt_id = read_byte
178
180
  Component.new(id, h, v, qt_id)
179
181
  end
182
+ @pos = end_pos
180
183
  end
181
184
 
182
185
  def parse_sos
183
- read_u16 # length
186
+ end_pos = read_segment_end
187
+ ensure_remaining_in_segment!(end_pos, 1, "SOS")
184
188
  num_components = read_byte
189
+ ensure_remaining_in_segment!(end_pos, num_components * 2 + 3, "SOS")
185
190
 
186
191
  components = Array.new(num_components) do
187
192
  id = read_byte
@@ -196,12 +201,33 @@ module PureJPEG
196
201
  ahl = read_byte # successive approximation
197
202
  ah = (ahl >> 4) & 0x0F
198
203
  al = ahl & 0x0F
204
+ @pos = end_pos
199
205
  Scan.new(components, ss, se, ah, al, nil)
200
206
  end
201
207
 
202
208
  def parse_dri
203
- read_u16 # length
209
+ end_pos = read_segment_end
210
+ ensure_remaining_in_segment!(end_pos, 2, "DRI")
204
211
  @restart_interval = read_u16
212
+ @pos = end_pos
213
+ end
214
+
215
+ def read_segment_end
216
+ length = read_u16
217
+ raise PureJPEG::DecodeError, "Invalid JPEG segment length: #{length}" if length < 2
218
+
219
+ end_pos = @pos + length - 2
220
+ if end_pos > @data.bytesize
221
+ raise PureJPEG::DecodeError, "JPEG segment length exceeds available data"
222
+ end
223
+
224
+ end_pos
225
+ end
226
+
227
+ def ensure_remaining_in_segment!(end_pos, byte_count, segment_name)
228
+ return if @pos + byte_count <= end_pos
229
+
230
+ raise PureJPEG::DecodeError, "Truncated #{segment_name} segment"
205
231
  end
206
232
 
207
233
  # Extract entropy-coded scan data (everything from current position to EOI marker).
@@ -36,9 +36,20 @@ module PureJPEG
36
36
  }
37
37
  end
38
38
 
39
- # Quantize a 64-element DCT block in place into `out`.
39
+ # Quantize a 64-element DCT block into `out`.
40
+ # Uses integer rounding division (round-to-nearest) to match the
41
+ # behavior of Float division + round from the previous float DCT.
40
42
  def self.quantize!(block, table, out)
41
- 64.times { |i| out[i] = (block[i] / table[i]).round }
43
+ i = 0
44
+ while i < 64
45
+ v = block[i]; t = table[i]
46
+ out[i] = if v >= 0
47
+ (v + (t >> 1)) / t
48
+ else
49
+ -((-v + (t >> 1)) / t)
50
+ end
51
+ i += 1
52
+ end
42
53
  out
43
54
  end
44
55
 
@@ -18,10 +18,16 @@ module PureJPEG
18
18
  attr_reader :height
19
19
 
20
20
  # @param image [ChunkyPNG::Image] the source PNG image
21
- def initialize(image)
21
+ # @param background [Array<Integer>, nil] optional [r, g, b] background
22
+ # color to composite transparent pixels against before encoding
23
+ def initialize(image, background: nil)
22
24
  @width = image.width
23
25
  @height = image.height
24
- @packed_pixels = image.pixels
26
+ @packed_pixels = if background.nil?
27
+ image.pixels
28
+ else
29
+ composite_pixels(image.pixels, background)
30
+ end
25
31
  end
26
32
 
27
33
  # @return [Array<Integer>] flat row-major array of packed RGBA integers
@@ -40,6 +46,37 @@ module PureJPEG
40
46
  (color >> 8) & 0xFF
41
47
  )
42
48
  end
49
+
50
+ private
51
+
52
+ def composite_pixels(pixels, background)
53
+ bg_r, bg_g, bg_b = validate_background!(background)
54
+
55
+ pixels.map do |color|
56
+ alpha = color & 0xFF
57
+ next color if alpha == 255
58
+
59
+ src_r = (color >> 24) & 0xFF
60
+ src_g = (color >> 16) & 0xFF
61
+ src_b = (color >> 8) & 0xFF
62
+ inv_alpha = 255 - alpha
63
+
64
+ r = ((src_r * alpha) + (bg_r * inv_alpha) + 127) / 255
65
+ g = ((src_g * alpha) + (bg_g * inv_alpha) + 127) / 255
66
+ b = ((src_b * alpha) + (bg_b * inv_alpha) + 127) / 255
67
+
68
+ (r << 24) | (g << 16) | (b << 8) | 255
69
+ end
70
+ end
71
+
72
+ def validate_background!(background)
73
+ unless background.respond_to?(:length) && background.length == 3 &&
74
+ background.all? { |v| v.is_a?(Integer) && v.between?(0, 255) }
75
+ raise ArgumentError, "background must be an [r, g, b] array of integers between 0 and 255"
76
+ end
77
+
78
+ background
79
+ end
43
80
  end
44
81
  end
45
82
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
- VERSION = "0.3.1"
4
+ VERSION = "0.3.3"
5
5
  end
data/lib/pure_jpeg.rb CHANGED
@@ -50,10 +50,12 @@ module PureJPEG
50
50
  # and passes it to {.encode}.
51
51
  #
52
52
  # @param image [ChunkyPNG::Image] the source image
53
+ # @param background [Array<Integer>, nil] optional [r, g, b] background
54
+ # color to composite transparent pixels against before encoding
53
55
  # @param opts [Hash] encoding options passed to {Encoder#initialize}
54
56
  # @return [Encoder]
55
- def self.from_chunky_png(image, **opts)
56
- source = Source::ChunkyPNGSource.new(image)
57
+ def self.from_chunky_png(image, background: nil, **opts)
58
+ source = Source::ChunkyPNGSource.new(image, background: background)
57
59
  Encoder.new(source, **opts)
58
60
  end
59
61
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pure_jpeg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Cooper
@@ -86,7 +86,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
86
86
  - !ruby/object:Gem::Version
87
87
  version: '0'
88
88
  requirements: []
89
- rubygems_version: 4.0.3
89
+ rubygems_version: 4.0.6
90
90
  specification_version: 4
91
91
  summary: Pure Ruby JPEG encoder and decoder
92
92
  test_files: []