pure_jpeg 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7a0015f811a2250264bfa73727aa37af15fee1af10e4897243045f2f4f54ae07
4
- data.tar.gz: 5085ab8c4bd1d9941c94e116b39ea7ca38f658fdf0e48523cae65fc913f765d0
3
+ metadata.gz: 780f932b176fecdb5daab2546909fe2610325e54f7364cf482e3e2652ab614a5
4
+ data.tar.gz: 1fbc350f25d09b989ee6262e077bb36847c6637db325fb03c29f2083bb0ec973
5
5
  SHA512:
6
- metadata.gz: b507962b2ec9650e743b365b8ace5ddb2a9c1d04de126206ac6062c9da46001dc3e8a1f04df356187b3a2458f3383625d864fe12ee0b3c5e3df589755aa540aa
7
- data.tar.gz: 7bf019ea4702bbd7379ad3a1d295acaadd47b185815461b810c08c33299248235b326e9d0cdcfbebe23646f32014487c55212517a9e8a246d4fd5d40891eb62f
6
+ metadata.gz: a385db40804bf4a992d78253ba619e90b147aa5be7808b341398918f5f36c3593fbd1a979aaad9729ad874f2427fe15d31fc9a0413e5f7ea98ee44e43e125f2d
7
+ data.tar.gz: b9f5581f4c4f27f42460961b3b4231fd94b45be130fbdce74a28bc4888f2a2f3de08fd43b9e3efc782ccb6b4f260aa8484ff7794e7bde1362018dc3deb282b26
data/CHANGELOG.md CHANGED
@@ -1,5 +1,22 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.2
4
+
5
+ Performance:
6
+
7
+ - Replaced matrix-multiply float DCT with integer-scaled AAN (Arai-Agui-Nakajima) DCT from the IJG reference implementation -- all-integer, no Float allocations
8
+ - Fixed-point integer arithmetic for RGB/YCbCr color space conversion in both encoder and decoder
9
+ - Eliminated short-lived Array allocations in Huffman encoder (`category_and_bits` split into separate methods)
10
+ - `String#<<` with Integer instead of `byte.chr` to avoid String allocations in bit writer
11
+ - DCT inner loop unrolling to eliminate nested block invocations
12
+ - Unrolled `write_block` and `extract_block_into` inner loops
13
+ - Integer rounding division in quantization (no more Float division + round)
14
+ - Hoisted hash lookups and method calls out of per-pixel loops in decoder
15
+
16
+ Result: ~2.9x faster encode, ~4.6x faster decode on Ruby 4.0.2 with YJIT.
17
+
18
+ Credits: [Ufuk Kayserilioglu](https://github.com/paracycle)
19
+
3
20
  ## 0.3.1
4
21
 
5
22
  Fixes:
data/README.md CHANGED
@@ -194,19 +194,19 @@ Decoding:
194
194
 
195
195
  Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
196
196
 
197
- Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
197
+ Possible future improvements: ICC profile rendering/conversion.
198
198
 
199
199
  ## Performance
200
200
 
201
- On a 1024x1024 image (Ruby 4.0.1 on my M5):
201
+ On a 1024x1024 image (Ruby 4.0.2 with YJIT on an M5):
202
202
 
203
203
  | Operation | Time |
204
204
  |-----------|------|
205
- | Encode (color, q85) | ~1.2s |
206
- | Decode (baseline) | ~1.2s |
207
- | Decode (progressive) | ~1.3s |
205
+ | Encode (color, q85) | ~0.16s |
206
+ | Decode (baseline) | ~0.14s |
207
+ | Decode (progressive) | ~0.18s |
208
208
 
209
- Both the encoder and decoder use a separable DCT with a precomputed cosine matrix and reuse all per-block buffers to minimize GC pressure. Pixel data is stored as packed integers internally to avoid per-pixel object allocation.
209
+ The encoder and decoder use an integer-scaled AAN (Arai-Agui-Nakajima) DCT with fixed-point arithmetic throughout no Float operations in the hot path. Color space conversion uses fixed-point integer math, and pixel data is stored as packed integers to avoid per-pixel object allocation.
210
210
 
211
211
  ## Some useful `rake` tasks
212
212
 
@@ -233,6 +233,10 @@ rake profile # CPU profile with StackProf (requires the stackprof gem)
233
233
 
234
234
  **The final 10% still takes 90% of the time.** As mentioned above, the first run was quick, but getting things right has taken much longer. v0.1->0.2 has taken longer than 0.1 did! But we now have progressive JPEG support, even more optimizations, better tests, etc. etc.
235
235
 
236
+ ## Credits
237
+
238
+ - [Ufuk Kayserilioglu](https://github.com/paracycle) - Major performance optimizations including integer-scaled AAN DCT, fixed-point color space conversion, and YJIT-targeted improvements.
239
+
236
240
  ## License
237
241
 
238
242
  MIT
@@ -17,8 +17,8 @@ module PureJPEG
17
17
  while @bits_in_buffer >= 8
18
18
  @bits_in_buffer -= 8
19
19
  byte = (@buffer >> @bits_in_buffer) & 0xFF
20
- @data << byte.chr
21
- @data << "\x00".b if byte == 0xFF # byte stuffing
20
+ @data << byte
21
+ @data << 0x00 if byte == 0xFF # byte stuffing
22
22
  end
23
23
 
24
24
  @buffer &= (1 << @bits_in_buffer) - 1
data/lib/pure_jpeg/dct.rb CHANGED
@@ -1,10 +1,15 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
+ # Integer-scaled DCT based on the IJG (Independent JPEG Group) reference
5
+ # implementation (jfdctint.c / jidctint.c). Uses the Arai-Agui-Nakajima
6
+ # factorization with 13-bit fixed-point constants.
7
+ #
8
+ # All arithmetic is pure Integer (additions, shifts, multiplies) — no Float
9
+ # operations. This is ~3x faster than the matrix-multiply float DCT under
10
+ # YJIT and eliminates millions of Float object allocations during decode.
4
11
  module DCT
5
- # Precomputed 8x8 DCT matrix: A[k][n] = (C(k)/2) * cos((2n+1)*k*pi/16)
6
- # where C(0) = 1/sqrt(2), C(k) = 1 for k > 0.
7
- # This lets us do the 2D DCT as two 1D matrix-vector multiplies (separable).
12
+ # Keep the float matrix available for reference / testing
8
13
  MATRIX = Array.new(8) { |k|
9
14
  ck = k == 0 ? 0.5 / Math.sqrt(2.0) : 0.5
10
15
  Array.new(8) { |n|
@@ -12,72 +17,215 @@ module PureJPEG
12
17
  }
13
18
  }.freeze
14
19
 
15
- # Flatten for faster indexed access
16
20
  MATRIX_FLAT = MATRIX.flatten.freeze
17
-
18
- # Transposed matrix for inverse DCT: A^T[n][k] = A[k][n]
19
21
  MATRIX_T_FLAT = Array.new(64) { |i| MATRIX_FLAT[(i % 8) * 8 + i / 8] }.freeze
20
22
 
21
- # Separable forward 2D DCT: row pass then column pass.
22
- # Writes result into `out`. Uses `temp` as scratch space.
23
- # All three arrays must be pre-allocated with 64 elements.
24
- def self.forward!(block, temp, out)
25
- # Row pass: temp[y*8+u] = sum_x A[u][x] * block[y*8+x]
26
- m = MATRIX_FLAT
27
- 8.times do |y|
28
- y8 = y << 3
29
- b0 = block[y8]; b1 = block[y8|1]; b2 = block[y8|2]; b3 = block[y8|3]
30
- b4 = block[y8|4]; b5 = block[y8|5]; b6 = block[y8|6]; b7 = block[y8|7]
31
- 8.times do |u|
32
- u8 = u << 3
33
- temp[y8|u] = m[u8]*b0 + m[u8|1]*b1 + m[u8|2]*b2 + m[u8|3]*b3 +
34
- m[u8|4]*b4 + m[u8|5]*b5 + m[u8|6]*b6 + m[u8|7]*b7
35
- end
23
+ # Fixed-point constants (13-bit precision) from IJG reference.
24
+ CONST_BITS = 13
25
+ PASS1_BITS = 2
26
+
27
+ FIX_0_298631336 = 2446
28
+ FIX_0_390180644 = 3196
29
+ FIX_0_541196100 = 4433
30
+ FIX_0_765366865 = 6270
31
+ FIX_0_899976223 = 7373
32
+ FIX_1_175875602 = 9633
33
+ FIX_1_501321110 = 12299
34
+ FIX_1_847759065 = 15137
35
+ FIX_1_961570560 = 16069
36
+ FIX_2_053119869 = 16819
37
+ FIX_2_562915447 = 20995
38
+ FIX_3_072711026 = 25172
39
+
40
+ CB = CONST_BITS
41
+ P1 = PASS1_BITS
42
+ CB_M_P1 = CB - P1 # 11
43
+ CB_P_P1_P3 = CB + P1 + 3 # 18
44
+ P1_P3 = P1 + 3 # 5
45
+ CB2_P_P1 = CB * 2 + P1 # 28 (unused, was for column even-multiplied path)
46
+
47
+ # Forward 2D DCT (in-place). Input: 64-element array of level-shifted
48
+ # integers (-128..127). Output: DCT coefficients (integers).
49
+ # The `_temp` and `_out` parameters are accepted for API compatibility
50
+ # but ignored; computation is done in-place on `data`.
51
+ def self.forward!(data, _temp = nil, _out = nil)
52
+ # Pass 1: process rows
53
+ 8.times do |row|
54
+ i = row << 3
55
+ d0 = data[i]; d1 = data[i+1]; d2 = data[i+2]; d3 = data[i+3]
56
+ d4 = data[i+4]; d5 = data[i+5]; d6 = data[i+6]; d7 = data[i+7]
57
+
58
+ tmp0 = d0 + d7; tmp7 = d0 - d7
59
+ tmp1 = d1 + d6; tmp6 = d1 - d6
60
+ tmp2 = d2 + d5; tmp5 = d2 - d5
61
+ tmp3 = d3 + d4; tmp4 = d3 - d4
62
+
63
+ # Even part
64
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
65
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
66
+
67
+ data[i] = (tmp10 + tmp11) << P1
68
+ data[i+4] = (tmp10 - tmp11) << P1
69
+
70
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
71
+ data[i+2] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
72
+ data[i+6] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
73
+
74
+ # Odd part
75
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
76
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
77
+ z5 = (z3 + z4) * FIX_1_175875602
78
+
79
+ tmp4 = tmp4 * FIX_0_298631336
80
+ tmp5 = tmp5 * FIX_2_053119869
81
+ tmp6 = tmp6 * FIX_3_072711026
82
+ tmp7 = tmp7 * FIX_1_501321110
83
+ z1 = z1 * -FIX_0_899976223
84
+ z2 = z2 * -FIX_2_562915447
85
+ z3 = z3 * -FIX_1_961570560 + z5
86
+ z4 = z4 * -FIX_0_390180644 + z5
87
+
88
+ data[i+7] = (tmp4 + z1 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
89
+ data[i+5] = (tmp5 + z2 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
90
+ data[i+3] = (tmp6 + z2 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
91
+ data[i+1] = (tmp7 + z1 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
36
92
  end
37
93
 
38
- # Column pass: out[v*8+u] = sum_y A[v][y] * temp[y*8+u]
39
- 8.times do |u|
40
- t0 = temp[u]; t1 = temp[8|u]; t2 = temp[16|u]; t3 = temp[24|u]
41
- t4 = temp[32|u]; t5 = temp[40|u]; t6 = temp[48|u]; t7 = temp[56|u]
42
- 8.times do |v|
43
- v8 = v << 3
44
- out[v8|u] = m[v8]*t0 + m[v8|1]*t1 + m[v8|2]*t2 + m[v8|3]*t3 +
45
- m[v8|4]*t4 + m[v8|5]*t5 + m[v8|6]*t6 + m[v8|7]*t7
46
- end
94
+ # Pass 2: process columns
95
+ 8.times do |col|
96
+ d0 = data[col]; d1 = data[col+8]; d2 = data[col+16]; d3 = data[col+24]
97
+ d4 = data[col+32]; d5 = data[col+40]; d6 = data[col+48]; d7 = data[col+56]
98
+
99
+ tmp0 = d0 + d7; tmp7 = d0 - d7
100
+ tmp1 = d1 + d6; tmp6 = d1 - d6
101
+ tmp2 = d2 + d5; tmp5 = d2 - d5
102
+ tmp3 = d3 + d4; tmp4 = d3 - d4
103
+
104
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
105
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
106
+
107
+ data[col] = (tmp10 + tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
108
+ data[col+32] = (tmp10 - tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
109
+
110
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
111
+ data[col+16] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
112
+ data[col+48] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
113
+
114
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
115
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
116
+ z5 = (z3 + z4) * FIX_1_175875602
117
+
118
+ tmp4 = tmp4 * FIX_0_298631336
119
+ tmp5 = tmp5 * FIX_2_053119869
120
+ tmp6 = tmp6 * FIX_3_072711026
121
+ tmp7 = tmp7 * FIX_1_501321110
122
+ z1 = z1 * -FIX_0_899976223
123
+ z2 = z2 * -FIX_2_562915447
124
+ z3 = z3 * -FIX_1_961570560 + z5
125
+ z4 = z4 * -FIX_0_390180644 + z5
126
+
127
+ data[col+56] = (tmp4 + z1 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
128
+ data[col+40] = (tmp5 + z2 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
129
+ data[col+24] = (tmp6 + z2 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
130
+ data[col+8] = (tmp7 + z1 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
47
131
  end
48
132
 
49
- out
133
+ data
50
134
  end
51
135
 
52
- # Separable inverse 2D DCT: same structure as forward but using A^T.
53
- # f = A^T * F * A
54
- def self.inverse!(block, temp, out)
55
- mt = MATRIX_T_FLAT
56
-
57
- # Row pass: temp[v*8+x] = sum_u A^T[x][u] * block[v*8+u]
58
- 8.times do |v|
59
- v8 = v << 3
60
- b0 = block[v8]; b1 = block[v8|1]; b2 = block[v8|2]; b3 = block[v8|3]
61
- b4 = block[v8|4]; b5 = block[v8|5]; b6 = block[v8|6]; b7 = block[v8|7]
62
- 8.times do |x|
63
- x8 = x << 3
64
- temp[v8|x] = mt[x8]*b0 + mt[x8|1]*b1 + mt[x8|2]*b2 + mt[x8|3]*b3 +
65
- mt[x8|4]*b4 + mt[x8|5]*b5 + mt[x8|6]*b6 + mt[x8|7]*b7
66
- end
136
+ # Inverse 2D DCT (in-place). Input: dequantized DCT coefficients (integers).
137
+ # Output: spatial-domain values (integers) that still need +128 level shift.
138
+ def self.inverse!(data, _temp = nil, _out = nil)
139
+ # Pass 1: process columns
140
+ 8.times do |col|
141
+ d0 = data[col]; d2 = data[col+16]; d4 = data[col+32]; d6 = data[col+48]
142
+ d1 = data[col+8]; d3 = data[col+24]; d5 = data[col+40]; d7 = data[col+56]
143
+
144
+ # Even part
145
+ z1 = (d2 + d6) * FIX_0_541196100
146
+ tmp2 = z1 - d6 * FIX_1_847759065
147
+ tmp3 = z1 + d2 * FIX_0_765366865
148
+
149
+ tmp0 = (d0 + d4) << CB
150
+ tmp1 = (d0 - d4) << CB
151
+
152
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
153
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
154
+
155
+ # Odd part
156
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
157
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
158
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
159
+ z5 = (z3 + z4) * FIX_1_175875602
160
+
161
+ tmp0 = tmp0 * FIX_0_298631336
162
+ tmp1 = tmp1 * FIX_2_053119869
163
+ tmp2 = tmp2 * FIX_3_072711026
164
+ tmp3 = tmp3 * FIX_1_501321110
165
+ z1 = z1 * -FIX_0_899976223
166
+ z2 = z2 * -FIX_2_562915447
167
+ z3 = z3 * -FIX_1_961570560 + z5
168
+ z4 = z4 * -FIX_0_390180644 + z5
169
+
170
+ tmp0 += z1 + z3; tmp1 += z2 + z4
171
+ tmp2 += z2 + z3; tmp3 += z1 + z4
172
+
173
+ data[col] = (tmp10 + tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
174
+ data[col+56] = (tmp10 - tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
175
+ data[col+8] = (tmp11 + tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
176
+ data[col+48] = (tmp11 - tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
177
+ data[col+16] = (tmp12 + tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
178
+ data[col+40] = (tmp12 - tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
179
+ data[col+24] = (tmp13 + tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
180
+ data[col+32] = (tmp13 - tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
67
181
  end
68
182
 
69
- # Column pass: out[y*8+x] = sum_v A^T[y][v] * temp[v*8+x]
70
- 8.times do |x|
71
- t0 = temp[x]; t1 = temp[8|x]; t2 = temp[16|x]; t3 = temp[24|x]
72
- t4 = temp[32|x]; t5 = temp[40|x]; t6 = temp[48|x]; t7 = temp[56|x]
73
- 8.times do |y|
74
- y8 = y << 3
75
- out[y8|x] = mt[y8]*t0 + mt[y8|1]*t1 + mt[y8|2]*t2 + mt[y8|3]*t3 +
76
- mt[y8|4]*t4 + mt[y8|5]*t5 + mt[y8|6]*t6 + mt[y8|7]*t7
77
- end
183
+ # Pass 2: process rows
184
+ 8.times do |row|
185
+ i = row << 3
186
+ d0 = data[i]; d2 = data[i+2]; d4 = data[i+4]; d6 = data[i+6]
187
+ d1 = data[i+1]; d3 = data[i+3]; d5 = data[i+5]; d7 = data[i+7]
188
+
189
+ # Even part
190
+ z1 = (d2 + d6) * FIX_0_541196100
191
+ tmp2 = z1 - d6 * FIX_1_847759065
192
+ tmp3 = z1 + d2 * FIX_0_765366865
193
+
194
+ tmp0 = (d0 + d4) << CB
195
+ tmp1 = (d0 - d4) << CB
196
+
197
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
198
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
199
+
200
+ # Odd part
201
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
202
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
203
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
204
+ z5 = (z3 + z4) * FIX_1_175875602
205
+
206
+ tmp0 = tmp0 * FIX_0_298631336
207
+ tmp1 = tmp1 * FIX_2_053119869
208
+ tmp2 = tmp2 * FIX_3_072711026
209
+ tmp3 = tmp3 * FIX_1_501321110
210
+ z1 = z1 * -FIX_0_899976223
211
+ z2 = z2 * -FIX_2_562915447
212
+ z3 = z3 * -FIX_1_961570560 + z5
213
+ z4 = z4 * -FIX_0_390180644 + z5
214
+
215
+ tmp0 += z1 + z3; tmp1 += z2 + z4
216
+ tmp2 += z2 + z3; tmp3 += z1 + z4
217
+
218
+ data[i] = (tmp10 + tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
219
+ data[i+7] = (tmp10 - tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
220
+ data[i+1] = (tmp11 + tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
221
+ data[i+6] = (tmp11 - tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
222
+ data[i+2] = (tmp12 + tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
223
+ data[i+5] = (tmp12 - tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
224
+ data[i+3] = (tmp13 + tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
225
+ data[i+4] = (tmp13 - tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
78
226
  end
79
227
 
80
- out
228
+ data
81
229
  end
82
230
  end
83
231
  end
@@ -78,10 +78,8 @@ module PureJPEG
78
78
 
79
79
  # Reusable buffers
80
80
  zigzag = Array.new(64, 0)
81
- raster = Array.new(64, 0.0)
82
- dequant = Array.new(64, 0.0)
83
- temp = Array.new(64, 0.0)
84
- spatial = Array.new(64, 0.0)
81
+ raster = Array.new(64, 0)
82
+ dequant = Array.new(64, 0)
85
83
 
86
84
  mcus_y.times do |mcu_row|
87
85
  mcus_x.times do |mcu_col|
@@ -104,12 +102,12 @@ module PureJPEG
104
102
  # Inverse pipeline: unzigzag -> dequantize -> IDCT -> level shift
105
103
  Zigzag.unreorder!(zigzag, raster)
106
104
  Quantization.dequantize!(raster, qt, dequant)
107
- DCT.inverse!(dequant, temp, spatial)
105
+ DCT.inverse!(dequant)
108
106
 
109
107
  # Write block into channel buffer
110
108
  bx = (mcu_col * comp.h_sampling + bh) * 8
111
109
  by = (mcu_row * comp.v_sampling + bv) * 8
112
- write_block(spatial, ch[:data], ch[:width], bx, by)
110
+ write_block(dequant, ch[:data], ch[:width], bx, by)
113
111
  end
114
112
  end
115
113
  end
@@ -204,10 +202,8 @@ module PureJPEG
204
202
  end
205
203
 
206
204
  zigzag = Array.new(64, 0)
207
- raster = Array.new(64, 0.0)
208
- dequant = Array.new(64, 0.0)
209
- temp = Array.new(64, 0.0)
210
- spatial = Array.new(64, 0.0)
205
+ raster = Array.new(64, 0)
206
+ dequant = Array.new(64, 0)
211
207
 
212
208
  jfif.components.each do |c|
213
209
  qt = fetch_quant_table!(jfif, c)
@@ -222,8 +218,8 @@ module PureJPEG
222
218
 
223
219
  Zigzag.unreorder!(zigzag, raster)
224
220
  Quantization.dequantize!(raster, qt, dequant)
225
- DCT.inverse!(dequant, temp, spatial)
226
- write_block(spatial, ch[:data], ch[:width], block_x * 8, block_y * 8)
221
+ DCT.inverse!(dequant)
222
+ write_block(dequant, ch[:data], ch[:width], block_x * 8, block_y * 8)
227
223
  end
228
224
  end
229
225
  end
@@ -460,12 +456,16 @@ module PureJPEG
460
456
  # Write an 8x8 spatial block (level-shifted by +128) into a channel buffer.
461
457
  def write_block(spatial, channel, ch_width, bx, by)
462
458
  8.times do |row|
463
- dst_row = (by + row) * ch_width + bx
464
- row8 = row << 3
465
- 8.times do |col|
466
- val = (spatial[row8 | col] + 128.0).round
467
- channel[dst_row + col] = val < 0 ? 0 : (val > 255 ? 255 : val)
468
- end
459
+ dst = (by + row) * ch_width + bx
460
+ r8 = row << 3
461
+ v = spatial[r8] + 128; channel[dst] = v < 0 ? 0 : (v > 255 ? 255 : v)
462
+ v = spatial[r8 | 1] + 128; channel[dst + 1] = v < 0 ? 0 : (v > 255 ? 255 : v)
463
+ v = spatial[r8 | 2] + 128; channel[dst + 2] = v < 0 ? 0 : (v > 255 ? 255 : v)
464
+ v = spatial[r8 | 3] + 128; channel[dst + 3] = v < 0 ? 0 : (v > 255 ? 255 : v)
465
+ v = spatial[r8 | 4] + 128; channel[dst + 4] = v < 0 ? 0 : (v > 255 ? 255 : v)
466
+ v = spatial[r8 | 5] + 128; channel[dst + 5] = v < 0 ? 0 : (v > 255 ? 255 : v)
467
+ v = spatial[r8 | 6] + 128; channel[dst + 6] = v < 0 ? 0 : (v > 255 ? 255 : v)
468
+ v = spatial[r8 | 7] + 128; channel[dst + 7] = v < 0 ? 0 : (v > 255 ? 255 : v)
469
469
  end
470
470
  end
471
471
 
@@ -493,18 +493,27 @@ module PureJPEG
493
493
 
494
494
  def assemble_grayscale(width, height, channels, comp)
495
495
  ch = channels[comp.id]
496
+ ch_data = ch[:data]
497
+ ch_width = ch[:width]
496
498
  pixels = Array.new(width * height)
497
499
  height.times do |y|
498
- src_row = y * ch[:width]
500
+ src_row = y * ch_width
499
501
  dst_row = y * width
500
502
  width.times do |x|
501
- v = ch[:data][src_row + x]
503
+ v = ch_data[src_row + x]
502
504
  pixels[dst_row + x] = (v << 16) | (v << 8) | v
503
505
  end
504
506
  end
505
507
  Image.new(width, height, pixels, icc_profile: @icc_profile)
506
508
  end
507
509
 
510
+ # Fixed-point coefficients (scaled by 2^16) for YCbCr→RGB.
511
+ FP_R_CR = 91881 # 1.402 * 65536
512
+ FP_G_CB = -22554 # -0.344136 * 65536
513
+ FP_G_CR = -46802 # -0.714136 * 65536
514
+ FP_B_CB = 116130 # 1.772 * 65536
515
+ FP_HALF = 32768 # rounding bias
516
+
508
517
  def assemble_color(width, height, channels, components, max_h, max_v)
509
518
  # Upsample chroma channels if needed and convert YCbCr to RGB
510
519
  y_comp, cb_comp, cr_comp = resolve_color_components(components)
@@ -513,29 +522,39 @@ module PureJPEG
513
522
  cb_ch = channels[cb_comp.id]
514
523
  cr_ch = channels[cr_comp.id]
515
524
 
525
+ y_data = y_ch[:data]
526
+ cb_data = cb_ch[:data]
527
+ cr_data = cr_ch[:data]
528
+ y_stride = y_ch[:width]
529
+ cb_stride = cb_ch[:width]
530
+ cr_stride = cr_ch[:width]
531
+ cb_h = cb_comp.h_sampling
532
+ cb_v = cb_comp.v_sampling
533
+ cr_h = cr_comp.h_sampling
534
+ cr_v = cr_comp.v_sampling
535
+
516
536
  pixels = Array.new(width * height)
517
537
 
518
538
  height.times do |py|
519
539
  dst_row = py * width
520
- y_row = py * y_ch[:width]
540
+ y_row = py * y_stride
521
541
 
522
542
  # Chroma coordinates (nearest-neighbor upsampling)
523
- cb_y = (py * cb_comp.v_sampling) / max_v
524
- cr_y = (py * cr_comp.v_sampling) / max_v
525
- cb_row = cb_y * cb_ch[:width]
526
- cr_row = cr_y * cr_ch[:width]
543
+ cb_row = ((py * cb_v) / max_v) * cb_stride
544
+ cr_row = ((py * cr_v) / max_v) * cr_stride
527
545
 
528
546
  width.times do |px|
529
- lum = y_ch[:data][y_row + px]
547
+ lum = y_data[y_row + px]
530
548
 
531
- cb_x = (px * cb_comp.h_sampling) / max_h
532
- cr_x = (px * cr_comp.h_sampling) / max_h
533
- cb = cb_ch[:data][cb_row + cb_x] - 128.0
534
- cr = cr_ch[:data][cr_row + cr_x] - 128.0
549
+ cb_x = (px * cb_h) / max_h
550
+ cr_x = (px * cr_h) / max_h
551
+ cb_val = cb_data[cb_row + cb_x] - 128
552
+ cr_val = cr_data[cr_row + cr_x] - 128
535
553
 
536
- r = (lum + 1.402 * cr).round
537
- g = (lum - 0.344136 * cb - 0.714136 * cr).round
538
- b = (lum + 1.772 * cb).round
554
+ # Fixed-point YCbCr→RGB (all integer arithmetic)
555
+ r = lum + ((FP_R_CR * cr_val + FP_HALF) >> 16)
556
+ g = lum + ((FP_G_CB * cb_val + FP_G_CR * cr_val + FP_HALF) >> 16)
557
+ b = lum + ((FP_B_CB * cb_val + FP_HALF) >> 16)
539
558
 
540
559
  r = r < 0 ? 0 : (r > 255 ? 255 : r)
541
560
  g = g < 0 ? 0 : (g > 255 ? 255 : g)
@@ -205,17 +205,14 @@ module PureJPEG
205
205
  padded_w = (width + 7) & ~7
206
206
  padded_h = (height + 7) & ~7
207
207
 
208
- block = Array.new(64, 0.0)
209
- temp = Array.new(64, 0.0)
210
- dct = Array.new(64, 0.0)
208
+ block = Array.new(64, 0)
211
209
  qbuf = Array.new(64, 0)
212
210
  zbuf = Array.new(64, 0)
213
211
 
214
212
  (0...padded_h).step(8) do |by|
215
213
  (0...padded_w).step(8) do |bx|
216
214
  extract_block_into(y_data, width, height, bx, by, block)
217
- transform_block(block, temp, dct, qbuf, zbuf, qtable)
218
- yield zbuf
215
+ yield transform_block(block, qbuf, zbuf, qtable)
219
216
  end
220
217
  end
221
218
  end
@@ -278,37 +275,29 @@ module PureJPEG
278
275
  mcu_w = (width + 15) & ~15
279
276
  mcu_h = (height + 15) & ~15
280
277
 
281
- block = Array.new(64, 0.0)
282
- temp = Array.new(64, 0.0)
283
- dct = Array.new(64, 0.0)
278
+ block = Array.new(64, 0)
284
279
  qbuf = Array.new(64, 0)
285
280
  zbuf = Array.new(64, 0)
286
281
 
287
282
  (0...mcu_h).step(16) do |my|
288
283
  (0...mcu_w).step(16) do |mx|
289
284
  extract_block_into(y_data, width, height, mx, my, block)
290
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
291
- yield :y, zbuf
285
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
292
286
 
293
287
  extract_block_into(y_data, width, height, mx + 8, my, block)
294
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
295
- yield :y, zbuf
288
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
296
289
 
297
290
  extract_block_into(y_data, width, height, mx, my + 8, block)
298
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
299
- yield :y, zbuf
291
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
300
292
 
301
293
  extract_block_into(y_data, width, height, mx + 8, my + 8, block)
302
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
303
- yield :y, zbuf
294
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
304
295
 
305
296
  extract_block_into(cb_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
306
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
307
- yield :cb, zbuf
297
+ yield :cb, transform_block(block, qbuf, zbuf, chr_qt)
308
298
 
309
299
  extract_block_into(cr_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
310
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
311
- yield :cr, zbuf
300
+ yield :cr, transform_block(block, qbuf, zbuf, chr_qt)
312
301
  end
313
302
  end
314
303
  end
@@ -333,9 +322,9 @@ module PureJPEG
333
322
 
334
323
  # --- Shared block pipeline (all buffers pre-allocated) ---
335
324
 
336
- def transform_block(block, temp, dct, qbuf, zbuf, qtable)
337
- DCT.forward!(block, temp, dct)
338
- Quantization.quantize!(dct, qtable, qbuf)
325
+ def transform_block(block, qbuf, zbuf, qtable)
326
+ DCT.forward!(block)
327
+ Quantization.quantize!(block, qtable, qbuf)
339
328
  Zigzag.reorder!(qbuf, zbuf)
340
329
  zbuf
341
330
  end
@@ -352,26 +341,42 @@ module PureJPEG
352
341
  end
353
342
  end
354
343
 
344
+ # Fixed-point coefficients (scaled by 2^16 = 65536) for RGB→YCbCr.
345
+ # Y = 0.299*R + 0.587*G + 0.114*B
346
+ # Cb = -0.168736*R - 0.331264*G + 0.5*B + 128
347
+ # Cr = 0.5*R - 0.418688*G - 0.081312*B + 128
348
+ FP_Y_R = 19595; FP_Y_G = 38470; FP_Y_B = 7471
349
+ FP_CB_R = -11058; FP_CB_G = -21710; FP_CB_B = 32768
350
+ FP_CR_R = 32768; FP_CR_G = -27440; FP_CR_B = -5328
351
+ FP_HALF = 32768 # rounding bias
352
+ FP_128 = 8388608 # 128 << 16
353
+
354
+ def clamp255(v)
355
+ v < 0 ? 0 : (v > 255 ? 255 : v)
356
+ end
357
+
355
358
  def extract_luminance(width, height)
356
359
  luminance = Array.new(width * height)
357
360
  if source.respond_to?(:packed_pixels)
358
361
  packed = source.packed_pixels
359
362
  r_shift, g_shift, b_shift = packed_shifts
363
+ n = width * height
360
364
  i = 0
361
- (width * height).times do
365
+ n.times do
362
366
  color = packed[i]
363
367
  r = (color >> r_shift) & 0xFF
364
368
  g = (color >> g_shift) & 0xFF
365
369
  b = (color >> b_shift) & 0xFF
366
- luminance[i] = (0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
370
+ luminance[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
367
371
  i += 1
368
372
  end
369
373
  else
370
- height.times do |y|
371
- row = y * width
372
- width.times do |x|
373
- pixel = source[x, y]
374
- luminance[row + x] = (0.299 * pixel.r + 0.587 * pixel.g + 0.114 * pixel.b).round.clamp(0, 255)
374
+ height.times do |py|
375
+ row = py * width
376
+ width.times do |px|
377
+ pixel = source[px, py]
378
+ r = pixel.r; g = pixel.g; b = pixel.b
379
+ luminance[row + px] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
375
380
  end
376
381
  end
377
382
  end
@@ -393,9 +398,9 @@ module PureJPEG
393
398
  r = (color >> r_shift) & 0xFF
394
399
  g = (color >> g_shift) & 0xFF
395
400
  b = (color >> b_shift) & 0xFF
396
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
397
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
398
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
401
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
402
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
403
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
399
404
  i += 1
400
405
  end
401
406
  else
@@ -405,9 +410,9 @@ module PureJPEG
405
410
  pixel = source[px, py]
406
411
  r = pixel.r; g = pixel.g; b = pixel.b
407
412
  i = row + px
408
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
409
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
410
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
413
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
414
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
415
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
411
416
  end
412
417
  end
413
418
  end
@@ -442,13 +447,16 @@ module PureJPEG
442
447
  8.times do |row|
443
448
  sy = by + row
444
449
  sy = max_y if sy > max_y
445
- src_row = sy * width
446
- row8 = row << 3
447
- 8.times do |col|
448
- sx = bx + col
449
- sx = max_x if sx > max_x
450
- block[row8 | col] = channel[src_row + sx] - 128.0
451
- end
450
+ src = sy * width
451
+ r8 = row << 3
452
+ x = bx; block[r8] = channel[src + (x > max_x ? max_x : x)] - 128
453
+ x = bx + 1; block[r8 | 1] = channel[src + (x > max_x ? max_x : x)] - 128
454
+ x = bx + 2; block[r8 | 2] = channel[src + (x > max_x ? max_x : x)] - 128
455
+ x = bx + 3; block[r8 | 3] = channel[src + (x > max_x ? max_x : x)] - 128
456
+ x = bx + 4; block[r8 | 4] = channel[src + (x > max_x ? max_x : x)] - 128
457
+ x = bx + 5; block[r8 | 5] = channel[src + (x > max_x ? max_x : x)] - 128
458
+ x = bx + 6; block[r8 | 6] = channel[src + (x > max_x ? max_x : x)] - 128
459
+ x = bx + 7; block[r8 | 7] = channel[src + (x > max_x ? max_x : x)] - 128
452
460
  end
453
461
  block
454
462
  end
@@ -3,17 +3,22 @@
3
3
  module PureJPEG
4
4
  module Huffman
5
5
  class Encoder
6
- def self.category_and_bits(value)
7
- return [0, 0] if value == 0
8
- abs_val = value.abs
6
+ # Return the Huffman category (bit length) for a value.
7
+ # Avoids Array allocation compared to the combined category_and_bits.
8
+ def self.category(value)
9
+ return 0 if value == 0
10
+ v = value.abs
9
11
  cat = 0
10
- v = abs_val
11
12
  while v > 0
12
13
  cat += 1
13
14
  v >>= 1
14
15
  end
15
- bits = value > 0 ? value : value + (1 << cat) - 1
16
- [cat, bits]
16
+ cat
17
+ end
18
+
19
+ # Return the extra bits to encode for a value with the given category.
20
+ def self.value_bits(value, cat)
21
+ value > 0 ? value : value + (1 << cat) - 1
17
22
  end
18
23
 
19
24
  def self.each_ac_item(zigzag)
@@ -39,7 +44,7 @@ module PureJPEG
39
44
  end
40
45
 
41
46
  value = zigzag[i]
42
- cat, = category_and_bits(value)
47
+ cat = category(value)
43
48
  yield (run << 4) | cat, value
44
49
  i += 1
45
50
  end
@@ -73,10 +78,10 @@ module PureJPEG
73
78
  private
74
79
 
75
80
  def encode_dc(diff, writer)
76
- cat, bits = self.class.category_and_bits(diff)
81
+ cat = self.class.category(diff)
77
82
  code, length = @dc_table[cat]
78
83
  writer.write_bits(code, length)
79
- writer.write_bits(bits, cat) if cat > 0
84
+ writer.write_bits(self.class.value_bits(diff, cat), cat) if cat > 0
80
85
  end
81
86
 
82
87
  def encode_ac(zigzag, writer)
@@ -85,8 +90,8 @@ module PureJPEG
85
90
  writer.write_bits(code, length)
86
91
  next if symbol == 0x00 || symbol == 0xF0
87
92
 
88
- cat, bits = self.class.category_and_bits(value)
89
- writer.write_bits(bits, cat)
93
+ cat = self.class.category(value)
94
+ writer.write_bits(self.class.value_bits(value, cat), cat)
90
95
  end
91
96
  end
92
97
  end
@@ -104,7 +109,7 @@ module PureJPEG
104
109
  diff = zigzag[0] - @prev_dc[state_key]
105
110
  @prev_dc[state_key] = zigzag[0]
106
111
 
107
- cat, = Encoder.category_and_bits(diff)
112
+ cat = Encoder.category(diff)
108
113
  @dc_frequencies[cat] += 1
109
114
 
110
115
  Encoder.each_ac_symbol(zigzag) do |symbol|
@@ -36,9 +36,20 @@ module PureJPEG
36
36
  }
37
37
  end
38
38
 
39
- # Quantize a 64-element DCT block in place into `out`.
39
+ # Quantize a 64-element DCT block into `out`.
40
+ # Uses integer rounding division (round-to-nearest) to match the
41
+ # behavior of Float division + round from the previous float DCT.
40
42
  def self.quantize!(block, table, out)
41
- 64.times { |i| out[i] = (block[i] / table[i]).round }
43
+ i = 0
44
+ while i < 64
45
+ v = block[i]; t = table[i]
46
+ out[i] = if v >= 0
47
+ (v + (t >> 1)) / t
48
+ else
49
+ -((-v + (t >> 1)) / t)
50
+ end
51
+ i += 1
52
+ end
42
53
  out
43
54
  end
44
55
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
- VERSION = "0.3.1"
4
+ VERSION = "0.3.2"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pure_jpeg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Cooper
@@ -86,7 +86,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
86
86
  - !ruby/object:Gem::Version
87
87
  version: '0'
88
88
  requirements: []
89
- rubygems_version: 4.0.3
89
+ rubygems_version: 3.6.9
90
90
  specification_version: 4
91
91
  summary: Pure Ruby JPEG encoder and decoder
92
92
  test_files: []