pure_jpeg 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 66b5d6fe1b663128f62aae8111b55a9b2ddbe9739d501fcf1146c459286433da
4
- data.tar.gz: 02ae6cdc25f520221fee4adfe9d6070c4a975602e5de2b34a0b9ab8b4e005829
3
+ metadata.gz: 780f932b176fecdb5daab2546909fe2610325e54f7364cf482e3e2652ab614a5
4
+ data.tar.gz: 1fbc350f25d09b989ee6262e077bb36847c6637db325fb03c29f2083bb0ec973
5
5
  SHA512:
6
- metadata.gz: f476b8fec25f1f0402f297d534f52887aed90778ddf1217668a0889755856436dae8a0d8e4e9d648bc2a33879165b32fa0560e5b506549467c21a648fd6ecf29
7
- data.tar.gz: 5438c161519149458fad8cd28dea1f9f073a2646c5f91f619a357d375b77456ed5f366dd3ed78e75f5c0d2cbe4156f4ec8a5a6dbaa9e19d86198dcd8a2fcacfc
6
+ metadata.gz: a385db40804bf4a992d78253ba619e90b147aa5be7808b341398918f5f36c3593fbd1a979aaad9729ad874f2427fe15d31fc9a0413e5f7ea98ee44e43e125f2d
7
+ data.tar.gz: b9f5581f4c4f27f42460961b3b4231fd94b45be130fbdce74a28bc4888f2a2f3de08fd43b9e3efc782ccb6b4f260aa8484ff7794e7bde1362018dc3deb282b26
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.2
4
+
5
+ Performance:
6
+
7
+ - Replaced matrix-multiply float DCT with integer-scaled AAN (Arai-Agui-Nakajima) DCT from the IJG reference implementation -- all-integer, no Float allocations
8
+ - Fixed-point integer arithmetic for RGB/YCbCr color space conversion in both encoder and decoder
9
+ - Eliminated short-lived Array allocations in Huffman encoder (`category_and_bits` split into separate methods)
10
+ - `String#<<` with Integer instead of `byte.chr` to avoid String allocations in bit writer
11
+ - DCT inner loop unrolling to eliminate nested block invocations
12
+ - Unrolled `write_block` and `extract_block_into` inner loops
13
+ - Integer rounding division in quantization (no more Float division + round)
14
+ - Hoisted hash lookups and method calls out of per-pixel loops in decoder
15
+
16
+ Result: ~2.9x faster encode, ~4.6x faster decode on Ruby 4.0.2 with YJIT.
17
+
18
+ Credits: [Ufuk Kayserilioglu](https://github.com/paracycle)
19
+
20
+ ## 0.3.1
21
+
22
+ Fixes:
23
+
24
+ - Fixed shared `Pixel` instance bug in decoder that could corrupt pixel data
25
+ - Encoder validates return values from `quantization_modifier` blocks
26
+
3
27
  ## 0.3.0
4
28
 
5
29
  New features:
data/README.md CHANGED
@@ -194,25 +194,26 @@ Decoding:
194
194
 
195
195
  Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
196
196
 
197
- Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
197
+ Possible future improvements: ICC profile rendering/conversion.
198
198
 
199
199
  ## Performance
200
200
 
201
- On a 1024x1024 image (Ruby 4.0.1 on my M1 Max):
201
+ On a 1024x1024 image (Ruby 4.0.2 with YJIT on an M5):
202
202
 
203
203
  | Operation | Time |
204
204
  |-----------|------|
205
- | Encode (color, q85) | ~1.7s |
206
- | Decode (color) | ~1.8s |
205
+ | Encode (color, q85) | ~0.16s |
206
+ | Decode (baseline) | ~0.14s |
207
+ | Decode (progressive) | ~0.18s |
207
208
 
208
- Both the encoder and decoder use a separable DCT with a precomputed cosine matrix and reuse all per-block buffers to minimize GC pressure. Pixel data is stored as packed integers internally to avoid per-pixel object allocation.
209
+ The encoder and decoder use an integer-scaled AAN (Arai-Agui-Nakajima) DCT with fixed-point arithmetic throughout no Float operations in the hot path. Color space conversion uses fixed-point integer math, and pixel data is stored as packed integers to avoid per-pixel object allocation.
209
210
 
210
211
  ## Some useful `rake` tasks
211
212
 
212
213
  ```
213
214
  bundle install
214
215
  rake test # run the test suite
215
- rake benchmark # benchmark encoding (3 runs against examples/a.png)
216
+ rake benchmark # benchmark encoding and decoding (3 runs each)
216
217
  rake profile # CPU profile with StackProf (requires the stackprof gem)
217
218
  ```
218
219
 
@@ -222,7 +223,7 @@ rake profile # CPU profile with StackProf (requires the stackprof gem)
222
223
 
223
224
  **I have read all of the code produced up to v0.2.0.** The algorithms are above my paygrade, but I'm OK with what has been produced, and I manually fixed a variety of stylistic things along the way. For example, CC seems to like wrapping entire functions in `if` statements rather than bailing on the opposite condition. *Later update: I have not read the ICC and optimized Huffman code yet, but it is heavily tested.*
224
225
 
225
- **CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my Kodak digital camera from 2001. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).
226
+ **CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my [Casio QV-10 digital camera](https://medium.com/people-gadgets/the-gadget-we-miss-the-casio-qv-10-digital-camera-c25ab786ce49) from the late 1990s. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).
226
227
 
227
228
  **CC is lazy.** The initial implementation was VERY SLOW. It took 15 seconds to turn a 1024x1024 PNG into a JPEG, so we went down the profiling rabbit hole and found many optimizations to make it ~6x faster. CC is poor at considering the role of Ruby's GC when implementing low level algorithms and needs some prodding to make the correct optimizations. CC is also lazy to the point of recommending that you just use another language (e.g. Go or Rust) rather than do a pure Ruby version of something - despite it being possible with some extra work.
228
229
 
@@ -232,6 +233,10 @@ rake profile # CPU profile with StackProf (requires the stackprof gem)
232
233
 
233
234
  **The final 10% still takes 90% of the time.** As mentioned above, the first run was quick, but getting things right has taken much longer. v0.1->0.2 has taken longer than 0.1 did! But we now have progressive JPEG support, even more optimizations, better tests, etc. etc.
234
235
 
236
+ ## Credits
237
+
238
+ - [Ufuk Kayserilioglu](https://github.com/paracycle) - Major performance optimizations including integer-scaled AAN DCT, fixed-point color space conversion, and YJIT-targeted improvements.
239
+
235
240
  ## License
236
241
 
237
242
  MIT
@@ -17,8 +17,8 @@ module PureJPEG
17
17
  while @bits_in_buffer >= 8
18
18
  @bits_in_buffer -= 8
19
19
  byte = (@buffer >> @bits_in_buffer) & 0xFF
20
- @data << byte.chr
21
- @data << "\x00".b if byte == 0xFF # byte stuffing
20
+ @data << byte
21
+ @data << 0x00 if byte == 0xFF # byte stuffing
22
22
  end
23
23
 
24
24
  @buffer &= (1 << @bits_in_buffer) - 1
data/lib/pure_jpeg/dct.rb CHANGED
@@ -1,10 +1,15 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
+ # Integer-scaled DCT based on the IJG (Independent JPEG Group) reference
5
+ # implementation (jfdctint.c / jidctint.c). Uses the Arai-Agui-Nakajima
6
+ # factorization with 13-bit fixed-point constants.
7
+ #
8
+ # All arithmetic is pure Integer (additions, shifts, multiplies) — no Float
9
+ # operations. This is ~3x faster than the matrix-multiply float DCT under
10
+ # YJIT and eliminates millions of Float object allocations during decode.
4
11
  module DCT
5
- # Precomputed 8x8 DCT matrix: A[k][n] = (C(k)/2) * cos((2n+1)*k*pi/16)
6
- # where C(0) = 1/sqrt(2), C(k) = 1 for k > 0.
7
- # This lets us do the 2D DCT as two 1D matrix-vector multiplies (separable).
12
+ # Keep the float matrix available for reference / testing
8
13
  MATRIX = Array.new(8) { |k|
9
14
  ck = k == 0 ? 0.5 / Math.sqrt(2.0) : 0.5
10
15
  Array.new(8) { |n|
@@ -12,72 +17,215 @@ module PureJPEG
12
17
  }
13
18
  }.freeze
14
19
 
15
- # Flatten for faster indexed access
16
20
  MATRIX_FLAT = MATRIX.flatten.freeze
17
-
18
- # Transposed matrix for inverse DCT: A^T[n][k] = A[k][n]
19
21
  MATRIX_T_FLAT = Array.new(64) { |i| MATRIX_FLAT[(i % 8) * 8 + i / 8] }.freeze
20
22
 
21
- # Separable forward 2D DCT: row pass then column pass.
22
- # Writes result into `out`. Uses `temp` as scratch space.
23
- # All three arrays must be pre-allocated with 64 elements.
24
- def self.forward!(block, temp, out)
25
- # Row pass: temp[y*8+u] = sum_x A[u][x] * block[y*8+x]
26
- m = MATRIX_FLAT
27
- 8.times do |y|
28
- y8 = y << 3
29
- b0 = block[y8]; b1 = block[y8|1]; b2 = block[y8|2]; b3 = block[y8|3]
30
- b4 = block[y8|4]; b5 = block[y8|5]; b6 = block[y8|6]; b7 = block[y8|7]
31
- 8.times do |u|
32
- u8 = u << 3
33
- temp[y8|u] = m[u8]*b0 + m[u8|1]*b1 + m[u8|2]*b2 + m[u8|3]*b3 +
34
- m[u8|4]*b4 + m[u8|5]*b5 + m[u8|6]*b6 + m[u8|7]*b7
35
- end
23
+ # Fixed-point constants (13-bit precision) from IJG reference.
24
+ CONST_BITS = 13
25
+ PASS1_BITS = 2
26
+
27
+ FIX_0_298631336 = 2446
28
+ FIX_0_390180644 = 3196
29
+ FIX_0_541196100 = 4433
30
+ FIX_0_765366865 = 6270
31
+ FIX_0_899976223 = 7373
32
+ FIX_1_175875602 = 9633
33
+ FIX_1_501321110 = 12299
34
+ FIX_1_847759065 = 15137
35
+ FIX_1_961570560 = 16069
36
+ FIX_2_053119869 = 16819
37
+ FIX_2_562915447 = 20995
38
+ FIX_3_072711026 = 25172
39
+
40
+ CB = CONST_BITS
41
+ P1 = PASS1_BITS
42
+ CB_M_P1 = CB - P1 # 11
43
+ CB_P_P1_P3 = CB + P1 + 3 # 18
44
+ P1_P3 = P1 + 3 # 5
45
+ CB2_P_P1 = CB * 2 + P1 # 28 (unused, was for column even-multiplied path)
46
+
47
+ # Forward 2D DCT (in-place). Input: 64-element array of level-shifted
48
+ # integers (-128..127). Output: DCT coefficients (integers).
49
+ # The `_temp` and `_out` parameters are accepted for API compatibility
50
+ # but ignored; computation is done in-place on `data`.
51
+ def self.forward!(data, _temp = nil, _out = nil)
52
+ # Pass 1: process rows
53
+ 8.times do |row|
54
+ i = row << 3
55
+ d0 = data[i]; d1 = data[i+1]; d2 = data[i+2]; d3 = data[i+3]
56
+ d4 = data[i+4]; d5 = data[i+5]; d6 = data[i+6]; d7 = data[i+7]
57
+
58
+ tmp0 = d0 + d7; tmp7 = d0 - d7
59
+ tmp1 = d1 + d6; tmp6 = d1 - d6
60
+ tmp2 = d2 + d5; tmp5 = d2 - d5
61
+ tmp3 = d3 + d4; tmp4 = d3 - d4
62
+
63
+ # Even part
64
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
65
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
66
+
67
+ data[i] = (tmp10 + tmp11) << P1
68
+ data[i+4] = (tmp10 - tmp11) << P1
69
+
70
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
71
+ data[i+2] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
72
+ data[i+6] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
73
+
74
+ # Odd part
75
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
76
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
77
+ z5 = (z3 + z4) * FIX_1_175875602
78
+
79
+ tmp4 = tmp4 * FIX_0_298631336
80
+ tmp5 = tmp5 * FIX_2_053119869
81
+ tmp6 = tmp6 * FIX_3_072711026
82
+ tmp7 = tmp7 * FIX_1_501321110
83
+ z1 = z1 * -FIX_0_899976223
84
+ z2 = z2 * -FIX_2_562915447
85
+ z3 = z3 * -FIX_1_961570560 + z5
86
+ z4 = z4 * -FIX_0_390180644 + z5
87
+
88
+ data[i+7] = (tmp4 + z1 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
89
+ data[i+5] = (tmp5 + z2 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
90
+ data[i+3] = (tmp6 + z2 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
91
+ data[i+1] = (tmp7 + z1 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
36
92
  end
37
93
 
38
- # Column pass: out[v*8+u] = sum_y A[v][y] * temp[y*8+u]
39
- 8.times do |u|
40
- t0 = temp[u]; t1 = temp[8|u]; t2 = temp[16|u]; t3 = temp[24|u]
41
- t4 = temp[32|u]; t5 = temp[40|u]; t6 = temp[48|u]; t7 = temp[56|u]
42
- 8.times do |v|
43
- v8 = v << 3
44
- out[v8|u] = m[v8]*t0 + m[v8|1]*t1 + m[v8|2]*t2 + m[v8|3]*t3 +
45
- m[v8|4]*t4 + m[v8|5]*t5 + m[v8|6]*t6 + m[v8|7]*t7
46
- end
94
+ # Pass 2: process columns
95
+ 8.times do |col|
96
+ d0 = data[col]; d1 = data[col+8]; d2 = data[col+16]; d3 = data[col+24]
97
+ d4 = data[col+32]; d5 = data[col+40]; d6 = data[col+48]; d7 = data[col+56]
98
+
99
+ tmp0 = d0 + d7; tmp7 = d0 - d7
100
+ tmp1 = d1 + d6; tmp6 = d1 - d6
101
+ tmp2 = d2 + d5; tmp5 = d2 - d5
102
+ tmp3 = d3 + d4; tmp4 = d3 - d4
103
+
104
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
105
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
106
+
107
+ data[col] = (tmp10 + tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
108
+ data[col+32] = (tmp10 - tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
109
+
110
+ z1 = (tmp12 + tmp13) * FIX_0_541196100
111
+ data[col+16] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
112
+ data[col+48] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
113
+
114
+ z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
115
+ z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
116
+ z5 = (z3 + z4) * FIX_1_175875602
117
+
118
+ tmp4 = tmp4 * FIX_0_298631336
119
+ tmp5 = tmp5 * FIX_2_053119869
120
+ tmp6 = tmp6 * FIX_3_072711026
121
+ tmp7 = tmp7 * FIX_1_501321110
122
+ z1 = z1 * -FIX_0_899976223
123
+ z2 = z2 * -FIX_2_562915447
124
+ z3 = z3 * -FIX_1_961570560 + z5
125
+ z4 = z4 * -FIX_0_390180644 + z5
126
+
127
+ data[col+56] = (tmp4 + z1 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
128
+ data[col+40] = (tmp5 + z2 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
129
+ data[col+24] = (tmp6 + z2 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
130
+ data[col+8] = (tmp7 + z1 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
47
131
  end
48
132
 
49
- out
133
+ data
50
134
  end
51
135
 
52
- # Separable inverse 2D DCT: same structure as forward but using A^T.
53
- # f = A^T * F * A
54
- def self.inverse!(block, temp, out)
55
- mt = MATRIX_T_FLAT
56
-
57
- # Row pass: temp[v*8+x] = sum_u A^T[x][u] * block[v*8+u]
58
- 8.times do |v|
59
- v8 = v << 3
60
- b0 = block[v8]; b1 = block[v8|1]; b2 = block[v8|2]; b3 = block[v8|3]
61
- b4 = block[v8|4]; b5 = block[v8|5]; b6 = block[v8|6]; b7 = block[v8|7]
62
- 8.times do |x|
63
- x8 = x << 3
64
- temp[v8|x] = mt[x8]*b0 + mt[x8|1]*b1 + mt[x8|2]*b2 + mt[x8|3]*b3 +
65
- mt[x8|4]*b4 + mt[x8|5]*b5 + mt[x8|6]*b6 + mt[x8|7]*b7
66
- end
136
+ # Inverse 2D DCT (in-place). Input: dequantized DCT coefficients (integers).
137
+ # Output: spatial-domain values (integers) that still need +128 level shift.
138
+ def self.inverse!(data, _temp = nil, _out = nil)
139
+ # Pass 1: process columns
140
+ 8.times do |col|
141
+ d0 = data[col]; d2 = data[col+16]; d4 = data[col+32]; d6 = data[col+48]
142
+ d1 = data[col+8]; d3 = data[col+24]; d5 = data[col+40]; d7 = data[col+56]
143
+
144
+ # Even part
145
+ z1 = (d2 + d6) * FIX_0_541196100
146
+ tmp2 = z1 - d6 * FIX_1_847759065
147
+ tmp3 = z1 + d2 * FIX_0_765366865
148
+
149
+ tmp0 = (d0 + d4) << CB
150
+ tmp1 = (d0 - d4) << CB
151
+
152
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
153
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
154
+
155
+ # Odd part
156
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
157
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
158
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
159
+ z5 = (z3 + z4) * FIX_1_175875602
160
+
161
+ tmp0 = tmp0 * FIX_0_298631336
162
+ tmp1 = tmp1 * FIX_2_053119869
163
+ tmp2 = tmp2 * FIX_3_072711026
164
+ tmp3 = tmp3 * FIX_1_501321110
165
+ z1 = z1 * -FIX_0_899976223
166
+ z2 = z2 * -FIX_2_562915447
167
+ z3 = z3 * -FIX_1_961570560 + z5
168
+ z4 = z4 * -FIX_0_390180644 + z5
169
+
170
+ tmp0 += z1 + z3; tmp1 += z2 + z4
171
+ tmp2 += z2 + z3; tmp3 += z1 + z4
172
+
173
+ data[col] = (tmp10 + tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
174
+ data[col+56] = (tmp10 - tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
175
+ data[col+8] = (tmp11 + tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
176
+ data[col+48] = (tmp11 - tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
177
+ data[col+16] = (tmp12 + tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
178
+ data[col+40] = (tmp12 - tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
179
+ data[col+24] = (tmp13 + tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
180
+ data[col+32] = (tmp13 - tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
67
181
  end
68
182
 
69
- # Column pass: out[y*8+x] = sum_v A^T[y][v] * temp[v*8+x]
70
- 8.times do |x|
71
- t0 = temp[x]; t1 = temp[8|x]; t2 = temp[16|x]; t3 = temp[24|x]
72
- t4 = temp[32|x]; t5 = temp[40|x]; t6 = temp[48|x]; t7 = temp[56|x]
73
- 8.times do |y|
74
- y8 = y << 3
75
- out[y8|x] = mt[y8]*t0 + mt[y8|1]*t1 + mt[y8|2]*t2 + mt[y8|3]*t3 +
76
- mt[y8|4]*t4 + mt[y8|5]*t5 + mt[y8|6]*t6 + mt[y8|7]*t7
77
- end
183
+ # Pass 2: process rows
184
+ 8.times do |row|
185
+ i = row << 3
186
+ d0 = data[i]; d2 = data[i+2]; d4 = data[i+4]; d6 = data[i+6]
187
+ d1 = data[i+1]; d3 = data[i+3]; d5 = data[i+5]; d7 = data[i+7]
188
+
189
+ # Even part
190
+ z1 = (d2 + d6) * FIX_0_541196100
191
+ tmp2 = z1 - d6 * FIX_1_847759065
192
+ tmp3 = z1 + d2 * FIX_0_765366865
193
+
194
+ tmp0 = (d0 + d4) << CB
195
+ tmp1 = (d0 - d4) << CB
196
+
197
+ tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
198
+ tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
199
+
200
+ # Odd part
201
+ tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
202
+ z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
203
+ z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
204
+ z5 = (z3 + z4) * FIX_1_175875602
205
+
206
+ tmp0 = tmp0 * FIX_0_298631336
207
+ tmp1 = tmp1 * FIX_2_053119869
208
+ tmp2 = tmp2 * FIX_3_072711026
209
+ tmp3 = tmp3 * FIX_1_501321110
210
+ z1 = z1 * -FIX_0_899976223
211
+ z2 = z2 * -FIX_2_562915447
212
+ z3 = z3 * -FIX_1_961570560 + z5
213
+ z4 = z4 * -FIX_0_390180644 + z5
214
+
215
+ tmp0 += z1 + z3; tmp1 += z2 + z4
216
+ tmp2 += z2 + z3; tmp3 += z1 + z4
217
+
218
+ data[i] = (tmp10 + tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
219
+ data[i+7] = (tmp10 - tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
220
+ data[i+1] = (tmp11 + tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
221
+ data[i+6] = (tmp11 - tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
222
+ data[i+2] = (tmp12 + tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
223
+ data[i+5] = (tmp12 - tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
224
+ data[i+3] = (tmp13 + tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
225
+ data[i+4] = (tmp13 - tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
78
226
  end
79
227
 
80
- out
228
+ data
81
229
  end
82
230
  end
83
231
  end
@@ -78,10 +78,8 @@ module PureJPEG
78
78
 
79
79
  # Reusable buffers
80
80
  zigzag = Array.new(64, 0)
81
- raster = Array.new(64, 0.0)
82
- dequant = Array.new(64, 0.0)
83
- temp = Array.new(64, 0.0)
84
- spatial = Array.new(64, 0.0)
81
+ raster = Array.new(64, 0)
82
+ dequant = Array.new(64, 0)
85
83
 
86
84
  mcus_y.times do |mcu_row|
87
85
  mcus_x.times do |mcu_col|
@@ -104,12 +102,12 @@ module PureJPEG
104
102
  # Inverse pipeline: unzigzag -> dequantize -> IDCT -> level shift
105
103
  Zigzag.unreorder!(zigzag, raster)
106
104
  Quantization.dequantize!(raster, qt, dequant)
107
- DCT.inverse!(dequant, temp, spatial)
105
+ DCT.inverse!(dequant)
108
106
 
109
107
  # Write block into channel buffer
110
108
  bx = (mcu_col * comp.h_sampling + bh) * 8
111
109
  by = (mcu_row * comp.v_sampling + bv) * 8
112
- write_block(spatial, ch[:data], ch[:width], bx, by)
110
+ write_block(dequant, ch[:data], ch[:width], bx, by)
113
111
  end
114
112
  end
115
113
  end
@@ -204,10 +202,8 @@ module PureJPEG
204
202
  end
205
203
 
206
204
  zigzag = Array.new(64, 0)
207
- raster = Array.new(64, 0.0)
208
- dequant = Array.new(64, 0.0)
209
- temp = Array.new(64, 0.0)
210
- spatial = Array.new(64, 0.0)
205
+ raster = Array.new(64, 0)
206
+ dequant = Array.new(64, 0)
211
207
 
212
208
  jfif.components.each do |c|
213
209
  qt = fetch_quant_table!(jfif, c)
@@ -222,8 +218,8 @@ module PureJPEG
222
218
 
223
219
  Zigzag.unreorder!(zigzag, raster)
224
220
  Quantization.dequantize!(raster, qt, dequant)
225
- DCT.inverse!(dequant, temp, spatial)
226
- write_block(spatial, ch[:data], ch[:width], block_x * 8, block_y * 8)
221
+ DCT.inverse!(dequant)
222
+ write_block(dequant, ch[:data], ch[:width], block_x * 8, block_y * 8)
227
223
  end
228
224
  end
229
225
  end
@@ -460,12 +456,16 @@ module PureJPEG
460
456
  # Write an 8x8 spatial block (level-shifted by +128) into a channel buffer.
461
457
  def write_block(spatial, channel, ch_width, bx, by)
462
458
  8.times do |row|
463
- dst_row = (by + row) * ch_width + bx
464
- row8 = row << 3
465
- 8.times do |col|
466
- val = (spatial[row8 | col] + 128.0).round
467
- channel[dst_row + col] = val < 0 ? 0 : (val > 255 ? 255 : val)
468
- end
459
+ dst = (by + row) * ch_width + bx
460
+ r8 = row << 3
461
+ v = spatial[r8] + 128; channel[dst] = v < 0 ? 0 : (v > 255 ? 255 : v)
462
+ v = spatial[r8 | 1] + 128; channel[dst + 1] = v < 0 ? 0 : (v > 255 ? 255 : v)
463
+ v = spatial[r8 | 2] + 128; channel[dst + 2] = v < 0 ? 0 : (v > 255 ? 255 : v)
464
+ v = spatial[r8 | 3] + 128; channel[dst + 3] = v < 0 ? 0 : (v > 255 ? 255 : v)
465
+ v = spatial[r8 | 4] + 128; channel[dst + 4] = v < 0 ? 0 : (v > 255 ? 255 : v)
466
+ v = spatial[r8 | 5] + 128; channel[dst + 5] = v < 0 ? 0 : (v > 255 ? 255 : v)
467
+ v = spatial[r8 | 6] + 128; channel[dst + 6] = v < 0 ? 0 : (v > 255 ? 255 : v)
468
+ v = spatial[r8 | 7] + 128; channel[dst + 7] = v < 0 ? 0 : (v > 255 ? 255 : v)
469
469
  end
470
470
  end
471
471
 
@@ -493,18 +493,27 @@ module PureJPEG
493
493
 
494
494
  def assemble_grayscale(width, height, channels, comp)
495
495
  ch = channels[comp.id]
496
+ ch_data = ch[:data]
497
+ ch_width = ch[:width]
496
498
  pixels = Array.new(width * height)
497
499
  height.times do |y|
498
- src_row = y * ch[:width]
500
+ src_row = y * ch_width
499
501
  dst_row = y * width
500
502
  width.times do |x|
501
- v = ch[:data][src_row + x]
503
+ v = ch_data[src_row + x]
502
504
  pixels[dst_row + x] = (v << 16) | (v << 8) | v
503
505
  end
504
506
  end
505
507
  Image.new(width, height, pixels, icc_profile: @icc_profile)
506
508
  end
507
509
 
510
+ # Fixed-point coefficients (scaled by 2^16) for YCbCr→RGB.
511
+ FP_R_CR = 91881 # 1.402 * 65536
512
+ FP_G_CB = -22554 # -0.344136 * 65536
513
+ FP_G_CR = -46802 # -0.714136 * 65536
514
+ FP_B_CB = 116130 # 1.772 * 65536
515
+ FP_HALF = 32768 # rounding bias
516
+
508
517
  def assemble_color(width, height, channels, components, max_h, max_v)
509
518
  # Upsample chroma channels if needed and convert YCbCr to RGB
510
519
  y_comp, cb_comp, cr_comp = resolve_color_components(components)
@@ -513,29 +522,39 @@ module PureJPEG
513
522
  cb_ch = channels[cb_comp.id]
514
523
  cr_ch = channels[cr_comp.id]
515
524
 
525
+ y_data = y_ch[:data]
526
+ cb_data = cb_ch[:data]
527
+ cr_data = cr_ch[:data]
528
+ y_stride = y_ch[:width]
529
+ cb_stride = cb_ch[:width]
530
+ cr_stride = cr_ch[:width]
531
+ cb_h = cb_comp.h_sampling
532
+ cb_v = cb_comp.v_sampling
533
+ cr_h = cr_comp.h_sampling
534
+ cr_v = cr_comp.v_sampling
535
+
516
536
  pixels = Array.new(width * height)
517
537
 
518
538
  height.times do |py|
519
539
  dst_row = py * width
520
- y_row = py * y_ch[:width]
540
+ y_row = py * y_stride
521
541
 
522
542
  # Chroma coordinates (nearest-neighbor upsampling)
523
- cb_y = (py * cb_comp.v_sampling) / max_v
524
- cr_y = (py * cr_comp.v_sampling) / max_v
525
- cb_row = cb_y * cb_ch[:width]
526
- cr_row = cr_y * cr_ch[:width]
543
+ cb_row = ((py * cb_v) / max_v) * cb_stride
544
+ cr_row = ((py * cr_v) / max_v) * cr_stride
527
545
 
528
546
  width.times do |px|
529
- lum = y_ch[:data][y_row + px]
547
+ lum = y_data[y_row + px]
530
548
 
531
- cb_x = (px * cb_comp.h_sampling) / max_h
532
- cr_x = (px * cr_comp.h_sampling) / max_h
533
- cb = cb_ch[:data][cb_row + cb_x] - 128.0
534
- cr = cr_ch[:data][cr_row + cr_x] - 128.0
549
+ cb_x = (px * cb_h) / max_h
550
+ cr_x = (px * cr_h) / max_h
551
+ cb_val = cb_data[cb_row + cb_x] - 128
552
+ cr_val = cr_data[cr_row + cr_x] - 128
535
553
 
536
- r = (lum + 1.402 * cr).round
537
- g = (lum - 0.344136 * cb - 0.714136 * cr).round
538
- b = (lum + 1.772 * cb).round
554
+ # Fixed-point YCbCr→RGB (all integer arithmetic)
555
+ r = lum + ((FP_R_CR * cr_val + FP_HALF) >> 16)
556
+ g = lum + ((FP_G_CB * cb_val + FP_G_CR * cr_val + FP_HALF) >> 16)
557
+ b = lum + ((FP_B_CB * cb_val + FP_HALF) >> 16)
539
558
 
540
559
  r = r < 0 ? 0 : (r > 255 ? 255 : r)
541
560
  g = g < 0 ? 0 : (g > 255 ? 255 : g)
@@ -76,17 +76,27 @@ module PureJPEG
76
76
 
77
77
  def build_lum_qtable
78
78
  table = @luminance_table || Quantization.scale_table(Quantization::LUMINANCE_BASE, quality)
79
- table = @quantization_modifier.call(table, :luminance) if @quantization_modifier
79
+ table = apply_quantization_modifier(table, :luminance) if @quantization_modifier
80
80
  table
81
81
  end
82
82
 
83
83
  def build_chr_qtable
84
84
  table = @chrominance_table || Quantization.scale_table(Quantization::CHROMINANCE_BASE, @chroma_quality)
85
- table = @quantization_modifier.call(table, :chrominance) if @quantization_modifier
85
+ table = apply_quantization_modifier(table, :chrominance) if @quantization_modifier
86
86
  table
87
87
  end
88
88
 
89
+ def apply_quantization_modifier(table, channel)
90
+ modified = @quantization_modifier.call(table, channel)
91
+ validate_qtable!(modified, "quantization_modifier result for #{channel}")
92
+ modified
93
+ end
94
+
89
95
  def validate_qtable!(table, name)
96
+ unless table.respond_to?(:length) && table.respond_to?(:all?)
97
+ raise ArgumentError, "#{name} must be a 64-element array of integers between 1 and 255"
98
+ end
99
+
90
100
  raise ArgumentError, "#{name} must have exactly 64 elements (got #{table.length})" unless table.length == 64
91
101
  unless table.all? { |v| v.is_a?(Integer) && v >= 1 && v <= 255 }
92
102
  raise ArgumentError, "#{name} elements must be integers between 1 and 255"
@@ -195,17 +205,14 @@ module PureJPEG
195
205
  padded_w = (width + 7) & ~7
196
206
  padded_h = (height + 7) & ~7
197
207
 
198
- block = Array.new(64, 0.0)
199
- temp = Array.new(64, 0.0)
200
- dct = Array.new(64, 0.0)
208
+ block = Array.new(64, 0)
201
209
  qbuf = Array.new(64, 0)
202
210
  zbuf = Array.new(64, 0)
203
211
 
204
212
  (0...padded_h).step(8) do |by|
205
213
  (0...padded_w).step(8) do |bx|
206
214
  extract_block_into(y_data, width, height, bx, by, block)
207
- transform_block(block, temp, dct, qbuf, zbuf, qtable)
208
- yield zbuf
215
+ yield transform_block(block, qbuf, zbuf, qtable)
209
216
  end
210
217
  end
211
218
  end
@@ -268,37 +275,29 @@ module PureJPEG
268
275
  mcu_w = (width + 15) & ~15
269
276
  mcu_h = (height + 15) & ~15
270
277
 
271
- block = Array.new(64, 0.0)
272
- temp = Array.new(64, 0.0)
273
- dct = Array.new(64, 0.0)
278
+ block = Array.new(64, 0)
274
279
  qbuf = Array.new(64, 0)
275
280
  zbuf = Array.new(64, 0)
276
281
 
277
282
  (0...mcu_h).step(16) do |my|
278
283
  (0...mcu_w).step(16) do |mx|
279
284
  extract_block_into(y_data, width, height, mx, my, block)
280
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
281
- yield :y, zbuf
285
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
282
286
 
283
287
  extract_block_into(y_data, width, height, mx + 8, my, block)
284
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
285
- yield :y, zbuf
288
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
286
289
 
287
290
  extract_block_into(y_data, width, height, mx, my + 8, block)
288
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
289
- yield :y, zbuf
291
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
290
292
 
291
293
  extract_block_into(y_data, width, height, mx + 8, my + 8, block)
292
- transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
293
- yield :y, zbuf
294
+ yield :y, transform_block(block, qbuf, zbuf, lum_qt)
294
295
 
295
296
  extract_block_into(cb_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
296
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
297
- yield :cb, zbuf
297
+ yield :cb, transform_block(block, qbuf, zbuf, chr_qt)
298
298
 
299
299
  extract_block_into(cr_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
300
- transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
301
- yield :cr, zbuf
300
+ yield :cr, transform_block(block, qbuf, zbuf, chr_qt)
302
301
  end
303
302
  end
304
303
  end
@@ -323,9 +322,9 @@ module PureJPEG
323
322
 
324
323
  # --- Shared block pipeline (all buffers pre-allocated) ---
325
324
 
326
- def transform_block(block, temp, dct, qbuf, zbuf, qtable)
327
- DCT.forward!(block, temp, dct)
328
- Quantization.quantize!(dct, qtable, qbuf)
325
+ def transform_block(block, qbuf, zbuf, qtable)
326
+ DCT.forward!(block)
327
+ Quantization.quantize!(block, qtable, qbuf)
329
328
  Zigzag.reorder!(qbuf, zbuf)
330
329
  zbuf
331
330
  end
@@ -342,26 +341,42 @@ module PureJPEG
342
341
  end
343
342
  end
344
343
 
344
+ # Fixed-point coefficients (scaled by 2^16 = 65536) for RGB→YCbCr.
345
+ # Y = 0.299*R + 0.587*G + 0.114*B
346
+ # Cb = -0.168736*R - 0.331264*G + 0.5*B + 128
347
+ # Cr = 0.5*R - 0.418688*G - 0.081312*B + 128
348
+ FP_Y_R = 19595; FP_Y_G = 38470; FP_Y_B = 7471
349
+ FP_CB_R = -11058; FP_CB_G = -21710; FP_CB_B = 32768
350
+ FP_CR_R = 32768; FP_CR_G = -27440; FP_CR_B = -5328
351
+ FP_HALF = 32768 # rounding bias
352
+ FP_128 = 8388608 # 128 << 16
353
+
354
+ def clamp255(v)
355
+ v < 0 ? 0 : (v > 255 ? 255 : v)
356
+ end
357
+
345
358
  def extract_luminance(width, height)
346
359
  luminance = Array.new(width * height)
347
360
  if source.respond_to?(:packed_pixels)
348
361
  packed = source.packed_pixels
349
362
  r_shift, g_shift, b_shift = packed_shifts
363
+ n = width * height
350
364
  i = 0
351
- (width * height).times do
365
+ n.times do
352
366
  color = packed[i]
353
367
  r = (color >> r_shift) & 0xFF
354
368
  g = (color >> g_shift) & 0xFF
355
369
  b = (color >> b_shift) & 0xFF
356
- luminance[i] = (0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
370
+ luminance[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
357
371
  i += 1
358
372
  end
359
373
  else
360
- height.times do |y|
361
- row = y * width
362
- width.times do |x|
363
- pixel = source[x, y]
364
- luminance[row + x] = (0.299 * pixel.r + 0.587 * pixel.g + 0.114 * pixel.b).round.clamp(0, 255)
374
+ height.times do |py|
375
+ row = py * width
376
+ width.times do |px|
377
+ pixel = source[px, py]
378
+ r = pixel.r; g = pixel.g; b = pixel.b
379
+ luminance[row + px] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
365
380
  end
366
381
  end
367
382
  end
@@ -383,9 +398,9 @@ module PureJPEG
383
398
  r = (color >> r_shift) & 0xFF
384
399
  g = (color >> g_shift) & 0xFF
385
400
  b = (color >> b_shift) & 0xFF
386
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
387
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
388
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
401
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
402
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
403
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
389
404
  i += 1
390
405
  end
391
406
  else
@@ -395,9 +410,9 @@ module PureJPEG
395
410
  pixel = source[px, py]
396
411
  r = pixel.r; g = pixel.g; b = pixel.b
397
412
  i = row + px
398
- y_data[i] = ( 0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
399
- cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0).round.clamp(0, 255)
400
- cr_data[i] = ( 0.5 * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
413
+ y_data[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
414
+ cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
415
+ cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
401
416
  end
402
417
  end
403
418
  end
@@ -432,13 +447,16 @@ module PureJPEG
432
447
  8.times do |row|
433
448
  sy = by + row
434
449
  sy = max_y if sy > max_y
435
- src_row = sy * width
436
- row8 = row << 3
437
- 8.times do |col|
438
- sx = bx + col
439
- sx = max_x if sx > max_x
440
- block[row8 | col] = channel[src_row + sx] - 128.0
441
- end
450
+ src = sy * width
451
+ r8 = row << 3
452
+ x = bx; block[r8] = channel[src + (x > max_x ? max_x : x)] - 128
453
+ x = bx + 1; block[r8 | 1] = channel[src + (x > max_x ? max_x : x)] - 128
454
+ x = bx + 2; block[r8 | 2] = channel[src + (x > max_x ? max_x : x)] - 128
455
+ x = bx + 3; block[r8 | 3] = channel[src + (x > max_x ? max_x : x)] - 128
456
+ x = bx + 4; block[r8 | 4] = channel[src + (x > max_x ? max_x : x)] - 128
457
+ x = bx + 5; block[r8 | 5] = channel[src + (x > max_x ? max_x : x)] - 128
458
+ x = bx + 6; block[r8 | 6] = channel[src + (x > max_x ? max_x : x)] - 128
459
+ x = bx + 7; block[r8 | 7] = channel[src + (x > max_x ? max_x : x)] - 128
442
460
  end
443
461
  block
444
462
  end
@@ -3,17 +3,22 @@
3
3
  module PureJPEG
4
4
  module Huffman
5
5
  class Encoder
6
- def self.category_and_bits(value)
7
- return [0, 0] if value == 0
8
- abs_val = value.abs
6
+ # Return the Huffman category (bit length) for a value.
7
+ # Avoids Array allocation compared to the combined category_and_bits.
8
+ def self.category(value)
9
+ return 0 if value == 0
10
+ v = value.abs
9
11
  cat = 0
10
- v = abs_val
11
12
  while v > 0
12
13
  cat += 1
13
14
  v >>= 1
14
15
  end
15
- bits = value > 0 ? value : value + (1 << cat) - 1
16
- [cat, bits]
16
+ cat
17
+ end
18
+
19
+ # Return the extra bits to encode for a value with the given category.
20
+ def self.value_bits(value, cat)
21
+ value > 0 ? value : value + (1 << cat) - 1
17
22
  end
18
23
 
19
24
  def self.each_ac_item(zigzag)
@@ -39,7 +44,7 @@ module PureJPEG
39
44
  end
40
45
 
41
46
  value = zigzag[i]
42
- cat, = category_and_bits(value)
47
+ cat = category(value)
43
48
  yield (run << 4) | cat, value
44
49
  i += 1
45
50
  end
@@ -73,10 +78,10 @@ module PureJPEG
73
78
  private
74
79
 
75
80
  def encode_dc(diff, writer)
76
- cat, bits = self.class.category_and_bits(diff)
81
+ cat = self.class.category(diff)
77
82
  code, length = @dc_table[cat]
78
83
  writer.write_bits(code, length)
79
- writer.write_bits(bits, cat) if cat > 0
84
+ writer.write_bits(self.class.value_bits(diff, cat), cat) if cat > 0
80
85
  end
81
86
 
82
87
  def encode_ac(zigzag, writer)
@@ -85,8 +90,8 @@ module PureJPEG
85
90
  writer.write_bits(code, length)
86
91
  next if symbol == 0x00 || symbol == 0xF0
87
92
 
88
- cat, bits = self.class.category_and_bits(value)
89
- writer.write_bits(bits, cat)
93
+ cat = self.class.category(value)
94
+ writer.write_bits(self.class.value_bits(value, cat), cat)
90
95
  end
91
96
  end
92
97
  end
@@ -104,7 +109,7 @@ module PureJPEG
104
109
  diff = zigzag[0] - @prev_dc[state_key]
105
110
  @prev_dc[state_key] = zigzag[0]
106
111
 
107
- cat, = Encoder.category_and_bits(diff)
112
+ cat = Encoder.category(diff)
108
113
  @dc_frequencies[cat] += 1
109
114
 
110
115
  Encoder.each_ac_symbol(zigzag) do |symbol|
@@ -36,9 +36,20 @@ module PureJPEG
36
36
  }
37
37
  end
38
38
 
39
- # Quantize a 64-element DCT block in place into `out`.
39
+ # Quantize a 64-element DCT block into `out`.
40
+ # Uses integer rounding division (round-to-nearest) to match the
41
+ # behavior of Float division + round from the previous float DCT.
40
42
  def self.quantize!(block, table, out)
41
- 64.times { |i| out[i] = (block[i] / table[i]).round }
43
+ i = 0
44
+ while i < 64
45
+ v = block[i]; t = table[i]
46
+ out[i] = if v >= 0
47
+ (v + (t >> 1)) / t
48
+ else
49
+ -((-v + (t >> 1)) / t)
50
+ end
51
+ i += 1
52
+ end
42
53
  out
43
54
  end
44
55
 
@@ -27,8 +27,7 @@ module PureJPEG
27
27
  def initialize(width, height, &block)
28
28
  @width = width
29
29
  @height = height
30
- black = Pixel.new(0, 0, 0)
31
- @pixels = Array.new(width * height, black)
30
+ @pixels = Array.new(width * height) { Pixel.new(0, 0, 0) }
32
31
 
33
32
  if block
34
33
  height.times do |y|
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PureJPEG
4
- VERSION = "0.3.0"
4
+ VERSION = "0.3.2"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pure_jpeg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Cooper