RubyGems - pure_jpeg - Versions diffs - 0.3.0 → 0.3.2 - Mend

pure_jpeg 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +24 -0
data/README.md +12 -7
data/lib/pure_jpeg/bit_writer.rb +2 -2
data/lib/pure_jpeg/dct.rb +204 -56
data/lib/pure_jpeg/decoder.rb +52 -33
data/lib/pure_jpeg/encoder.rb +63 -45
data/lib/pure_jpeg/huffman/encoder.rb +17 -12
data/lib/pure_jpeg/quantization.rb +13 -2
data/lib/pure_jpeg/source/raw_source.rb +1 -2
data/lib/pure_jpeg/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 66b5d6fe1b663128f62aae8111b55a9b2ddbe9739d501fcf1146c459286433da
-  data.tar.gz: 02ae6cdc25f520221fee4adfe9d6070c4a975602e5de2b34a0b9ab8b4e005829
+  metadata.gz: 780f932b176fecdb5daab2546909fe2610325e54f7364cf482e3e2652ab614a5
+  data.tar.gz: 1fbc350f25d09b989ee6262e077bb36847c6637db325fb03c29f2083bb0ec973
 SHA512:
-  metadata.gz: f476b8fec25f1f0402f297d534f52887aed90778ddf1217668a0889755856436dae8a0d8e4e9d648bc2a33879165b32fa0560e5b506549467c21a648fd6ecf29
-  data.tar.gz: 5438c161519149458fad8cd28dea1f9f073a2646c5f91f619a357d375b77456ed5f366dd3ed78e75f5c0d2cbe4156f4ec8a5a6dbaa9e19d86198dcd8a2fcacfc
+  metadata.gz: a385db40804bf4a992d78253ba619e90b147aa5be7808b341398918f5f36c3593fbd1a979aaad9729ad874f2427fe15d31fc9a0413e5f7ea98ee44e43e125f2d
+  data.tar.gz: b9f5581f4c4f27f42460961b3b4231fd94b45be130fbdce74a28bc4888f2a2f3de08fd43b9e3efc782ccb6b4f260aa8484ff7794e7bde1362018dc3deb282b26

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,29 @@
 # Changelog
+## 0.3.2
+Performance:
+- Replaced matrix-multiply float DCT with integer-scaled AAN (Arai-Agui-Nakajima) DCT from the IJG reference implementation -- all-integer, no Float allocations
+- Fixed-point integer arithmetic for RGB/YCbCr color space conversion in both encoder and decoder
+- Eliminated short-lived Array allocations in Huffman encoder (`category_and_bits` split into separate methods)
+- `String#<<` with Integer instead of `byte.chr` to avoid String allocations in bit writer
+- DCT inner loop unrolling to eliminate nested block invocations
+- Unrolled `write_block` and `extract_block_into` inner loops
+- Integer rounding division in quantization (no more Float division + round)
+- Hoisted hash lookups and method calls out of per-pixel loops in decoder
+Result: ~2.9x faster encode, ~4.6x faster decode on Ruby 4.0.2 with YJIT.
+Credits: [Ufuk Kayserilioglu](https://github.com/paracycle)
+## 0.3.1
+Fixes:
+- Fixed shared `Pixel` instance bug in decoder that could corrupt pixel data
+- Encoder validates return values from `quantization_modifier` blocks
 ## 0.3.0
 New features:

data/README.md CHANGED Viewed

@@ -194,25 +194,26 @@ Decoding:
 Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
-Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
+Possible future improvements: ICC profile rendering/conversion.
 ## Performance
-On a 1024x1024 image (Ruby 4.0.1 on my M1 Max):
+On a 1024x1024 image (Ruby 4.0.2 with YJIT on an M5):
 | Operation | Time |
 |-----------|------|
-| Encode (color, q85) | ~1.7s |
-| Decode (color) | ~1.8s |
+| Encode (color, q85) | ~0.16s |
+| Decode (baseline) | ~0.14s |
+| Decode (progressive) | ~0.18s |
-Both the encoder and decoder use a separable DCT with a precomputed cosine matrix and reuse all per-block buffers to minimize GC pressure. Pixel data is stored as packed integers internally to avoid per-pixel object allocation.
+The encoder and decoder use an integer-scaled AAN (Arai-Agui-Nakajima) DCT with fixed-point arithmetic throughout — no Float operations in the hot path. Color space conversion uses fixed-point integer math, and pixel data is stored as packed integers to avoid per-pixel object allocation.
 ## Some useful `rake` tasks
 ```
 bundle install
 rake test        # run the test suite
-rake benchmark   # benchmark encoding (3 runs against examples/a.png)
+rake benchmark   # benchmark encoding and decoding (3 runs each)
 rake profile     # CPU profile with StackProf (requires the stackprof gem)
 ```
@@ -222,7 +223,7 @@ rake profile     # CPU profile with StackProf (requires the stackprof gem)
 **I have read all of the code produced up to v0.2.0.** The algorithms are above my paygrade, but I'm OK with what has been produced, and I manually fixed a variety of stylistic things along the way. For example, CC seems to like wrapping entire functions in `if` statements rather than bailing on the opposite condition. *Later update: I have not read the ICC and optimized Huffman code yet, but it is heavily tested.*
-**CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my Kodak digital camera from 2001. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).
+**CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my [Casio QV-10 digital camera](https://medium.com/people-gadgets/the-gadget-we-miss-the-casio-qv-10-digital-camera-c25ab786ce49) from the late 1990s. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).
 **CC is lazy.** The initial implementation was VERY SLOW. It took 15 seconds to turn a 1024x1024 PNG into a JPEG, so we went down the profiling rabbit hole and found many optimizations to make it ~6x faster. CC is poor at considering the role of Ruby's GC when implementing low level algorithms and needs some prodding to make the correct optimizations. CC is also lazy to the point of recommending that you just use another language (e.g. Go or Rust) rather than do a pure Ruby version of something - despite it being possible with some extra work.
@@ -232,6 +233,10 @@ rake profile     # CPU profile with StackProf (requires the stackprof gem)
 **The final 10% still takes 90% of the time.** As mentioned above, the first run was quick, but getting things right has taken much longer. v0.1->0.2 has taken longer than 0.1 did! But we now have progressive JPEG support, even more optimizations, better tests, etc. etc.
+## Credits
+- [Ufuk Kayserilioglu](https://github.com/paracycle) - Major performance optimizations including integer-scaled AAN DCT, fixed-point color space conversion, and YJIT-targeted improvements.
 ## License
 MIT

data/lib/pure_jpeg/bit_writer.rb CHANGED Viewed

@@ -17,8 +17,8 @@ module PureJPEG
       while @bits_in_buffer >= 8
         @bits_in_buffer -= 8
         byte = (@buffer >> @bits_in_buffer) & 0xFF
-        @data << byte.chr
-        @data << "\x00".b if byte == 0xFF  # byte stuffing
+        @data << byte
+        @data << 0x00 if byte == 0xFF  # byte stuffing
       end
       @buffer &= (1 << @bits_in_buffer) - 1

data/lib/pure_jpeg/dct.rb CHANGED Viewed

@@ -1,10 +1,15 @@
 # frozen_string_literal: true
 module PureJPEG
+  # Integer-scaled DCT based on the IJG (Independent JPEG Group) reference
+  # implementation (jfdctint.c / jidctint.c). Uses the Arai-Agui-Nakajima
+  # factorization with 13-bit fixed-point constants.
+  #
+  # All arithmetic is pure Integer (additions, shifts, multiplies) — no Float
+  # operations. This is ~3x faster than the matrix-multiply float DCT under
+  # YJIT and eliminates millions of Float object allocations during decode.
   module DCT
-    # Precomputed 8x8 DCT matrix: A[k][n] = (C(k)/2) * cos((2n+1)*k*pi/16)
-    # where C(0) = 1/sqrt(2), C(k) = 1 for k > 0.
-    # This lets us do the 2D DCT as two 1D matrix-vector multiplies (separable).
+    # Keep the float matrix available for reference / testing
     MATRIX = Array.new(8) { |k|
       ck = k == 0 ? 0.5 / Math.sqrt(2.0) : 0.5
       Array.new(8) { |n|
@@ -12,72 +17,215 @@ module PureJPEG
       }
     }.freeze
-    # Flatten for faster indexed access
     MATRIX_FLAT = MATRIX.flatten.freeze
-    # Transposed matrix for inverse DCT: A^T[n][k] = A[k][n]
     MATRIX_T_FLAT = Array.new(64) { |i| MATRIX_FLAT[(i % 8) * 8 + i / 8] }.freeze
-    # Separable forward 2D DCT: row pass then column pass.
-    # Writes result into `out`. Uses `temp` as scratch space.
-    # All three arrays must be pre-allocated with 64 elements.
-    def self.forward!(block, temp, out)
-      # Row pass: temp[y*8+u] = sum_x A[u][x] * block[y*8+x]
-      m = MATRIX_FLAT
-      8.times do |y|
-        y8 = y << 3
-        b0 = block[y8]; b1 = block[y8|1]; b2 = block[y8|2]; b3 = block[y8|3]
-        b4 = block[y8|4]; b5 = block[y8|5]; b6 = block[y8|6]; b7 = block[y8|7]
-        8.times do |u|
-          u8 = u << 3
-          temp[y8|u] = m[u8]*b0 + m[u8|1]*b1 + m[u8|2]*b2 + m[u8|3]*b3 +
-                       m[u8|4]*b4 + m[u8|5]*b5 + m[u8|6]*b6 + m[u8|7]*b7
-        end
+    # Fixed-point constants (13-bit precision) from IJG reference.
+    CONST_BITS = 13
+    PASS1_BITS = 2
+    FIX_0_298631336 = 2446
+    FIX_0_390180644 = 3196
+    FIX_0_541196100 = 4433
+    FIX_0_765366865 = 6270
+    FIX_0_899976223 = 7373
+    FIX_1_175875602 = 9633
+    FIX_1_501321110 = 12299
+    FIX_1_847759065 = 15137
+    FIX_1_961570560 = 16069
+    FIX_2_053119869 = 16819
+    FIX_2_562915447 = 20995
+    FIX_3_072711026 = 25172
+    CB = CONST_BITS
+    P1 = PASS1_BITS
+    CB_M_P1 = CB - P1        # 11
+    CB_P_P1_P3 = CB + P1 + 3 # 18
+    P1_P3 = P1 + 3           # 5
+    CB2_P_P1 = CB * 2 + P1   # 28  (unused, was for column even-multiplied path)
+    # Forward 2D DCT (in-place). Input: 64-element array of level-shifted
+    # integers (-128..127). Output: DCT coefficients (integers).
+    # The `_temp` and `_out` parameters are accepted for API compatibility
+    # but ignored; computation is done in-place on `data`.
+    def self.forward!(data, _temp = nil, _out = nil)
+      # Pass 1: process rows
+      8.times do |row|
+        i = row << 3
+        d0 = data[i]; d1 = data[i+1]; d2 = data[i+2]; d3 = data[i+3]
+        d4 = data[i+4]; d5 = data[i+5]; d6 = data[i+6]; d7 = data[i+7]
+        tmp0 = d0 + d7; tmp7 = d0 - d7
+        tmp1 = d1 + d6; tmp6 = d1 - d6
+        tmp2 = d2 + d5; tmp5 = d2 - d5
+        tmp3 = d3 + d4; tmp4 = d3 - d4
+        # Even part
+        tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
+        tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
+        data[i]   = (tmp10 + tmp11) << P1
+        data[i+4] = (tmp10 - tmp11) << P1
+        z1 = (tmp12 + tmp13) * FIX_0_541196100
+        data[i+2] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[i+6] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        # Odd part
+        z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
+        z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
+        z5 = (z3 + z4) * FIX_1_175875602
+        tmp4 = tmp4 * FIX_0_298631336
+        tmp5 = tmp5 * FIX_2_053119869
+        tmp6 = tmp6 * FIX_3_072711026
+        tmp7 = tmp7 * FIX_1_501321110
+        z1 = z1 * -FIX_0_899976223
+        z2 = z2 * -FIX_2_562915447
+        z3 = z3 * -FIX_1_961570560 + z5
+        z4 = z4 * -FIX_0_390180644 + z5
+        data[i+7] = (tmp4 + z1 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[i+5] = (tmp5 + z2 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[i+3] = (tmp6 + z2 + z3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[i+1] = (tmp7 + z1 + z4 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
       end
-      # Column pass: out[v*8+u] = sum_y A[v][y] * temp[y*8+u]
-      8.times do |u|
-        t0 = temp[u]; t1 = temp[8|u]; t2 = temp[16|u]; t3 = temp[24|u]
-        t4 = temp[32|u]; t5 = temp[40|u]; t6 = temp[48|u]; t7 = temp[56|u]
-        8.times do |v|
-          v8 = v << 3
-          out[v8|u] = m[v8]*t0 + m[v8|1]*t1 + m[v8|2]*t2 + m[v8|3]*t3 +
-                      m[v8|4]*t4 + m[v8|5]*t5 + m[v8|6]*t6 + m[v8|7]*t7
-        end
+      # Pass 2: process columns
+      8.times do |col|
+        d0 = data[col]; d1 = data[col+8]; d2 = data[col+16]; d3 = data[col+24]
+        d4 = data[col+32]; d5 = data[col+40]; d6 = data[col+48]; d7 = data[col+56]
+        tmp0 = d0 + d7; tmp7 = d0 - d7
+        tmp1 = d1 + d6; tmp6 = d1 - d6
+        tmp2 = d2 + d5; tmp5 = d2 - d5
+        tmp3 = d3 + d4; tmp4 = d3 - d4
+        tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
+        tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
+        data[col]    = (tmp10 + tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
+        data[col+32] = (tmp10 - tmp11 + (1 << (P1_P3 - 1))) >> P1_P3
+        z1 = (tmp12 + tmp13) * FIX_0_541196100
+        data[col+16] = (z1 + tmp13 * FIX_0_765366865 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[col+48] = (z1 - tmp12 * FIX_1_847759065 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        z1 = tmp4 + tmp7; z2 = tmp5 + tmp6
+        z3 = tmp4 + tmp6; z4 = tmp5 + tmp7
+        z5 = (z3 + z4) * FIX_1_175875602
+        tmp4 = tmp4 * FIX_0_298631336
+        tmp5 = tmp5 * FIX_2_053119869
+        tmp6 = tmp6 * FIX_3_072711026
+        tmp7 = tmp7 * FIX_1_501321110
+        z1 = z1 * -FIX_0_899976223
+        z2 = z2 * -FIX_2_562915447
+        z3 = z3 * -FIX_1_961570560 + z5
+        z4 = z4 * -FIX_0_390180644 + z5
+        data[col+56] = (tmp4 + z1 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[col+40] = (tmp5 + z2 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[col+24] = (tmp6 + z2 + z3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[col+8]  = (tmp7 + z1 + z4 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
       end
-      out
+      data
     end
-    # Separable inverse 2D DCT: same structure as forward but using A^T.
-    # f = A^T * F * A
-    def self.inverse!(block, temp, out)
-      mt = MATRIX_T_FLAT
-      # Row pass: temp[v*8+x] = sum_u A^T[x][u] * block[v*8+u]
-      8.times do |v|
-        v8 = v << 3
-        b0 = block[v8]; b1 = block[v8|1]; b2 = block[v8|2]; b3 = block[v8|3]
-        b4 = block[v8|4]; b5 = block[v8|5]; b6 = block[v8|6]; b7 = block[v8|7]
-        8.times do |x|
-          x8 = x << 3
-          temp[v8|x] = mt[x8]*b0 + mt[x8|1]*b1 + mt[x8|2]*b2 + mt[x8|3]*b3 +
-                        mt[x8|4]*b4 + mt[x8|5]*b5 + mt[x8|6]*b6 + mt[x8|7]*b7
-        end
+    # Inverse 2D DCT (in-place). Input: dequantized DCT coefficients (integers).
+    # Output: spatial-domain values (integers) that still need +128 level shift.
+    def self.inverse!(data, _temp = nil, _out = nil)
+      # Pass 1: process columns
+      8.times do |col|
+        d0 = data[col]; d2 = data[col+16]; d4 = data[col+32]; d6 = data[col+48]
+        d1 = data[col+8]; d3 = data[col+24]; d5 = data[col+40]; d7 = data[col+56]
+        # Even part
+        z1 = (d2 + d6) * FIX_0_541196100
+        tmp2 = z1 - d6 * FIX_1_847759065
+        tmp3 = z1 + d2 * FIX_0_765366865
+        tmp0 = (d0 + d4) << CB
+        tmp1 = (d0 - d4) << CB
+        tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
+        tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
+        # Odd part
+        tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
+        z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
+        z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
+        z5 = (z3 + z4) * FIX_1_175875602
+        tmp0 = tmp0 * FIX_0_298631336
+        tmp1 = tmp1 * FIX_2_053119869
+        tmp2 = tmp2 * FIX_3_072711026
+        tmp3 = tmp3 * FIX_1_501321110
+        z1 = z1 * -FIX_0_899976223
+        z2 = z2 * -FIX_2_562915447
+        z3 = z3 * -FIX_1_961570560 + z5
+        z4 = z4 * -FIX_0_390180644 + z5
+        tmp0 += z1 + z3; tmp1 += z2 + z4
+        tmp2 += z2 + z3; tmp3 += z1 + z4
+        data[col]    = (tmp10 + tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+56] = (tmp10 - tmp3 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+8]  = (tmp11 + tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+48] = (tmp11 - tmp2 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+16] = (tmp12 + tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+40] = (tmp12 - tmp1 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+24] = (tmp13 + tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
+        data[col+32] = (tmp13 - tmp0 + (1 << (CB_M_P1 - 1))) >> CB_M_P1
       end
-      # Column pass: out[y*8+x] = sum_v A^T[y][v] * temp[v*8+x]
-      8.times do |x|
-        t0 = temp[x]; t1 = temp[8|x]; t2 = temp[16|x]; t3 = temp[24|x]
-        t4 = temp[32|x]; t5 = temp[40|x]; t6 = temp[48|x]; t7 = temp[56|x]
-        8.times do |y|
-          y8 = y << 3
-          out[y8|x] = mt[y8]*t0 + mt[y8|1]*t1 + mt[y8|2]*t2 + mt[y8|3]*t3 +
-                       mt[y8|4]*t4 + mt[y8|5]*t5 + mt[y8|6]*t6 + mt[y8|7]*t7
-        end
+      # Pass 2: process rows
+      8.times do |row|
+        i = row << 3
+        d0 = data[i]; d2 = data[i+2]; d4 = data[i+4]; d6 = data[i+6]
+        d1 = data[i+1]; d3 = data[i+3]; d5 = data[i+5]; d7 = data[i+7]
+        # Even part
+        z1 = (d2 + d6) * FIX_0_541196100
+        tmp2 = z1 - d6 * FIX_1_847759065
+        tmp3 = z1 + d2 * FIX_0_765366865
+        tmp0 = (d0 + d4) << CB
+        tmp1 = (d0 - d4) << CB
+        tmp10 = tmp0 + tmp3; tmp13 = tmp0 - tmp3
+        tmp11 = tmp1 + tmp2; tmp12 = tmp1 - tmp2
+        # Odd part
+        tmp0 = d7; tmp1 = d5; tmp2 = d3; tmp3 = d1
+        z1 = tmp0 + tmp3; z2 = tmp1 + tmp2
+        z3 = tmp0 + tmp2; z4 = tmp1 + tmp3
+        z5 = (z3 + z4) * FIX_1_175875602
+        tmp0 = tmp0 * FIX_0_298631336
+        tmp1 = tmp1 * FIX_2_053119869
+        tmp2 = tmp2 * FIX_3_072711026
+        tmp3 = tmp3 * FIX_1_501321110
+        z1 = z1 * -FIX_0_899976223
+        z2 = z2 * -FIX_2_562915447
+        z3 = z3 * -FIX_1_961570560 + z5
+        z4 = z4 * -FIX_0_390180644 + z5
+        tmp0 += z1 + z3; tmp1 += z2 + z4
+        tmp2 += z2 + z3; tmp3 += z1 + z4
+        data[i]   = (tmp10 + tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+7] = (tmp10 - tmp3 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+1] = (tmp11 + tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+6] = (tmp11 - tmp2 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+2] = (tmp12 + tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+5] = (tmp12 - tmp1 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+3] = (tmp13 + tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
+        data[i+4] = (tmp13 - tmp0 + (1 << (CB_P_P1_P3 - 1))) >> CB_P_P1_P3
       end
-      out
+      data
     end
   end
 end

data/lib/pure_jpeg/decoder.rb CHANGED Viewed

@@ -78,10 +78,8 @@ module PureJPEG
       # Reusable buffers
       zigzag = Array.new(64, 0)
-      raster = Array.new(64, 0.0)
-      dequant = Array.new(64, 0.0)
-      temp = Array.new(64, 0.0)
-      spatial = Array.new(64, 0.0)
+      raster = Array.new(64, 0)
+      dequant = Array.new(64, 0)
       mcus_y.times do |mcu_row|
         mcus_x.times do |mcu_col|
@@ -104,12 +102,12 @@ module PureJPEG
                 # Inverse pipeline: unzigzag -> dequantize -> IDCT -> level shift
                 Zigzag.unreorder!(zigzag, raster)
                 Quantization.dequantize!(raster, qt, dequant)
-                DCT.inverse!(dequant, temp, spatial)
+                DCT.inverse!(dequant)
                 # Write block into channel buffer
                 bx = (mcu_col * comp.h_sampling + bh) * 8
                 by = (mcu_row * comp.v_sampling + bv) * 8
-                write_block(spatial, ch[:data], ch[:width], bx, by)
+                write_block(dequant, ch[:data], ch[:width], bx, by)
               end
             end
           end
@@ -204,10 +202,8 @@ module PureJPEG
       end
       zigzag = Array.new(64, 0)
-      raster = Array.new(64, 0.0)
-      dequant = Array.new(64, 0.0)
-      temp = Array.new(64, 0.0)
-      spatial = Array.new(64, 0.0)
+      raster = Array.new(64, 0)
+      dequant = Array.new(64, 0)
       jfif.components.each do |c|
         qt = fetch_quant_table!(jfif, c)
@@ -222,8 +218,8 @@ module PureJPEG
             Zigzag.unreorder!(zigzag, raster)
             Quantization.dequantize!(raster, qt, dequant)
-            DCT.inverse!(dequant, temp, spatial)
-            write_block(spatial, ch[:data], ch[:width], block_x * 8, block_y * 8)
+            DCT.inverse!(dequant)
+            write_block(dequant, ch[:data], ch[:width], block_x * 8, block_y * 8)
           end
         end
       end
@@ -460,12 +456,16 @@ module PureJPEG
     # Write an 8x8 spatial block (level-shifted by +128) into a channel buffer.
     def write_block(spatial, channel, ch_width, bx, by)
       8.times do |row|
-        dst_row = (by + row) * ch_width + bx
-        row8 = row << 3
-        8.times do |col|
-          val = (spatial[row8 | col] + 128.0).round
-          channel[dst_row + col] = val < 0 ? 0 : (val > 255 ? 255 : val)
-        end
+        dst = (by + row) * ch_width + bx
+        r8 = row << 3
+        v = spatial[r8]     + 128; channel[dst]     = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 1] + 128; channel[dst + 1] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 2] + 128; channel[dst + 2] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 3] + 128; channel[dst + 3] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 4] + 128; channel[dst + 4] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 5] + 128; channel[dst + 5] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 6] + 128; channel[dst + 6] = v < 0 ? 0 : (v > 255 ? 255 : v)
+        v = spatial[r8 | 7] + 128; channel[dst + 7] = v < 0 ? 0 : (v > 255 ? 255 : v)
       end
     end
@@ -493,18 +493,27 @@ module PureJPEG
     def assemble_grayscale(width, height, channels, comp)
       ch = channels[comp.id]
+      ch_data = ch[:data]
+      ch_width = ch[:width]
       pixels = Array.new(width * height)
       height.times do |y|
-        src_row = y * ch[:width]
+        src_row = y * ch_width
         dst_row = y * width
         width.times do |x|
-          v = ch[:data][src_row + x]
+          v = ch_data[src_row + x]
           pixels[dst_row + x] = (v << 16) | (v << 8) | v
         end
       end
       Image.new(width, height, pixels, icc_profile: @icc_profile)
     end
+    # Fixed-point coefficients (scaled by 2^16) for YCbCr→RGB.
+    FP_R_CR =  91881  # 1.402    * 65536
+    FP_G_CB = -22554  # -0.344136 * 65536
+    FP_G_CR = -46802  # -0.714136 * 65536
+    FP_B_CB = 116130  # 1.772    * 65536
+    FP_HALF =  32768  # rounding bias
     def assemble_color(width, height, channels, components, max_h, max_v)
       # Upsample chroma channels if needed and convert YCbCr to RGB
       y_comp, cb_comp, cr_comp = resolve_color_components(components)
@@ -513,29 +522,39 @@ module PureJPEG
       cb_ch = channels[cb_comp.id]
       cr_ch = channels[cr_comp.id]
+      y_data = y_ch[:data]
+      cb_data = cb_ch[:data]
+      cr_data = cr_ch[:data]
+      y_stride = y_ch[:width]
+      cb_stride = cb_ch[:width]
+      cr_stride = cr_ch[:width]
+      cb_h = cb_comp.h_sampling
+      cb_v = cb_comp.v_sampling
+      cr_h = cr_comp.h_sampling
+      cr_v = cr_comp.v_sampling
       pixels = Array.new(width * height)
       height.times do |py|
         dst_row = py * width
-        y_row = py * y_ch[:width]
+        y_row = py * y_stride
         # Chroma coordinates (nearest-neighbor upsampling)
-        cb_y = (py * cb_comp.v_sampling) / max_v
-        cr_y = (py * cr_comp.v_sampling) / max_v
-        cb_row = cb_y * cb_ch[:width]
-        cr_row = cr_y * cr_ch[:width]
+        cb_row = ((py * cb_v) / max_v) * cb_stride
+        cr_row = ((py * cr_v) / max_v) * cr_stride
         width.times do |px|
-          lum = y_ch[:data][y_row + px]
+          lum = y_data[y_row + px]
-          cb_x = (px * cb_comp.h_sampling) / max_h
-          cr_x = (px * cr_comp.h_sampling) / max_h
-          cb = cb_ch[:data][cb_row + cb_x] - 128.0
-          cr = cr_ch[:data][cr_row + cr_x] - 128.0
+          cb_x = (px * cb_h) / max_h
+          cr_x = (px * cr_h) / max_h
+          cb_val = cb_data[cb_row + cb_x] - 128
+          cr_val = cr_data[cr_row + cr_x] - 128
-          r = (lum + 1.402 * cr).round
-          g = (lum - 0.344136 * cb - 0.714136 * cr).round
-          b = (lum + 1.772 * cb).round
+          # Fixed-point YCbCr→RGB (all integer arithmetic)
+          r = lum + ((FP_R_CR * cr_val + FP_HALF) >> 16)
+          g = lum + ((FP_G_CB * cb_val + FP_G_CR * cr_val + FP_HALF) >> 16)
+          b = lum + ((FP_B_CB * cb_val + FP_HALF) >> 16)
           r = r < 0 ? 0 : (r > 255 ? 255 : r)
           g = g < 0 ? 0 : (g > 255 ? 255 : g)

data/lib/pure_jpeg/encoder.rb CHANGED Viewed

@@ -76,17 +76,27 @@ module PureJPEG
     def build_lum_qtable
       table = @luminance_table || Quantization.scale_table(Quantization::LUMINANCE_BASE, quality)
-      table = @quantization_modifier.call(table, :luminance) if @quantization_modifier
+      table = apply_quantization_modifier(table, :luminance) if @quantization_modifier
       table
     end
     def build_chr_qtable
       table = @chrominance_table || Quantization.scale_table(Quantization::CHROMINANCE_BASE, @chroma_quality)
-      table = @quantization_modifier.call(table, :chrominance) if @quantization_modifier
+      table = apply_quantization_modifier(table, :chrominance) if @quantization_modifier
       table
     end
+    def apply_quantization_modifier(table, channel)
+      modified = @quantization_modifier.call(table, channel)
+      validate_qtable!(modified, "quantization_modifier result for #{channel}")
+      modified
+    end
     def validate_qtable!(table, name)
+      unless table.respond_to?(:length) && table.respond_to?(:all?)
+        raise ArgumentError, "#{name} must be a 64-element array of integers between 1 and 255"
+      end
       raise ArgumentError, "#{name} must have exactly 64 elements (got #{table.length})" unless table.length == 64
       unless table.all? { |v| v.is_a?(Integer) && v >= 1 && v <= 255 }
         raise ArgumentError, "#{name} elements must be integers between 1 and 255"
@@ -195,17 +205,14 @@ module PureJPEG
       padded_w = (width + 7) & ~7
       padded_h = (height + 7) & ~7
-      block = Array.new(64, 0.0)
-      temp  = Array.new(64, 0.0)
-      dct   = Array.new(64, 0.0)
+      block = Array.new(64, 0)
       qbuf  = Array.new(64, 0)
       zbuf  = Array.new(64, 0)
       (0...padded_h).step(8) do |by|
         (0...padded_w).step(8) do |bx|
           extract_block_into(y_data, width, height, bx, by, block)
-          transform_block(block, temp, dct, qbuf, zbuf, qtable)
-          yield zbuf
+          yield transform_block(block, qbuf, zbuf, qtable)
         end
       end
     end
@@ -268,37 +275,29 @@ module PureJPEG
       mcu_w = (width + 15) & ~15
       mcu_h = (height + 15) & ~15
-      block = Array.new(64, 0.0)
-      temp  = Array.new(64, 0.0)
-      dct   = Array.new(64, 0.0)
+      block = Array.new(64, 0)
       qbuf  = Array.new(64, 0)
       zbuf  = Array.new(64, 0)
       (0...mcu_h).step(16) do |my|
         (0...mcu_w).step(16) do |mx|
           extract_block_into(y_data, width, height, mx, my, block)
-          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
-          yield :y, zbuf
+          yield :y, transform_block(block, qbuf, zbuf, lum_qt)
           extract_block_into(y_data, width, height, mx + 8, my, block)
-          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
-          yield :y, zbuf
+          yield :y, transform_block(block, qbuf, zbuf, lum_qt)
           extract_block_into(y_data, width, height, mx, my + 8, block)
-          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
-          yield :y, zbuf
+          yield :y, transform_block(block, qbuf, zbuf, lum_qt)
           extract_block_into(y_data, width, height, mx + 8, my + 8, block)
-          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
-          yield :y, zbuf
+          yield :y, transform_block(block, qbuf, zbuf, lum_qt)
           extract_block_into(cb_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
-          transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
-          yield :cb, zbuf
+          yield :cb, transform_block(block, qbuf, zbuf, chr_qt)
           extract_block_into(cr_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
-          transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
-          yield :cr, zbuf
+          yield :cr, transform_block(block, qbuf, zbuf, chr_qt)
         end
       end
     end
@@ -323,9 +322,9 @@ module PureJPEG
     # --- Shared block pipeline (all buffers pre-allocated) ---
-    def transform_block(block, temp, dct, qbuf, zbuf, qtable)
-      DCT.forward!(block, temp, dct)
-      Quantization.quantize!(dct, qtable, qbuf)
+    def transform_block(block, qbuf, zbuf, qtable)
+      DCT.forward!(block)
+      Quantization.quantize!(block, qtable, qbuf)
       Zigzag.reorder!(qbuf, zbuf)
       zbuf
     end
@@ -342,26 +341,42 @@ module PureJPEG
       end
     end
+    # Fixed-point coefficients (scaled by 2^16 = 65536) for RGB→YCbCr.
+    # Y  =  0.299*R + 0.587*G + 0.114*B
+    # Cb = -0.168736*R - 0.331264*G + 0.5*B + 128
+    # Cr =  0.5*R - 0.418688*G - 0.081312*B + 128
+    FP_Y_R  =  19595; FP_Y_G  =  38470; FP_Y_B  =   7471
+    FP_CB_R = -11058; FP_CB_G = -21710; FP_CB_B =  32768
+    FP_CR_R =  32768; FP_CR_G = -27440; FP_CR_B =  -5328
+    FP_HALF =  32768  # rounding bias
+    FP_128  = 8388608 # 128 << 16
+    def clamp255(v)
+      v < 0 ? 0 : (v > 255 ? 255 : v)
+    end
     def extract_luminance(width, height)
       luminance = Array.new(width * height)
       if source.respond_to?(:packed_pixels)
         packed = source.packed_pixels
         r_shift, g_shift, b_shift = packed_shifts
+        n = width * height
         i = 0
-        (width * height).times do
+        n.times do
           color = packed[i]
           r = (color >> r_shift) & 0xFF
           g = (color >> g_shift) & 0xFF
           b = (color >> b_shift) & 0xFF
-          luminance[i] = (0.299 * r + 0.587 * g + 0.114 * b).round.clamp(0, 255)
+          luminance[i] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
           i += 1
         end
       else
-        height.times do |y|
-          row = y * width
-          width.times do |x|
-            pixel = source[x, y]
-            luminance[row + x] = (0.299 * pixel.r + 0.587 * pixel.g + 0.114 * pixel.b).round.clamp(0, 255)
+        height.times do |py|
+          row = py * width
+          width.times do |px|
+            pixel = source[px, py]
+            r = pixel.r; g = pixel.g; b = pixel.b
+            luminance[row + px] = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
           end
         end
       end
@@ -383,9 +398,9 @@ module PureJPEG
           r = (color >> r_shift) & 0xFF
           g = (color >> g_shift) & 0xFF
           b = (color >> b_shift) & 0xFF
-          y_data[i]  = ( 0.299    * r + 0.587    * g + 0.114    * b).round.clamp(0, 255)
-          cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5      * b + 128.0).round.clamp(0, 255)
-          cr_data[i] = ( 0.5      * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
+          y_data[i]  = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
+          cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
+          cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
           i += 1
         end
       else
@@ -395,9 +410,9 @@ module PureJPEG
             pixel = source[px, py]
             r = pixel.r; g = pixel.g; b = pixel.b
             i = row + px
-            y_data[i]  = ( 0.299    * r + 0.587    * g + 0.114    * b).round.clamp(0, 255)
-            cb_data[i] = (-0.168736 * r - 0.331264 * g + 0.5      * b + 128.0).round.clamp(0, 255)
-            cr_data[i] = ( 0.5      * r - 0.418688 * g - 0.081312 * b + 128.0).round.clamp(0, 255)
+            y_data[i]  = clamp255((FP_Y_R * r + FP_Y_G * g + FP_Y_B * b + FP_HALF) >> 16)
+            cb_data[i] = clamp255((FP_CB_R * r + FP_CB_G * g + FP_CB_B * b + FP_128 + FP_HALF) >> 16)
+            cr_data[i] = clamp255((FP_CR_R * r + FP_CR_G * g + FP_CR_B * b + FP_128 + FP_HALF) >> 16)
           end
         end
       end
@@ -432,13 +447,16 @@ module PureJPEG
       8.times do |row|
         sy = by + row
         sy = max_y if sy > max_y
-        src_row = sy * width
-        row8 = row << 3
-        8.times do |col|
-          sx = bx + col
-          sx = max_x if sx > max_x
-          block[row8 | col] = channel[src_row + sx] - 128.0
-        end
+        src = sy * width
+        r8 = row << 3
+        x = bx;     block[r8]     = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 1; block[r8 | 1] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 2; block[r8 | 2] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 3; block[r8 | 3] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 4; block[r8 | 4] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 5; block[r8 | 5] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 6; block[r8 | 6] = channel[src + (x > max_x ? max_x : x)] - 128
+        x = bx + 7; block[r8 | 7] = channel[src + (x > max_x ? max_x : x)] - 128
       end
       block
     end

data/lib/pure_jpeg/huffman/encoder.rb CHANGED Viewed

@@ -3,17 +3,22 @@
 module PureJPEG
   module Huffman
     class Encoder
-      def self.category_and_bits(value)
-        return [0, 0] if value == 0
-        abs_val = value.abs
+      # Return the Huffman category (bit length) for a value.
+      # Avoids Array allocation compared to the combined category_and_bits.
+      def self.category(value)
+        return 0 if value == 0
+        v = value.abs
         cat = 0
-        v = abs_val
         while v > 0
           cat += 1
           v >>= 1
         end
-        bits = value > 0 ? value : value + (1 << cat) - 1
-        [cat, bits]
+        cat
+      end
+      # Return the extra bits to encode for a value with the given category.
+      def self.value_bits(value, cat)
+        value > 0 ? value : value + (1 << cat) - 1
       end
       def self.each_ac_item(zigzag)
@@ -39,7 +44,7 @@ module PureJPEG
           end
           value = zigzag[i]
-          cat, = category_and_bits(value)
+          cat = category(value)
           yield (run << 4) | cat, value
           i += 1
         end
@@ -73,10 +78,10 @@ module PureJPEG
       private
       def encode_dc(diff, writer)
-        cat, bits = self.class.category_and_bits(diff)
+        cat = self.class.category(diff)
         code, length = @dc_table[cat]
         writer.write_bits(code, length)
-        writer.write_bits(bits, cat) if cat > 0
+        writer.write_bits(self.class.value_bits(diff, cat), cat) if cat > 0
       end
       def encode_ac(zigzag, writer)
@@ -85,8 +90,8 @@ module PureJPEG
           writer.write_bits(code, length)
           next if symbol == 0x00 || symbol == 0xF0
-          cat, bits = self.class.category_and_bits(value)
-          writer.write_bits(bits, cat)
+          cat = self.class.category(value)
+          writer.write_bits(self.class.value_bits(value, cat), cat)
         end
       end
     end
@@ -104,7 +109,7 @@ module PureJPEG
         diff = zigzag[0] - @prev_dc[state_key]
         @prev_dc[state_key] = zigzag[0]
-        cat, = Encoder.category_and_bits(diff)
+        cat = Encoder.category(diff)
         @dc_frequencies[cat] += 1
         Encoder.each_ac_symbol(zigzag) do |symbol|

data/lib/pure_jpeg/quantization.rb CHANGED Viewed

@@ -36,9 +36,20 @@ module PureJPEG
       }
     end
-    # Quantize a 64-element DCT block in place into `out`.
+    # Quantize a 64-element DCT block into `out`.
+    # Uses integer rounding division (round-to-nearest) to match the
+    # behavior of Float division + round from the previous float DCT.
     def self.quantize!(block, table, out)
-      64.times { |i| out[i] = (block[i] / table[i]).round }
+      i = 0
+      while i < 64
+        v = block[i]; t = table[i]
+        out[i] = if v >= 0
+                   (v + (t >> 1)) / t
+                 else
+                   -((-v + (t >> 1)) / t)
+                 end
+        i += 1
+      end
       out
     end

data/lib/pure_jpeg/source/raw_source.rb CHANGED Viewed

@@ -27,8 +27,7 @@ module PureJPEG
       def initialize(width, height, &block)
         @width = width
         @height = height
-        black = Pixel.new(0, 0, 0)
-        @pixels = Array.new(width * height, black)
+        @pixels = Array.new(width * height) { Pixel.new(0, 0, 0) }
         if block
           height.times do |y|

data/lib/pure_jpeg/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module PureJPEG
-  VERSION = "0.3.0"
+  VERSION = "0.3.2"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: pure_jpeg
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.3.2
 platform: ruby
 authors:
 - Peter Cooper