RubyGems - pure_jpeg - Versions diffs - 0.2.0 → 0.3.0 - Mend

pure_jpeg 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +16 -0
data/README.md +18 -4
data/lib/pure_jpeg/decoder.rb +55 -19
data/lib/pure_jpeg/encoder.rb +156 -56
data/lib/pure_jpeg/huffman/encoder.rb +73 -45
data/lib/pure_jpeg/huffman/tables.rb +91 -0
data/lib/pure_jpeg/image.rb +6 -1
data/lib/pure_jpeg/info.rb +6 -0
data/lib/pure_jpeg/jfif_reader.rb +32 -3
data/lib/pure_jpeg/version.rb +1 -1
data/lib/pure_jpeg.rb +27 -0
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6eacb8a616f95a52625f6f5acb3c8c137306c5dcf3636e93e2e715287b429655
-  data.tar.gz: dade5c2d3b9603bb7089635977ea2e38827da1a3912c820152367f3ca643c5f9
+  metadata.gz: 66b5d6fe1b663128f62aae8111b55a9b2ddbe9739d501fcf1146c459286433da
+  data.tar.gz: 02ae6cdc25f520221fee4adfe9d6070c4a975602e5de2b34a0b9ab8b4e005829
 SHA512:
-  metadata.gz: ccf7a06b88c08f14ca70d944ddcc753795217e3bfbca00874484ee6f3c2a360d8cede768c4469f5b92d2789bf7bcc71d61b22f67d76fb98774af28f88132c248
-  data.tar.gz: d868a5d4f7db3b20a504bc09f9908169fd35c269b511e619e083957fd6b4bf90a3cf22985a5e6d81289a02306ba75677b94b8ff9f6d741acc7b2047979d0f2c6
+  metadata.gz: f476b8fec25f1f0402f297d534f52887aed90778ddf1217668a0889755856436dae8a0d8e4e9d648bc2a33879165b32fa0560e5b506549467c21a648fd6ecf29
+  data.tar.gz: 5438c161519149458fad8cd28dea1f9f073a2646c5f91f619a357d375b77456ed5f366dd3ed78e75f5c0d2cbe4156f4ec8a5a6dbaa9e19d86198dcd8a2fcacfc

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,21 @@
 # Changelog
+## 0.3.0
+New features:
+- `PureJPEG.info` for reading dimensions and metadata without full decode
+- ICC color profile extraction (available on `Info` and `Image`)
+- Optional image-specific optimized Huffman tables (`optimize_huffman: true`)
+Fixes:
+- Decoder validates Huffman table, quantization table, and component references with clear error messages
+- Color decoding looks up Y/Cb/Cr components by ID instead of assuming SOF array order
+- Support for non-standard component IDs (e.g. 0, 1, 2 as used by some Adobe tools)
+- Explicit error for unsupported component counts (e.g. CMYK)
+- Encoder no longer holds file handle open during encoding
 ## 0.2.0
 New features:

data/README.md CHANGED Viewed

@@ -79,7 +79,8 @@ PureJPEG.encode(source,
   luminance_table: nil,           # custom 64-element quantization table for Y
   chrominance_table: nil,         # custom 64-element quantization table for Cb/Cr
   quantization_modifier: nil,     # proc(table, :luminance/:chrominance) -> modified table
-  scramble_quantization: false    # intentionally misordered quant tables (creative effect)
+  scramble_quantization: false,   # intentionally misordered quant tables (creative effect)
+  optimize_huffman: false         # slower 2-pass encode, usually smaller files
 )
 ```
@@ -135,6 +136,16 @@ pixel.b  # => 97
 image = PureJPEG.read(jpeg_bytes)
 ```
+### Read dimensions and metadata only
+```ruby
+info = PureJPEG.info("photo.jpg")
+info.width           # => 1024
+info.height          # => 768
+info.component_count # => 3
+info.progressive     # => false
+```
 ### Iterating pixels
 ```ruby
@@ -171,7 +182,8 @@ Encoding:
 - 8-bit precision
 - Grayscale (1 component) and YCbCr color (3 components)
 - 4:2:0 chroma subsampling (color) or no subsampling (grayscale)
-- Standard Huffman tables (Annex K)
+- Standard Huffman tables (Annex K) by default
+- Optional image-specific optimized Huffman tables
 Decoding:
 - Baseline DCT (SOF0) and Progressive DCT (SOF2)
@@ -182,6 +194,8 @@ Decoding:
 Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
+Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
 ## Performance
 On a 1024x1024 image (Ruby 4.0.1 on my M1 Max):
@@ -204,9 +218,9 @@ rake profile     # CPU profile with StackProf (requires the stackprof gem)
 ## AI Disclosure
-**Claude Code did the majority of the work.** The math of JPEG encoding/decoding is beyond me, except 'getting it' at a high level. I understand it like I understand the engine in my car :-)
+**Claude Code did the majority of the work.** The math of JPEG encoding/decoding is beyond me, except 'getting it' at a high level. I understand it like I understand the engine in my car :-) *Later update: OpenAI Codex is also reviewing and adding features now. It feels stronger in many areas.*
-**I have read all of the code produced.** The algorithms are above my paygrade, but I'm OK with what has been produced, and I manually fixed a variety of stylistic things along the way. For example, CC seems to like wrapping entire functions in `if` statements rather than bailing on the opposite condition.
+**I have read all of the code produced up to v0.2.0.** The algorithms are above my paygrade, but I'm OK with what has been produced, and I manually fixed a variety of stylistic things along the way. For example, CC seems to like wrapping entire functions in `if` statements rather than bailing on the opposite condition. *Later update: I have not read the ICC and optimized Huffman code yet, but it is heavily tested.*
 **CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my Kodak digital camera from 2001. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).

data/lib/pure_jpeg/decoder.rb CHANGED Viewed

@@ -27,6 +27,8 @@ module PureJPEG
     def decode
       jfif = JFIFReader.new(@data)
+      @icc_profile = jfif.icc_profile
+      validate_dimensions!(jfif.width, jfif.height)
       return decode_progressive(jfif) if jfif.progressive
       width = jfif.width
@@ -90,10 +92,8 @@ module PureJPEG
           end
           jfif.scan_components.each do |sc|
-            comp = comp_info[sc.id]
-            dc_tab = dc_tables[sc.dc_table_id]
-            ac_tab = ac_tables[sc.ac_table_id]
-            qt = jfif.quant_tables[comp.qt_id]
+            comp, dc_tab, ac_tab = resolve_scan_references!(sc, comp_info, dc_tables, ac_tables)
+            qt = fetch_quant_table!(jfif, comp)
             ch = channels[comp.id]
             comp.v_sampling.times do |bv|
@@ -122,13 +122,20 @@ module PureJPEG
       num_components = jfif.components.length
       if num_components == 1
         assemble_grayscale(width, height, channels, jfif.components[0])
-      else
+      elsif num_components == 3
         assemble_color(width, height, channels, jfif.components, max_h, max_v)
+      else
+        raise DecodeError, "Unsupported number of components: #{num_components}"
       end
     end
     private
+    def validate_dimensions!(width, height)
+      raise DecodeError, "Invalid image dimensions: #{width}x#{height}" if width <= 0 || height <= 0
+      raise DecodeError, "Image too large: #{width}x#{height} (max #{MAX_DIMENSION}x#{MAX_DIMENSION})" if width > MAX_DIMENSION || height > MAX_DIMENSION
+    end
     # --- Progressive JPEG decoding ---
     def decode_progressive(jfif)
@@ -203,7 +210,7 @@ module PureJPEG
       spatial = Array.new(64, 0.0)
       jfif.components.each do |c|
-        qt = jfif.quant_tables[c.qt_id]
+        qt = fetch_quant_table!(jfif, c)
         ch = channels[c.id]
         coeff_buf = coeffs[c.id]
         bx_count, by_count = comp_blocks[c.id]
@@ -224,17 +231,17 @@ module PureJPEG
       num_components = jfif.components.length
       if num_components == 1
         assemble_grayscale(width, height, channels, jfif.components[0])
-      else
+      elsif num_components == 3
         assemble_color(width, height, channels, jfif.components, max_h, max_v)
+      else
+        raise DecodeError, "Unsupported number of components: #{num_components}"
       end
     end
     def prog_scan_non_interleaved(reader, scan, comp_info, dc_tables, ac_tables,
                                   coeffs, comp_blocks, restart_interval, ss, se, ah, al)
       sc = scan.components[0]
-      comp = comp_info[sc.id]
-      dc_tab = dc_tables[sc.dc_table_id]
-      ac_tab = ac_tables[sc.ac_table_id]
+      comp, dc_tab, ac_tab = resolve_scan_references!(sc, comp_info, dc_tables, ac_tables, require_ac: ss > 0)
       coeff_buf = coeffs[comp.id]
       bx_count, by_count = comp_blocks[comp.id]
@@ -284,8 +291,7 @@ module PureJPEG
           end
           scan.components.each do |sc|
-            comp = comp_info[sc.id]
-            dc_tab = dc_tables[sc.dc_table_id]
+            comp, dc_tab = resolve_scan_references!(sc, comp_info, dc_tables, ac_tables, require_ac: false)
             coeff_buf = coeffs[comp.id]
             bx_count = comp_blocks[comp.id][0]
@@ -463,6 +469,28 @@ module PureJPEG
       end
     end
+    def resolve_scan_references!(scan_component, comp_info, dc_tables, ac_tables, require_ac: true)
+      comp = comp_info[scan_component.id]
+      raise DecodeError, "Scan references unknown component id #{scan_component.id}" unless comp
+      dc_tab = dc_tables[scan_component.dc_table_id]
+      raise DecodeError, "Component #{scan_component.id} references missing DC Huffman table #{scan_component.dc_table_id}" unless dc_tab
+      if require_ac
+        ac_tab = ac_tables[scan_component.ac_table_id]
+        raise DecodeError, "Component #{scan_component.id} references missing AC Huffman table #{scan_component.ac_table_id}" unless ac_tab
+      end
+      [comp, dc_tab, ac_tab]
+    end
+    def fetch_quant_table!(jfif, comp)
+      qt = jfif.quant_tables[comp.qt_id]
+      raise DecodeError, "Component #{comp.id} references missing quantization table #{comp.qt_id}" unless qt
+      qt
+    end
     def assemble_grayscale(width, height, channels, comp)
       ch = channels[comp.id]
       pixels = Array.new(width * height)
@@ -474,17 +502,16 @@ module PureJPEG
           pixels[dst_row + x] = (v << 16) | (v << 8) | v
         end
       end
-      Image.new(width, height, pixels)
+      Image.new(width, height, pixels, icc_profile: @icc_profile)
     end
     def assemble_color(width, height, channels, components, max_h, max_v)
       # Upsample chroma channels if needed and convert YCbCr to RGB
-      y_ch  = channels[components[0].id]
-      cb_ch = channels[components[1].id]
-      cr_ch = channels[components[2].id]
+      y_comp, cb_comp, cr_comp = resolve_color_components(components)
-      cb_comp = components[1]
-      cr_comp = components[2]
+      y_ch = channels[y_comp.id]
+      cb_ch = channels[cb_comp.id]
+      cr_ch = channels[cr_comp.id]
       pixels = Array.new(width * height)
@@ -518,7 +545,16 @@ module PureJPEG
         end
       end
-      Image.new(width, height, pixels)
+      Image.new(width, height, pixels, icc_profile: @icc_profile)
+    end
+    def resolve_color_components(components)
+      by_id = components.each_with_object({}) { |comp, memo| memo[comp.id] = comp }
+      if by_id[1] && by_id[2] && by_id[3]
+        [by_id[1], by_id[2], by_id[3]]
+      else
+        components
+      end
     end
   end
 end

data/lib/pure_jpeg/encoder.rb CHANGED Viewed

@@ -15,6 +15,8 @@ module PureJPEG
     attr_reader :quality
     # @return [Boolean] whether grayscale mode is enabled
     attr_reader :grayscale
+    # @return [Boolean] whether image-specific Huffman tables are generated
+    attr_reader :optimize_huffman
     # Create a new encoder for the given pixel source.
     #
@@ -34,12 +36,16 @@ module PureJPEG
     # @param scramble_quantization [Boolean] write quantization tables in raster
     #   order instead of zigzag (non-spec-compliant; recreates the "early digicam"
     #   artifact look when decoded by standard viewers)
+    # @param optimize_huffman [Boolean] build image-specific Huffman tables with
+    #   an additional analysis pass (default false)
     def initialize(source, quality: 85, grayscale: false, chroma_quality: nil,
                    luminance_table: nil, chrominance_table: nil,
-                   quantization_modifier: nil, scramble_quantization: false)
+                   quantization_modifier: nil, scramble_quantization: false,
+                   optimize_huffman: false)
       @source = source
       @quality = quality
       @grayscale = grayscale
+      @optimize_huffman = optimize_huffman
       @chroma_quality = chroma_quality || quality
       validate_qtable!(luminance_table, "luminance_table") if luminance_table
       validate_qtable!(chrominance_table, "chrominance_table") if chrominance_table
@@ -54,7 +60,7 @@ module PureJPEG
     # @param path [String] output file path
     # @return [void]
     def write(path)
-      File.open(path, "wb") { |f| encode(f) }
+      File.binwrite(path, to_bytes)
     end
     # Return the encoded JPEG as a binary string.
@@ -91,61 +97,127 @@ module PureJPEG
       width = source.width
       height = source.height
+      raise ArgumentError, "Width must be a positive integer (got #{width.inspect})" unless width.is_a?(Integer) && width > 0
+      raise ArgumentError, "Height must be a positive integer (got #{height.inspect})" unless height.is_a?(Integer) && height > 0
+      raise ArgumentError, "Width #{width} exceeds maximum of #{MAX_DIMENSION}" if width > MAX_DIMENSION
+      raise ArgumentError, "Height #{height} exceeds maximum of #{MAX_DIMENSION}" if height > MAX_DIMENSION
       lum_qtable = build_lum_qtable
-      lum_dc = Huffman.build_table(Huffman::DC_LUMINANCE_BITS, Huffman::DC_LUMINANCE_VALUES)
-      lum_ac = Huffman.build_table(Huffman::AC_LUMINANCE_BITS, Huffman::AC_LUMINANCE_VALUES)
-      lum_huff = Huffman::Encoder.new(lum_dc, lum_ac)
       if grayscale
-        scan_data = encode_grayscale(width, height, lum_qtable, lum_huff)
-        write_grayscale_jfif(io, width, height, lum_qtable, scan_data)
+        y_data = extract_luminance(width, height)
+        lum_dc_bits, lum_dc_values, lum_ac_bits, lum_ac_values =
+          if optimize_huffman
+            counter = collect_grayscale_frequencies(y_data, width, height, lum_qtable)
+            dc_bits, dc_values = Huffman.optimize_table(counter.dc_frequencies)
+            ac_bits, ac_values = Huffman.optimize_table(counter.ac_frequencies)
+            [dc_bits, dc_values, ac_bits, ac_values]
+          else
+            [Huffman::DC_LUMINANCE_BITS, Huffman::DC_LUMINANCE_VALUES,
+             Huffman::AC_LUMINANCE_BITS, Huffman::AC_LUMINANCE_VALUES]
+          end
+        lum_huff = Huffman::Encoder.new(
+          Huffman.build_table(lum_dc_bits, lum_dc_values),
+          Huffman.build_table(lum_ac_bits, lum_ac_values)
+        )
+        scan_data = encode_grayscale_data(y_data, width, height, lum_qtable, lum_huff)
+        write_grayscale_jfif(io, width, height, lum_qtable, scan_data,
+                             lum_dc_bits, lum_dc_values, lum_ac_bits, lum_ac_values)
       else
         chr_qtable = build_chr_qtable
-        chr_dc = Huffman.build_table(Huffman::DC_CHROMINANCE_BITS, Huffman::DC_CHROMINANCE_VALUES)
-        chr_ac = Huffman.build_table(Huffman::AC_CHROMINANCE_BITS, Huffman::AC_CHROMINANCE_VALUES)
-        chr_huff = Huffman::Encoder.new(chr_dc, chr_ac)
+        y_data, cb_data, cr_data = extract_ycbcr(width, height)
+        sub_w = (width + 1) / 2
+        sub_h = (height + 1) / 2
+        cb_sub = downsample(cb_data, width, height, sub_w, sub_h)
+        cr_sub = downsample(cr_data, width, height, sub_w, sub_h)
+        lum_dc_bits, lum_dc_values, lum_ac_bits, lum_ac_values,
+          chr_dc_bits, chr_dc_values, chr_ac_bits, chr_ac_values =
+          if optimize_huffman
+            lum_counter, chr_counter = collect_color_frequencies(
+              y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qtable, chr_qtable
+            )
+            dc_bits, dc_values = Huffman.optimize_table(lum_counter.dc_frequencies)
+            ac_bits, ac_values = Huffman.optimize_table(lum_counter.ac_frequencies)
+            chr_dc_bits, chr_dc_values = Huffman.optimize_table(chr_counter.dc_frequencies)
+            chr_ac_bits, chr_ac_values = Huffman.optimize_table(chr_counter.ac_frequencies)
+            [dc_bits, dc_values, ac_bits, ac_values, chr_dc_bits, chr_dc_values, chr_ac_bits, chr_ac_values]
+          else
+            [Huffman::DC_LUMINANCE_BITS, Huffman::DC_LUMINANCE_VALUES,
+             Huffman::AC_LUMINANCE_BITS, Huffman::AC_LUMINANCE_VALUES,
+             Huffman::DC_CHROMINANCE_BITS, Huffman::DC_CHROMINANCE_VALUES,
+             Huffman::AC_CHROMINANCE_BITS, Huffman::AC_CHROMINANCE_VALUES]
+          end
-        scan_data = encode_color(width, height, lum_qtable, chr_qtable, lum_huff, chr_huff)
-        write_color_jfif(io, width, height, lum_qtable, chr_qtable, scan_data)
+        lum_huff = Huffman::Encoder.new(
+          Huffman.build_table(lum_dc_bits, lum_dc_values),
+          Huffman.build_table(lum_ac_bits, lum_ac_values)
+        )
+        chr_huff = Huffman::Encoder.new(
+          Huffman.build_table(chr_dc_bits, chr_dc_values),
+          Huffman.build_table(chr_ac_bits, chr_ac_values)
+        )
+        scan_data = encode_color_data(
+          y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qtable, chr_qtable, lum_huff, chr_huff
+        )
+        write_color_jfif(io, width, height, lum_qtable, chr_qtable, scan_data,
+                         lum_dc_bits, lum_dc_values, lum_ac_bits, lum_ac_values,
+                         chr_dc_bits, chr_dc_values, chr_ac_bits, chr_ac_values)
       end
     end
     # --- Grayscale encoding ---
-    def encode_grayscale(width, height, qtable, huff)
-      y_data = extract_luminance(width, height)
+    def collect_grayscale_frequencies(y_data, width, height, qtable)
+      counter = Huffman::FrequencyCounter.new
+      each_grayscale_block(y_data, width, height, qtable) do |zbuf|
+        counter.observe_block(zbuf, :y)
+      end
+      counter
+    end
+    def encode_grayscale_data(y_data, width, height, qtable, huff)
+      bit_writer = BitWriter.new
+      prev_dc = 0
+      each_grayscale_block(y_data, width, height, qtable) do |zbuf|
+        prev_dc = huff.encode_block(zbuf, prev_dc, bit_writer)
+      end
+      bit_writer.flush
+      bit_writer.bytes
+    end
+    def each_grayscale_block(y_data, width, height, qtable)
       padded_w = (width + 7) & ~7
       padded_h = (height + 7) & ~7
-      # Reusable buffers
       block = Array.new(64, 0.0)
       temp  = Array.new(64, 0.0)
       dct   = Array.new(64, 0.0)
       qbuf  = Array.new(64, 0)
       zbuf  = Array.new(64, 0)
-      bit_writer = BitWriter.new
-      prev_dc = 0
       (0...padded_h).step(8) do |by|
         (0...padded_w).step(8) do |bx|
           extract_block_into(y_data, width, height, bx, by, block)
-          prev_dc = encode_block(block, temp, dct, qbuf, zbuf, qtable, huff, prev_dc, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, qtable)
+          yield zbuf
         end
       end
-      bit_writer.flush
-      bit_writer.bytes
     end
-    def write_grayscale_jfif(io, width, height, qtable, scan_data)
+    def write_grayscale_jfif(io, width, height, qtable, scan_data, dc_bits, dc_values, ac_bits, ac_values)
       jfif = JFIFWriter.new(io, scramble_quantization: @scramble_quantization)
       jfif.write_soi
       jfif.write_app0
       jfif.write_dqt(qtable, 0)
       jfif.write_sof0(width, height, [[1, 1, 1, 0]])
-      jfif.write_dht(0, 0, Huffman::DC_LUMINANCE_BITS, Huffman::DC_LUMINANCE_VALUES)
-      jfif.write_dht(1, 0, Huffman::AC_LUMINANCE_BITS, Huffman::AC_LUMINANCE_VALUES)
+      jfif.write_dht(0, 0, dc_bits, dc_values)
+      jfif.write_dht(1, 0, ac_bits, ac_values)
       jfif.write_sos([[1, 0, 0]])
       jfif.write_scan_data(scan_data)
       jfif.write_eoi
@@ -153,69 +225,97 @@ module PureJPEG
     # --- Color encoding (YCbCr 4:2:0) ---
-    def encode_color(width, height, lum_qt, chr_qt, lum_huff, chr_huff)
-      y_data, cb_data, cr_data = extract_ycbcr(width, height)
+    def collect_color_frequencies(y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qt, chr_qt)
+      lum_counter = Huffman::FrequencyCounter.new
+      chr_counter = Huffman::FrequencyCounter.new
+      each_color_block(y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qt, chr_qt) do |component, zbuf|
+        case component
+        when :y
+          lum_counter.observe_block(zbuf, :y)
+        when :cb
+          chr_counter.observe_block(zbuf, :cb)
+        when :cr
+          chr_counter.observe_block(zbuf, :cr)
+        end
+      end
+      [lum_counter, chr_counter]
+    end
+    def encode_color_data(y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qt, chr_qt, lum_huff, chr_huff)
+      bit_writer = BitWriter.new
+      prev_dc_y = 0
+      prev_dc_cb = 0
+      prev_dc_cr = 0
-      sub_w = (width + 1) / 2
-      sub_h = (height + 1) / 2
-      cb_sub = downsample(cb_data, width, height, sub_w, sub_h)
-      cr_sub = downsample(cr_data, width, height, sub_w, sub_h)
+      each_color_block(y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qt, chr_qt) do |component, zbuf|
+        case component
+        when :y
+          prev_dc_y = lum_huff.encode_block(zbuf, prev_dc_y, bit_writer)
+        when :cb
+          prev_dc_cb = chr_huff.encode_block(zbuf, prev_dc_cb, bit_writer)
+        when :cr
+          prev_dc_cr = chr_huff.encode_block(zbuf, prev_dc_cr, bit_writer)
+        end
+      end
+      bit_writer.flush
+      bit_writer.bytes
+    end
+    def each_color_block(y_data, cb_sub, cr_sub, width, height, sub_w, sub_h, lum_qt, chr_qt)
       mcu_w = (width + 15) & ~15
       mcu_h = (height + 15) & ~15
-      # Reusable buffers
       block = Array.new(64, 0.0)
       temp  = Array.new(64, 0.0)
       dct   = Array.new(64, 0.0)
       qbuf  = Array.new(64, 0)
       zbuf  = Array.new(64, 0)
-      bit_writer = BitWriter.new
-      prev_dc_y = 0
-      prev_dc_cb = 0
-      prev_dc_cr = 0
       (0...mcu_h).step(16) do |my|
         (0...mcu_w).step(16) do |mx|
-          # 4 luminance blocks
           extract_block_into(y_data, width, height, mx, my, block)
-          prev_dc_y = encode_block(block, temp, dct, qbuf, zbuf, lum_qt, lum_huff, prev_dc_y, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
+          yield :y, zbuf
           extract_block_into(y_data, width, height, mx + 8, my, block)
-          prev_dc_y = encode_block(block, temp, dct, qbuf, zbuf, lum_qt, lum_huff, prev_dc_y, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
+          yield :y, zbuf
           extract_block_into(y_data, width, height, mx, my + 8, block)
-          prev_dc_y = encode_block(block, temp, dct, qbuf, zbuf, lum_qt, lum_huff, prev_dc_y, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
+          yield :y, zbuf
           extract_block_into(y_data, width, height, mx + 8, my + 8, block)
-          prev_dc_y = encode_block(block, temp, dct, qbuf, zbuf, lum_qt, lum_huff, prev_dc_y, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, lum_qt)
+          yield :y, zbuf
-          # 1 Cb block
           extract_block_into(cb_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
-          prev_dc_cb = encode_block(block, temp, dct, qbuf, zbuf, chr_qt, chr_huff, prev_dc_cb, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
+          yield :cb, zbuf
-          # 1 Cr block
           extract_block_into(cr_sub, sub_w, sub_h, mx >> 1, my >> 1, block)
-          prev_dc_cr = encode_block(block, temp, dct, qbuf, zbuf, chr_qt, chr_huff, prev_dc_cr, bit_writer)
+          transform_block(block, temp, dct, qbuf, zbuf, chr_qt)
+          yield :cr, zbuf
         end
       end
-      bit_writer.flush
-      bit_writer.bytes
     end
-    def write_color_jfif(io, width, height, lum_qt, chr_qt, scan_data)
+    def write_color_jfif(io, width, height, lum_qt, chr_qt, scan_data,
+                         lum_dc_bits, lum_dc_values, lum_ac_bits, lum_ac_values,
+                         chr_dc_bits, chr_dc_values, chr_ac_bits, chr_ac_values)
       jfif = JFIFWriter.new(io, scramble_quantization: @scramble_quantization)
       jfif.write_soi
       jfif.write_app0
       jfif.write_dqt(lum_qt, 0)
       jfif.write_dqt(chr_qt, 1)
       jfif.write_sof0(width, height, [[1, 2, 2, 0], [2, 1, 1, 1], [3, 1, 1, 1]])
-      jfif.write_dht(0, 0, Huffman::DC_LUMINANCE_BITS, Huffman::DC_LUMINANCE_VALUES)
-      jfif.write_dht(1, 0, Huffman::AC_LUMINANCE_BITS, Huffman::AC_LUMINANCE_VALUES)
-      jfif.write_dht(0, 1, Huffman::DC_CHROMINANCE_BITS, Huffman::DC_CHROMINANCE_VALUES)
-      jfif.write_dht(1, 1, Huffman::AC_CHROMINANCE_BITS, Huffman::AC_CHROMINANCE_VALUES)
+      jfif.write_dht(0, 0, lum_dc_bits, lum_dc_values)
+      jfif.write_dht(1, 0, lum_ac_bits, lum_ac_values)
+      jfif.write_dht(0, 1, chr_dc_bits, chr_dc_values)
+      jfif.write_dht(1, 1, chr_ac_bits, chr_ac_values)
       jfif.write_sos([[1, 0, 0], [2, 1, 1], [3, 1, 1]])
       jfif.write_scan_data(scan_data)
       jfif.write_eoi
@@ -223,11 +323,11 @@ module PureJPEG
     # --- Shared block pipeline (all buffers pre-allocated) ---
-    def encode_block(block, temp, dct, qbuf, zbuf, qtable, huff, prev_dc, bit_writer)
+    def transform_block(block, temp, dct, qbuf, zbuf, qtable)
       DCT.forward!(block, temp, dct)
       Quantization.quantize!(dct, qtable, qbuf)
       Zigzag.reorder!(qbuf, zbuf)
-      huff.encode_block(zbuf, prev_dc, bit_writer)
+      zbuf
     end
     # --- Pixel extraction ---

data/lib/pure_jpeg/huffman/encoder.rb CHANGED Viewed

@@ -3,6 +3,56 @@
 module PureJPEG
   module Huffman
     class Encoder
+      def self.category_and_bits(value)
+        return [0, 0] if value == 0
+        abs_val = value.abs
+        cat = 0
+        v = abs_val
+        while v > 0
+          cat += 1
+          v >>= 1
+        end
+        bits = value > 0 ? value : value + (1 << cat) - 1
+        [cat, bits]
+      end
+      def self.each_ac_item(zigzag)
+        last_nonzero = 63
+        last_nonzero -= 1 while last_nonzero > 0 && zigzag[last_nonzero] == 0
+        if last_nonzero == 0
+          yield 0x00, 0
+          return
+        end
+        i = 1
+        while i <= last_nonzero
+          run = 0
+          while i <= last_nonzero && zigzag[i] == 0
+            run += 1
+            i += 1
+          end
+          while run >= 16
+            yield 0xF0, 0
+            run -= 16
+          end
+          value = zigzag[i]
+          cat, = category_and_bits(value)
+          yield (run << 4) | cat, value
+          i += 1
+        end
+        yield 0x00, 0 if last_nonzero < 63
+      end
+      def self.each_ac_symbol(zigzag)
+        each_ac_item(zigzag) do |symbol, _value|
+          yield symbol
+        end
+      end
       def initialize(dc_table, ac_table)
         @dc_table = dc_table
         @ac_table = ac_table
@@ -23,65 +73,43 @@ module PureJPEG
       private
       def encode_dc(diff, writer)
-        cat, bits = category_and_bits(diff)
+        cat, bits = self.class.category_and_bits(diff)
         code, length = @dc_table[cat]
         writer.write_bits(code, length)
         writer.write_bits(bits, cat) if cat > 0
       end
       def encode_ac(zigzag, writer)
-        last_nonzero = 63
-        last_nonzero -= 1 while last_nonzero > 0 && zigzag[last_nonzero] == 0
+        self.class.each_ac_item(zigzag) do |symbol, value|
+          code, length = @ac_table[symbol]
+          writer.write_bits(code, length)
+          next if symbol == 0x00 || symbol == 0xF0
-        if last_nonzero == 0
-          # All AC coefficients are zero (AC starts at index 1)
-          eob = @ac_table[0x00]
-          writer.write_bits(eob[0], eob[1])
-          return
+          cat, bits = self.class.category_and_bits(value)
+          writer.write_bits(bits, cat)
         end
+      end
+    end
-        i = 1
-        while i <= last_nonzero
-          run = 0
-          while i <= last_nonzero && zigzag[i] == 0
-            run += 1
-            i += 1
-          end
+    class FrequencyCounter
+      attr_reader :dc_frequencies, :ac_frequencies
-          # Emit ZRL (16 zeros) symbols as needed
-          while run >= 16
-            zrl = @ac_table[0xF0]
-            writer.write_bits(zrl[0], zrl[1])
-            run -= 16
-          end
+      def initialize
+        @dc_frequencies = Array.new(256, 0)
+        @ac_frequencies = Array.new(256, 0)
+        @prev_dc = Hash.new(0)
+      end
-          cat, bits = category_and_bits(zigzag[i])
-          symbol = (run << 4) | cat
-          code, length = @ac_table[symbol]
-          writer.write_bits(code, length)
-          writer.write_bits(bits, cat) if cat > 0
-          i += 1
-        end
+      def observe_block(zigzag, state_key)
+        diff = zigzag[0] - @prev_dc[state_key]
+        @prev_dc[state_key] = zigzag[0]
-        # EOB if we didn't reach position 63
-        if last_nonzero < 63
-          eob = @ac_table[0x00]
-          writer.write_bits(eob[0], eob[1])
-        end
-      end
+        cat, = Encoder.category_and_bits(diff)
+        @dc_frequencies[cat] += 1
-      # Returns [category, encoded_bits] for a coefficient value.
-      def category_and_bits(value)
-        return [0, 0] if value == 0
-        abs_val = value.abs
-        cat = 0
-        v = abs_val
-        while v > 0
-          cat += 1
-          v >>= 1
+        Encoder.each_ac_symbol(zigzag) do |symbol|
+          @ac_frequencies[symbol] += 1
         end
-        bits = value > 0 ? value : value + (1 << cat) - 1
-        [cat, bits]
       end
     end
   end

data/lib/pure_jpeg/huffman/tables.rb CHANGED Viewed

@@ -81,5 +81,96 @@ module PureJPEG
       table
     end
+    # Build a JPEG canonical Huffman table definition from symbol frequencies.
+    # Returns [bits, values], where bits has 16 entries for code lengths 1..16.
+    def self.optimize_table(frequencies)
+      lengths = build_code_lengths(frequencies)
+      counts = length_counts(lengths)
+      trim_counts_to_jpeg_limit!(counts)
+      symbols = (0...256).select { |symbol| frequencies[symbol].positive? }
+      symbols.sort_by! { |symbol| [-frequencies[symbol], symbol] }
+      bits = Array.new(16, 0)
+      values = []
+      index = 0
+      1.upto(16) do |length|
+        count = counts[length]
+        bits[length - 1] = count
+        count.times do
+          values << symbols[index]
+          index += 1
+        end
+      end
+      [bits.freeze, values.freeze]
+    end
+    def self.build_code_lengths(frequencies)
+      nodes = []
+      256.times do |symbol|
+        freq = frequencies[symbol]
+        nodes << { freq: freq, symbol: symbol } if freq.positive?
+      end
+      nodes << { freq: 1, symbol: 256 }
+      while nodes.length > 1
+        nodes.sort_by! do |node|
+          [node[:freq], node[:symbol] || 257]
+        end
+        left = nodes.shift
+        right = nodes.shift
+        nodes << { freq: left[:freq] + right[:freq], left: left, right: right }
+      end
+      lengths = Array.new(257, 0)
+      assign_code_lengths(nodes.first, 0, lengths)
+      lengths
+    end
+    private_class_method :build_code_lengths
+    def self.assign_code_lengths(node, depth, lengths)
+      if node[:symbol]
+        lengths[node[:symbol]] = depth.zero? ? 1 : depth
+        return
+      end
+      assign_code_lengths(node[:left], depth + 1, lengths)
+      assign_code_lengths(node[:right], depth + 1, lengths)
+    end
+    private_class_method :assign_code_lengths
+    def self.length_counts(lengths)
+      counts = Array.new([lengths.max + 1, 33].max, 0)
+      lengths.each do |length|
+        counts[length] += 1 if length.positive?
+      end
+      counts
+    end
+    private_class_method :length_counts
+    def self.trim_counts_to_jpeg_limit!(counts)
+      max_length = counts.length - 1
+      while max_length > 16
+        while counts[max_length].positive?
+          j = max_length - 2
+          j -= 1 while j.positive? && counts[j].zero?
+          raise ArgumentError, "Unable to limit Huffman code lengths" unless j.positive?
+          counts[max_length] -= 2
+          counts[max_length - 1] += 1
+          counts[j + 1] += 2
+          counts[j] -= 1
+        end
+        max_length -= 1
+      end
+      max_length = 16
+      max_length -= 1 while max_length.positive? && counts[max_length].zero?
+      counts[max_length] -= 1
+    end
+    private_class_method :trim_counts_to_jpeg_limit!
   end
 end

data/lib/pure_jpeg/image.rb CHANGED Viewed

@@ -17,14 +17,19 @@ module PureJPEG
     #   Format: +(r << 16) | (g << 8) | b+.
     attr_reader :packed_pixels
+    # @return [String, nil] raw ICC color profile data, if present in the source JPEG
+    attr_reader :icc_profile
     # @param width [Integer]
     # @param height [Integer]
     # @param packed_pixels [Array<Integer>] flat row-major array of packed RGB
     #   integers in the format +(r << 16) | (g << 8) | b+
-    def initialize(width, height, packed_pixels)
+    # @param icc_profile [String, nil] raw ICC profile bytes
+    def initialize(width, height, packed_pixels, icc_profile: nil)
       @width = width
       @height = height
       @packed_pixels = packed_pixels
+      @icc_profile = icc_profile
     end
     # Retrieve a pixel by coordinate.

data/lib/pure_jpeg/info.rb ADDED Viewed

@@ -0,0 +1,6 @@
+# frozen_string_literal: true
+module PureJPEG
+  # Lightweight metadata returned by {.info}.
+  Info = Struct.new(:width, :height, :component_count, :progressive, :icc_profile, keyword_init: true)
+end

data/lib/pure_jpeg/jfif_reader.rb CHANGED Viewed

@@ -3,14 +3,15 @@
 module PureJPEG
   class JFIFReader
     attr_reader :width, :height, :components, :quant_tables, :huffman_tables,
-                :restart_interval, :progressive, :scans
+                :restart_interval, :progressive, :scans, :icc_profile
     Component = Struct.new(:id, :h_sampling, :v_sampling, :qt_id)
     ScanComponent = Struct.new(:id, :dc_table_id, :ac_table_id)
     Scan = Struct.new(:components, :spectral_start, :spectral_end, :successive_high, :successive_low, :data, :huffman_tables)
-    def initialize(data)
+    def initialize(data, stop_after_frame: false)
       @data = data.b
+      @stop_after_frame = stop_after_frame
       @pos = 0
       @quant_tables = {}
       @huffman_tables = {}
@@ -18,7 +19,9 @@ module PureJPEG
       @restart_interval = 0
       @progressive = false
       @scans = []
+      @icc_chunks = {}
       parse
+      assemble_icc_profile
     end
     def scan_components
@@ -37,7 +40,9 @@ module PureJPEG
       loop do
         marker = read_marker
         case marker
-        when 0xE0..0xEF # APP0-APP15
+        when 0xE2 # APP2 (may contain ICC profile)
+          parse_app2
+        when 0xE0, 0xE1, 0xE3..0xEF # APP0, APP1, APP3-APP15
           skip_segment
         when 0xDB # DQT
           parse_dqt
@@ -45,9 +50,11 @@ module PureJPEG
           parse_dht
         when 0xC0 # SOF0 (baseline)
           parse_sof0
+          return if @stop_after_frame
         when 0xC2 # SOF2 (progressive)
           parse_sof0
           @progressive = true
+          return if @stop_after_frame
         when 0xDA # SOS
           scan = parse_sos
           scan.data = extract_scan_data
@@ -91,6 +98,28 @@ module PureJPEG
       raise PureJPEG::DecodeError, "Expected marker 0x#{expected.to_s(16)}, got 0x#{marker.to_s(16)}" unless marker == expected
     end
+    ICC_PROFILE_SIG = "ICC_PROFILE\0".b
+    def parse_app2
+      length = read_u16
+      end_pos = @pos + length - 2
+      if length >= 16 && @data[@pos, 12] == ICC_PROFILE_SIG
+        @pos += 12
+        seq_no = read_byte
+        _total = read_byte
+        @icc_chunks[seq_no] = @data[@pos, end_pos - @pos]
+      end
+      @pos = end_pos
+    end
+    def assemble_icc_profile
+      return if @icc_chunks.empty?
+      @icc_profile = @icc_chunks.sort_by(&:first).map(&:last).join.b
+    end
     def skip_segment
       length = read_u16
       @pos += length - 2

data/lib/pure_jpeg/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module PureJPEG
-  VERSION = "0.2.0"
+  VERSION = "0.3.0"
 end

data/lib/pure_jpeg.rb CHANGED Viewed

@@ -16,6 +16,7 @@ require_relative "pure_jpeg/huffman/encoder"
 require_relative "pure_jpeg/huffman/decoder"
 require_relative "pure_jpeg/jfif_writer"
 require_relative "pure_jpeg/jfif_reader"
+require_relative "pure_jpeg/info"
 require_relative "pure_jpeg/image"
 require_relative "pure_jpeg/encoder"
 require_relative "pure_jpeg/decoder"
@@ -28,6 +29,9 @@ module PureJPEG
   # Raised when decoding invalid or unsupported JPEG data.
   class DecodeError < StandardError; end
+  # Maximum image dimension (width or height) allowed for encoding and decoding.
+  MAX_DIMENSION = 8192
   # Encode a pixel source as a JPEG.
   #
   # @param source [#width, #height, #[]] any object responding to +width+,
@@ -60,4 +64,27 @@ module PureJPEG
   def self.read(path_or_data)
     Decoder.decode(path_or_data)
   end
+  # Read JPEG dimensions and basic frame metadata without decoding scan data.
+  #
+  # @param path_or_data [String] a file path or raw JPEG bytes
+  # @return [Info] image metadata parsed from the frame header
+  def self.info(path_or_data)
+    data = if path_or_data.is_a?(String) && !path_or_data.start_with?("\xFF\xD8".b) && File.exist?(path_or_data)
+             File.binread(path_or_data)
+           else
+             path_or_data.b
+           end
+    jfif = JFIFReader.new(data, stop_after_frame: true)
+    raise DecodeError, "JPEG frame header not found" unless jfif.width && jfif.height
+    Info.new(
+      width: jfif.width,
+      height: jfif.height,
+      component_count: jfif.components.length,
+      progressive: jfif.progressive,
+      icc_profile: jfif.icc_profile
+    )
+  end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: pure_jpeg
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
 platform: ruby
 authors:
 - Peter Cooper
@@ -57,6 +57,7 @@ files:
 - lib/pure_jpeg/huffman/encoder.rb
 - lib/pure_jpeg/huffman/tables.rb
 - lib/pure_jpeg/image.rb
+- lib/pure_jpeg/info.rb
 - lib/pure_jpeg/jfif_reader.rb
 - lib/pure_jpeg/jfif_writer.rb
 - lib/pure_jpeg/quantization.rb