pure_jpeg 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e09848a734582d7635ff6a8c3d85b166da4f2c5b496d5c4f4e30105af4834e33
4
- data.tar.gz: 052b8e0a21d58eb9aa9e169e06b7cfbde44f3610775da6fed86e05875dceaea3
3
+ metadata.gz: 66b5d6fe1b663128f62aae8111b55a9b2ddbe9739d501fcf1146c459286433da
4
+ data.tar.gz: 02ae6cdc25f520221fee4adfe9d6070c4a975602e5de2b34a0b9ab8b4e005829
5
5
  SHA512:
6
- metadata.gz: 750ea3d65bf2ae6c272998b2ef7b814954eb1b576f6df83036d931543f30b99481281e8a12ef63fc388fdc60db0c54815fc8219b66782415844b3ed8721a5975
7
- data.tar.gz: 16475c5174b009a7a45eec5ee288799ea7965e4e10ed02ab20d55b15de5ebbdf92e9c0196d58053b6539dfea4601d7e22016c636ee471d2e73300feb86d97f07
6
+ metadata.gz: f476b8fec25f1f0402f297d534f52887aed90778ddf1217668a0889755856436dae8a0d8e4e9d648bc2a33879165b32fa0560e5b506549467c21a648fd6ecf29
7
+ data.tar.gz: 5438c161519149458fad8cd28dea1f9f073a2646c5f91f619a357d375b77456ed5f366dd3ed78e75f5c0d2cbe4156f4ec8a5a6dbaa9e19d86198dcd8a2fcacfc
data/CHANGELOG.md CHANGED
@@ -1,5 +1,47 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.0
4
+
5
+ New features:
6
+
7
+ - `PureJPEG.info` for reading dimensions and metadata without full decode
8
+ - ICC color profile extraction (available on `Info` and `Image`)
9
+ - Optional image-specific optimized Huffman tables (`optimize_huffman: true`)
10
+
11
+ Fixes:
12
+
13
+ - Decoder validates Huffman table, quantization table, and component references with clear error messages
14
+ - Color decoding looks up Y/Cb/Cr components by ID instead of assuming SOF array order
15
+ - Support for non-standard component IDs (e.g. 0, 1, 2 as used by some Adobe tools)
16
+ - Explicit error for unsupported component counts (e.g. CMYK)
17
+ - Encoder no longer holds file handle open during encoding
18
+
19
+ ## 0.2.0
20
+
21
+ New features:
22
+
23
+ - Progressive JPEG decoding (SOF2) with spectral selection and successive approximation
24
+ - `Image#each_rgb` for iterating pixels without per-pixel struct allocation
25
+ - `PureJPEG::DecodeError` exception class for all decoding errors
26
+ - Validation of custom quantization tables (length and value range)
27
+
28
+ Performance:
29
+
30
+ - Packed integer pixel storage in `Image` eliminates per-pixel object allocation on decode (~6x faster decode)
31
+ - Fast path for encoder pixel extraction from packed sources (`ChunkyPNGSource`, `Image`)
32
+ - `BitReader#read_bits` fast path when buffer already has enough bits
33
+ - `BitWriter` builds a `String` directly instead of `Array` + `pack`
34
+ - `Huffman.build_table` returns an `Array` for O(1) lookup instead of `Hash`
35
+ - Faster scan data extraction using `String#index`
36
+
37
+ Fixes:
38
+
39
+ - JPEG data detection uses SOI marker check instead of null-byte heuristic
40
+ - `RawSource` pixels default to black instead of `nil`
41
+ - `BitReader` bounds check for truncated 0xFF sequences
42
+ - `JFIFReader` bounds check when reading past end of data
43
+ - Fixed dead tautological check in AC encoding EOB logic
44
+
3
45
  ## 0.1.0
4
46
 
5
47
  Initial release.
data/LICENSE CHANGED
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2025 Peter Cooper
3
+ Copyright (c) 2026 Peter Cooper
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,6 +1,15 @@
1
- # PureJPEG
1
+ <p align="center">
2
+ <img src="purejpeg.jpg" width="480" alt="PureJPEG">
3
+ </p>
2
4
 
3
- Pure Ruby JPEG encoder and decoder. Implements baseline JPEG (DCT, Huffman, 4:2:0 chroma subsampling) and exposes a variety of encoding options to adjust parts of the JPEG pipeline not normally available (I needed this to recreate the JPEG compression styles of older digital cameras - don't ask..)
5
+ # PureJPEG - Pure Ruby JPEG encoder and decoder library
6
+
7
+ Convert PNG or other pixel data to JPEG. Or the other way! Implements baseline JPEG encoding (DCT, Huffman, 4:2:0 chroma subsampling) and decodes both baseline and progressive JPEGs. Exposes a variety of encoding options to adjust parts of the JPEG pipeline not normally available (I needed this to recreate the JPEG compression styles of older digital cameras - don't ask..)
8
+
9
+ It works on CRuby 3.0+, TruffleRuby 33.0, and JRuby 10.0.
10
+
11
+ > [!NOTE]
12
+ > Rubyists might find the [AI Disclosure](#ai-disclosure) section below of interest.
4
13
 
5
14
  ## Installation
6
15
 
@@ -14,7 +23,7 @@ gem "pure_jpeg"
14
23
  gem install pure_jpeg
15
24
  ```
16
25
 
17
- There are no runtime dependencies. [ChunkyPNG](https://github.com/wvanbergen/chunky_png) is optional and for if you want to use `from_chunky_png`. I have a pure PNG encoder/decoder not far behind this that will ultimately plug in nicely too to get pure Ruby graphical bliss ;-)
26
+ There are no runtime dependencies. [ChunkyPNG](https://github.com/wvanbergen/chunky_png) is optional (though quite useful) if you want to use `from_chunky_png`. I have a pure PNG encoder/decoder not far behind this that will ultimately plug in nicely too to get 100% pure Ruby graphical bliss ;-)
18
27
 
19
28
  `examples/` contains some useful example scripts for basic JPEG to PNG and PNG to JPEG conversion if you want to do some quick tests without writing code.
20
29
 
@@ -70,13 +79,42 @@ PureJPEG.encode(source,
70
79
  luminance_table: nil, # custom 64-element quantization table for Y
71
80
  chrominance_table: nil, # custom 64-element quantization table for Cb/Cr
72
81
  quantization_modifier: nil, # proc(table, :luminance/:chrominance) -> modified table
73
- scramble_quantization: false # intentionally misordered quant tables (creative effect)
82
+ scramble_quantization: false, # intentionally misordered quant tables (creative effect)
83
+ optimize_huffman: false # slower 2-pass encode, usually smaller files
74
84
  )
75
85
  ```
76
86
 
77
87
  See [CREATIVE.md](CREATIVE.md) for detailed examples of the creative encoding options.
78
88
 
79
- Each stage of the JPEG pipeline is a separate module, so individual components (DCT, quantization, Huffman coding) can be replaced or extended independently which is kinda my plan here as I made this to play around with effects.
89
+ Here's a quick example of sort of the "old digital camera" effect I was looking for though:
90
+
91
+ <table>
92
+ <tr>
93
+ <td align="center"><strong>Normal</strong></td>
94
+ <td align="center"><strong>Scrambled quantization</strong></td>
95
+ </tr>
96
+ <tr>
97
+ <td><img src="examples/peppers.jpg" width="360"></td>
98
+ <td><img src="examples/peppers-funky.jpg" width="360"></td>
99
+ </tr>
100
+ </table>
101
+
102
+ And here's what happens when you convert a PNG with transparency — JPEG doesn't support alpha, so the hidden RGB data behind transparent pixels bleeds through:
103
+
104
+ <table>
105
+ <tr>
106
+ <td align="center"><strong>PNG with transparency</strong></td>
107
+ <td align="center"><strong>Converted to JPEG</strong></td>
108
+ </tr>
109
+ <tr>
110
+ <td><img src="examples/dice.png" width="360"></td>
111
+ <td><img src="examples/dice.jpg" width="360"></td>
112
+ </tr>
113
+ </table>
114
+
115
+ I consider this a feature but you may consider it a deficiency and that a default background of white should be applied. This may be something I'll add if anyone wants it!
116
+
117
+ Note that each stage of the JPEG pipeline is a separate module, so individual components (DCT, quantization, Huffman coding) can be replaced or extended independently which is kinda my plan here as I made this to play around with effects.
80
118
 
81
119
  ## Decoding (reading JPEGs!)
82
120
 
@@ -98,6 +136,16 @@ pixel.b # => 97
98
136
  image = PureJPEG.read(jpeg_bytes)
99
137
  ```
100
138
 
139
+ ### Read dimensions and metadata only
140
+
141
+ ```ruby
142
+ info = PureJPEG.info("photo.jpg")
143
+ info.width # => 1024
144
+ info.height # => 768
145
+ info.component_count # => 3
146
+ info.progressive # => false
147
+ ```
148
+
101
149
  ### Iterating pixels
102
150
 
103
151
  ```ruby
@@ -134,27 +182,30 @@ Encoding:
134
182
  - 8-bit precision
135
183
  - Grayscale (1 component) and YCbCr color (3 components)
136
184
  - 4:2:0 chroma subsampling (color) or no subsampling (grayscale)
137
- - Standard Huffman tables (Annex K)
185
+ - Standard Huffman tables (Annex K) by default
186
+ - Optional image-specific optimized Huffman tables
138
187
 
139
188
  Decoding:
140
- - Baseline DCT (SOF0)
189
+ - Baseline DCT (SOF0) and Progressive DCT (SOF2)
141
190
  - 8-bit precision
142
191
  - 1-component (grayscale) and 3-component (YCbCr) images
143
192
  - Any chroma subsampling factor (4:4:4, 4:2:2, 4:2:0, etc.)
144
193
  - Restart markers (DRI/RST)
145
194
 
146
- Not supported: progressive JPEG (SOF2), arithmetic coding, 12-bit precision, multi-scan, EXIF/ICC profile preservation. Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
195
+ Not supported: arithmetic coding, 12-bit precision, EXIF/ICC profile preservation, adding a default background for transparent sources (see what happens above!). Largely because I don't need these, but they are all do-able, especially with how loosely coupled this library is internally. Raise an issue if you really care about them!
196
+
197
+ Possible future improvements: AAN/fixed-point DCT (but it's a LOT of work), ICC profile rendering/conversion.
147
198
 
148
199
  ## Performance
149
200
 
150
- On a 1024x1024 image (Ruby 3.4 on my M1 Max):
201
+ On a 1024x1024 image (Ruby 4.0.1 on my M1 Max):
151
202
 
152
203
  | Operation | Time |
153
204
  |-----------|------|
154
- | Encode (color, q85) | ~2.8s |
155
- | Decode (color) | ~12s |
205
+ | Encode (color, q85) | ~1.7s |
206
+ | Decode (color) | ~1.8s |
156
207
 
157
- The encoder uses a separable DCT with a precomputed cosine matrix and reuses all per-block buffers to minimize GC pressure (more on the optimizations below).
208
+ Both the encoder and decoder use a separable DCT with a precomputed cosine matrix and reuse all per-block buffers to minimize GC pressure. Pixel data is stored as packed integers internally to avoid per-pixel object allocation.
158
209
 
159
210
  ## Some useful `rake` tasks
160
211
 
@@ -167,13 +218,19 @@ rake profile # CPU profile with StackProf (requires the stackprof gem)
167
218
 
168
219
  ## AI Disclosure
169
220
 
170
- Claude Code did the majority of the work. However, it did require a lot of guidance as it was quite naive in its approach at first with its JPEG outputs looking very akin to those of my Kodak digital camera from 2001! It turns out it got something wrong which, amusingly, it seems devices of those era also got wrong (specifically not using the zigzag approach during quanitization).
221
+ **Claude Code did the majority of the work.** The math of JPEG encoding/decoding is beyond me, except 'getting it' at a high level. I understand it like I understand the engine in my car :-) *Later update: OpenAI Codex is also reviewing and adding features now. It feels stronger in many areas.*
222
+
223
+ **I have read all of the code produced up to v0.2.0.** The algorithms are above my paygrade, but I'm OK with what has been produced, and I manually fixed a variety of stylistic things along the way. For example, CC seems to like wrapping entire functions in `if` statements rather than bailing on the opposite condition. *Later update: I have not read the ICC and optimized Huffman code yet, but it is heavily tested.*
224
+
225
+ **CC needed a lot of guidance.** Its initial JPEG algorithm was somewhat naive and output odd looking JPEGs akin to those of my Kodak digital camera from 2001. After some back and forth and image comparisons, we figured out it was doing the quantization entirely wrong (specifically not using the zigzag approach during quanitization but just going in raster order). I *like* this aesthetic, but fixed it up so that it works as a generally usable JPEG library, while adding ways to customize things so you can recreate the effect, if preferred (see `CREATIVE.md` for more on that).
226
+
227
+ **CC is lazy.** The initial implementation was VERY SLOW. It took 15 seconds to turn a 1024x1024 PNG into a JPEG, so we went down the profiling rabbit hole and found many optimizations to make it ~6x faster. CC is poor at considering the role of Ruby's GC when implementing low level algorithms and needs some prodding to make the correct optimizations. CC is also lazy to the point of recommending that you just use another language (e.g. Go or Rust) rather than do a pure Ruby version of something - despite it being possible with some extra work.
171
228
 
172
- The initial implementation was also VERY SLOW. It took about 15 seconds just to turn a 1024x1024 PNG into a JPEG, so some profiling was necessary which ended up finding a lot of possible optimizations to make it about 6x faster.
229
+ **CC's testing and cleanliness leaves a bit to be desired.** The CC-created tests were superficial, so I worked on getting them beefed up to tackle a variety of edge cases. They could still get better. It also didn't do RDoc comments, use Minitest, and a variety of other things I coerced it into working on. A good `CLAUDE.md` file could probably avoid many of these problems. I worked without one.
173
230
 
174
- The tests were also a bit superficial, so I worked on getting them beefed up to tackle a variety of edge cases, although they could still be better. It also didn't do RDoc comments, use Minitest, and a variety of other things I had to coerce it into finishing.
231
+ **The overall experience was good.** I enjoyed this project, but CC clearly requires an experienced developer to keep it on the rails and to not end up with a bunch of buggy half-working crap. Getting to the basic 'turn a PNG into a JPEG' took only twenty minutes, but the rest of making it actually widely useful took several hours more.
175
232
 
176
- I have read all of the code produced. A lot of the internals are above my paygrade but I'm generally OK with what has been produced and fixed a variety of stylistic things along the way.
233
+ **The final 10% still takes 90% of the time.** As mentioned above, the first run was quick, but getting things right has taken much longer. v0.1->0.2 has taken longer than 0.1 did! But we now have progressive JPEG support, even more optimizations, better tests, etc. etc.
177
234
 
178
235
  ## License
179
236
 
@@ -18,6 +18,12 @@ module PureJPEG
18
18
 
19
19
  def read_bits(n)
20
20
  return 0 if n == 0
21
+ # Fast path: enough bits already in the buffer
22
+ if @bits_in_buffer >= n
23
+ @bits_in_buffer -= n
24
+ return (@buffer >> @bits_in_buffer) & ((1 << n) - 1)
25
+ end
26
+ # Slow path: need to refill
21
27
  value = 0
22
28
  n.times { value = (value << 1) | read_bit }
23
29
  value
@@ -43,10 +49,11 @@ module PureJPEG
43
49
  private
44
50
 
45
51
  def fill_buffer
46
- raise "Unexpected end of scan data" if @pos >= @length
52
+ raise PureJPEG::DecodeError, "Unexpected end of scan data" if @pos >= @length
47
53
  byte = @data.getbyte(@pos)
48
54
  @pos += 1
49
55
  if byte == 0xFF
56
+ raise PureJPEG::DecodeError, "Unexpected end of scan data" if @pos >= @length
50
57
  next_byte = @data.getbyte(@pos)
51
58
  @pos += 1
52
59
  # 0xFF 0x00 is a stuffed 0xFF byte
@@ -3,7 +3,7 @@
3
3
  module PureJPEG
4
4
  class BitWriter
5
5
  def initialize
6
- @data = []
6
+ @data = String.new(capacity: 4096, encoding: Encoding::BINARY)
7
7
  @buffer = 0
8
8
  @bits_in_buffer = 0
9
9
  end
@@ -17,8 +17,8 @@ module PureJPEG
17
17
  while @bits_in_buffer >= 8
18
18
  @bits_in_buffer -= 8
19
19
  byte = (@buffer >> @bits_in_buffer) & 0xFF
20
- @data << byte
21
- @data << 0x00 if byte == 0xFF # byte stuffing
20
+ @data << byte.chr
21
+ @data << "\x00".b if byte == 0xFF # byte stuffing
22
22
  end
23
23
 
24
24
  @buffer &= (1 << @bits_in_buffer) - 1
@@ -32,7 +32,7 @@ module PureJPEG
32
32
  end
33
33
 
34
34
  def bytes
35
- @data.pack("C*")
35
+ @data
36
36
  end
37
37
  end
38
38
  end