omq-lz4 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4217a939f757a07b0e3e2ad1a905d11ea3cd017e54baa2f059a75f304386bdab
4
- data.tar.gz: 1d0f4e4076e0cf02291b3f714fe20c721b8e05e0d4d17af4fbc8662e83897319
3
+ metadata.gz: b14469227b64cb647b42dbb8495754752ea81654449eab28cae78a0778aa36dd
4
+ data.tar.gz: 344189a2d0c29ec2d46b310f5607d133bdba87a69fb974be6bb582101f76aacf
5
5
  SHA512:
6
- metadata.gz: a963f09065ef7a019a8b766a00093e5ef84ba9489fed66d3524da5e8a0d8d956d27cd01534926ac1cb3f5c5088439da9e0db5c5f71c0b7b58b7b90db8ccfc8bf
7
- data.tar.gz: 86e7b5fe26c080a13a6b5c5c44b0e0a250bc45d650b2a9f76f4a80330209d6e24374b8a6412dd28118cb971619f56d3bede7a5ed937c91c5a80fd520ab17b5ac
6
+ metadata.gz: c3ef67d35613434faf125317a61dded584c8d268032eb01830e22a809415ab20b2cf198daa8ee7803d04e88b97d4b39bb6042071a7b18bbf8cff03cb6db06cee
7
+ data.tar.gz: 49ed574d2b93d2e039160514602cf4f90e125edd9c2b3e05033fbbd5ee1dc1eb098373acaed27c54553e9e6977d6488d3f751b85544ad87a6f0498e5e061cad5
data/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Changelog
2
2
 
3
+ ## [Unreleased]
4
+
5
+ ## 0.3.1 (2026-05-28)
6
+
7
+ ### Changed
8
+
9
+ - `MIN_COMPRESS_WITH_DICT`: raised from 32 to 128. The previous value
10
+ was too aggressive; 128 leaves a safer margin above the measured
11
+ crossover.
12
+
3
13
  ## 0.3.0 (2026-05-11)
4
14
 
5
15
  ### Added
data/README.md CHANGED
@@ -4,19 +4,19 @@
4
4
  [![License: ISC](https://img.shields.io/badge/License-ISC-blue.svg)](LICENSE)
5
5
  [![Ruby](https://img.shields.io/badge/Ruby-%3E%3D%203.3-CC342D?logo=ruby&logoColor=white)](https://www.ruby-lang.org)
6
6
 
7
- > [RFC.md](RFC.md) for the wire-format spec and
8
- > [CHANGELOG.md](CHANGELOG.md) for what's in.
9
-
10
7
  LZ4-compressed TCP transport for [OMQ](https://github.com/paddor/omq),
11
8
  complementary to [`omq-zstd`](https://github.com/paddor/omq-zstd).
12
9
  Pick `lz4+tcp://` instead of `tcp://` or `zstd+tcp://` when you want
13
10
  cheap per-message compression with a small per-connection footprint.
14
11
 
12
+ See [RFC.md](RFC.md) for the wire-format specification and
13
+ [CHANGELOG.md](CHANGELOG.md) for release history.
14
+
15
15
  ## When to pick `lz4+tcp://` over `zstd+tcp://`
16
16
 
17
17
  LZ4 has no entropy stage (no Huffman, no FSE), ~16 KiB of encoder state
18
18
  per connection, and trades a **worse compression ratio** for
19
- **~4–8× faster encode** and **~ less memory per connection**.
19
+ **~4-8x faster encode** and **~3x less memory per connection**.
20
20
 
21
21
  | | `zstd+tcp://` | `lz4+tcp://` |
22
22
  |---|---|---|
@@ -25,7 +25,7 @@ per connection, and trades a **worse compression ratio** for
25
25
  | Memory per connection | ~256 KiB | ~16 KiB + dict |
26
26
  | Ratio, 1 KiB JSON no dict | ~45% | ~65% |
27
27
  | Ratio, 1 KiB JSON with dict | ~20% | ~35% |
28
- | Auto-trained dictionaries | | (user-supplied only) |
28
+ | Auto-trained dictionaries | yes | no (user-supplied only) |
29
29
 
30
30
  Pick `omq-lz4` for CPU- or memory-scarce deployments (edge gateways,
31
31
  IoT concentrators, high-fanout scenarios where per-connection state
@@ -60,13 +60,13 @@ pull.receive # => ["hello, compressed world"]
60
60
  ```
61
61
 
62
62
  Both peers must use `lz4+tcp://`. A `tcp://` peer cannot talk to an
63
- `lz4+tcp://` peer they speak different transports.
63
+ `lz4+tcp://` peer. They speak different transports.
64
64
 
65
65
  ### Dictionary compression
66
66
 
67
67
  Small messages don't compress well on their own. A shared dictionary
68
- gives 2–5× better ratios on payloads with a common prefix. Supply a
69
- user-trained dictionary (LZ4 has no auto-training use `omq-zstd`
68
+ gives 2-5x better ratios on payloads with a common prefix. Supply a
69
+ user-trained dictionary (LZ4 has no auto-training; use `omq-zstd`
70
70
  for that):
71
71
 
72
72
  ```ruby
@@ -78,9 +78,7 @@ The sender ships the dictionary to the receiver in-band, prefixed
78
78
  with the dictionary sentinel (`4C 5A 34 44`, "LZ4D" in ASCII), on
79
79
  the first outgoing message. The receiver installs the dictionary
80
80
  and decompresses subsequent messages against it. Dictionary size
81
- is capped at **8 KiB** tighter than `omq-zstd`'s 64 KiB cap, to
82
- let constrained peers accept shipments without allocating tens of
83
- KB of scratch.
81
+ is capped at **8 KiB** (same cap as `omq-zstd`).
84
82
 
85
83
  ### Compression thresholds
86
84
 
@@ -89,7 +87,7 @@ To avoid pessimizing tiny frames, the sender skips compression below:
89
87
  | Mode | Threshold |
90
88
  |-----------------|-----------|
91
89
  | No dictionary | 512 B |
92
- | With dictionary | 32 B |
90
+ | With dictionary | 128 B |
93
91
 
94
92
  Below the threshold the part is sent uncompressed (4-byte zero
95
93
  sentinel + plaintext).
@@ -99,14 +97,54 @@ sentinel + plaintext).
99
97
  The receiver bounds decompression by the socket's `max_message_size`
100
98
  (the same knob you'd use on a plain `tcp://` socket). It caps the
101
99
  **total decompressed size of all parts in a single message**. A peer
102
- attempting to send an over-budget message drops the connection
100
+ attempting to send an over-budget message drops the connection.
103
101
  `OMQ::SocketDeadError` surfaces on the next `receive`.
104
102
 
105
103
  Independent of that, the dictionary itself is capped at 8 KiB; a
106
104
  larger shipment drops the connection.
107
105
 
108
- See the plan roadmap ([../OMQ-LZ4.plan](../OMQ-LZ4.plan)) for
109
- history and open questions.
106
+ ## Wire format
107
+
108
+ Every post-handshake ZMTP message part starts with a 4-byte sentinel:
109
+
110
+ | Sentinel (hex) | ASCII | Meaning |
111
+ |---|---|---|
112
+ | `00 00 00 00` | (none) | Uncompressed plaintext |
113
+ | `4C 5A 34 42` | `LZ4B` | LZ4-compressed single block |
114
+ | `4C 5A 34 4D` | `LZ4M` | LZ4-compressed multi-block |
115
+ | `4C 5A 34 44` | `LZ4D` | Dictionary shipment |
116
+
117
+ **Single-block** (`LZ4B`): `sentinel (4) || decompressed_size u64 LE (8) || LZ4 block bytes`.
118
+ 12-byte envelope. Raw LZ4 block format (no magic, no descriptor, no
119
+ checksum). `decompressed_size` is required because LZ4 block format
120
+ carries no length prefix; the receiver pre-sizes its output buffer.
121
+
122
+ **Multi-block** (`LZ4M`): same header, followed by a sequence of
123
+ `u32 LE compressed_block_len || LZ4 block bytes` pairs. Each block
124
+ decompresses independently at up to 1 GiB. Used for parts exceeding
125
+ the single-block size cap.
126
+
127
+ **Dictionary shipment** (`LZ4D`): `sentinel (4) || dict bytes (1..8192)`.
128
+ Single-part ZMTP message consumed by the transport, not delivered
129
+ to the application. At most one per direction per connection.
130
+
131
+ Any other leading 4 bytes close the connection.
132
+
133
+ ## Constants
134
+
135
+ | Constant | Value |
136
+ |---|---|
137
+ | Scheme | `lz4+tcp` |
138
+ | Uncompressed sentinel | `00 00 00 00` |
139
+ | Single-block sentinel | `4C 5A 34 42` (`LZ4B`) |
140
+ | Multi-block sentinel | `4C 5A 34 4D` (`LZ4M`) |
141
+ | Dictionary sentinel | `4C 5A 34 44` (`LZ4D`) |
142
+ | LZ4M block size | 1 GiB (`0x40000000`) |
143
+ | Max dictionary size | 8 KiB |
144
+ | Min compress, no dict | 512 B |
145
+ | Min compress, with dict | 128 B |
146
+ | LZ4B envelope | 12 B (4 sentinel + 8 size) |
147
+ | Uncompressed envelope | 4 B (sentinel only) |
110
148
 
111
149
  ## Performance
112
150
 
@@ -123,7 +161,7 @@ Lorem ipsum prefix) input.
123
161
  | 16 KiB | ~3.2 µs | ~2.4 µs | ~3.9 µs | ~3.0 µs |
124
162
  | 1 MiB | ~89 µs | ~87 µs | ~173 µs | ~303 µs |
125
163
 
126
- **End-to-end PUSH PULL over `lz4+tcp://` (loopback):**
164
+ **End-to-end PUSH -> PULL over `lz4+tcp://` (loopback):**
127
165
 
128
166
  | Message size | Throughput |
129
167
  |--------------|-----------:|
@@ -141,8 +179,8 @@ OMQ_DEV=1 bundle exec ruby --yjit bench/head_to_head.rb # lz4 vs zstd
141
179
 
142
180
  ### Head-to-head vs `omq-zstd` and plain `tcp`
143
181
 
144
- End-to-end PUSH PULL throughput, Ruby 4.0 + YJIT. Input:
145
- UUID-sprinkled Lorem ipsum a fresh UUID between each Lorem
182
+ End-to-end PUSH -> PULL throughput, Ruby 4.0 + YJIT. Input:
183
+ UUID-sprinkled Lorem ipsum, a fresh UUID between each Lorem
146
184
  paragraph. Approximates realistic workloads where a schema
147
185
  repeats but values vary (event logs, protobuf records, JSON
148
186
  events), so a fraction of every message is mandatorily
@@ -160,27 +198,27 @@ across three bandwidth regimes.
160
198
  | Link | Metric | tcp | lz4+tcp | zstd -3 | zstd 3 |
161
199
  |---------------------|----------|------:|--------:|--------:|-------:|
162
200
  | **100 Mbit** | plain | 11.8 | 105 | 114 | **197**|
163
- | (cap 12 MiB/s) | wire | 11.8 | 12 | 12 | 12 |
164
- | | speedup | 1.00× | 8.89× | 9.70× |**16.74×**|
201
+ | (cap ~12 MiB/s) | wire | 11.8 | 12 | 12 | 12 |
202
+ | | speedup | 1.00x | 8.89x | 9.70x |**16.74x**|
165
203
  | **1 Gbit** | plain | 117 | 794 | **900** | 603 |
166
- | (cap 125 MiB/s) | wire | 117 | 93 | 94 | 36 |
167
- | | speedup | 1.00× | 6.81× |**7.73×**| 5.17× |
204
+ | (cap ~125 MiB/s) | wire | 117 | 93 | 94 | 36 |
205
+ | | speedup | 1.00x | 6.81x |**7.73x**| 5.17x |
168
206
  | **Unlimited loopback** | plain | **1 064** | 869 | 972 | 626 |
169
207
  | (kernel-copy-bound) | wire | 1 064 | 99 | 101 | 37 |
170
- | | speedup | 1.00× | 0.82× | 0.91× | 0.59× |
208
+ | | speedup | 1.00x | 0.82x | 0.91x | 0.59x |
171
209
 
172
210
  Three regimes visible:
173
211
 
174
- - **100 Mbit** all compressed transports saturate wire at
175
- ~12 MiB/s. Plaintext = wire-cap × (1 / compression-ratio). The
212
+ - **100 Mbit**: all compressed transports saturate wire at
213
+ ~12 MiB/s. Plaintext = wire-cap x (1 / compression-ratio). The
176
214
  tighter the ratio, the bigger the win: `zstd 3`'s 3% wire ratio
177
- translates to a **~17× throughput multiplier** over plain tcp.
178
- - **1 Gbit** compressed transports shift from wire-saturated to
215
+ translates to a **~17x throughput multiplier** over plain tcp.
216
+ - **1 Gbit**: compressed transports shift from wire-saturated to
179
217
  CPU-limited. `zstd -3` reaches ~75% of wire cap; `zstd 3` only
180
218
  29% (deep CPU-bound). Both beat plain tcp (which is pinned at
181
- the wire cap) by **6–8×**. `zstd 3`'s tighter wire no longer
182
- helps there's no wire saturation to trade CPU for.
183
- - **Unlimited loopback** no wire cap. All three are
219
+ the wire cap) by **6-8x**. `zstd 3`'s tighter wire no longer
220
+ helps; there's no wire saturation to trade CPU for.
221
+ - **Unlimited loopback**: no wire cap. All three are
184
222
  CPU-limited. Plain tcp doesn't pay compression CPU, so **skip
185
223
  compression on loopback**.
186
224
 
@@ -196,25 +234,25 @@ Or use a `veth` pair in a network namespace so shaping doesn't
196
234
  touch your host's real loopback (see `tc-netem(8)`, `ip-netns(8)`).
197
235
 
198
236
  Full sweeps (8 sizes from 256 B to 512 KiB) for each regime live
199
- in `bench/head_to_head.rb` output run it yourself; the
237
+ in `bench/head_to_head.rb` output. Run it yourself; the
200
238
  headline numbers above are stable across repeats but small sizes
201
239
  and very large sizes vary a bit run-to-run.
202
240
 
203
241
  **Takeaway:**
204
242
 
205
243
  - Pick **`lz4+tcp://`** for bandwidth-limited links (any real
206
- network even 1 Gbit LAN). 6–9× throughput multiplier over
244
+ network, even 1 Gbit LAN). 6-9x throughput multiplier over
207
245
  plain `tcp`, minimal memory (~16 KiB/connection), modest CPU.
208
246
  Ties or beats `zstd -3` at 1 Gbit; loses the ratio race to
209
247
  `zstd 3` at 100 Mbit and below.
210
- - Pick **`zstd+tcp://` (level 3)** when the wire is the
211
- precious resource (100 Mbit links, WAN, or you're paying for
212
- egress). **~17× throughput multiplier at 100 Mbit** for 128 KiB
213
- messages is hard to argue with.
248
+ - Pick **`zstd+tcp://` (level >= 3)** when the wire is the
249
+ precious resource (100 Mbit links or slower, WAN, or you're
250
+ paying for egress). **~17x throughput multiplier at 100 Mbit**
251
+ for 128 KiB messages is hard to argue with.
214
252
  - Pick **plain `tcp://`** when the link is *not* the bottleneck
215
253
  (localhost IPC, loopback, datacenter-fast inter-host
216
254
  connections where the bandwidth ceiling is above the CPU's
217
- compress/decompress speed typically 10+ Gbit), or when the
255
+ compress/decompress speed, typically 10+ Gbit), or when the
218
256
  payload is already high-entropy (encrypted, already compressed,
219
257
  random binary) and compression only adds overhead.
220
258
 
data/lib/omq/lz4/codec.rb CHANGED
@@ -41,7 +41,7 @@ module OMQ
41
41
  # passthrough anyway. Below the threshold, `encode_part` emits
42
42
  # UNCOMPRESSED directly without touching the compressor.
43
43
  MIN_COMPRESS_NO_DICT = 512
44
- MIN_COMPRESS_WITH_DICT = 32
44
+ MIN_COMPRESS_WITH_DICT = 128
45
45
 
46
46
  # Maximum dictionary size on the wire. A policy choice, not a
47
47
  # protocol limit; tight enough that constrained peers can accept
@@ -2,6 +2,6 @@
2
2
 
3
3
  module OMQ
4
4
  module LZ4
5
- VERSION = "0.3.0"
5
+ VERSION = "0.3.1"
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: omq-lz4
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Patrik Wenger
@@ -74,7 +74,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  requirements: []
77
- rubygems_version: 4.0.6
77
+ rubygems_version: 4.0.10
78
78
  specification_version: 4
79
79
  summary: LZ4+TCP transport for OMQ
80
80
  test_files: []