omq-lz4 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +10 -0
- data/README.md +75 -37
- data/lib/omq/lz4/codec.rb +1 -1
- data/lib/omq/lz4/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: b14469227b64cb647b42dbb8495754752ea81654449eab28cae78a0778aa36dd
|
|
4
|
+
data.tar.gz: 344189a2d0c29ec2d46b310f5607d133bdba87a69fb974be6bb582101f76aacf
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: c3ef67d35613434faf125317a61dded584c8d268032eb01830e22a809415ab20b2cf198daa8ee7803d04e88b97d4b39bb6042071a7b18bbf8cff03cb6db06cee
|
|
7
|
+
data.tar.gz: 49ed574d2b93d2e039160514602cf4f90e125edd9c2b3e05033fbbd5ee1dc1eb098373acaed27c54553e9e6977d6488d3f751b85544ad87a6f0498e5e061cad5
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,15 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [Unreleased]
|
|
4
|
+
|
|
5
|
+
## 0.3.1 (2026-05-28)
|
|
6
|
+
|
|
7
|
+
### Changed
|
|
8
|
+
|
|
9
|
+
- `MIN_COMPRESS_WITH_DICT`: raised from 32 to 128. The previous value
|
|
10
|
+
was too aggressive; 128 leaves a safer margin above the measured
|
|
11
|
+
crossover.
|
|
12
|
+
|
|
3
13
|
## 0.3.0 (2026-05-11)
|
|
4
14
|
|
|
5
15
|
### Added
|
data/README.md
CHANGED
|
@@ -4,19 +4,19 @@
|
|
|
4
4
|
[](LICENSE)
|
|
5
5
|
[](https://www.ruby-lang.org)
|
|
6
6
|
|
|
7
|
-
> [RFC.md](RFC.md) for the wire-format spec and
|
|
8
|
-
> [CHANGELOG.md](CHANGELOG.md) for what's in.
|
|
9
|
-
|
|
10
7
|
LZ4-compressed TCP transport for [OMQ](https://github.com/paddor/omq),
|
|
11
8
|
complementary to [`omq-zstd`](https://github.com/paddor/omq-zstd).
|
|
12
9
|
Pick `lz4+tcp://` instead of `tcp://` or `zstd+tcp://` when you want
|
|
13
10
|
cheap per-message compression with a small per-connection footprint.
|
|
14
11
|
|
|
12
|
+
See [RFC.md](RFC.md) for the wire-format specification and
|
|
13
|
+
[CHANGELOG.md](CHANGELOG.md) for release history.
|
|
14
|
+
|
|
15
15
|
## When to pick `lz4+tcp://` over `zstd+tcp://`
|
|
16
16
|
|
|
17
17
|
LZ4 has no entropy stage (no Huffman, no FSE), ~16 KiB of encoder state
|
|
18
18
|
per connection, and trades a **worse compression ratio** for
|
|
19
|
-
**~4
|
|
19
|
+
**~4-8x faster encode** and **~3x less memory per connection**.
|
|
20
20
|
|
|
21
21
|
| | `zstd+tcp://` | `lz4+tcp://` |
|
|
22
22
|
|---|---|---|
|
|
@@ -25,7 +25,7 @@ per connection, and trades a **worse compression ratio** for
|
|
|
25
25
|
| Memory per connection | ~256 KiB | ~16 KiB + dict |
|
|
26
26
|
| Ratio, 1 KiB JSON no dict | ~45% | ~65% |
|
|
27
27
|
| Ratio, 1 KiB JSON with dict | ~20% | ~35% |
|
|
28
|
-
| Auto-trained dictionaries |
|
|
28
|
+
| Auto-trained dictionaries | yes | no (user-supplied only) |
|
|
29
29
|
|
|
30
30
|
Pick `omq-lz4` for CPU- or memory-scarce deployments (edge gateways,
|
|
31
31
|
IoT concentrators, high-fanout scenarios where per-connection state
|
|
@@ -60,13 +60,13 @@ pull.receive # => ["hello, compressed world"]
|
|
|
60
60
|
```
|
|
61
61
|
|
|
62
62
|
Both peers must use `lz4+tcp://`. A `tcp://` peer cannot talk to an
|
|
63
|
-
`lz4+tcp://` peer
|
|
63
|
+
`lz4+tcp://` peer. They speak different transports.
|
|
64
64
|
|
|
65
65
|
### Dictionary compression
|
|
66
66
|
|
|
67
67
|
Small messages don't compress well on their own. A shared dictionary
|
|
68
|
-
gives 2
|
|
69
|
-
user-trained dictionary (LZ4 has no auto-training
|
|
68
|
+
gives 2-5x better ratios on payloads with a common prefix. Supply a
|
|
69
|
+
user-trained dictionary (LZ4 has no auto-training; use `omq-zstd`
|
|
70
70
|
for that):
|
|
71
71
|
|
|
72
72
|
```ruby
|
|
@@ -78,9 +78,7 @@ The sender ships the dictionary to the receiver in-band, prefixed
|
|
|
78
78
|
with the dictionary sentinel (`4C 5A 34 44`, "LZ4D" in ASCII), on
|
|
79
79
|
the first outgoing message. The receiver installs the dictionary
|
|
80
80
|
and decompresses subsequent messages against it. Dictionary size
|
|
81
|
-
is capped at **8 KiB**
|
|
82
|
-
let constrained peers accept shipments without allocating tens of
|
|
83
|
-
KB of scratch.
|
|
81
|
+
is capped at **8 KiB** (same cap as `omq-zstd`).
|
|
84
82
|
|
|
85
83
|
### Compression thresholds
|
|
86
84
|
|
|
@@ -89,7 +87,7 @@ To avoid pessimizing tiny frames, the sender skips compression below:
|
|
|
89
87
|
| Mode | Threshold |
|
|
90
88
|
|-----------------|-----------|
|
|
91
89
|
| No dictionary | 512 B |
|
|
92
|
-
| With dictionary |
|
|
90
|
+
| With dictionary | 128 B |
|
|
93
91
|
|
|
94
92
|
Below the threshold the part is sent uncompressed (4-byte zero
|
|
95
93
|
sentinel + plaintext).
|
|
@@ -99,14 +97,54 @@ sentinel + plaintext).
|
|
|
99
97
|
The receiver bounds decompression by the socket's `max_message_size`
|
|
100
98
|
(the same knob you'd use on a plain `tcp://` socket). It caps the
|
|
101
99
|
**total decompressed size of all parts in a single message**. A peer
|
|
102
|
-
attempting to send an over-budget message drops the connection
|
|
100
|
+
attempting to send an over-budget message drops the connection.
|
|
103
101
|
`OMQ::SocketDeadError` surfaces on the next `receive`.
|
|
104
102
|
|
|
105
103
|
Independent of that, the dictionary itself is capped at 8 KiB; a
|
|
106
104
|
larger shipment drops the connection.
|
|
107
105
|
|
|
108
|
-
|
|
109
|
-
|
|
106
|
+
## Wire format
|
|
107
|
+
|
|
108
|
+
Every post-handshake ZMTP message part starts with a 4-byte sentinel:
|
|
109
|
+
|
|
110
|
+
| Sentinel (hex) | ASCII | Meaning |
|
|
111
|
+
|---|---|---|
|
|
112
|
+
| `00 00 00 00` | (none) | Uncompressed plaintext |
|
|
113
|
+
| `4C 5A 34 42` | `LZ4B` | LZ4-compressed single block |
|
|
114
|
+
| `4C 5A 34 4D` | `LZ4M` | LZ4-compressed multi-block |
|
|
115
|
+
| `4C 5A 34 44` | `LZ4D` | Dictionary shipment |
|
|
116
|
+
|
|
117
|
+
**Single-block** (`LZ4B`): `sentinel (4) || decompressed_size u64 LE (8) || LZ4 block bytes`.
|
|
118
|
+
12-byte envelope. Raw LZ4 block format (no magic, no descriptor, no
|
|
119
|
+
checksum). `decompressed_size` is required because LZ4 block format
|
|
120
|
+
carries no length prefix; the receiver pre-sizes its output buffer.
|
|
121
|
+
|
|
122
|
+
**Multi-block** (`LZ4M`): same header, followed by a sequence of
|
|
123
|
+
`u32 LE compressed_block_len || LZ4 block bytes` pairs. Each block
|
|
124
|
+
decompresses independently at up to 1 GiB. Used for parts exceeding
|
|
125
|
+
the single-block size cap.
|
|
126
|
+
|
|
127
|
+
**Dictionary shipment** (`LZ4D`): `sentinel (4) || dict bytes (1..8192)`.
|
|
128
|
+
Single-part ZMTP message consumed by the transport, not delivered
|
|
129
|
+
to the application. At most one per direction per connection.
|
|
130
|
+
|
|
131
|
+
Any other leading 4 bytes close the connection.
|
|
132
|
+
|
|
133
|
+
## Constants
|
|
134
|
+
|
|
135
|
+
| Constant | Value |
|
|
136
|
+
|---|---|
|
|
137
|
+
| Scheme | `lz4+tcp` |
|
|
138
|
+
| Uncompressed sentinel | `00 00 00 00` |
|
|
139
|
+
| Single-block sentinel | `4C 5A 34 42` (`LZ4B`) |
|
|
140
|
+
| Multi-block sentinel | `4C 5A 34 4D` (`LZ4M`) |
|
|
141
|
+
| Dictionary sentinel | `4C 5A 34 44` (`LZ4D`) |
|
|
142
|
+
| LZ4M block size | 1 GiB (`0x40000000`) |
|
|
143
|
+
| Max dictionary size | 8 KiB |
|
|
144
|
+
| Min compress, no dict | 512 B |
|
|
145
|
+
| Min compress, with dict | 128 B |
|
|
146
|
+
| LZ4B envelope | 12 B (4 sentinel + 8 size) |
|
|
147
|
+
| Uncompressed envelope | 4 B (sentinel only) |
|
|
110
148
|
|
|
111
149
|
## Performance
|
|
112
150
|
|
|
@@ -123,7 +161,7 @@ Lorem ipsum prefix) input.
|
|
|
123
161
|
| 16 KiB | ~3.2 µs | ~2.4 µs | ~3.9 µs | ~3.0 µs |
|
|
124
162
|
| 1 MiB | ~89 µs | ~87 µs | ~173 µs | ~303 µs |
|
|
125
163
|
|
|
126
|
-
**End-to-end PUSH
|
|
164
|
+
**End-to-end PUSH -> PULL over `lz4+tcp://` (loopback):**
|
|
127
165
|
|
|
128
166
|
| Message size | Throughput |
|
|
129
167
|
|--------------|-----------:|
|
|
@@ -141,8 +179,8 @@ OMQ_DEV=1 bundle exec ruby --yjit bench/head_to_head.rb # lz4 vs zstd
|
|
|
141
179
|
|
|
142
180
|
### Head-to-head vs `omq-zstd` and plain `tcp`
|
|
143
181
|
|
|
144
|
-
End-to-end PUSH
|
|
145
|
-
UUID-sprinkled Lorem ipsum
|
|
182
|
+
End-to-end PUSH -> PULL throughput, Ruby 4.0 + YJIT. Input:
|
|
183
|
+
UUID-sprinkled Lorem ipsum, a fresh UUID between each Lorem
|
|
146
184
|
paragraph. Approximates realistic workloads where a schema
|
|
147
185
|
repeats but values vary (event logs, protobuf records, JSON
|
|
148
186
|
events), so a fraction of every message is mandatorily
|
|
@@ -160,27 +198,27 @@ across three bandwidth regimes.
|
|
|
160
198
|
| Link | Metric | tcp | lz4+tcp | zstd -3 | zstd 3 |
|
|
161
199
|
|---------------------|----------|------:|--------:|--------:|-------:|
|
|
162
200
|
| **100 Mbit** | plain | 11.8 | 105 | 114 | **197**|
|
|
163
|
-
| (cap
|
|
164
|
-
| | speedup | 1.
|
|
201
|
+
| (cap ~12 MiB/s) | wire | 11.8 | 12 | 12 | 12 |
|
|
202
|
+
| | speedup | 1.00x | 8.89x | 9.70x |**16.74x**|
|
|
165
203
|
| **1 Gbit** | plain | 117 | 794 | **900** | 603 |
|
|
166
|
-
| (cap
|
|
167
|
-
| | speedup | 1.
|
|
204
|
+
| (cap ~125 MiB/s) | wire | 117 | 93 | 94 | 36 |
|
|
205
|
+
| | speedup | 1.00x | 6.81x |**7.73x**| 5.17x |
|
|
168
206
|
| **Unlimited loopback** | plain | **1 064** | 869 | 972 | 626 |
|
|
169
207
|
| (kernel-copy-bound) | wire | 1 064 | 99 | 101 | 37 |
|
|
170
|
-
| | speedup | 1.
|
|
208
|
+
| | speedup | 1.00x | 0.82x | 0.91x | 0.59x |
|
|
171
209
|
|
|
172
210
|
Three regimes visible:
|
|
173
211
|
|
|
174
|
-
- **100 Mbit
|
|
175
|
-
~12 MiB/s. Plaintext = wire-cap
|
|
212
|
+
- **100 Mbit**: all compressed transports saturate wire at
|
|
213
|
+
~12 MiB/s. Plaintext = wire-cap x (1 / compression-ratio). The
|
|
176
214
|
tighter the ratio, the bigger the win: `zstd 3`'s 3% wire ratio
|
|
177
|
-
translates to a **~
|
|
178
|
-
- **1 Gbit
|
|
215
|
+
translates to a **~17x throughput multiplier** over plain tcp.
|
|
216
|
+
- **1 Gbit**: compressed transports shift from wire-saturated to
|
|
179
217
|
CPU-limited. `zstd -3` reaches ~75% of wire cap; `zstd 3` only
|
|
180
218
|
29% (deep CPU-bound). Both beat plain tcp (which is pinned at
|
|
181
|
-
the wire cap) by **6
|
|
182
|
-
helps
|
|
183
|
-
- **Unlimited loopback
|
|
219
|
+
the wire cap) by **6-8x**. `zstd 3`'s tighter wire no longer
|
|
220
|
+
helps; there's no wire saturation to trade CPU for.
|
|
221
|
+
- **Unlimited loopback**: no wire cap. All three are
|
|
184
222
|
CPU-limited. Plain tcp doesn't pay compression CPU, so **skip
|
|
185
223
|
compression on loopback**.
|
|
186
224
|
|
|
@@ -196,25 +234,25 @@ Or use a `veth` pair in a network namespace so shaping doesn't
|
|
|
196
234
|
touch your host's real loopback (see `tc-netem(8)`, `ip-netns(8)`).
|
|
197
235
|
|
|
198
236
|
Full sweeps (8 sizes from 256 B to 512 KiB) for each regime live
|
|
199
|
-
in `bench/head_to_head.rb` output
|
|
237
|
+
in `bench/head_to_head.rb` output. Run it yourself; the
|
|
200
238
|
headline numbers above are stable across repeats but small sizes
|
|
201
239
|
and very large sizes vary a bit run-to-run.
|
|
202
240
|
|
|
203
241
|
**Takeaway:**
|
|
204
242
|
|
|
205
243
|
- Pick **`lz4+tcp://`** for bandwidth-limited links (any real
|
|
206
|
-
network
|
|
244
|
+
network, even 1 Gbit LAN). 6-9x throughput multiplier over
|
|
207
245
|
plain `tcp`, minimal memory (~16 KiB/connection), modest CPU.
|
|
208
246
|
Ties or beats `zstd -3` at 1 Gbit; loses the ratio race to
|
|
209
247
|
`zstd 3` at 100 Mbit and below.
|
|
210
|
-
- Pick **`zstd+tcp://` (level
|
|
211
|
-
precious resource (
|
|
212
|
-
egress). **~
|
|
213
|
-
messages is hard to argue with.
|
|
248
|
+
- Pick **`zstd+tcp://` (level >= 3)** when the wire is the
|
|
249
|
+
precious resource (100 Mbit links or slower, WAN, or you're
|
|
250
|
+
paying for egress). **~17x throughput multiplier at 100 Mbit**
|
|
251
|
+
for 128 KiB messages is hard to argue with.
|
|
214
252
|
- Pick **plain `tcp://`** when the link is *not* the bottleneck
|
|
215
253
|
(localhost IPC, loopback, datacenter-fast inter-host
|
|
216
254
|
connections where the bandwidth ceiling is above the CPU's
|
|
217
|
-
compress/decompress speed
|
|
255
|
+
compress/decompress speed, typically 10+ Gbit), or when the
|
|
218
256
|
payload is already high-entropy (encrypted, already compressed,
|
|
219
257
|
random binary) and compression only adds overhead.
|
|
220
258
|
|
data/lib/omq/lz4/codec.rb
CHANGED
|
@@ -41,7 +41,7 @@ module OMQ
|
|
|
41
41
|
# passthrough anyway. Below the threshold, `encode_part` emits
|
|
42
42
|
# UNCOMPRESSED directly without touching the compressor.
|
|
43
43
|
MIN_COMPRESS_NO_DICT = 512
|
|
44
|
-
MIN_COMPRESS_WITH_DICT =
|
|
44
|
+
MIN_COMPRESS_WITH_DICT = 128
|
|
45
45
|
|
|
46
46
|
# Maximum dictionary size on the wire. A policy choice, not a
|
|
47
47
|
# protocol limit; tight enough that constrained peers can accept
|
data/lib/omq/lz4/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: omq-lz4
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.1
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Patrik Wenger
|
|
@@ -74,7 +74,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
74
74
|
- !ruby/object:Gem::Version
|
|
75
75
|
version: '0'
|
|
76
76
|
requirements: []
|
|
77
|
-
rubygems_version: 4.0.
|
|
77
|
+
rubygems_version: 4.0.10
|
|
78
78
|
specification_version: 4
|
|
79
79
|
summary: LZ4+TCP transport for OMQ
|
|
80
80
|
test_files: []
|