omq-zstd 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +52 -22
- data/RFC.md +120 -103
- data/lib/omq/transport/zstd_tcp/codec.rb +3 -3
- data/lib/omq/zstd/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1fcea64306ec7603681d7b3350e7cdf2d22d43487be062e79cc91f51f64aacfb
|
|
4
|
+
data.tar.gz: ac5d8d88deb3f90ccf97be757d384612274f4d742b042b79bc949a47ea7d0d3c
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 8e5a04f9ffdb46d130af145fa7025c544cc1709e7711cd92180b0408923667e61d70098efc4679c7b0d2617a3acde40b91c9ecaa1b5abb9e56cadc4a34cd8294
|
|
7
|
+
data.tar.gz: 25ab87f733a74257a9b2d772d96a76be134b60804ce2796fa0a95710334eb0552f973fab4a7b04442f6d9cce8091da008c5ed17e6868e925b942fdc4dfefd0f6
|
data/README.md
CHANGED
|
@@ -4,13 +4,11 @@
|
|
|
4
4
|
[](LICENSE)
|
|
5
5
|
[](https://www.ruby-lang.org)
|
|
6
6
|
|
|
7
|
-
> **Status:** Draft. Wire format may change before the first tagged release.
|
|
8
|
-
|
|
9
7
|
Zstandard-compressed TCP transport for [OMQ](https://github.com/paddor/omq).
|
|
10
8
|
Pick `zstd+tcp://` instead of `tcp://` and every message part on the wire is
|
|
11
9
|
compressed per-part with [Zstandard](https://github.com/facebook/zstd).
|
|
12
|
-
Compression is intrinsic to the transport
|
|
13
|
-
no payload changes. The ZMTP handshake
|
|
10
|
+
Compression is intrinsic to the transport: no negotiation, no socket option,
|
|
11
|
+
no payload changes. The ZMTP handshake runs over plain TCP; only
|
|
14
12
|
post-handshake message parts are compressed.
|
|
15
13
|
|
|
16
14
|
See [RFC.md](RFC.md) for the wire-format specification and
|
|
@@ -44,7 +42,7 @@ pull.receive # => ["hello, compressed world"]
|
|
|
44
42
|
```
|
|
45
43
|
|
|
46
44
|
Both peers must use the `zstd+tcp://` scheme. A `tcp://` peer cannot talk to
|
|
47
|
-
a `zstd+tcp://` peer
|
|
45
|
+
a `zstd+tcp://` peer. They speak different transports.
|
|
48
46
|
|
|
49
47
|
### Compression level
|
|
50
48
|
|
|
@@ -61,7 +59,7 @@ at any level the peer chose.
|
|
|
61
59
|
### Dictionaries
|
|
62
60
|
|
|
63
61
|
Small messages don't compress well on their own. A shared Zstd dictionary
|
|
64
|
-
trained on representative payloads gives 2
|
|
62
|
+
trained on representative payloads gives 2-10x ratios on payloads in the
|
|
65
63
|
dozens-to-hundreds-of-bytes range.
|
|
66
64
|
|
|
67
65
|
**User-supplied dictionary** (out-of-band agreement):
|
|
@@ -75,11 +73,11 @@ The sender ships the dictionary to the receiver in-band as a one-shot
|
|
|
75
73
|
single-part message prefixed with the dictionary sentinel
|
|
76
74
|
(`37 A4 30 EC`), so the receiver does not need a copy on disk.
|
|
77
75
|
|
|
78
|
-
**Auto-trained dictionary** (zero config
|
|
76
|
+
**Auto-trained dictionary** (zero config, the default when no `dict:` is
|
|
79
77
|
passed): the sender collects up to 1000 samples or 100 KiB (whichever hits
|
|
80
|
-
first),
|
|
81
|
-
mode. Until then, payloads are
|
|
82
|
-
plaintext when below the threshold.
|
|
78
|
+
first), skipping samples larger than 2048 bytes. It trains a 2 KiB dictionary,
|
|
79
|
+
ships it inline, and switches to dictionary mode. Until then, payloads are
|
|
80
|
+
compressed without a dictionary or sent plaintext when below the threshold.
|
|
83
81
|
|
|
84
82
|
### Compression thresholds
|
|
85
83
|
|
|
@@ -95,8 +93,8 @@ plaintext bytes).
|
|
|
95
93
|
|
|
96
94
|
### Security limits
|
|
97
95
|
|
|
98
|
-
The receiver bounds decompression by the socket's own `max_message_size
|
|
99
|
-
|
|
96
|
+
The receiver bounds decompression by the socket's own `max_message_size`,
|
|
97
|
+
the same knob you'd use on a plain `tcp://` socket. It caps the
|
|
100
98
|
**total decompressed size of all parts in a single message**, not each
|
|
101
99
|
part individually: the budget starts at `max_message_size` and shrinks
|
|
102
100
|
as each part is decoded, so a message whose parts sum to more than the
|
|
@@ -111,10 +109,42 @@ ceiling on decompressed message size. Set a value that matches what
|
|
|
111
109
|
your application would tolerate over plain `tcp://`.
|
|
112
110
|
|
|
113
111
|
Independent of the message-size knob, the dictionary itself is capped at
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
112
|
+
8 KiB. A peer attempting to ship a larger dictionary, or send a message
|
|
113
|
+
whose decompressed parts exceed `max_message_size`, drops the connection.
|
|
114
|
+
`OMQ::SocketDeadError` surfaces on the next `receive`.
|
|
115
|
+
|
|
116
|
+
## Wire format
|
|
117
|
+
|
|
118
|
+
Every post-handshake ZMTP message part starts with a 4-byte sentinel:
|
|
119
|
+
|
|
120
|
+
| Sentinel (hex) | Meaning |
|
|
121
|
+
|---|---|
|
|
122
|
+
| `00 00 00 00` | Uncompressed plaintext |
|
|
123
|
+
| `28 B5 2F FD` | Zstandard-compressed frame |
|
|
124
|
+
| `37 A4 30 EC` | Dictionary shipment |
|
|
125
|
+
|
|
126
|
+
Compressed parts are standard Zstandard frames with `Frame_Content_Size`
|
|
127
|
+
set in the header. The receiver uses FCS for budget enforcement before
|
|
128
|
+
invoking the decoder. Any other leading 4 bytes close the connection.
|
|
129
|
+
|
|
130
|
+
Dictionary shipments are single-part ZMTP messages consumed by the
|
|
131
|
+
transport layer. They are not delivered to the application.
|
|
132
|
+
|
|
133
|
+
## Constants
|
|
134
|
+
|
|
135
|
+
| Constant | Value |
|
|
136
|
+
|---|---|
|
|
137
|
+
| Uncompressed sentinel | `00 00 00 00` |
|
|
138
|
+
| Zstd frame sentinel | `28 B5 2F FD` (Zstandard frame magic) |
|
|
139
|
+
| Dictionary sentinel | `37 A4 30 EC` |
|
|
140
|
+
| Default level | -3 |
|
|
141
|
+
| Min compress, no dict | 512 B |
|
|
142
|
+
| Min compress, with dict | 64 B |
|
|
143
|
+
| Max dictionary size | 8 KiB |
|
|
144
|
+
| Train max samples | 1000 |
|
|
145
|
+
| Train max bytes | 100 KiB |
|
|
146
|
+
| Train max sample length | 2048 B |
|
|
147
|
+
| Dictionary capacity | 2 KiB |
|
|
118
148
|
|
|
119
149
|
## When to use it
|
|
120
150
|
|
|
@@ -127,12 +157,12 @@ surfaces on the next `receive`.
|
|
|
127
157
|
|
|
128
158
|
It is **not** worth it for:
|
|
129
159
|
|
|
130
|
-
- `inproc://` or `ipc
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
- Already-compressed payloads (gzip, video, encrypted blobs)
|
|
160
|
+
- `inproc://` or `ipc://`. No wire to shrink. Use `zstd+tcp://` only on
|
|
161
|
+
the connections that actually need it. Other transports on the same
|
|
162
|
+
socket are unaffected.
|
|
163
|
+
- Already-compressed payloads (gzip, video, encrypted blobs). The Zstd
|
|
134
164
|
pass adds CPU for no gain.
|
|
135
|
-
- Latency-critical sub-microsecond paths
|
|
165
|
+
- Latency-critical sub-microsecond paths. Compression adds single-digit
|
|
136
166
|
microseconds per kilobyte at low levels, but it is not free.
|
|
137
167
|
|
|
138
168
|
## How it works (in one paragraph)
|
|
@@ -140,7 +170,7 @@ It is **not** worth it for:
|
|
|
140
170
|
`require "omq/zstd"` registers the `zstd+tcp` scheme on
|
|
141
171
|
`OMQ::Engine.transports`. A `zstd+tcp` socket builds a per-engine
|
|
142
172
|
`Codec` (one Zstd dictionary instance shared across all the socket's
|
|
143
|
-
connections
|
|
173
|
+
connections; fan-out compresses each part exactly once). Each accepted
|
|
144
174
|
or dialed TCP connection is wrapped in `ZstdConnection`, a
|
|
145
175
|
`SimpleDelegator` over the underlying ZMTP connection that intercepts
|
|
146
176
|
`#send_message` / `#write_message` / `#receive_message`. Message parts
|
data/RFC.md
CHANGED
|
@@ -1,11 +1,13 @@
|
|
|
1
|
-
# ZMTP
|
|
1
|
+
# ZMTP-Zstd: Zstandard-Compressed TCP Transport for ZMTP
|
|
2
2
|
|
|
3
|
-
| Field
|
|
3
|
+
| Field | Value |
|
|
4
4
|
|----------|----------------------------------------------------|
|
|
5
|
-
| Status
|
|
6
|
-
| Editor
|
|
5
|
+
| Status | Draft |
|
|
6
|
+
| Editor | Patrik Wenger |
|
|
7
|
+
| Scheme | `zstd+tcp://` |
|
|
7
8
|
| Requires | [RFC 37/ZMTP 3.1](https://rfc.zeromq.org/spec/37/) |
|
|
8
9
|
|
|
10
|
+
|
|
9
11
|
## 1. Abstract
|
|
10
12
|
|
|
11
13
|
This specification defines `zstd+tcp://`, a TCP transport for ZMTP 3.1
|
|
@@ -16,10 +18,19 @@ greeting and handshake proceed over raw TCP exactly as they would over
|
|
|
16
18
|
is individually encoded with a 4-byte sentinel dispatch that
|
|
17
19
|
distinguishes uncompressed plaintext, Zstandard-compressed frames, and
|
|
18
20
|
dictionary shipments. No ZMTP properties, command frames, or
|
|
19
|
-
negotiation are involved
|
|
21
|
+
negotiation are involved. Compression is an intrinsic property of the
|
|
20
22
|
transport, like encryption is an intrinsic property of TLS.
|
|
21
23
|
|
|
22
|
-
|
|
24
|
+
|
|
25
|
+
## 2. Language
|
|
26
|
+
|
|
27
|
+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|
28
|
+
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
|
29
|
+
document are to be interpreted as described in
|
|
30
|
+
[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
## 3. Motivation
|
|
23
34
|
|
|
24
35
|
Zstandard at low compression levels encodes in single-digit microseconds
|
|
25
36
|
per kilobyte, decompresses faster still, and on dictionary-trained
|
|
@@ -39,14 +50,14 @@ the transport. `zstd+tcp://` replaces it with a transport-level
|
|
|
39
50
|
mechanism that any ZMTP application benefits from without changes to the
|
|
40
51
|
payload.
|
|
41
52
|
|
|
42
|
-
###
|
|
53
|
+
### 3.1 Why a transport scheme
|
|
43
54
|
|
|
44
55
|
Compression could live at three layers. Each has a fatal flaw except the
|
|
45
56
|
transport layer.
|
|
46
57
|
|
|
47
58
|
**Socket-level wrapper** (too high). A wrapper above routing knows
|
|
48
59
|
nothing about transports. It compresses local connections (pure
|
|
49
|
-
overhead) and cannot act on new connections naturally
|
|
60
|
+
overhead) and cannot act on new connections naturally. Dictionary
|
|
50
61
|
shipping requires per-connection state, but a wrapper only sees messages
|
|
51
62
|
after routing has dispatched them. Reconnect handling requires hooking
|
|
52
63
|
into connection lifecycle events that are awkward from outside.
|
|
@@ -61,13 +72,13 @@ connections.
|
|
|
61
72
|
explicit in the endpoint URI. Only TCP connections get compressed. Local
|
|
62
73
|
transports are unaffected even on the same socket. Dictionary lifetime
|
|
63
74
|
matches connection lifetime naturally (new connection = new wrapper =
|
|
64
|
-
re-ship dictionary). No negotiation is needed
|
|
75
|
+
re-ship dictionary). No negotiation is needed; both peers use
|
|
65
76
|
`zstd+tcp://`. The codec is socket-wide (shared across connections), so
|
|
66
77
|
fan-out patterns compress once and reuse the result.
|
|
67
78
|
|
|
68
|
-
###
|
|
79
|
+
### 3.2 Why not negotiate
|
|
69
80
|
|
|
70
|
-
ZMTP 3.1 already supports unknown READY properties
|
|
81
|
+
ZMTP 3.1 already supports unknown READY properties. An unaware peer
|
|
71
82
|
silently ignores them. A negotiation-based design could fall back to
|
|
72
83
|
plaintext when the peer does not understand compression. But this
|
|
73
84
|
introduces complexity (profile matching, asymmetric per-direction state,
|
|
@@ -76,7 +87,7 @@ deployment decision, not a runtime discovery. Both peers are configured
|
|
|
76
87
|
to use `zstd+tcp://` or they are not. The transport scheme approach
|
|
77
88
|
eliminates the entire negotiation surface and its edge cases.
|
|
78
89
|
|
|
79
|
-
###
|
|
90
|
+
### 3.3 Why Zstandard
|
|
80
91
|
|
|
81
92
|
Zstandard at low levels matches LZ4 on encode latency, beats it on
|
|
82
93
|
decompression speed and ratio at every realistic ZMQ payload size, and
|
|
@@ -85,9 +96,10 @@ particularly important for fan-out patterns (PUB/SUB, RADIO/DISH): the
|
|
|
85
96
|
publisher pays one compress, every subscriber pays decompress, so
|
|
86
97
|
per-subscriber CPU dominates the total budget.
|
|
87
98
|
|
|
88
|
-
## 3. Goals and Non-goals
|
|
89
99
|
|
|
90
|
-
|
|
100
|
+
## 4. Goals and Non-goals
|
|
101
|
+
|
|
102
|
+
### 4.1 Goals
|
|
91
103
|
|
|
92
104
|
- Transparent to application code: send/receive operations see plaintext.
|
|
93
105
|
- Per-part sender decision: opt out for short or incompressible parts.
|
|
@@ -98,50 +110,52 @@ per-subscriber CPU dominates the total budget.
|
|
|
98
110
|
- No ZMTP-level negotiation, no new READY properties, no new command
|
|
99
111
|
frames.
|
|
100
112
|
|
|
101
|
-
###
|
|
113
|
+
### 4.2 Non-goals
|
|
102
114
|
|
|
103
115
|
- New ZMTP mechanism, new socket type, new greeting, new frame flag bit.
|
|
104
116
|
- Compression of the ZMTP greeting or command frames (READY, SUBSCRIBE,
|
|
105
117
|
PING, PONG, ...).
|
|
106
|
-
- Application to non-TCP transports (`inproc://` is zero-copy
|
|
118
|
+
- Application to non-TCP transports (`inproc://` is zero-copy;
|
|
107
119
|
compression is pure overhead; `ipc://` rarely benefits).
|
|
108
120
|
- Replacing or weakening CurveZMQ or any other security mechanism.
|
|
109
|
-
See Sec.
|
|
121
|
+
See Sec. 9.
|
|
110
122
|
- Streaming / context-takeover compression. Each part is decodable in
|
|
111
123
|
isolation with no dependency on a previous part's LZ77 history.
|
|
112
124
|
|
|
113
|
-
## 4. Terminology
|
|
114
125
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
|
118
|
-
|
|
119
|
-
|
|
|
120
|
-
|
|
|
121
|
-
|
|
|
122
|
-
|
|
|
126
|
+
## 5. Terminology
|
|
127
|
+
|
|
128
|
+
| Term | Meaning |
|
|
129
|
+
|---------------------|----------------------------------------------------------------------|
|
|
130
|
+
| Part | One ZMTP message frame body. A multipart message has multiple parts. |
|
|
131
|
+
| Sentinel | The first 4 bytes of a post-handshake part on the wire (Sec. 6.1). |
|
|
132
|
+
| Uncompressed part | A wire part whose sentinel is `00 00 00 00`. |
|
|
133
|
+
| Compressed part | A wire part whose first 4 bytes are the Zstandard magic `28 B5 2F FD`. |
|
|
134
|
+
| Dictionary part | A wire part whose first 4 bytes are `37 A4 30 EC` (Sec. 7). |
|
|
135
|
+
| Dictionary message | A single-part ZMTP message consisting of exactly one dictionary part. |
|
|
136
|
+
|
|
123
137
|
|
|
124
|
-
##
|
|
138
|
+
## 6. Part Encoding
|
|
125
139
|
|
|
126
140
|
After the ZMTP handshake completes, every message part on the wire is
|
|
127
141
|
individually encoded. The ZMTP MORE flag is carried on the wire frame
|
|
128
|
-
header as normal. Multipart messages are encoded part by part
|
|
142
|
+
header as normal. Multipart messages are encoded part by part; each
|
|
129
143
|
part is independent.
|
|
130
144
|
|
|
131
|
-
###
|
|
145
|
+
### 6.1 Sentinel dispatch
|
|
132
146
|
|
|
133
147
|
The first 4 bytes of each wire part determine how it is decoded.
|
|
134
148
|
|
|
135
|
-
| Sentinel (hex) | Meaning
|
|
136
|
-
|
|
137
|
-
| `00 00 00 00` | Uncompressed plaintext (Sec.
|
|
138
|
-
| `28 B5 2F FD` | Zstandard compressed frame (Sec.
|
|
139
|
-
| `37 A4 30 EC` | Dictionary shipment (Sec.
|
|
149
|
+
| Sentinel (hex) | Meaning |
|
|
150
|
+
|------------------|--------------------------------------|
|
|
151
|
+
| `00 00 00 00` | Uncompressed plaintext (Sec. 6.3) |
|
|
152
|
+
| `28 B5 2F FD` | Zstandard compressed frame (Sec. 6.4)|
|
|
153
|
+
| `37 A4 30 EC` | Dictionary shipment (Sec. 7) |
|
|
140
154
|
|
|
141
155
|
All other 4-byte values are reserved. A receiver that encounters an
|
|
142
|
-
unknown sentinel MUST
|
|
156
|
+
unknown sentinel MUST close the connection.
|
|
143
157
|
|
|
144
|
-
###
|
|
158
|
+
### 6.2 Compression level
|
|
145
159
|
|
|
146
160
|
The default compression level is **-3** (Zstandard fast strategy). At
|
|
147
161
|
this level the encoder cost is in the low single-digit microseconds per
|
|
@@ -149,11 +163,11 @@ kilobyte, and the achieved ratio is within a few percent of level 3 once
|
|
|
149
163
|
a dictionary is in play.
|
|
150
164
|
|
|
151
165
|
The compression level is a sender choice and is not communicated on the
|
|
152
|
-
wire
|
|
166
|
+
wire. The receiver decodes any valid Zstandard frame regardless of the
|
|
153
167
|
level used to encode it. Implementations SHOULD expose the level as a
|
|
154
168
|
configurable parameter.
|
|
155
169
|
|
|
156
|
-
###
|
|
170
|
+
### 6.3 Uncompressed sentinel `00 00 00 00`
|
|
157
171
|
|
|
158
172
|
```
|
|
159
173
|
+------------------+-------------------+
|
|
@@ -169,7 +183,7 @@ without an extra flag bit in the ZMTP frame header.
|
|
|
169
183
|
Four zero bytes cannot collide with a valid Zstandard frame magic or the
|
|
170
184
|
dictionary sentinel, so no ambiguity arises.
|
|
171
185
|
|
|
172
|
-
###
|
|
186
|
+
### 6.4 Compressed Zstandard frame
|
|
173
187
|
|
|
174
188
|
```
|
|
175
189
|
+------------------+
|
|
@@ -178,15 +192,15 @@ dictionary sentinel, so no ambiguity arises.
|
|
|
178
192
|
+------------------+
|
|
179
193
|
```
|
|
180
194
|
|
|
181
|
-
The wire part IS the Zstandard frame
|
|
195
|
+
The wire part IS the Zstandard frame. Its first 4 bytes are the
|
|
182
196
|
standard Zstandard frame magic `28 B5 2F FD`. No additional framing is
|
|
183
197
|
added.
|
|
184
198
|
|
|
185
199
|
The sender MUST configure the encoder to write the `Frame_Content_Size`
|
|
186
|
-
field in the Zstandard frame header (RFC 8878
|
|
187
|
-
is required for the receiver's budget enforcement (Sec.
|
|
200
|
+
field in the Zstandard frame header (RFC 8878 Sec. 3.1.1.1.2). This
|
|
201
|
+
field is required for the receiver's budget enforcement (Sec. 6.6).
|
|
188
202
|
|
|
189
|
-
###
|
|
203
|
+
### 6.5 Sender rules
|
|
190
204
|
|
|
191
205
|
For each outgoing message part, the sender proceeds as follows:
|
|
192
206
|
|
|
@@ -203,7 +217,7 @@ For each outgoing message part, the sender proceeds as follows:
|
|
|
203
217
|
|
|
204
218
|
3. Otherwise, run the Zstandard encoder. The encoder MUST write the
|
|
205
219
|
`Frame_Content_Size` field. If the compressed output's size is
|
|
206
|
-
|
|
220
|
+
>= `plaintext_size - 4` (net saving <= 0 after accounting for the
|
|
207
221
|
4-byte sentinel of the uncompressed alternative), prepend
|
|
208
222
|
`00 00 00 00` and emit the plaintext instead. Otherwise emit the
|
|
209
223
|
Zstandard frame as-is.
|
|
@@ -213,41 +227,42 @@ For each outgoing message part, the sender proceeds as follows:
|
|
|
213
227
|
MUST still prepend `00 00 00 00` to avoid sentinel ambiguity.
|
|
214
228
|
Step 2 and step 3's fallback path already guarantee this.
|
|
215
229
|
|
|
216
|
-
###
|
|
230
|
+
### 6.6 Receiver rules
|
|
217
231
|
|
|
218
232
|
For each incoming wire part, the receiver proceeds as follows:
|
|
219
233
|
|
|
220
234
|
1. Read the first 4 bytes as the sentinel. If the part is shorter than
|
|
221
|
-
4 bytes,
|
|
235
|
+
4 bytes, close the connection.
|
|
222
236
|
|
|
223
237
|
2. Sentinel `00 00 00 00`: the remaining `N - 4` bytes are plaintext.
|
|
224
238
|
Return them.
|
|
225
239
|
|
|
226
240
|
3. Sentinel `28 B5 2F FD`: the entire wire part is a Zstandard frame.
|
|
227
241
|
- Read the `Frame_Content_Size` field from the Zstandard header. If
|
|
228
|
-
the field is absent,
|
|
242
|
+
the field is absent, close the connection.
|
|
229
243
|
- If the connection enforces a maximum message size, add this part's
|
|
230
244
|
declared content size to the running decompressed total for the
|
|
231
245
|
current multipart message (parts chained by the ZMTP MORE flag).
|
|
232
|
-
If the running total would exceed the maximum,
|
|
233
|
-
|
|
246
|
+
If the running total would exceed the maximum, close the connection
|
|
247
|
+
without invoking the decoder.
|
|
234
248
|
- Invoke the decoder in a bounded mode that aborts if it would write
|
|
235
249
|
more bytes than `Frame_Content_Size` declared. On such an abort,
|
|
236
|
-
|
|
250
|
+
close the connection.
|
|
237
251
|
- Return the decompressed plaintext.
|
|
238
252
|
|
|
239
|
-
4. Sentinel `37 A4 30 EC`: dictionary shipment. See Sec.
|
|
253
|
+
4. Sentinel `37 A4 30 EC`: dictionary shipment. See Sec. 7.
|
|
240
254
|
|
|
241
|
-
5. Any other sentinel:
|
|
255
|
+
5. Any other sentinel: close the connection.
|
|
242
256
|
|
|
243
257
|
The maximum message size always refers to the **decompressed** plaintext
|
|
244
258
|
summed across all parts of a multipart message. A multipart message
|
|
245
259
|
whose total wire length is small but whose total decompressed size
|
|
246
260
|
exceeds the limit MUST be rejected before decoder invocation.
|
|
247
261
|
|
|
248
|
-
## 6. Dictionary Shipment
|
|
249
262
|
|
|
250
|
-
|
|
263
|
+
## 7. Dictionary Shipment
|
|
264
|
+
|
|
265
|
+
### 7.1 Dictionary message format
|
|
251
266
|
|
|
252
267
|
A dictionary is shipped as a **single-part ZMTP message** (no MORE flag)
|
|
253
268
|
whose body begins with the dictionary sentinel:
|
|
@@ -266,40 +281,40 @@ with the Zstandard frame magic and the uncompressed sentinel.
|
|
|
266
281
|
The remaining `D` bytes are the raw dictionary as it should be passed
|
|
267
282
|
to the Zstandard decoder's dictionary-load operation.
|
|
268
283
|
|
|
269
|
-
###
|
|
284
|
+
### 7.2 Constraints
|
|
270
285
|
|
|
271
286
|
- A dictionary message MUST be a single-part ZMTP message (MORE flag
|
|
272
287
|
not set on the frame header). A dictionary sentinel in a multipart
|
|
273
288
|
message's non-final or non-only part is a protocol error.
|
|
274
289
|
|
|
275
|
-
- A dictionary message MUST NOT exceed **
|
|
290
|
+
- A dictionary message MUST NOT exceed **8 KiB** total (sentinel +
|
|
276
291
|
dictionary bytes). A receiver that receives a dictionary message
|
|
277
|
-
larger than
|
|
292
|
+
larger than 8 KiB MUST close the connection.
|
|
278
293
|
|
|
279
294
|
- A sender MUST send at most **one** dictionary message per direction
|
|
280
295
|
per connection. A receiver that receives a second dictionary message
|
|
281
|
-
on the same connection MUST
|
|
296
|
+
on the same connection MUST close the connection.
|
|
282
297
|
|
|
283
298
|
- A dictionary message MUST be sent BEFORE any compressed part that
|
|
284
299
|
references the dictionary. In practice this means the sender ships
|
|
285
300
|
the dictionary before (or immediately after training triggers during)
|
|
286
301
|
the first compressed write that would benefit from it.
|
|
287
302
|
|
|
288
|
-
###
|
|
303
|
+
### 7.3 Receiver handling
|
|
289
304
|
|
|
290
305
|
When the receiver encounters a dictionary part:
|
|
291
306
|
|
|
292
|
-
1. Validate the constraints in Sec.
|
|
307
|
+
1. Validate the constraints in Sec. 7.2.
|
|
293
308
|
2. Strip the 4-byte sentinel.
|
|
294
309
|
3. Install the remaining bytes as the decompression dictionary for this
|
|
295
310
|
connection.
|
|
296
|
-
4. Discard the message
|
|
311
|
+
4. Discard the message. It is not delivered to the application.
|
|
297
312
|
|
|
298
313
|
If all parts of a ZMTP message are dictionary parts (which is always
|
|
299
314
|
the case, since dictionary messages are single-part), the receiver
|
|
300
315
|
loops to receive the next message.
|
|
301
316
|
|
|
302
|
-
###
|
|
317
|
+
### 7.4 Dictionary scope
|
|
303
318
|
|
|
304
319
|
The dictionary a sender ships applies to a single direction of a single
|
|
305
320
|
connection. Each peer may independently ship its own dictionary for its
|
|
@@ -309,7 +324,7 @@ nothing (or uncompressed traffic) back.
|
|
|
309
324
|
|
|
310
325
|
The sender's dictionary is typically socket-wide: trained once from
|
|
311
326
|
early traffic across all connections and reused. But this is an
|
|
312
|
-
implementation choice
|
|
327
|
+
implementation choice. The wire protocol carries no dictionary identity
|
|
313
328
|
or scope metadata.
|
|
314
329
|
|
|
315
330
|
An implementation MAY pool training samples and share the resulting
|
|
@@ -319,21 +334,21 @@ multiple `zstd+tcp://` endpoints: samples from one endpoint accelerate
|
|
|
319
334
|
training for all of them, and newly opened connections benefit from a
|
|
320
335
|
dictionary trained by their predecessors. Connections that were
|
|
321
336
|
configured with an explicit out-of-band dictionary MUST NOT participate
|
|
322
|
-
in shared training
|
|
337
|
+
in shared training; they use their own dictionary independently.
|
|
323
338
|
|
|
324
|
-
###
|
|
339
|
+
### 7.5 Automatic dictionary training
|
|
325
340
|
|
|
326
341
|
A sender MAY train a dictionary automatically from early traffic:
|
|
327
342
|
|
|
328
343
|
1. Buffer plaintext samples from the first messages. Samples larger
|
|
329
|
-
than **
|
|
344
|
+
than **2048 bytes** SHOULD be skipped; dictionaries primarily
|
|
330
345
|
benefit small frames.
|
|
331
346
|
2. When the buffer reaches **1000 samples** OR **100 KiB** of
|
|
332
347
|
plaintext (whichever comes first), train a Zstandard dictionary from
|
|
333
348
|
the buffered samples and discard the buffer.
|
|
334
349
|
3. The recommended dictionary capacity (training target size) is
|
|
335
|
-
**
|
|
336
|
-
4. Ship the trained dictionary via a dictionary message (Sec.
|
|
350
|
+
**2 KiB**.
|
|
351
|
+
4. Ship the trained dictionary via a dictionary message (Sec. 7.1) on
|
|
337
352
|
every connection, before any compressed part that uses it.
|
|
338
353
|
5. Switch to dictionary-bound compression for all subsequent parts.
|
|
339
354
|
|
|
@@ -341,16 +356,17 @@ If training fails (the sample set was too small or too uniform), the
|
|
|
341
356
|
sender MUST stay in no-dictionary mode for the rest of the socket's
|
|
342
357
|
lifetime. It MUST NOT retry training.
|
|
343
358
|
|
|
344
|
-
###
|
|
359
|
+
### 7.6 Dictionary ID
|
|
345
360
|
|
|
346
361
|
Auto-trained dictionaries SHOULD be patched with a random dictionary ID
|
|
347
362
|
in the Zstandard user range (32768 to 2^31 - 1) to avoid collisions
|
|
348
363
|
with Zstandard's built-in dictionary IDs. Out-of-band dictionaries
|
|
349
364
|
retain whatever dictionary ID they were created with.
|
|
350
365
|
|
|
351
|
-
## 7. ZMTP Interaction
|
|
352
366
|
|
|
353
|
-
|
|
367
|
+
## 8. ZMTP Interaction
|
|
368
|
+
|
|
369
|
+
### 8.1 Greeting and handshake
|
|
354
370
|
|
|
355
371
|
The ZMTP greeting and security mechanism handshake proceed over raw TCP
|
|
356
372
|
exactly as specified by RFC 37. `zstd+tcp://` does not modify the
|
|
@@ -358,28 +374,29 @@ greeting, mechanism, READY properties, or any command frames. The
|
|
|
358
374
|
compression layer activates only after the handshake is complete and the
|
|
359
375
|
connection is ready for message traffic.
|
|
360
376
|
|
|
361
|
-
###
|
|
377
|
+
### 8.2 Command frames
|
|
362
378
|
|
|
363
379
|
ZMTP command frames (READY, SUBSCRIBE, CANCEL, JOIN, LEAVE, PING,
|
|
364
380
|
PONG) are never compressed. They are sent and received as standard ZMTP
|
|
365
381
|
command frames. Only message frames (the COMMAND bit not set in the
|
|
366
382
|
frame header) are subject to sentinel-dispatched encoding.
|
|
367
383
|
|
|
368
|
-
###
|
|
384
|
+
### 8.3 Socket type compatibility
|
|
369
385
|
|
|
370
386
|
`zstd+tcp://` is compatible with all ZMTP socket types. The socket type
|
|
371
387
|
negotiation in the READY handshake is unaffected.
|
|
372
388
|
|
|
373
|
-
###
|
|
389
|
+
### 8.4 Peer requirement
|
|
374
390
|
|
|
375
391
|
Both peers of a connection MUST use `zstd+tcp://`. There is no
|
|
376
392
|
fallback to plaintext TCP and no negotiation. A `zstd+tcp://` peer
|
|
377
393
|
connecting to a plain `tcp://` peer (or vice versa) will see garbled
|
|
378
394
|
data or sentinel errors and the connection will fail.
|
|
379
395
|
|
|
380
|
-
## 8. Security Considerations
|
|
381
396
|
|
|
382
|
-
|
|
397
|
+
## 9. Security Considerations
|
|
398
|
+
|
|
399
|
+
### 9.1 Compression combined with encryption (CRIME / BREACH)
|
|
383
400
|
|
|
384
401
|
Combining length-revealing compression with a secure channel that
|
|
385
402
|
carries attacker-influenced plaintext enables CRIME- and BREACH-style
|
|
@@ -392,30 +409,30 @@ encrypted tunnel when the plaintext contains attacker-controlled
|
|
|
392
409
|
content. Deployments that accept this risk MUST do so with explicit
|
|
393
410
|
opt-in.
|
|
394
411
|
|
|
395
|
-
###
|
|
412
|
+
### 9.2 Length side-channel
|
|
396
413
|
|
|
397
414
|
Compression makes the wire length of a part depend on its content. An
|
|
398
415
|
on-path observer can learn something about the plaintext from the
|
|
399
416
|
compressed length alone. Deployments that care about traffic analysis
|
|
400
417
|
MUST NOT rely on `zstd+tcp://` to hide payload shape.
|
|
401
418
|
|
|
402
|
-
###
|
|
419
|
+
### 9.3 Dictionary contents
|
|
403
420
|
|
|
404
421
|
When auto-training is enabled, the receiver loads dictionary bytes
|
|
405
422
|
chosen by the peer. The Zstandard reference dictionary loader is
|
|
406
423
|
hardened against malformed inputs, but implementations MUST enforce the
|
|
407
|
-
|
|
424
|
+
8 KiB cap on dictionary messages (Sec. 7.2) and SHOULD NOT cache
|
|
408
425
|
received dictionaries across connections.
|
|
409
426
|
|
|
410
|
-
###
|
|
427
|
+
### 9.4 Decompression bombs
|
|
411
428
|
|
|
412
429
|
A small compressed frame can decompress to many megabytes of plaintext.
|
|
413
|
-
The receiver rules in Sec.
|
|
430
|
+
The receiver rules in Sec. 6.6 mitigate this:
|
|
414
431
|
|
|
415
432
|
1. Every compressed part MUST carry `Frame_Content_Size`. The receiver
|
|
416
433
|
checks the declared total against the maximum message size before
|
|
417
434
|
invoking the decoder, so a bomb is rejected on its header alone.
|
|
418
|
-
2. The decoder is invoked in bounded mode
|
|
435
|
+
2. The decoder is invoked in bounded mode. It aborts if it would write
|
|
419
436
|
more bytes than declared. A peer that lies in the header cannot
|
|
420
437
|
expand a part past its declared size.
|
|
421
438
|
|
|
@@ -423,30 +440,30 @@ Implementations SHOULD set a conservative maximum message size on
|
|
|
423
440
|
`zstd+tcp://` connections even if they would otherwise leave it
|
|
424
441
|
unbounded.
|
|
425
442
|
|
|
426
|
-
## 9. Constants
|
|
427
443
|
|
|
428
|
-
|
|
429
|
-
SENTINEL_UNCOMPRESSED = 00 00 00 00 (4 bytes)
|
|
430
|
-
SENTINEL_ZSTD_FRAME = 28 B5 2F FD (4 bytes, Zstandard frame magic)
|
|
431
|
-
SENTINEL_ZSTD_DICT = 37 A4 30 EC (4 bytes)
|
|
444
|
+
## 10. Constants
|
|
432
445
|
|
|
433
|
-
|
|
446
|
+
| Constant | Value |
|
|
447
|
+
|-------------------------|-----------------------------------------------|
|
|
448
|
+
| Uncompressed sentinel | `00 00 00 00` |
|
|
449
|
+
| Zstd frame sentinel | `28 B5 2F FD` (Zstandard frame magic) |
|
|
450
|
+
| Dictionary sentinel | `37 A4 30 EC` |
|
|
451
|
+
| Default level | -3 |
|
|
452
|
+
| Min compress, no dict | 512 bytes |
|
|
453
|
+
| Min compress, with dict | 64 bytes |
|
|
454
|
+
| Max dictionary size | 8 KiB |
|
|
455
|
+
| Train max samples | 1000 |
|
|
456
|
+
| Train max bytes | 100 KiB |
|
|
457
|
+
| Train max sample length | 2048 bytes |
|
|
458
|
+
| Dictionary capacity | 2 KiB |
|
|
434
459
|
|
|
435
|
-
MIN_COMPRESS_NO_DICT = 512 bytes
|
|
436
|
-
MIN_COMPRESS_WITH_DICT = 64 bytes
|
|
437
|
-
|
|
438
|
-
MAX_DICT_SIZE = 64 KiB
|
|
439
|
-
|
|
440
|
-
TRAIN_MAX_SAMPLES = 1000
|
|
441
|
-
TRAIN_MAX_BYTES = 100 KiB
|
|
442
|
-
TRAIN_MAX_SAMPLE_LEN = 1024 bytes
|
|
443
|
-
DICT_CAPACITY = 8 KiB
|
|
444
|
-
```
|
|
445
460
|
|
|
446
|
-
##
|
|
461
|
+
## 11. References
|
|
447
462
|
|
|
448
|
-
- [RFC 37
|
|
449
|
-
- [RFC
|
|
463
|
+
- [RFC 37/ZMTP 3.1](https://rfc.zeromq.org/spec/37/)
|
|
464
|
+
- [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119)
|
|
465
|
+
- [RFC 8878: Zstandard Compression Data Format](https://datatracker.ietf.org/doc/html/rfc8878)
|
|
450
466
|
- [Zstandard dictionary builder](https://github.com/facebook/zstd/blob/dev/lib/dictBuilder/zdict.h)
|
|
451
|
-
- [CRIME attack](https://en.wikipedia.org/wiki/CRIME)
|
|
452
|
-
- [BREACH attack](https://en.wikipedia.org/wiki/BREACH)
|
|
467
|
+
- [CRIME attack](https://en.wikipedia.org/wiki/CRIME)
|
|
468
|
+
- [BREACH attack](https://en.wikipedia.org/wiki/BREACH)
|
|
469
|
+
- [`lz4+tcp://` RFC](../omq-lz4/RFC.md)
|
|
@@ -4,11 +4,11 @@ module OMQ
|
|
|
4
4
|
module Transport
|
|
5
5
|
module ZstdTcp
|
|
6
6
|
class Codec
|
|
7
|
-
MAX_DICT_SIZE =
|
|
8
|
-
DICT_CAPACITY =
|
|
7
|
+
MAX_DICT_SIZE = 8 * 1024
|
|
8
|
+
DICT_CAPACITY = 2 * 1024
|
|
9
9
|
TRAIN_MAX_SAMPLES = 1000
|
|
10
10
|
TRAIN_MAX_BYTES = 100 * 1024
|
|
11
|
-
TRAIN_MAX_SAMPLE_LEN =
|
|
11
|
+
TRAIN_MAX_SAMPLE_LEN = 2048
|
|
12
12
|
MIN_COMPRESS_NO_DICT = 512
|
|
13
13
|
MIN_COMPRESS_WITH_DICT = 64
|
|
14
14
|
|
data/lib/omq/zstd/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: omq-zstd
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.4.
|
|
4
|
+
version: 0.4.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Patrik Wenger
|
|
@@ -73,7 +73,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
73
73
|
- !ruby/object:Gem::Version
|
|
74
74
|
version: '0'
|
|
75
75
|
requirements: []
|
|
76
|
-
rubygems_version: 4.0.
|
|
76
|
+
rubygems_version: 4.0.10
|
|
77
77
|
specification_version: 4
|
|
78
78
|
summary: Zstd+TCP transport for OMQ
|
|
79
79
|
test_files: []
|