omq-zstd 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0460be7b61085b54ba2e2de90033ed296524f08bfd56db2385ba3c69284713f1
4
- data.tar.gz: 0061e9188f071929d68a19415fcd3207fae5f929cdddf80bd022e127007302e3
3
+ metadata.gz: 1fcea64306ec7603681d7b3350e7cdf2d22d43487be062e79cc91f51f64aacfb
4
+ data.tar.gz: ac5d8d88deb3f90ccf97be757d384612274f4d742b042b79bc949a47ea7d0d3c
5
5
  SHA512:
6
- metadata.gz: 6730b00146a3ce45053466f14496b792b6cf619557555dafaed72debcd667c3dedf6a06200036114afb7adda81183c276d6212f42652a03b40ac62f13d2fa947
7
- data.tar.gz: 46519bf5bf3e3f3f944404aa174591a49faa240946acce52f743cf0fcb5263a7a3a609d26667eb6bd2f1f38f406be22779c26ffad81e5f9c6f122f3379474536
6
+ metadata.gz: 8e5a04f9ffdb46d130af145fa7025c544cc1709e7711cd92180b0408923667e61d70098efc4679c7b0d2617a3acde40b91c9ecaa1b5abb9e56cadc4a34cd8294
7
+ data.tar.gz: 25ab87f733a74257a9b2d772d96a76be134b60804ce2796fa0a95710334eb0552f973fab4a7b04442f6d9cce8091da008c5ed17e6868e925b942fdc4dfefd0f6
data/README.md CHANGED
@@ -4,13 +4,11 @@
4
4
  [![License: ISC](https://img.shields.io/badge/License-ISC-blue.svg)](LICENSE)
5
5
  [![Ruby](https://img.shields.io/badge/Ruby-%3E%3D%203.3-CC342D?logo=ruby&logoColor=white)](https://www.ruby-lang.org)
6
6
 
7
- > **Status:** Draft. Wire format may change before the first tagged release.
8
-
9
7
  Zstandard-compressed TCP transport for [OMQ](https://github.com/paddor/omq).
10
8
  Pick `zstd+tcp://` instead of `tcp://` and every message part on the wire is
11
9
  compressed per-part with [Zstandard](https://github.com/facebook/zstd).
12
- Compression is intrinsic to the transport no negotiation, no socket option,
13
- no payload changes. The ZMTP handshake itself runs over plain TCP; only
10
+ Compression is intrinsic to the transport: no negotiation, no socket option,
11
+ no payload changes. The ZMTP handshake runs over plain TCP; only
14
12
  post-handshake message parts are compressed.
15
13
 
16
14
  See [RFC.md](RFC.md) for the wire-format specification and
@@ -44,7 +42,7 @@ pull.receive # => ["hello, compressed world"]
44
42
  ```
45
43
 
46
44
  Both peers must use the `zstd+tcp://` scheme. A `tcp://` peer cannot talk to
47
- a `zstd+tcp://` peer they speak different transports.
45
+ a `zstd+tcp://` peer. They speak different transports.
48
46
 
49
47
  ### Compression level
50
48
 
@@ -61,7 +59,7 @@ at any level the peer chose.
61
59
  ### Dictionaries
62
60
 
63
61
  Small messages don't compress well on their own. A shared Zstd dictionary
64
- trained on representative payloads gives 2–10× ratios on payloads in the
62
+ trained on representative payloads gives 2-10x ratios on payloads in the
65
63
  dozens-to-hundreds-of-bytes range.
66
64
 
67
65
  **User-supplied dictionary** (out-of-band agreement):
@@ -75,11 +73,11 @@ The sender ships the dictionary to the receiver in-band as a one-shot
75
73
  single-part message prefixed with the dictionary sentinel
76
74
  (`37 A4 30 EC`), so the receiver does not need a copy on disk.
77
75
 
78
- **Auto-trained dictionary** (zero config the default when no `dict:` is
76
+ **Auto-trained dictionary** (zero config, the default when no `dict:` is
79
77
  passed): the sender collects up to 1000 samples or 100 KiB (whichever hits
80
- first), trains a dictionary, ships it inline, and switches to dictionary
81
- mode. Until then, payloads are compressed without a dictionary or sent
82
- plaintext when below the threshold.
78
+ first), skipping samples larger than 2048 bytes. It trains a 2 KiB dictionary,
79
+ ships it inline, and switches to dictionary mode. Until then, payloads are
80
+ compressed without a dictionary or sent plaintext when below the threshold.
83
81
 
84
82
  ### Compression thresholds
85
83
 
@@ -95,8 +93,8 @@ plaintext bytes).
95
93
 
96
94
  ### Security limits
97
95
 
98
- The receiver bounds decompression by the socket's own `max_message_size`
99
- the same knob you'd use on a plain `tcp://` socket. It caps the
96
+ The receiver bounds decompression by the socket's own `max_message_size`,
97
+ the same knob you'd use on a plain `tcp://` socket. It caps the
100
98
  **total decompressed size of all parts in a single message**, not each
101
99
  part individually: the budget starts at `max_message_size` and shrinks
102
100
  as each part is decoded, so a message whose parts sum to more than the
@@ -111,10 +109,42 @@ ceiling on decompressed message size. Set a value that matches what
111
109
  your application would tolerate over plain `tcp://`.
112
110
 
113
111
  Independent of the message-size knob, the dictionary itself is capped at
114
- 64 KiB (Zstd's recommended dictionary size range). A peer attempting to
115
- ship a larger dictionary, or send a message whose decompressed parts
116
- exceed `max_message_size`, drops the connection `OMQ::SocketDeadError`
117
- surfaces on the next `receive`.
112
+ 8 KiB. A peer attempting to ship a larger dictionary, or send a message
113
+ whose decompressed parts exceed `max_message_size`, drops the connection.
114
+ `OMQ::SocketDeadError` surfaces on the next `receive`.
115
+
116
+ ## Wire format
117
+
118
+ Every post-handshake ZMTP message part starts with a 4-byte sentinel:
119
+
120
+ | Sentinel (hex) | Meaning |
121
+ |---|---|
122
+ | `00 00 00 00` | Uncompressed plaintext |
123
+ | `28 B5 2F FD` | Zstandard-compressed frame |
124
+ | `37 A4 30 EC` | Dictionary shipment |
125
+
126
+ Compressed parts are standard Zstandard frames with `Frame_Content_Size`
127
+ set in the header. The receiver uses FCS for budget enforcement before
128
+ invoking the decoder. Any other leading 4 bytes close the connection.
129
+
130
+ Dictionary shipments are single-part ZMTP messages consumed by the
131
+ transport layer. They are not delivered to the application.
132
+
133
+ ## Constants
134
+
135
+ | Constant | Value |
136
+ |---|---|
137
+ | Uncompressed sentinel | `00 00 00 00` |
138
+ | Zstd frame sentinel | `28 B5 2F FD` (Zstandard frame magic) |
139
+ | Dictionary sentinel | `37 A4 30 EC` |
140
+ | Default level | -3 |
141
+ | Min compress, no dict | 512 B |
142
+ | Min compress, with dict | 64 B |
143
+ | Max dictionary size | 8 KiB |
144
+ | Train max samples | 1000 |
145
+ | Train max bytes | 100 KiB |
146
+ | Train max sample length | 2048 B |
147
+ | Dictionary capacity | 2 KiB |
118
148
 
119
149
  ## When to use it
120
150
 
@@ -127,12 +157,12 @@ surfaces on the next `receive`.
127
157
 
128
158
  It is **not** worth it for:
129
159
 
130
- - `inproc://` or `ipc://` irrelevant; there is no wire to shrink. Use
131
- `zstd+tcp://` only on the connections that actually need it. Other
132
- transports on the same socket are unaffected.
133
- - Already-compressed payloads (gzip, video, encrypted blobs) the Zstd
160
+ - `inproc://` or `ipc://`. No wire to shrink. Use `zstd+tcp://` only on
161
+ the connections that actually need it. Other transports on the same
162
+ socket are unaffected.
163
+ - Already-compressed payloads (gzip, video, encrypted blobs). The Zstd
134
164
  pass adds CPU for no gain.
135
- - Latency-critical sub-microsecond paths compression adds single-digit
165
+ - Latency-critical sub-microsecond paths. Compression adds single-digit
136
166
  microseconds per kilobyte at low levels, but it is not free.
137
167
 
138
168
  ## How it works (in one paragraph)
@@ -140,7 +170,7 @@ It is **not** worth it for:
140
170
  `require "omq/zstd"` registers the `zstd+tcp` scheme on
141
171
  `OMQ::Engine.transports`. A `zstd+tcp` socket builds a per-engine
142
172
  `Codec` (one Zstd dictionary instance shared across all the socket's
143
- connections fan-out compresses each part exactly once). Each accepted
173
+ connections; fan-out compresses each part exactly once). Each accepted
144
174
  or dialed TCP connection is wrapped in `ZstdConnection`, a
145
175
  `SimpleDelegator` over the underlying ZMTP connection that intercepts
146
176
  `#send_message` / `#write_message` / `#receive_message`. Message parts
data/RFC.md CHANGED
@@ -1,11 +1,13 @@
1
- # ZMTP over Zstd+TCP: Zstandard-Compressed TCP Transport for ZMTP
1
+ # ZMTP-Zstd: Zstandard-Compressed TCP Transport for ZMTP
2
2
 
3
- | Field | Value |
3
+ | Field | Value |
4
4
  |----------|----------------------------------------------------|
5
- | Status | Draft |
6
- | Editor | Patrik Wenger |
5
+ | Status | Draft |
6
+ | Editor | Patrik Wenger |
7
+ | Scheme | `zstd+tcp://` |
7
8
  | Requires | [RFC 37/ZMTP 3.1](https://rfc.zeromq.org/spec/37/) |
8
9
 
10
+
9
11
  ## 1. Abstract
10
12
 
11
13
  This specification defines `zstd+tcp://`, a TCP transport for ZMTP 3.1
@@ -16,10 +18,19 @@ greeting and handshake proceed over raw TCP exactly as they would over
16
18
  is individually encoded with a 4-byte sentinel dispatch that
17
19
  distinguishes uncompressed plaintext, Zstandard-compressed frames, and
18
20
  dictionary shipments. No ZMTP properties, command frames, or
19
- negotiation are involved compression is an intrinsic property of the
21
+ negotiation are involved. Compression is an intrinsic property of the
20
22
  transport, like encryption is an intrinsic property of TLS.
21
23
 
22
- ## 2. Motivation
24
+
25
+ ## 2. Language
26
+
27
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
28
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
29
+ document are to be interpreted as described in
30
+ [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
31
+
32
+
33
+ ## 3. Motivation
23
34
 
24
35
  Zstandard at low compression levels encodes in single-digit microseconds
25
36
  per kilobyte, decompresses faster still, and on dictionary-trained
@@ -39,14 +50,14 @@ the transport. `zstd+tcp://` replaces it with a transport-level
39
50
  mechanism that any ZMTP application benefits from without changes to the
40
51
  payload.
41
52
 
42
- ### 2.1 Why a transport scheme
53
+ ### 3.1 Why a transport scheme
43
54
 
44
55
  Compression could live at three layers. Each has a fatal flaw except the
45
56
  transport layer.
46
57
 
47
58
  **Socket-level wrapper** (too high). A wrapper above routing knows
48
59
  nothing about transports. It compresses local connections (pure
49
- overhead) and cannot act on new connections naturally — dictionary
60
+ overhead) and cannot act on new connections naturally. Dictionary
50
61
  shipping requires per-connection state, but a wrapper only sees messages
51
62
  after routing has dispatched them. Reconnect handling requires hooking
52
63
  into connection lifecycle events that are awkward from outside.
@@ -61,13 +72,13 @@ connections.
61
72
  explicit in the endpoint URI. Only TCP connections get compressed. Local
62
73
  transports are unaffected even on the same socket. Dictionary lifetime
63
74
  matches connection lifetime naturally (new connection = new wrapper =
64
- re-ship dictionary). No negotiation is needed both peers use
75
+ re-ship dictionary). No negotiation is needed; both peers use
65
76
  `zstd+tcp://`. The codec is socket-wide (shared across connections), so
66
77
  fan-out patterns compress once and reuse the result.
67
78
 
68
- ### 2.2 Why not negotiate
79
+ ### 3.2 Why not negotiate
69
80
 
70
- ZMTP 3.1 already supports unknown READY properties an unaware peer
81
+ ZMTP 3.1 already supports unknown READY properties. An unaware peer
71
82
  silently ignores them. A negotiation-based design could fall back to
72
83
  plaintext when the peer does not understand compression. But this
73
84
  introduces complexity (profile matching, asymmetric per-direction state,
@@ -76,7 +87,7 @@ deployment decision, not a runtime discovery. Both peers are configured
76
87
  to use `zstd+tcp://` or they are not. The transport scheme approach
77
88
  eliminates the entire negotiation surface and its edge cases.
78
89
 
79
- ### 2.3 Why Zstandard
90
+ ### 3.3 Why Zstandard
80
91
 
81
92
  Zstandard at low levels matches LZ4 on encode latency, beats it on
82
93
  decompression speed and ratio at every realistic ZMQ payload size, and
@@ -85,9 +96,10 @@ particularly important for fan-out patterns (PUB/SUB, RADIO/DISH): the
85
96
  publisher pays one compress, every subscriber pays decompress, so
86
97
  per-subscriber CPU dominates the total budget.
87
98
 
88
- ## 3. Goals and Non-goals
89
99
 
90
- ### 3.1 Goals
100
+ ## 4. Goals and Non-goals
101
+
102
+ ### 4.1 Goals
91
103
 
92
104
  - Transparent to application code: send/receive operations see plaintext.
93
105
  - Per-part sender decision: opt out for short or incompressible parts.
@@ -98,50 +110,52 @@ per-subscriber CPU dominates the total budget.
98
110
  - No ZMTP-level negotiation, no new READY properties, no new command
99
111
  frames.
100
112
 
101
- ### 3.2 Non-goals
113
+ ### 4.2 Non-goals
102
114
 
103
115
  - New ZMTP mechanism, new socket type, new greeting, new frame flag bit.
104
116
  - Compression of the ZMTP greeting or command frames (READY, SUBSCRIBE,
105
117
  PING, PONG, ...).
106
- - Application to non-TCP transports (`inproc://` is zero-copy
118
+ - Application to non-TCP transports (`inproc://` is zero-copy;
107
119
  compression is pure overhead; `ipc://` rarely benefits).
108
120
  - Replacing or weakening CurveZMQ or any other security mechanism.
109
- See Sec. 8.
121
+ See Sec. 9.
110
122
  - Streaming / context-takeover compression. Each part is decodable in
111
123
  isolation with no dependency on a previous part's LZ77 history.
112
124
 
113
- ## 4. Terminology
114
125
 
115
- | Term | Meaning |
116
- |---------------------|---------------------------------------------------------------------------|
117
- | Part | One ZMTP message frame body. A multipart message has multiple parts. |
118
- | Sentinel | The first 4 bytes of a post-handshake part on the wire (Sec. 5.1). |
119
- | Uncompressed part | A wire part whose sentinel is `00 00 00 00`. |
120
- | Compressed part | A wire part whose first 4 bytes are the Zstandard magic `28 B5 2F FD`. |
121
- | Dictionary part | A wire part whose first 4 bytes are `37 A4 30 EC` (Sec. 6). |
122
- | Dictionary message | A single-part ZMTP message consisting of exactly one dictionary part. |
126
+ ## 5. Terminology
127
+
128
+ | Term | Meaning |
129
+ |---------------------|----------------------------------------------------------------------|
130
+ | Part | One ZMTP message frame body. A multipart message has multiple parts. |
131
+ | Sentinel | The first 4 bytes of a post-handshake part on the wire (Sec. 6.1). |
132
+ | Uncompressed part | A wire part whose sentinel is `00 00 00 00`. |
133
+ | Compressed part | A wire part whose first 4 bytes are the Zstandard magic `28 B5 2F FD`. |
134
+ | Dictionary part | A wire part whose first 4 bytes are `37 A4 30 EC` (Sec. 7). |
135
+ | Dictionary message | A single-part ZMTP message consisting of exactly one dictionary part. |
136
+
123
137
 
124
- ## 5. Part Encoding
138
+ ## 6. Part Encoding
125
139
 
126
140
  After the ZMTP handshake completes, every message part on the wire is
127
141
  individually encoded. The ZMTP MORE flag is carried on the wire frame
128
- header as normal. Multipart messages are encoded part by part each
142
+ header as normal. Multipart messages are encoded part by part; each
129
143
  part is independent.
130
144
 
131
- ### 5.1 Sentinel dispatch
145
+ ### 6.1 Sentinel dispatch
132
146
 
133
147
  The first 4 bytes of each wire part determine how it is decoded.
134
148
 
135
- | Sentinel (hex) | Meaning |
136
- |------------------|-----------------------------------------------------------------|
137
- | `00 00 00 00` | Uncompressed plaintext (Sec. 5.3) |
138
- | `28 B5 2F FD` | Zstandard compressed frame (Sec. 5.4) |
139
- | `37 A4 30 EC` | Dictionary shipment (Sec. 6) |
149
+ | Sentinel (hex) | Meaning |
150
+ |------------------|--------------------------------------|
151
+ | `00 00 00 00` | Uncompressed plaintext (Sec. 6.3) |
152
+ | `28 B5 2F FD` | Zstandard compressed frame (Sec. 6.4)|
153
+ | `37 A4 30 EC` | Dictionary shipment (Sec. 7) |
140
154
 
141
155
  All other 4-byte values are reserved. A receiver that encounters an
142
- unknown sentinel MUST drop the connection with an error.
156
+ unknown sentinel MUST close the connection.
143
157
 
144
- ### 5.2 Compression level
158
+ ### 6.2 Compression level
145
159
 
146
160
  The default compression level is **-3** (Zstandard fast strategy). At
147
161
  this level the encoder cost is in the low single-digit microseconds per
@@ -149,11 +163,11 @@ kilobyte, and the achieved ratio is within a few percent of level 3 once
149
163
  a dictionary is in play.
150
164
 
151
165
  The compression level is a sender choice and is not communicated on the
152
- wire the receiver decodes any valid Zstandard frame regardless of the
166
+ wire. The receiver decodes any valid Zstandard frame regardless of the
153
167
  level used to encode it. Implementations SHOULD expose the level as a
154
168
  configurable parameter.
155
169
 
156
- ### 5.3 Uncompressed sentinel `00 00 00 00`
170
+ ### 6.3 Uncompressed sentinel `00 00 00 00`
157
171
 
158
172
  ```
159
173
  +------------------+-------------------+
@@ -169,7 +183,7 @@ without an extra flag bit in the ZMTP frame header.
169
183
  Four zero bytes cannot collide with a valid Zstandard frame magic or the
170
184
  dictionary sentinel, so no ambiguity arises.
171
185
 
172
- ### 5.4 Compressed Zstandard frame
186
+ ### 6.4 Compressed Zstandard frame
173
187
 
174
188
  ```
175
189
  +------------------+
@@ -178,15 +192,15 @@ dictionary sentinel, so no ambiguity arises.
178
192
  +------------------+
179
193
  ```
180
194
 
181
- The wire part IS the Zstandard frame its first 4 bytes are the
195
+ The wire part IS the Zstandard frame. Its first 4 bytes are the
182
196
  standard Zstandard frame magic `28 B5 2F FD`. No additional framing is
183
197
  added.
184
198
 
185
199
  The sender MUST configure the encoder to write the `Frame_Content_Size`
186
- field in the Zstandard frame header (RFC 8878 §3.1.1.1.2). This field
187
- is required for the receiver's budget enforcement (Sec. 5.6).
200
+ field in the Zstandard frame header (RFC 8878 Sec. 3.1.1.1.2). This
201
+ field is required for the receiver's budget enforcement (Sec. 6.6).
188
202
 
189
- ### 5.5 Sender rules
203
+ ### 6.5 Sender rules
190
204
 
191
205
  For each outgoing message part, the sender proceeds as follows:
192
206
 
@@ -203,7 +217,7 @@ For each outgoing message part, the sender proceeds as follows:
203
217
 
204
218
  3. Otherwise, run the Zstandard encoder. The encoder MUST write the
205
219
  `Frame_Content_Size` field. If the compressed output's size is
206
- `plaintext_size - 4` (net saving 0 after accounting for the
220
+ >= `plaintext_size - 4` (net saving <= 0 after accounting for the
207
221
  4-byte sentinel of the uncompressed alternative), prepend
208
222
  `00 00 00 00` and emit the plaintext instead. Otherwise emit the
209
223
  Zstandard frame as-is.
@@ -213,41 +227,42 @@ For each outgoing message part, the sender proceeds as follows:
213
227
  MUST still prepend `00 00 00 00` to avoid sentinel ambiguity.
214
228
  Step 2 and step 3's fallback path already guarantee this.
215
229
 
216
- ### 5.6 Receiver rules
230
+ ### 6.6 Receiver rules
217
231
 
218
232
  For each incoming wire part, the receiver proceeds as follows:
219
233
 
220
234
  1. Read the first 4 bytes as the sentinel. If the part is shorter than
221
- 4 bytes, drop the connection with an error.
235
+ 4 bytes, close the connection.
222
236
 
223
237
  2. Sentinel `00 00 00 00`: the remaining `N - 4` bytes are plaintext.
224
238
  Return them.
225
239
 
226
240
  3. Sentinel `28 B5 2F FD`: the entire wire part is a Zstandard frame.
227
241
  - Read the `Frame_Content_Size` field from the Zstandard header. If
228
- the field is absent, drop the connection with an error.
242
+ the field is absent, close the connection.
229
243
  - If the connection enforces a maximum message size, add this part's
230
244
  declared content size to the running decompressed total for the
231
245
  current multipart message (parts chained by the ZMTP MORE flag).
232
- If the running total would exceed the maximum, drop the connection
233
- with an error without invoking the decoder.
246
+ If the running total would exceed the maximum, close the connection
247
+ without invoking the decoder.
234
248
  - Invoke the decoder in a bounded mode that aborts if it would write
235
249
  more bytes than `Frame_Content_Size` declared. On such an abort,
236
- drop the connection with an error.
250
+ close the connection.
237
251
  - Return the decompressed plaintext.
238
252
 
239
- 4. Sentinel `37 A4 30 EC`: dictionary shipment. See Sec. 6.
253
+ 4. Sentinel `37 A4 30 EC`: dictionary shipment. See Sec. 7.
240
254
 
241
- 5. Any other sentinel: drop the connection with an error.
255
+ 5. Any other sentinel: close the connection.
242
256
 
243
257
  The maximum message size always refers to the **decompressed** plaintext
244
258
  summed across all parts of a multipart message. A multipart message
245
259
  whose total wire length is small but whose total decompressed size
246
260
  exceeds the limit MUST be rejected before decoder invocation.
247
261
 
248
- ## 6. Dictionary Shipment
249
262
 
250
- ### 6.1 Dictionary message format
263
+ ## 7. Dictionary Shipment
264
+
265
+ ### 7.1 Dictionary message format
251
266
 
252
267
  A dictionary is shipped as a **single-part ZMTP message** (no MORE flag)
253
268
  whose body begins with the dictionary sentinel:
@@ -266,40 +281,40 @@ with the Zstandard frame magic and the uncompressed sentinel.
266
281
  The remaining `D` bytes are the raw dictionary as it should be passed
267
282
  to the Zstandard decoder's dictionary-load operation.
268
283
 
269
- ### 6.2 Constraints
284
+ ### 7.2 Constraints
270
285
 
271
286
  - A dictionary message MUST be a single-part ZMTP message (MORE flag
272
287
  not set on the frame header). A dictionary sentinel in a multipart
273
288
  message's non-final or non-only part is a protocol error.
274
289
 
275
- - A dictionary message MUST NOT exceed **64 KiB** total (sentinel +
290
+ - A dictionary message MUST NOT exceed **8 KiB** total (sentinel +
276
291
  dictionary bytes). A receiver that receives a dictionary message
277
- larger than 64 KiB MUST drop the connection with an error.
292
+ larger than 8 KiB MUST close the connection.
278
293
 
279
294
  - A sender MUST send at most **one** dictionary message per direction
280
295
  per connection. A receiver that receives a second dictionary message
281
- on the same connection MUST drop the connection with an error.
296
+ on the same connection MUST close the connection.
282
297
 
283
298
  - A dictionary message MUST be sent BEFORE any compressed part that
284
299
  references the dictionary. In practice this means the sender ships
285
300
  the dictionary before (or immediately after training triggers during)
286
301
  the first compressed write that would benefit from it.
287
302
 
288
- ### 6.3 Receiver handling
303
+ ### 7.3 Receiver handling
289
304
 
290
305
  When the receiver encounters a dictionary part:
291
306
 
292
- 1. Validate the constraints in Sec. 6.2.
307
+ 1. Validate the constraints in Sec. 7.2.
293
308
  2. Strip the 4-byte sentinel.
294
309
  3. Install the remaining bytes as the decompression dictionary for this
295
310
  connection.
296
- 4. Discard the message it is not delivered to the application.
311
+ 4. Discard the message. It is not delivered to the application.
297
312
 
298
313
  If all parts of a ZMTP message are dictionary parts (which is always
299
314
  the case, since dictionary messages are single-part), the receiver
300
315
  loops to receive the next message.
301
316
 
302
- ### 6.4 Dictionary scope
317
+ ### 7.4 Dictionary scope
303
318
 
304
319
  The dictionary a sender ships applies to a single direction of a single
305
320
  connection. Each peer may independently ship its own dictionary for its
@@ -309,7 +324,7 @@ nothing (or uncompressed traffic) back.
309
324
 
310
325
  The sender's dictionary is typically socket-wide: trained once from
311
326
  early traffic across all connections and reused. But this is an
312
- implementation choice the wire protocol carries no dictionary identity
327
+ implementation choice. The wire protocol carries no dictionary identity
313
328
  or scope metadata.
314
329
 
315
330
  An implementation MAY pool training samples and share the resulting
@@ -319,21 +334,21 @@ multiple `zstd+tcp://` endpoints: samples from one endpoint accelerate
319
334
  training for all of them, and newly opened connections benefit from a
320
335
  dictionary trained by their predecessors. Connections that were
321
336
  configured with an explicit out-of-band dictionary MUST NOT participate
322
- in shared training they use their own dictionary independently.
337
+ in shared training; they use their own dictionary independently.
323
338
 
324
- ### 6.5 Automatic dictionary training
339
+ ### 7.5 Automatic dictionary training
325
340
 
326
341
  A sender MAY train a dictionary automatically from early traffic:
327
342
 
328
343
  1. Buffer plaintext samples from the first messages. Samples larger
329
- than **1024 bytes** SHOULD be skipped dictionaries primarily
344
+ than **2048 bytes** SHOULD be skipped; dictionaries primarily
330
345
  benefit small frames.
331
346
  2. When the buffer reaches **1000 samples** OR **100 KiB** of
332
347
  plaintext (whichever comes first), train a Zstandard dictionary from
333
348
  the buffered samples and discard the buffer.
334
349
  3. The recommended dictionary capacity (training target size) is
335
- **8 KiB**.
336
- 4. Ship the trained dictionary via a dictionary message (Sec. 6.1) on
350
+ **2 KiB**.
351
+ 4. Ship the trained dictionary via a dictionary message (Sec. 7.1) on
337
352
  every connection, before any compressed part that uses it.
338
353
  5. Switch to dictionary-bound compression for all subsequent parts.
339
354
 
@@ -341,16 +356,17 @@ If training fails (the sample set was too small or too uniform), the
341
356
  sender MUST stay in no-dictionary mode for the rest of the socket's
342
357
  lifetime. It MUST NOT retry training.
343
358
 
344
- ### 6.6 Dictionary ID
359
+ ### 7.6 Dictionary ID
345
360
 
346
361
  Auto-trained dictionaries SHOULD be patched with a random dictionary ID
347
362
  in the Zstandard user range (32768 to 2^31 - 1) to avoid collisions
348
363
  with Zstandard's built-in dictionary IDs. Out-of-band dictionaries
349
364
  retain whatever dictionary ID they were created with.
350
365
 
351
- ## 7. ZMTP Interaction
352
366
 
353
- ### 7.1 Greeting and handshake
367
+ ## 8. ZMTP Interaction
368
+
369
+ ### 8.1 Greeting and handshake
354
370
 
355
371
  The ZMTP greeting and security mechanism handshake proceed over raw TCP
356
372
  exactly as specified by RFC 37. `zstd+tcp://` does not modify the
@@ -358,28 +374,29 @@ greeting, mechanism, READY properties, or any command frames. The
358
374
  compression layer activates only after the handshake is complete and the
359
375
  connection is ready for message traffic.
360
376
 
361
- ### 7.2 Command frames
377
+ ### 8.2 Command frames
362
378
 
363
379
  ZMTP command frames (READY, SUBSCRIBE, CANCEL, JOIN, LEAVE, PING,
364
380
  PONG) are never compressed. They are sent and received as standard ZMTP
365
381
  command frames. Only message frames (the COMMAND bit not set in the
366
382
  frame header) are subject to sentinel-dispatched encoding.
367
383
 
368
- ### 7.3 Socket type compatibility
384
+ ### 8.3 Socket type compatibility
369
385
 
370
386
  `zstd+tcp://` is compatible with all ZMTP socket types. The socket type
371
387
  negotiation in the READY handshake is unaffected.
372
388
 
373
- ### 7.4 Peer requirement
389
+ ### 8.4 Peer requirement
374
390
 
375
391
  Both peers of a connection MUST use `zstd+tcp://`. There is no
376
392
  fallback to plaintext TCP and no negotiation. A `zstd+tcp://` peer
377
393
  connecting to a plain `tcp://` peer (or vice versa) will see garbled
378
394
  data or sentinel errors and the connection will fail.
379
395
 
380
- ## 8. Security Considerations
381
396
 
382
- ### 8.1 Compression combined with encryption (CRIME / BREACH)
397
+ ## 9. Security Considerations
398
+
399
+ ### 9.1 Compression combined with encryption (CRIME / BREACH)
383
400
 
384
401
  Combining length-revealing compression with a secure channel that
385
402
  carries attacker-influenced plaintext enables CRIME- and BREACH-style
@@ -392,30 +409,30 @@ encrypted tunnel when the plaintext contains attacker-controlled
392
409
  content. Deployments that accept this risk MUST do so with explicit
393
410
  opt-in.
394
411
 
395
- ### 8.2 Length side-channel
412
+ ### 9.2 Length side-channel
396
413
 
397
414
  Compression makes the wire length of a part depend on its content. An
398
415
  on-path observer can learn something about the plaintext from the
399
416
  compressed length alone. Deployments that care about traffic analysis
400
417
  MUST NOT rely on `zstd+tcp://` to hide payload shape.
401
418
 
402
- ### 8.3 Dictionary contents
419
+ ### 9.3 Dictionary contents
403
420
 
404
421
  When auto-training is enabled, the receiver loads dictionary bytes
405
422
  chosen by the peer. The Zstandard reference dictionary loader is
406
423
  hardened against malformed inputs, but implementations MUST enforce the
407
- 64 KiB cap on dictionary messages (Sec. 6.2) and SHOULD NOT cache
424
+ 8 KiB cap on dictionary messages (Sec. 7.2) and SHOULD NOT cache
408
425
  received dictionaries across connections.
409
426
 
410
- ### 8.4 Decompression bombs
427
+ ### 9.4 Decompression bombs
411
428
 
412
429
  A small compressed frame can decompress to many megabytes of plaintext.
413
- The receiver rules in Sec. 5.6 mitigate this:
430
+ The receiver rules in Sec. 6.6 mitigate this:
414
431
 
415
432
  1. Every compressed part MUST carry `Frame_Content_Size`. The receiver
416
433
  checks the declared total against the maximum message size before
417
434
  invoking the decoder, so a bomb is rejected on its header alone.
418
- 2. The decoder is invoked in bounded mode it aborts if it would write
435
+ 2. The decoder is invoked in bounded mode. It aborts if it would write
419
436
  more bytes than declared. A peer that lies in the header cannot
420
437
  expand a part past its declared size.
421
438
 
@@ -423,30 +440,30 @@ Implementations SHOULD set a conservative maximum message size on
423
440
  `zstd+tcp://` connections even if they would otherwise leave it
424
441
  unbounded.
425
442
 
426
- ## 9. Constants
427
443
 
428
- ```
429
- SENTINEL_UNCOMPRESSED = 00 00 00 00 (4 bytes)
430
- SENTINEL_ZSTD_FRAME = 28 B5 2F FD (4 bytes, Zstandard frame magic)
431
- SENTINEL_ZSTD_DICT = 37 A4 30 EC (4 bytes)
444
+ ## 10. Constants
432
445
 
433
- DEFAULT_LEVEL = -3
446
+ | Constant | Value |
447
+ |-------------------------|-----------------------------------------------|
448
+ | Uncompressed sentinel | `00 00 00 00` |
449
+ | Zstd frame sentinel | `28 B5 2F FD` (Zstandard frame magic) |
450
+ | Dictionary sentinel | `37 A4 30 EC` |
451
+ | Default level | -3 |
452
+ | Min compress, no dict | 512 bytes |
453
+ | Min compress, with dict | 64 bytes |
454
+ | Max dictionary size | 8 KiB |
455
+ | Train max samples | 1000 |
456
+ | Train max bytes | 100 KiB |
457
+ | Train max sample length | 2048 bytes |
458
+ | Dictionary capacity | 2 KiB |
434
459
 
435
- MIN_COMPRESS_NO_DICT = 512 bytes
436
- MIN_COMPRESS_WITH_DICT = 64 bytes
437
-
438
- MAX_DICT_SIZE = 64 KiB
439
-
440
- TRAIN_MAX_SAMPLES = 1000
441
- TRAIN_MAX_BYTES = 100 KiB
442
- TRAIN_MAX_SAMPLE_LEN = 1024 bytes
443
- DICT_CAPACITY = 8 KiB
444
- ```
445
460
 
446
- ## 10. References
461
+ ## 11. References
447
462
 
448
- - [RFC 37 / ZMTP 3.1](https://rfc.zeromq.org/spec/37/) — underlying wire protocol
449
- - [RFC 8878 — Zstandard Compression Data Format](https://datatracker.ietf.org/doc/html/rfc8878)
463
+ - [RFC 37/ZMTP 3.1](https://rfc.zeromq.org/spec/37/)
464
+ - [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119)
465
+ - [RFC 8878: Zstandard Compression Data Format](https://datatracker.ietf.org/doc/html/rfc8878)
450
466
  - [Zstandard dictionary builder](https://github.com/facebook/zstd/blob/dev/lib/dictBuilder/zdict.h)
451
- - [CRIME attack](https://en.wikipedia.org/wiki/CRIME) — compression side-channel on TLS
452
- - [BREACH attack](https://en.wikipedia.org/wiki/BREACH) — HTTP-layer variant
467
+ - [CRIME attack](https://en.wikipedia.org/wiki/CRIME)
468
+ - [BREACH attack](https://en.wikipedia.org/wiki/BREACH)
469
+ - [`lz4+tcp://` RFC](../omq-lz4/RFC.md)
@@ -4,11 +4,11 @@ module OMQ
4
4
  module Transport
5
5
  module ZstdTcp
6
6
  class Codec
7
- MAX_DICT_SIZE = 64 * 1024
8
- DICT_CAPACITY = 8 * 1024
7
+ MAX_DICT_SIZE = 8 * 1024
8
+ DICT_CAPACITY = 2 * 1024
9
9
  TRAIN_MAX_SAMPLES = 1000
10
10
  TRAIN_MAX_BYTES = 100 * 1024
11
- TRAIN_MAX_SAMPLE_LEN = 1024
11
+ TRAIN_MAX_SAMPLE_LEN = 2048
12
12
  MIN_COMPRESS_NO_DICT = 512
13
13
  MIN_COMPRESS_WITH_DICT = 64
14
14
 
@@ -2,6 +2,6 @@
2
2
 
3
3
  module OMQ
4
4
  module Zstd
5
- VERSION = "0.4.1"
5
+ VERSION = "0.4.2"
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: omq-zstd
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Patrik Wenger
@@ -73,7 +73,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
73
73
  - !ruby/object:Gem::Version
74
74
  version: '0'
75
75
  requirements: []
76
- rubygems_version: 4.0.6
76
+ rubygems_version: 4.0.10
77
77
  specification_version: 4
78
78
  summary: Zstd+TCP transport for OMQ
79
79
  test_files: []