nnq-zstd 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c2f3392c672cfc2022eee048702a0211999e58fd0e723bc50f0ae1c789a114d9
4
+ data.tar.gz: de1740ec0a048c8bbb099ccfdefba9cae7103c022ea06c9d335121aa25c63ece
5
+ SHA512:
6
+ metadata.gz: bc58535bbe51850ca68b3f7fc24f7bcaf7a154ebb0b90730ad4968e8ad845123c84b6b62c6776e6e286d4fe866c55db45c969b1d33c747af5ff0daa5ab76fb8b
7
+ data.tar.gz: 1f380726e3ed288d9b6502b748259cd4f8d1afcbc31910931410eafbb5e778e74c614caea631e04f1a2ea34ebdb8b34bbdeecf45849a221989cd0e50bc573f7b
data/CHANGELOG.md ADDED
@@ -0,0 +1,20 @@
1
+ # Changelog
2
+
3
+ ## 0.1.0 — unreleased
4
+
5
+ Initial release.
6
+
7
+ - `NNQ::Zstd.wrap(socket, level: -3, dict: nil)` — transparent Zstd
8
+ compression decorator around an `NNQ::Socket`.
9
+ - Sender-side dictionary training: first ~1000 messages < 1 KiB each,
10
+ or up to 100 KiB cumulative sample bytes. Training failure disables
11
+ training for the session.
12
+ - In-band dict shipping via a 4-byte preamble discriminator
13
+ (`00 00 00 00` = plaintext, Zstd frame magic = compressed, Zstd
14
+ dict magic = dictionary).
15
+ - Bounded decompression: `Frame_Content_Size` required; effective
16
+ cap is `min(16 MiB, socket.recv_maxsz)`.
17
+ - Per-wrapper caps: 32 dicts, 128 KiB cumulative. Violations raise
18
+ `NNQ::Zstd::ProtocolError`.
19
+ - Auto-generated dict IDs restricted to the user range
20
+ `32_768..(2**31 - 1)`.
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2026, Patrik Wenger
2
+
3
+ Permission to use, copy, modify, and/or distribute this software for any
4
+ purpose with or without fee is hereby granted, provided that the above
5
+ copyright notice and this permission notice appear in all copies.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
8
+ WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
9
+ MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
10
+ ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
11
+ WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
12
+ ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
13
+ OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,63 @@
1
+ # NNQ::Zstd — transparent Zstd compression for NNQ sockets
2
+
3
+ [![CI](https://github.com/paddor/nnq-zstd/actions/workflows/ci.yml/badge.svg)](https://github.com/paddor/nnq-zstd/actions/workflows/ci.yml)
4
+ [![Gem Version](https://img.shields.io/gem/v/nnq-zstd?color=e9573f)](https://rubygems.org/gems/nnq-zstd)
5
+ [![License: ISC](https://img.shields.io/badge/License-ISC-blue.svg)](LICENSE)
6
+ [![Ruby](https://img.shields.io/badge/Ruby-%3E%3D%204.0-CC342D?logo=ruby&logoColor=white)](https://www.ruby-lang.org)
7
+
8
+ Wraps any `NNQ::Socket` with transparent per-message Zstd compression,
9
+ sender-side dictionary training, and in-band dictionary shipping. No
10
+ handshake, no negotiation — both peers must wrap for it to work.
11
+
12
+ See [RFC.md](RFC.md) for the normative wire protocol.
13
+
14
+ ## Quick start
15
+
16
+ ```ruby
17
+ require "nnq"
18
+ require "nnq/zstd"
19
+
20
+ push = NNQ::PUSH0.new
21
+ push.connect("tcp://127.0.0.1:5555")
22
+ push = NNQ::Zstd.wrap(push, level: -3) # fast (-3) or balanced (3)
23
+
24
+ push.send("payload") # compressed on the wire
25
+ ```
26
+
27
+ ```ruby
28
+ pull = NNQ::PULL0.new
29
+ pull.bind("tcp://*:5555")
30
+ pull = NNQ::Zstd.wrap(pull) # receiver-only: no dict config
31
+
32
+ pull.receive # => "payload"
33
+ ```
34
+
35
+ ## How it works
36
+
37
+ - Every message carries a 4-byte preamble:
38
+ - `00 00 00 00` — uncompressed plaintext (preamble stripped on recv).
39
+ - Zstd frame magic — a full Zstd frame (compressed).
40
+ - Zstd dict magic — a Zstd dictionary to install (not surfaced to the app).
41
+ - The sender trains a dict from its first ~1000 small messages (or 100 KiB
42
+ cumulative sample bytes), then compresses subsequent small messages
43
+ with it. The dict is shipped in-band before the next real payload.
44
+ - User-supplied dictionaries may be passed via `dict:`:
45
+ ```ruby
46
+ NNQ::Zstd.wrap(socket, level: -3, dict: [dict_bytes_1, dict_bytes_2])
47
+ ```
48
+ All supplied dicts are shipped to peers on the wire. Training is
49
+ skipped when `dict:` is given.
50
+ - Decompression is bounded by `min(16 MiB, recv_maxsz)`. Frames whose
51
+ header omits `Frame_Content_Size` are rejected.
52
+ - Up to **32 dicts** and **128 KiB cumulative** per wrapper. Any
53
+ protocol violation raises `NNQ::Zstd::ProtocolError`.
54
+
55
+ ## Out of scope
56
+
57
+ - Negotiation / auto-detection. Both peers must wrap.
58
+ - Dict persistence across process restarts. Training is per-session.
59
+ - Receiver-side training — receivers only install shipped dicts.
60
+
61
+ ## License
62
+
63
+ ISC. See [LICENSE](LICENSE).
data/RFC.md ADDED
@@ -0,0 +1,283 @@
1
+ # NNQ-Zstd wire protocol (RFC-style)
2
+
3
+ **Status:** experimental. Version: 0.1.0.
4
+
5
+ ## 1. Scope and non-goals
6
+
7
+ This document specifies the wire format and behavior of a transparent
8
+ Zstandard compression layer for messages carried over NNG/SP sockets
9
+ (`inproc`, `ipc`, `tcp`). It is not a replacement for SP and does not
10
+ define a connection-level handshake.
11
+
12
+ Non-goals:
13
+
14
+ - No capability negotiation. Both peers MUST wrap their sockets; an
15
+ unwrapped peer will see garbled bytes and SHOULD disconnect.
16
+ - No persistence of dictionaries across process restarts.
17
+ - No receiver-side training. Receivers only install dictionaries
18
+ shipped by senders.
19
+
20
+ The key words **MUST**, **MUST NOT**, **SHOULD**, **MAY**, and
21
+ **SHALL** are to be interpreted as described in RFC 2119.
22
+
23
+ ## 2. Terminology
24
+
25
+ - **Sender** — a wrapper instance that sends messages.
26
+ - **Receiver** — a wrapper instance that receives messages.
27
+ - **Wrapper** — an `NNQ::Zstd::Wrapper` (or equivalent in another
28
+ language) decorating an `NNQ::Socket`.
29
+ - **Codec** — the pure state machine inside the wrapper that
30
+ encodes/decodes individual messages.
31
+ - **Dictionary** (**dict**) — a Zstandard dictionary, as produced by
32
+ `ZDICT_trainFromBuffer` or handed in by the user.
33
+ - **Dict ID** — the 32-bit `Dictionary_ID` field carried in the
34
+ dict's header and in each frame header that references it.
35
+ - **Preamble** — the first four bytes of every wire message; see §3.
36
+ - **Wire message** — one SP message as handed to/from the underlying
37
+ socket.
38
+
39
+ ## 3. Wire format
40
+
41
+ Every wire message begins with a 4-byte preamble. The first four
42
+ bytes discriminate three mutually exclusive cases:
43
+
44
+ | First 4 bytes (hex, wire order) | LE u32 | Meaning |
45
+ |---|---|---|
46
+ | `00 00 00 00` | `0x00000000` | uncompressed |
47
+ | `28 B5 2F FD` | `0xFD2FB528` | Zstd frame |
48
+ | `37 A4 30 EC` | `0xEC30A437` | Zstd dict |
49
+
50
+ The Zstandard frame magic and Zstandard dictionary magic are fixed
51
+ by the Zstandard format specification. The NUL preamble is specific
52
+ to this protocol.
53
+
54
+ ### 3.1 Uncompressed
55
+
56
+ ```
57
+ +----+----+----+----+=========================+
58
+ | 00 | 00 | 00 | 00 | plaintext body |
59
+ +----+----+----+----+=========================+
60
+ ```
61
+
62
+ The receiver MUST strip the 4-byte NUL preamble and deliver the
63
+ remainder to the application.
64
+
65
+ ### 3.2 Zstd frame
66
+
67
+ ```
68
+ +----+----+----+----+=========================+
69
+ | 28 | B5 | 2F | FD | rest of Zstd frame |
70
+ +----+----+----+----+=========================+
71
+ ```
72
+
73
+ The entire wire message is a valid Zstandard frame; the preamble
74
+ is the frame's magic number. The receiver MUST NOT strip the
75
+ preamble before decompression.
76
+
77
+ ### 3.3 Zstd dictionary
78
+
79
+ ```
80
+ +----+----+----+----+=========================+
81
+ | 37 | A4 | 30 | EC | rest of Zstd dict |
82
+ +----+----+----+----+=========================+
83
+ ```
84
+
85
+ The entire wire message is a valid Zstandard dictionary. The
86
+ receiver MUST install the dictionary in its local store (§5.3)
87
+ and MUST NOT deliver this message to the application.
88
+
89
+ ## 4. Sender behavior
90
+
91
+ ### 4.1 Levels
92
+
93
+ The sender's compression level is set at wrapper construction. This
94
+ specification RECOMMENDS two levels:
95
+
96
+ - `-3` — fast strategy, low CPU, moderate ratio.
97
+ - `3` — balanced Zstd default.
98
+
99
+ Implementations MAY support arbitrary Zstd levels.
100
+
101
+ ### 4.2 Training
102
+
103
+ Unless the user supplied dictionaries at construction, the sender
104
+ SHALL collect training samples from outbound messages until either
105
+ condition is met:
106
+
107
+ - **1000** samples collected, OR
108
+ - **100 KiB** cumulative sample bytes collected.
109
+
110
+ Only messages with body size **< 1024 bytes** are eligible as
111
+ samples. The sender SHALL train via ZDICT with a dictionary
112
+ capacity of **8 KiB** (the training cap; the resulting dict may be
113
+ smaller).
114
+
115
+ On training failure (insufficient or pathological samples, ZDICT
116
+ internal error), the sender SHALL permanently disable training for
117
+ the session and continue in no-dict mode.
118
+
119
+ ### 4.3 Auto-generated dictionary IDs
120
+
121
+ Auto-trained dictionary IDs MUST fall in the user range:
122
+
123
+ ```
124
+ USER_DICT_ID_RANGE = 32_768 .. (2**31 - 1)
125
+ ```
126
+
127
+ Values `0..32767` are reserved for a future registrar; values
128
+ `>= 2**31` are reserved by the Zstandard format. Senders MUST NOT
129
+ assign auto-trained dicts to either reserved range. This is
130
+ enforced by post-patching bytes `[4..7]` of the trained dict buffer
131
+ with a randomly-chosen u32 LE value in the user range.
132
+
133
+ User-supplied dictionaries passed via `dict:` are honored as-is.
134
+ The user is responsible for choosing a non-reserved ID, or for
135
+ accepting the collision risk of using a reserved one.
136
+
137
+ ### 4.4 Compression thresholds
138
+
139
+ - **No dict loaded**: the sender MUST emit the message uncompressed
140
+ (§3.1) if `body.bytesize < 512`.
141
+ - **With a loaded dict**: the sender MUST emit the message
142
+ uncompressed if `body.bytesize < 64`.
143
+
144
+ In addition, the sender MUST emit the message uncompressed if the
145
+ compressed result would not save at least four bytes (i.e.
146
+ `compressed.bytesize >= body.bytesize - 4`); this avoids paying a
147
+ preamble's worth of overhead for negative wins.
148
+
149
+ ### 4.5 Dictionary shipping
150
+
151
+ **Every dictionary the sender knows** (user-supplied or
152
+ auto-trained) MUST be delivered to the peer in-band, as a dict
153
+ frame (§3.3), before any payload that requires it.
154
+
155
+ Shipping policy: a sender SHALL ship all known dicts eagerly. In
156
+ practice this means:
157
+
158
+ - On the first call to `encode` after construction (or after
159
+ training completes), return a dict frame for each dict not yet
160
+ shipped, followed by the real payload.
161
+ - On each newly-connected peer (observed via the underlying
162
+ socket's monitor stream), re-ship the full known-dict set.
163
+ - **Ordering**: any dict needed to decompress a given payload MUST
164
+ appear strictly before that payload on the wire.
165
+
166
+ ### 4.6 Per-wrapper caps (sender)
167
+
168
+ - At most **32 distinct dictionaries**.
169
+ - At most **128 KiB** cumulative dictionary bytes (sum of
170
+ `dict.bytesize` across the store).
171
+ - There is no per-dictionary size cap beyond the total-bytes
172
+ budget; a single 128 KiB dict is valid.
173
+
174
+ If training or user input would exceed either cap, the sender MUST
175
+ refuse to install the offending dict.
176
+
177
+ ### 4.7 Receive-only wrappers
178
+
179
+ A wrapper that is never asked to `send` naturally skips training
180
+ and dict shipping, since both are driven by outbound traffic. No
181
+ special mode is required: to obtain a decode-only decorator,
182
+ simply `wrap` the socket and only call `receive`.
183
+
184
+ ## 5. Receiver behavior
185
+
186
+ ### 5.1 Preamble dispatch
187
+
188
+ For each wire message received:
189
+
190
+ - If it begins with `00 00 00 00`, strip the preamble and deliver
191
+ the remainder to the application.
192
+ - If it begins with the Zstd frame magic, decompress it per §5.2
193
+ and deliver the plaintext.
194
+ - If it begins with the Zstd dict magic, install it per §5.3 and
195
+ do not deliver anything to the application; continue with the
196
+ next wire message.
197
+ - Otherwise, the preamble is unrecognized. The receiver MUST raise
198
+ a protocol error and SHOULD treat the wrapped socket as failed.
199
+
200
+ ### 5.2 Bounded decompression
201
+
202
+ For each Zstd frame:
203
+
204
+ 1. Inspect the frame header for `Frame_Content_Size`. If absent,
205
+ the receiver MUST raise a protocol error. This is the
206
+ anti-zip-bomb guarantee.
207
+ 2. Read the frame header's `Dictionary_ID` field. If non-zero, the
208
+ receiver MUST look up a dictionary with that ID in its local
209
+ store; if absent, raise a protocol error. If zero, decompress
210
+ without a dictionary.
211
+ 3. Call Zstandard decompression with a `max_output_size` equal to
212
+ `min(16 MiB, recv_maxsz)` where `recv_maxsz` is the wrapped
213
+ socket's own configured maximum inbound message size (an
214
+ implementation-defined default applies if the socket has no
215
+ such cap). Exceeding this size is a protocol error.
216
+ 4. Deliver the decompressed plaintext to the application.
217
+
218
+ ### 5.3 Dictionary installation
219
+
220
+ For each dict frame:
221
+
222
+ 1. Parse the dictionary (the Zstd dict header carries the dict_id
223
+ at bytes `[4..7]`).
224
+ 2. Check the per-wrapper caps (§5.4). A violation is a protocol
225
+ error.
226
+ 3. Insert `id => dictionary` into the local store. If an entry
227
+ with the same id already exists, overwrite it (idempotent);
228
+ adjust the cumulative-bytes accounting accordingly.
229
+ 4. Do not deliver the dict frame to the application.
230
+
231
+ Note: dict IDs in the reserved ranges are accepted if a peer
232
+ chooses to ship them (the reserved-range rule applies only to
233
+ auto-generated IDs at the sender; a receiver does not second-guess
234
+ the peer's choices).
235
+
236
+ ### 5.4 Per-wrapper caps (receiver)
237
+
238
+ - At most **32 distinct dictionaries**.
239
+ - At most **128 KiB** cumulative dictionary bytes.
240
+
241
+ These match the sender caps and protect against a malicious or
242
+ buggy peer shipping unbounded dictionary state.
243
+
244
+ ### 5.5 `#receive` contract
245
+
246
+ The wrapper's `#receive` operation MUST NEVER return a dict frame
247
+ to the application. It MUST loop internally over the underlying
248
+ socket's `receive`, silently installing any dict frames it
249
+ encounters, until it produces a real payload (plaintext or
250
+ successfully-decompressed frame) or the underlying socket closes.
251
+
252
+ ## 6. Interoperability
253
+
254
+ Both peers of a connection MUST wrap their sockets with a
255
+ compatible implementation of this protocol. An unwrapped peer will
256
+ see byte sequences it cannot parse (an SP message starting with a
257
+ NUL preamble, or a valid Zstd frame) and is expected to close the
258
+ connection.
259
+
260
+ ## 7. Security considerations
261
+
262
+ - **Zip bombs**. §5.2's mandatory `Frame_Content_Size` check plus
263
+ bounded `max_output_size` prevent an attacker from forcing
264
+ unbounded memory allocation during decompression.
265
+ - **Dictionary DoS**. §5.4's count and cumulative-bytes caps
266
+ prevent a malicious peer from exhausting memory by shipping
267
+ unbounded dictionaries.
268
+ - **Reserved-range squatting**. Auto-trained dicts MUST avoid
269
+ reserved ID ranges (§4.3) so that future registry-allocated
270
+ dicts can coexist with in-the-wild private dicts without
271
+ collision.
272
+ - **No confidentiality or integrity**. This protocol provides
273
+ neither. Wrap the underlying transport with TLS or a similar
274
+ mechanism for either property.
275
+
276
+ ## 8. Appendix: test vectors
277
+
278
+ *(Placeholder — to be filled with concrete hex dumps once the
279
+ reference implementation produces them.)*
280
+
281
+ 1. **NUL-preamble `"hi"`** — wire bytes: `00 00 00 00 68 69`.
282
+ 2. **Empty compressed frame** — TBD.
283
+ 3. **Minimal trained dictionary** — TBD.
@@ -0,0 +1,315 @@
1
+ # frozen_string_literal: true
2
+
3
+ module NNQ
4
+ module Zstd
5
+ # Pure state machine: no socket references. `encode(body)` returns
6
+ # `[wire, dict_frames]` where `dict_frames` are any dict frames
7
+ # that MUST precede the wire payload on the wire. `decode(wire)`
8
+ # returns a plaintext String, or `nil` if the wire message was a
9
+ # dict frame that has been silently installed into the receive-
10
+ # side dictionary store.
11
+ class Codec
12
+ MAX_DECOMPRESSED_SIZE = 16 * 1024 * 1024
13
+ MAX_DICTS = 32
14
+ MAX_DICTS_TOTAL_BYTES = 128 * 1024
15
+ DICT_CAPACITY = 8 * 1024
16
+ TRAIN_MAX_SAMPLES = 1000
17
+ TRAIN_MAX_BYTES = 100 * 1024
18
+ TRAIN_MAX_SAMPLE_LEN = 1024
19
+ MIN_COMPRESS_NO_DICT = 512
20
+ MIN_COMPRESS_WITH_DICT = 64
21
+
22
+ NUL_PREAMBLE = ("\x00" * 4).b.freeze
23
+ ZSTD_MAGIC = "\x28\xB5\x2F\xFD".b.freeze
24
+ ZDICT_MAGIC = "\x37\xA4\x30\xEC".b.freeze
25
+
26
+
27
+ def initialize(level:, dicts: [], recv_max_size: nil)
28
+ @level = level
29
+ @recv_max_size = [recv_max_size || MAX_DECOMPRESSED_SIZE, MAX_DECOMPRESSED_SIZE].min
30
+
31
+ @send_dicts = {}
32
+ @send_dict_bytes = {}
33
+ @send_dict_order = []
34
+ @pending_ship = []
35
+ @shipped_peers = Set.new
36
+ @active_send_id = nil
37
+
38
+ @recv_dicts = {}
39
+ @recv_total_bytes = 0
40
+
41
+ @training = dicts.empty?
42
+ @train_samples = []
43
+ @train_bytes = 0
44
+
45
+ dicts.each { |db| install_send_dict(db.b) }
46
+ end
47
+
48
+
49
+ # @return [Integer, nil] id of the dict currently used for compression
50
+ def active_send_dict_id
51
+ @active_send_id
52
+ end
53
+
54
+
55
+ # @return [Array<Integer>] ids of dicts in the send-side store
56
+ def send_dict_ids
57
+ @send_dict_order.dup
58
+ end
59
+
60
+
61
+ # @return [Array<Integer>] ids of dicts in the recv-side store
62
+ def recv_dict_ids
63
+ @recv_dicts.keys
64
+ end
65
+
66
+
67
+ # Resets the shipped-tracker so the next encode calls will re-emit
68
+ # every known dict before the next real payload. Called by the
69
+ # wrapper when a new peer connects.
70
+ def requeue_all_dicts_for_shipping!
71
+ @pending_ship = @send_dict_order.dup
72
+ end
73
+
74
+
75
+ # Encodes `body` into a wire message. Returns `[wire, dict_frames]`
76
+ # where `dict_frames` is an array of wire messages that MUST be
77
+ # sent strictly before `wire`.
78
+ def encode(body)
79
+ body = body.b
80
+ maybe_train!(body)
81
+
82
+ dict_frames = drain_pending_dict_frames
83
+ wire = compress_or_plain(body)
84
+ [wire, dict_frames]
85
+ end
86
+
87
+
88
+ # Decodes a wire message. Returns the plaintext String, or `nil`
89
+ # if this message was a dict frame (installed into the receive
90
+ # store, not surfaced to the caller).
91
+ def decode(wire)
92
+ raise ProtocolError, "wire message too short" if wire.bytesize < 4
93
+ head = wire.byteslice(0, 4)
94
+ case head
95
+ when NUL_PREAMBLE
96
+ wire.byteslice(4, wire.bytesize - 4) || "".b
97
+ when ZSTD_MAGIC
98
+ decode_zstd_frame(wire)
99
+ when ZDICT_MAGIC
100
+ install_recv_dict(wire)
101
+ nil
102
+ else
103
+ raise ProtocolError, "unrecognized preamble: #{head.unpack1('H*')}"
104
+ end
105
+ end
106
+
107
+
108
+ private
109
+
110
+
111
+ def maybe_train!(body)
112
+ return unless @training
113
+ return if body.bytesize >= TRAIN_MAX_SAMPLE_LEN
114
+
115
+ @train_samples << body
116
+ @train_bytes += body.bytesize
117
+
118
+ return unless @train_samples.size >= TRAIN_MAX_SAMPLES ||
119
+ @train_bytes >= TRAIN_MAX_BYTES
120
+
121
+ begin
122
+ bytes = RZstd::Dictionary.train(@train_samples, capacity: DICT_CAPACITY)
123
+ rescue RuntimeError
124
+ @training = false
125
+ @train_samples = nil
126
+ return
127
+ end
128
+
129
+ @training = false
130
+ @train_samples = nil
131
+
132
+ patched = patch_auto_dict_id(bytes)
133
+ install_send_dict(patched)
134
+ end
135
+
136
+
137
+ # Rewrite bytes [4..7] of a freshly trained dict with a random id
138
+ # in USER_DICT_ID_RANGE, to avoid reserved ranges per §4.3.
139
+ def patch_auto_dict_id(bytes)
140
+ out = bytes.dup.b
141
+ id = rand(USER_DICT_ID_RANGE)
142
+ out[4, 4] = [id].pack("V")
143
+ out
144
+ end
145
+
146
+
147
+ def install_send_dict(bytes)
148
+ if @send_dict_order.size >= MAX_DICTS
149
+ raise ProtocolError, "send-side dict count would exceed #{MAX_DICTS}"
150
+ end
151
+ total = @send_dict_bytes.each_value.sum(&:bytesize) + bytes.bytesize
152
+ if total > MAX_DICTS_TOTAL_BYTES
153
+ raise ProtocolError, "send-side dict bytes would exceed #{MAX_DICTS_TOTAL_BYTES}"
154
+ end
155
+ unless bytes.byteslice(0, 4) == ZDICT_MAGIC
156
+ raise ProtocolError, "supplied dict is not ZDICT-format"
157
+ end
158
+
159
+ dict = RZstd::Dictionary.new(bytes, level: @level)
160
+ id = dict.id
161
+ return if @send_dicts.key?(id)
162
+
163
+ @send_dicts[id] = dict
164
+ @send_dict_bytes[id] = bytes
165
+ @send_dict_order << id
166
+ @pending_ship << id
167
+ @active_send_id ||= id
168
+ end
169
+
170
+
171
+ def drain_pending_dict_frames
172
+ return [] if @pending_ship.empty?
173
+
174
+ frames = @pending_ship.map { |id| @send_dict_bytes.fetch(id) }
175
+ @pending_ship = []
176
+ frames
177
+ end
178
+
179
+
180
+ def compress_or_plain(body)
181
+ threshold = @active_send_id ? MIN_COMPRESS_WITH_DICT : MIN_COMPRESS_NO_DICT
182
+ return plain(body) if body.bytesize < threshold
183
+
184
+ compressed =
185
+ if @active_send_id
186
+ @send_dicts.fetch(@active_send_id).compress(body)
187
+ else
188
+ RZstd.compress(body, level: @level)
189
+ end
190
+
191
+ # Sanity bailout: a compressed result that doesn't save at
192
+ # least four bytes gets emitted as plaintext instead. Avoids
193
+ # paying a preamble's worth of overhead for negative wins.
194
+ return plain(body) if compressed.bytesize >= body.bytesize - 4
195
+
196
+ compressed
197
+ end
198
+
199
+
200
+ def plain(body)
201
+ NUL_PREAMBLE + body
202
+ end
203
+
204
+
205
+ def decode_zstd_frame(wire)
206
+ fcs = parse_frame_content_size(wire)
207
+ raise ProtocolError, "Zstd frame missing Frame_Content_Size" if fcs.nil?
208
+ if fcs > @recv_max_size
209
+ raise ProtocolError, "declared FCS #{fcs} exceeds limit #{@recv_max_size}"
210
+ end
211
+
212
+ dict_id = parse_frame_dict_id(wire)
213
+ if dict_id && dict_id != 0
214
+ dict = @recv_dicts[dict_id]
215
+ raise ProtocolError, "unknown dict_id #{dict_id}" if dict.nil?
216
+ dict.decompress(wire, max_output_size: @recv_max_size)
217
+ else
218
+ RZstd.decompress(wire, max_output_size: @recv_max_size)
219
+ end
220
+ rescue RZstd::DecompressError => e
221
+ raise ProtocolError, "decompression failed: #{e.message}"
222
+ rescue RZstd::MissingContentSizeError => e
223
+ raise ProtocolError, "Zstd frame missing Frame_Content_Size (#{e.message})"
224
+ rescue RZstd::OutputSizeLimitError => e
225
+ raise ProtocolError, "declared FCS exceeds limit (#{e.message})"
226
+ end
227
+
228
+
229
+ def install_recv_dict(wire)
230
+ if @recv_dicts.size >= MAX_DICTS
231
+ raise ProtocolError, "recv-side dict count would exceed #{MAX_DICTS}"
232
+ end
233
+ total = @recv_total_bytes + wire.bytesize
234
+ if total > MAX_DICTS_TOTAL_BYTES
235
+ raise ProtocolError, "recv-side dict bytes would exceed #{MAX_DICTS_TOTAL_BYTES}"
236
+ end
237
+ if wire.bytesize < 8
238
+ raise ProtocolError, "dict frame too short"
239
+ end
240
+
241
+ id = wire.byteslice(4, 4).unpack1("V")
242
+ dict = RZstd::Dictionary.new(wire.b)
243
+ unless dict.id == id
244
+ raise ProtocolError, "dict header id mismatch"
245
+ end
246
+
247
+ if (existing = @recv_dicts[id])
248
+ # Idempotent overwrite: adjust running total.
249
+ @recv_total_bytes -= @send_dict_bytes[id]&.bytesize || 0
250
+ _ = existing
251
+ end
252
+ @recv_dicts[id] = dict
253
+ @recv_total_bytes += wire.bytesize
254
+ end
255
+
256
+
257
+ # Parses the Zstandard `Frame_Content_Size` field from a frame
258
+ # header. Returns the FCS as an Integer, or `nil` if absent.
259
+ # Per the Zstandard spec, FCS is absent iff
260
+ # `Single_Segment_flag == 0 && FCS_flag == 0`.
261
+ def parse_frame_content_size(wire)
262
+ return nil if wire.bytesize < 5
263
+ fhd = wire.getbyte(4)
264
+ did_flag = fhd & 0x03
265
+ single_seg = (fhd >> 5) & 0x01
266
+ fcs_flag = (fhd >> 6) & 0x03
267
+
268
+ return nil if fcs_flag == 0 && single_seg == 0
269
+
270
+ off = 5 + (single_seg == 0 ? 1 : 0) + [0, 1, 2, 4][did_flag]
271
+ case fcs_flag
272
+ when 0
273
+ return nil if wire.bytesize < off + 1
274
+ wire.getbyte(off)
275
+ when 1
276
+ return nil if wire.bytesize < off + 2
277
+ wire.byteslice(off, 2).unpack1("v") + 256
278
+ when 2
279
+ return nil if wire.bytesize < off + 4
280
+ wire.byteslice(off, 4).unpack1("V")
281
+ when 3
282
+ return nil if wire.bytesize < off + 8
283
+ lo, hi = wire.byteslice(off, 8).unpack("VV")
284
+ (hi << 32) | lo
285
+ end
286
+ end
287
+
288
+
289
+ # Parses the `Dictionary_ID` field from a Zstd frame header.
290
+ # Returns the id as an Integer (0 if the frame carries no
291
+ # Dictionary_ID field), or `nil` if the header is truncated.
292
+ def parse_frame_dict_id(wire)
293
+ return nil if wire.bytesize < 5
294
+ fhd = wire.getbyte(4)
295
+ did_flag = fhd & 0x03
296
+ single_seg = (fhd >> 5) & 0x01
297
+
298
+ off = 5 + (single_seg == 0 ? 1 : 0)
299
+ case did_flag
300
+ when 0
301
+ 0
302
+ when 1
303
+ return nil if wire.bytesize < off + 1
304
+ wire.getbyte(off)
305
+ when 2
306
+ return nil if wire.bytesize < off + 2
307
+ wire.byteslice(off, 2).unpack1("v")
308
+ when 3
309
+ return nil if wire.bytesize < off + 4
310
+ wire.byteslice(off, 4).unpack1("V")
311
+ end
312
+ end
313
+ end
314
+ end
315
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module NNQ
4
+ module Zstd
5
+ VERSION = "0.1.0"
6
+ end
7
+ end
@@ -0,0 +1,112 @@
1
+ # frozen_string_literal: true
2
+
3
+ module NNQ
4
+ module Zstd
5
+ # Socket decorator that transparently runs every outbound body
6
+ # through the Codec's encoder and every inbound wire message
7
+ # through its decoder. Delegates any unknown method to the wrapped
8
+ # socket, so the wrapper quacks like an `NNQ::Socket`.
9
+ class Wrapper
10
+ attr_reader :codec
11
+
12
+
13
+ def initialize(socket, level:, dicts:)
14
+ @sock = socket
15
+ @codec = Codec.new(
16
+ level: level,
17
+ dicts: dicts,
18
+ recv_max_size: recv_max_size_from(socket),
19
+ )
20
+ start_dict_monitor!
21
+ end
22
+
23
+
24
+ def send(body)
25
+ send_with_codec(body) { |wire| @sock.send(wire) }
26
+ end
27
+
28
+
29
+ def send_reply(body)
30
+ send_with_codec(body) { |wire| @sock.send_reply(wire) }
31
+ end
32
+
33
+
34
+ def send_survey(body)
35
+ send_with_codec(body) { |wire| @sock.send_survey(wire) }
36
+ end
37
+
38
+
39
+ def send_request(body)
40
+ send_with_codec(body) { |wire| @sock.send_request(wire) }
41
+ end
42
+
43
+
44
+ # Loops internally until a real payload arrives or the socket
45
+ # closes. Dict frames are silently installed and discarded.
46
+ def receive
47
+ loop do
48
+ raw = @sock.receive
49
+ return raw if raw.nil?
50
+ decoded = @codec.decode(raw)
51
+ return decoded unless decoded.nil?
52
+ end
53
+ end
54
+
55
+
56
+ def close
57
+ begin
58
+ @monitor_task&.stop
59
+ rescue StandardError
60
+ # Monitor task may already be gone; fine.
61
+ end
62
+ @sock.close
63
+ end
64
+
65
+
66
+ def respond_to_missing?(name, include_private = false)
67
+ @sock.respond_to?(name, include_private)
68
+ end
69
+
70
+
71
+ def method_missing(name, *args, **kwargs, &block)
72
+ if @sock.respond_to?(name)
73
+ @sock.public_send(name, *args, **kwargs, &block)
74
+ else
75
+ super
76
+ end
77
+ end
78
+
79
+
80
+ private
81
+
82
+
83
+ def send_with_codec(body)
84
+ wire, dict_frames = @codec.encode(body)
85
+ dict_frames.each { |df| yield(df) }
86
+ yield(wire)
87
+ end
88
+
89
+
90
+ def recv_max_size_from(socket)
91
+ socket.options.recv_maxsz
92
+ rescue NoMethodError
93
+ nil
94
+ end
95
+
96
+
97
+ # Best-effort: when a new peer connects, requeue every known
98
+ # dict so the next encode call re-ships them in front of its
99
+ # payload. Uses the underlying socket's monitor stream.
100
+ def start_dict_monitor!
101
+ return unless @sock.respond_to?(:monitor)
102
+
103
+ @monitor_task = @sock.monitor do |event|
104
+ @codec.requeue_all_dicts_for_shipping! if event.type == :connected
105
+ end
106
+ rescue StandardError
107
+ # Monitor unavailable on this socket; dict re-shipping on
108
+ # reconnect just won't fire. Not fatal.
109
+ end
110
+ end
111
+ end
112
+ end
data/lib/nnq/zstd.rb ADDED
@@ -0,0 +1,45 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "nnq"
4
+ require "rzstd"
5
+ require "set"
6
+
7
+ require_relative "zstd/version"
8
+
9
+ module NNQ
10
+ module Zstd
11
+ class ProtocolError < StandardError; end
12
+ end
13
+ end
14
+
15
+ require_relative "zstd/codec"
16
+ require_relative "zstd/wrapper"
17
+
18
+ module NNQ
19
+ module Zstd
20
+
21
+ RESERVED_DICT_ID_LOW_MAX = 32_767
22
+ RESERVED_DICT_ID_HIGH_MIN = 2**31
23
+ USER_DICT_ID_RANGE = (32_768..(2**31 - 1)).freeze
24
+
25
+ # Wraps an NNQ::Socket with transparent Zstd compression.
26
+ #
27
+ # Both peers must wrap their sockets; there is no negotiation. See
28
+ # RFC.md for the wire protocol.
29
+ #
30
+ # @param socket [NNQ::Socket]
31
+ # @param level [Integer] Zstd level (default -3; fast strategy)
32
+ # @param dict [String, Array<String>, nil] pre-built dictionary
33
+ # bytes. If provided, training is skipped and all supplied dicts
34
+ # are shipped to peers on the wire. Each buffer MUST be a valid
35
+ # Zstd dictionary (ZDICT magic + header).
36
+ #
37
+ # A receive-only "passive" decoder needs no special flag — wrap
38
+ # the socket and just never call `send`. Training is driven by
39
+ # outbound traffic, so a socket that only receives naturally
40
+ # skips training and dict shipping.
41
+ def self.wrap(socket, level: -3, dict: nil)
42
+ Wrapper.new(socket, level: level, dicts: Array(dict).compact)
43
+ end
44
+ end
45
+ end
metadata ADDED
@@ -0,0 +1,78 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: nnq-zstd
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Patrik Wenger
8
+ bindir: bin
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: nnq
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '0.5'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '0.5'
26
+ - !ruby/object:Gem::Dependency
27
+ name: rzstd
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '0.3'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '0.3'
40
+ description: Wraps any NNQ::Socket with per-message Zstd compression, bounded decompression,
41
+ in-band dictionary shipping, and sender-side dictionary training. No negotiation;
42
+ both peers must wrap.
43
+ email:
44
+ - paddor@gmail.com
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - CHANGELOG.md
50
+ - LICENSE
51
+ - README.md
52
+ - RFC.md
53
+ - lib/nnq/zstd.rb
54
+ - lib/nnq/zstd/codec.rb
55
+ - lib/nnq/zstd/version.rb
56
+ - lib/nnq/zstd/wrapper.rb
57
+ homepage: https://github.com/paddor/nnq-zstd
58
+ licenses:
59
+ - ISC
60
+ metadata: {}
61
+ rdoc_options: []
62
+ require_paths:
63
+ - lib
64
+ required_ruby_version: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '4.0'
69
+ required_rubygems_version: !ruby/object:Gem::Requirement
70
+ requirements:
71
+ - - ">="
72
+ - !ruby/object:Gem::Version
73
+ version: '0'
74
+ requirements: []
75
+ rubygems_version: 4.0.6
76
+ specification_version: 4
77
+ summary: Transparent Zstd compression wrapper for NNQ sockets
78
+ test_files: []