vibe_zstd 1.1.1 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9b3326bfa52942e1f7ee95578bbbff2cd87647748a086742551d562eda6d94f0
4
- data.tar.gz: b594dade59dab715722477dc6d39eaa7768a39505fef2354c494090462f03afd
3
+ metadata.gz: 5eb8d0ac2293ee84d588b9eb237b0c3b4623989c4db8692ad1ab325fd8047695
4
+ data.tar.gz: 1732b409b13fd9e99fecb7ebf8d026f9b6d0119012641c96e66faa3d51c0f5ce
5
5
  SHA512:
6
- metadata.gz: bb5a4e27578f337ef0a72c133344c8d3f4e229b250f07c771d534bf65ffa40c0588d167e3cf01140d5baa1627e6866080b9710c059f15a59245b48eb7de026e8
7
- data.tar.gz: 26f0ac03864044c25068cf00c8039d78d7ac677ec1b61298f0bf09fc8918a48a0105785607a53c12bf2c6f5c7a1c3f47d6dc64aefa4ef27e03803509f3717d80
6
+ metadata.gz: 65cc971c6400d69adaca95b0c520c8bd8ec22648bdb57218f5ac833bd0a63db6908c953b4fca127753f2257f8502b34b89eb383c38a8f315431e4051ff7a9401
7
+ data.tar.gz: 836b8a52c28f8cd0d79ed4a8aa80a286f51dda58e1e483320f4544b2a63ded6e83cdccd3bfea122d267c9a44cb43dabbd2c12e69aeb69162d7d4f4ee7ecb60c3
data/CHANGELOG.md CHANGED
@@ -7,6 +7,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.3.0] - 2026-06-11
11
+
12
+ ### Security
13
+ - Fixed use-after-free: `CompressWriter` and `DecompressReader` now retain their dictionary object. Previously only the raw `ZSTD_CDict*`/`ZSTD_DDict*` pointer was stored, so a dictionary passed as `dict:` without the caller holding their own reference could be garbage-collected while the stream still used it.
14
+ - `DCtx#decompress` now raises `RuntimeError` ("Truncated frame") when an unknown-content-size frame ends mid-stream, instead of silently returning partial output. The known-size path already rejected truncated input; the streaming path now matches.
15
+ - Dictionary training (`train_dict`, `train_dict_cover`, `train_dict_fast_cover`, `finalize_dictionary`) no longer crashes or risks a heap overflow when samples are non-String objects responding to `to_str`, or when a malicious `to_str` mutates other samples mid-validation. Converted samples are retained and the copy is capacity-checked.
16
+ - Source strings are now locked (`rb_str_locktmp`) while the GVL is released during `CCtx#compress`, `DCtx#decompress`, and across `CompressWriter#write`'s IO calls, preventing a use-after-free read if another thread (or re-entrant IO code) mutates the string mid-operation. Unlocking is async-exception-safe via `rb_ensure`.
17
+ - `DecompressReader` snapshots each input chunk (`rb_str_new_frozen`), so IOs that reuse/mutate the buffer string they return can no longer invalidate the decoder's input pointer between reads.
18
+ - `DecompressReader#read` raises `TypeError` when the underlying IO's `read` returns a non-String (non-`to_str`-able) object, instead of crashing the VM.
19
+ - `CCtx.new` / `DCtx.new` raise `ArgumentError` on non-Symbol keyword keys (e.g. `CCtx.new("level" => 3)`) instead of hitting undefined behavior.
20
+
21
+ ### Fixed
22
+ - `DCtx#decompress` (unknown-size path): the C output buffer and per-call dictionary reference are now released via `rb_ensure` on every exit path, so an async exception (e.g. `Timeout`) can no longer leak the buffer or leave the dictionary referenced on the context.
23
+ - `DecompressReader#read(0)` returns `""` without latching EOF, matching IO semantics. Previously it returned `nil` and marked the stream finished.
24
+ - `DecompressReader#gets` no longer mixes character indexes with byte sizes, fixing line splitting with multibyte separators.
25
+ - Build: added `ext/vibe_zstd/depend` so editing the split implementation files (`cctx.c`, `dctx.c`, `dict.c`, `streaming.c`, `frames.c`) or project headers triggers recompilation of the extension.
26
+
27
+ ### Changed
28
+ - `DecompressReader#read(n)` caps its initial allocation (~128KB) and grows geometrically up to `n`, instead of preallocating the full requested size up front (`read(1_000_000_000)` on a small stream no longer allocates 1GB).
29
+ - `VibeZstd::ThreadLocal` uses true thread-local storage (`Thread#thread_variable_get/set`) instead of fiber-local `Thread.current[]`, so fiber-based servers (Falcon, async) reuse one context pool per OS thread rather than churning a fresh pool per fiber.
30
+ - README: prominent warning recommending `max_decompressed_size` when decompressing untrusted input.
31
+
32
+ ## [1.2.0] - 2026-06-06
33
+
34
+ ### Added
35
+ - `DCtx#format` / `#format=` (`ZSTD_d_format`) and magicless-format decompression. Frames produced with `format: 1` (`ZSTD_f_zstd1_magicless`) can now be decompressed by setting `format: 1` on the decompression side.
36
+ - Opt-in decompressed-size limit on `DCtx#decompress`, configurable per-call (`max_decompressed_size:` / `max_size:`), per-instance (`DCtx#max_decompressed_size=`, alias `max_size=`), and as a class default (`DCtx.default_max_decompressed_size=`). Resolved per-call → instance → class → unlimited. Exceeding the limit raises `VibeZstd::DecompressedSizeExceeded` (a subclass of `VibeZstd::Error`). Off by default, preserving existing behavior.
37
+ - `VibeZstd.compress` / `VibeZstd.decompress` now accept context (sticky) parameters as keyword arguments (e.g. `checksum_flag:`, `window_log:`, `workers:`, `format:`), applying them to a fresh context. Per-call options are still passed to the operation.
38
+
39
+ ### Fixed
40
+ - `CCtx#compress` now honors parameters configured on the context (`compression_level`, `checksum_flag`, `window_log`, `workers`, `format`, etc.). It previously used `ZSTD_compressCCtx`, which ignores all sticky parameters, so context configuration was silently discarded and one-shot compression always ran at the default level.
41
+ - `DCtx#decompress` now applies the dictionary on the unknown-content-size path. Dictionary frames produced by `CompressWriter` (which never pledges a size) previously failed to decompress with "Dictionary mismatch".
42
+ - `VibeZstd.read_skippable_frame` caps its allocation to the bytes actually present instead of trusting the frame's content-size header, preventing a tiny truncated input from forcing a multi-gigabyte allocation.
43
+ - Passing an unknown keyword to `VibeZstd.compress` / `VibeZstd.decompress` now raises `NoMethodError` instead of being silently ignored.
44
+
10
45
  ## [1.1.1] - 2026-03-25
11
46
 
12
47
  ### Fixed
@@ -54,6 +89,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
54
89
  - Thread pool support for parallel compression
55
90
  - Memory-efficient API for large files
56
91
 
92
+ [1.2.0]: https://github.com/kreynolds/vibe_zstd/compare/v1.1.1...v1.2.0
57
93
  [1.1.1]: https://github.com/kreynolds/vibe_zstd/compare/v1.1.0...v1.1.1
58
94
  [1.1.0]: https://github.com/kreynolds/vibe_zstd/compare/v1.0.2...v1.1.0
59
95
  [1.0.2]: https://github.com/kreynolds/vibe_zstd/compare/v1.0.1...v1.0.2
data/README.md CHANGED
@@ -134,6 +134,33 @@ decompressed = VibeZstd.decompress(compressed)
134
134
  compressed = VibeZstd.compress(data, level: 5)
135
135
  ```
136
136
 
137
+ The convenience methods accept the same options as `CCtx#compress` /
138
+ `DCtx#decompress`, plus any context (sticky) parameter, which is applied to the
139
+ internally-created context:
140
+
141
+ ```ruby
142
+ # Per-call options (level, dict, pledged_size / dict, initial_capacity, max_size)
143
+ compressed = VibeZstd.compress(data, level: 9, dict: cdict)
144
+ decompressed = VibeZstd.decompress(compressed, dict: ddict, max_size: 10 * 1024 * 1024)
145
+
146
+ # Context parameters work too (checksum_flag, window_log, workers, format, ...)
147
+ compressed = VibeZstd.compress(data, checksum_flag: true, window_log: 20)
148
+
149
+ # An unknown keyword raises NoMethodError instead of being silently ignored.
150
+ ```
151
+
152
+ > [!IMPORTANT]
153
+ > **Decompressing untrusted input?** Always set an output-size limit. By default
154
+ > there is no cap, so a tiny malicious frame can demand an enormous allocation
155
+ > (a "decompression bomb"):
156
+ >
157
+ > ```ruby
158
+ > VibeZstd.decompress(untrusted, max_decompressed_size: 50 * 1024 * 1024)
159
+ > ```
160
+ >
161
+ > See [Limiting Decompressed Size](#limiting-decompressed-size) for per-instance
162
+ > and process-wide defaults.
163
+
137
164
  ### Using Contexts (Recommended)
138
165
 
139
166
  For multiple operations, create reusable contexts:
@@ -550,6 +577,50 @@ VibeZstd::DCtx.default_initial_capacity = nil
550
577
  - **Large data (> 1MB)**: Set to `1_048_576` or higher
551
578
  - **Known-size frames**: Not applicable (size read from frame header)
552
579
 
580
+ #### Limiting Decompressed Size
581
+
582
+ When decompressing untrusted data, an attacker-controlled frame can declare or
583
+ expand to an enormous output (a "decompression bomb"). Set an opt-in output-size
584
+ limit; exceeding it raises `VibeZstd::DecompressedSizeExceeded` (a subclass of
585
+ `VibeZstd::Error`). It is off by default, so existing behavior is unchanged.
586
+
587
+ ```ruby
588
+ # Per call (alias: max_size)
589
+ VibeZstd::DCtx.new.decompress(data, max_decompressed_size: 50 * 1024 * 1024)
590
+
591
+ # Per instance
592
+ dctx = VibeZstd::DCtx.new(max_decompressed_size: 50 * 1024 * 1024)
593
+
594
+ # Class default for all new instances
595
+ VibeZstd::DCtx.default_max_decompressed_size = 100 * 1024 * 1024
596
+ VibeZstd::DCtx.default_max_decompressed_size = nil # unlimited again
597
+
598
+ # Resolution order: per-call → instance → class default → unlimited
599
+ begin
600
+ dctx.decompress(untrusted)
601
+ rescue VibeZstd::DecompressedSizeExceeded => e
602
+ warn "rejected oversized payload: #{e.message}"
603
+ end
604
+ ```
605
+
606
+ For frames with a known content size the limit is checked against the declared
607
+ size *before* allocating; for unknown-size frames the output buffer never grows
608
+ past the limit. This complements `window_log_max`, which bounds decoder *window*
609
+ memory but not total output size.
610
+
611
+ #### Magicless Frames
612
+
613
+ Frames compressed with `format: 1` (`ZSTD_f_zstd1_magicless`) omit the 4-byte
614
+ magic number. Decompress them by setting the same format on the decompression
615
+ side:
616
+
617
+ ```ruby
618
+ compressed = VibeZstd.compress(data, format: 1) # CCtx format parameter
619
+ original = VibeZstd.decompress(compressed, format: 1) # DCtx#format / #format=
620
+ ```
621
+
622
+ A magicless `DCtx` cannot read ordinary (magic-prefixed) frames, and vice versa.
623
+
553
624
  ### Memory Estimation
554
625
 
555
626
  Estimate memory usage before creating contexts:
@@ -810,8 +881,9 @@ end
810
881
  ### Module Methods
811
882
 
812
883
  ```ruby
813
- VibeZstd.compress(data, level: nil, dict: nil)
814
- VibeZstd.decompress(data, dict: nil)
884
+ # Per-call options plus any context (sticky) parameter as a keyword.
885
+ VibeZstd.compress(data, level: nil, dict: nil, pledged_size: nil, **ctx_params)
886
+ VibeZstd.decompress(data, dict: nil, initial_capacity: nil, max_decompressed_size: nil, **ctx_params)
815
887
  VibeZstd.frame_content_size(data)
816
888
  VibeZstd.compress_bound(size)
817
889
  VibeZstd.train_dict(samples, max_dict_size: 112640)
@@ -839,6 +911,7 @@ cctx.content_size_flag = 1
839
911
  cctx.compression_level = 9
840
912
  cctx.window_log = 20
841
913
  cctx.workers = 4
914
+ cctx.format = 1 # ZSTD_f_zstd1_magicless (omit the 4-byte magic number)
842
915
  # ... and many more
843
916
 
844
917
  # Class methods
@@ -850,13 +923,16 @@ VibeZstd::CCtx.estimate_memory(level)
850
923
 
851
924
  ```ruby
852
925
  dctx = VibeZstd::DCtx.new(**params)
853
- dctx.decompress(data, dict: nil, initial_capacity: nil)
926
+ dctx.decompress(data, dict: nil, initial_capacity: nil, max_decompressed_size: nil)
854
927
  dctx.use_prefix(prefix_data)
855
928
  dctx.initial_capacity = 1_048_576
856
929
  dctx.window_log_max = 20
930
+ dctx.max_decompressed_size = 50 * 1024 * 1024 # alias: max_size; raises DecompressedSizeExceeded
931
+ dctx.format = 1 # ZSTD_d_format (magicless frames)
857
932
 
858
933
  # Class methods
859
934
  VibeZstd::DCtx.default_initial_capacity = value
935
+ VibeZstd::DCtx.default_max_decompressed_size = value # 0/nil = unlimited
860
936
  VibeZstd::DCtx.parameter_bounds(param)
861
937
  VibeZstd::DCtx.frame_content_size(data)
862
938
  VibeZstd::DCtx.estimate_memory
data/ext/vibe_zstd/cctx.c CHANGED
@@ -7,6 +7,11 @@ extern rb_data_type_t vibe_zstd_cctx_type;
7
7
  // Helper to set CCtx parameter from Ruby keyword argument
8
8
  static int
9
9
  vibe_zstd_cctx_init_param_iter(VALUE key, VALUE value, VALUE self) {
10
+ // Reject non-Symbol keys early; SYM2ID on a non-Symbol is undefined behavior.
11
+ if (!SYMBOL_P(key)) {
12
+ rb_raise(rb_eArgError, "keyword key must be a Symbol, got %"PRIsVALUE, rb_inspect(key));
13
+ }
14
+
10
15
  // Build the setter method name: key + "="
11
16
  const char* key_str = rb_id2name(SYM2ID(key));
12
17
  char setter[256];
@@ -45,35 +50,40 @@ vibe_zstd_cctx_estimate_memory(VALUE self, VALUE level) {
45
50
  // Releasing the GVL allows other Ruby threads to run during CPU-intensive compression.
46
51
  typedef struct {
47
52
  ZSTD_CCtx* cctx;
48
- ZSTD_CDict* cdict;
49
53
  const void* src;
50
54
  size_t srcSize;
51
55
  void* dst;
52
56
  size_t dstCapacity;
53
- int compressionLevel;
54
57
  size_t result;
55
58
  } compress_args;
56
59
 
57
60
  // Compress without holding Ruby's GVL
58
61
  // Called via rb_thread_call_without_gvl to allow parallel Ruby thread execution
59
- // during CPU-intensive compression operations
62
+ // during CPU-intensive compression operations.
63
+ //
64
+ // Uses ZSTD_compress2 (the advanced one-shot API) so that "sticky" parameters
65
+ // configured on the context (compression_level, checksum_flag, window_log,
66
+ // workers, long_distance_matching, etc.) are honored. The legacy ZSTD_compressCCtx
67
+ // silently ignores all sticky parameters, which made context configuration a no-op.
60
68
  static void*
61
69
  compress_without_gvl(void* arg) {
62
70
  compress_args* args = arg;
63
- if (args->cdict) {
64
- args->result = ZSTD_compress_usingCDict(args->cctx, args->dst, args->dstCapacity, args->src, args->srcSize, args->cdict);
65
- } else {
66
- args->result = ZSTD_compressCCtx(args->cctx, args->dst, args->dstCapacity, args->src, args->srcSize, args->compressionLevel);
67
- }
71
+ args->result = ZSTD_compress2(args->cctx, args->dst, args->dstCapacity, args->src, args->srcSize);
68
72
  return NULL;
69
73
  }
70
74
 
71
75
  // CCtx compress - Compress data using this context
72
76
  //
73
- // Supports per-operation parameters via keyword arguments:
74
- // - level: Compression level (overrides context setting for this operation)
75
- // - dict: CDict to use for compression
76
- // - pledged_size: Expected input size for optimization (optional)
77
+ // Honors all parameters configured on the context (sticky parameters), e.g.
78
+ // compression_level, checksum_flag, window_log, workers, etc.
79
+ //
80
+ // Supports per-operation overrides via keyword arguments:
81
+ // - level: Compression level for this call only (restored afterward)
82
+ // - dict: CDict to use for this call only (un-referenced afterward)
83
+ // - pledged_size: Expected input size (enforced; resets after the frame)
84
+ //
85
+ // Per-call overrides are applied around the compression and then restored so
86
+ // repeated one-shot calls on the same context remain independent.
77
87
  //
78
88
  // Uses ZSTD_compressBound to allocate worst-case output buffer size,
79
89
  // which is the recommended approach for one-shot compression.
@@ -86,19 +96,20 @@ vibe_zstd_cctx_compress(int argc, VALUE* argv, VALUE self) {
86
96
  TypedData_Get_Struct(self, vibe_zstd_cctx, &vibe_zstd_cctx_type, cctx);
87
97
  StringValue(data);
88
98
 
89
- // Extract keyword arguments
90
- int lvl = ZSTD_defaultCLevel();
99
+ // Extract keyword arguments (all optional, all per-call overrides)
100
+ int has_level = 0;
101
+ int lvl = 0;
91
102
  ZSTD_CDict* cdict = NULL;
103
+ int has_pledged = 0;
92
104
  unsigned long long pledged_size = ZSTD_CONTENTSIZE_UNKNOWN;
93
105
 
94
106
  if (!NIL_P(options)) {
95
- // Handle level keyword argument
96
107
  VALUE level_val = rb_hash_aref(options, ID2SYM(rb_intern("level")));
97
108
  if (!NIL_P(level_val)) {
109
+ has_level = 1;
98
110
  lvl = NUM2INT(level_val);
99
111
  }
100
112
 
101
- // Handle dict keyword argument
102
113
  VALUE dict_val = rb_hash_aref(options, ID2SYM(rb_intern("dict")));
103
114
  if (!NIL_P(dict_val)) {
104
115
  vibe_zstd_cdict* cdict_struct;
@@ -106,18 +117,44 @@ vibe_zstd_cctx_compress(int argc, VALUE* argv, VALUE self) {
106
117
  cdict = cdict_struct->cdict;
107
118
  }
108
119
 
109
- // Handle pledged_size keyword argument
110
120
  VALUE pledged_size_val = rb_hash_aref(options, ID2SYM(rb_intern("pledged_size")));
111
121
  if (!NIL_P(pledged_size_val)) {
122
+ has_pledged = 1;
112
123
  pledged_size = NUM2ULL(pledged_size_val);
113
124
  }
114
125
  }
115
126
 
116
- // Set pledged size if provided
117
- if (pledged_size != ZSTD_CONTENTSIZE_UNKNOWN) {
118
- size_t result = ZSTD_CCtx_setPledgedSrcSize(cctx->cctx, pledged_size);
119
- if (ZSTD_isError(result)) {
120
- rb_raise(rb_eRuntimeError, "Failed to set pledged_size %llu: %s", pledged_size, ZSTD_getErrorName(result));
127
+ // Apply per-call compression level override without permanently mutating the
128
+ // context's configured level. The previous value is captured and restored.
129
+ int prev_level = 0;
130
+ if (has_level) {
131
+ size_t gp = ZSTD_CCtx_getParameter(cctx->cctx, ZSTD_c_compressionLevel, &prev_level);
132
+ if (ZSTD_isError(gp)) {
133
+ rb_raise(rb_eRuntimeError, "Failed to read compression level: %s", ZSTD_getErrorName(gp));
134
+ }
135
+ size_t sp = ZSTD_CCtx_setParameter(cctx->cctx, ZSTD_c_compressionLevel, lvl);
136
+ if (ZSTD_isError(sp)) {
137
+ rb_raise(rb_eArgError, "Invalid level %d: %s", lvl, ZSTD_getErrorName(sp));
138
+ }
139
+ }
140
+
141
+ // Reference a per-call dictionary; un-referenced after compression so the
142
+ // context returns to no-dictionary mode for subsequent calls.
143
+ if (cdict) {
144
+ size_t rc = ZSTD_CCtx_refCDict(cctx->cctx, cdict);
145
+ if (ZSTD_isError(rc)) {
146
+ if (has_level) ZSTD_CCtx_setParameter(cctx->cctx, ZSTD_c_compressionLevel, prev_level);
147
+ rb_raise(rb_eRuntimeError, "Failed to set dictionary: %s", ZSTD_getErrorName(rc));
148
+ }
149
+ }
150
+
151
+ // Set pledged size if provided (resets to UNKNOWN automatically after the frame)
152
+ if (has_pledged) {
153
+ size_t sps = ZSTD_CCtx_setPledgedSrcSize(cctx->cctx, pledged_size);
154
+ if (ZSTD_isError(sps)) {
155
+ if (cdict) ZSTD_CCtx_refCDict(cctx->cctx, NULL);
156
+ if (has_level) ZSTD_CCtx_setParameter(cctx->cctx, ZSTD_c_compressionLevel, prev_level);
157
+ rb_raise(rb_eRuntimeError, "Failed to set pledged_size %llu: %s", pledged_size, ZSTD_getErrorName(sps));
121
158
  }
122
159
  }
123
160
 
@@ -126,15 +163,24 @@ vibe_zstd_cctx_compress(int argc, VALUE* argv, VALUE self) {
126
163
  VALUE result_str = rb_str_new(NULL, dstCapacity);
127
164
  compress_args args = {
128
165
  .cctx = cctx->cctx,
129
- .cdict = cdict,
130
166
  .src = RSTRING_PTR(data),
131
167
  .srcSize = srcSize,
132
168
  .dst = RSTRING_PTR(result_str),
133
169
  .dstCapacity = dstCapacity,
134
- .compressionLevel = lvl,
135
170
  .result = 0
136
171
  };
137
- rb_thread_call_without_gvl(compress_without_gvl, &args, NULL, NULL);
172
+ // Lock the source string for the duration of the GVL-released compression.
173
+ // Without this, another Ruby thread holding the same String object could
174
+ // modify or reallocate it while compression reads from its buffer, causing
175
+ // a use-after-free read. The helper unlocks via rb_ensure so the string
176
+ // is never left permanently locked, even if an async exception (e.g.
177
+ // Timeout, Thread#raise) is delivered when the GVL is reacquired.
178
+ vibe_zstd_nogvl_with_str_locked(compress_without_gvl, &args, data);
179
+
180
+ // Restore context state so repeated one-shot calls remain independent.
181
+ if (cdict) ZSTD_CCtx_refCDict(cctx->cctx, NULL);
182
+ if (has_level) ZSTD_CCtx_setParameter(cctx->cctx, ZSTD_c_compressionLevel, prev_level);
183
+
138
184
  if (ZSTD_isError(args.result)) {
139
185
  rb_raise(rb_eRuntimeError, "Compression failed: %s", ZSTD_getErrorName(args.result));
140
186
  }