multi_compress 0.3.3 → 0.3.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -0
- data/GET_STARTED.md +6 -4
- data/README.md +2 -2
- data/ext/multi_compress/extconf.rb +57 -2
- data/ext/multi_compress/multi_compress.c +208 -120
- data/lib/multi_compress/version.rb +1 -1
- data/lib/multi_compress.rb +80 -41
- metadata +4 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9f30991dd0d788507bb43885eacc3ae6c131722b0c77428778e7e3176d6ef220
|
|
4
|
+
data.tar.gz: 53ccbd3e9bf75b8b5eb74a8a6d523baab825ec3492ac730dbe2f739732b724e2
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ad1f3ad3d2ba84e5eec72c1c0b5fed2dade4b3f1dd96efc11b6454ec9f9af78e4829e070c7310283d6500ed71a4a9895334a9d91bd6a0c86f4e947a98c5b0706
|
|
7
|
+
data.tar.gz: 11369cc33cae0ecb46cf0e9c84e0bc35c72bd2f7cb1cec46425c5cbbc27dfcc9c2037d5e5ab75a4e23c721229577b7ede3aea7bfe7327644024d1be7af229c57
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,25 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.5]
|
|
4
|
+
|
|
5
|
+
### Changed
|
|
6
|
+
- Lowered the minimum supported Ruby version to **2.7.1**.
|
|
7
|
+
|
|
8
|
+
### Fixed
|
|
9
|
+
- Made Fiber Scheduler integration conditional on the presence of Ruby's `ruby/fiber/scheduler.h` C API.
|
|
10
|
+
Ruby 2.7.1 now builds and installs without the scheduler header; scheduler-aware execution remains enabled automatically on supported newer Rubies.
|
|
11
|
+
|
|
12
|
+
### Notes
|
|
13
|
+
- On Ruby 2.7.1, all public compression, decompression, streaming, dictionary, and IO APIs are available.
|
|
14
|
+
Fiber Scheduler coordination is unavailable on that runtime, so operations use the existing direct/NOGVL execution paths.
|
|
15
|
+
|
|
16
|
+
## [0.3.4]
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
- Improved one-shot Zstd performance by reusing per-thread CCtx/DCtx.
|
|
20
|
+
- On deterministic arm64-darwin benchmarks, total zstd roundtrip improved by ~11–19% on ~10KB payloads,
|
|
21
|
+
~1–4% on medium payloads, and ~2–6% on log-like large payloads, with identical compressed sizes.
|
|
22
|
+
|
|
3
23
|
## [0.3.3]
|
|
4
24
|
|
|
5
25
|
### Changed
|
data/GET_STARTED.md
CHANGED
|
@@ -188,7 +188,7 @@ end
|
|
|
188
188
|
|
|
189
189
|
## Fiber-friendly Execution
|
|
190
190
|
|
|
191
|
-
Starting with **v0.2.0**, MultiCompress is
|
|
191
|
+
Starting with **v0.2.0**, MultiCompress is fiber-friendly on Ruby runtimes that expose the Fiber Scheduler C API and plays nicely with `Fiber::Scheduler`-based runtimes like [async](https://github.com/socketry/async) and [falcon](https://github.com/socketry/falcon). Ruby 2.7.1 is supported through the same public API, using direct/NOGVL execution because that runtime does not expose this scheduler API.
|
|
192
192
|
|
|
193
193
|
### The Problem It Solves
|
|
194
194
|
|
|
@@ -196,7 +196,7 @@ Compression is CPU-bound work. Historically, calling `zstd`/`lz4`/`brotli` from
|
|
|
196
196
|
|
|
197
197
|
### How It Works
|
|
198
198
|
|
|
199
|
-
When MultiCompress detects an active `Fiber::Scheduler`, it:
|
|
199
|
+
When the compiled Ruby runtime exposes the scheduler API and MultiCompress detects an active `Fiber::Scheduler`, it:
|
|
200
200
|
|
|
201
201
|
1. Spawns a **dedicated worker thread** via `rb_thread_create` to run the compression with the GVL released.
|
|
202
202
|
2. Parks the calling fiber with `rb_fiber_scheduler_block(scheduler, blocker, Qnil)`.
|
|
@@ -325,8 +325,10 @@ end
|
|
|
325
325
|
|
|
326
326
|
### Requirements
|
|
327
327
|
|
|
328
|
-
- Ruby **>=
|
|
329
|
-
-
|
|
328
|
+
- Ruby **>= 2.7.1**
|
|
329
|
+
- For Fiber Scheduler cooperation: a Ruby runtime exposing the Fiber Scheduler C API and a running `Fiber::Scheduler` — typically provided by `Async { ... }` or Falcon's web server
|
|
330
|
+
|
|
331
|
+
Ruby 2.7.1 supports all compression, decompression, streaming, dictionary, and IO APIs. It does not expose the Fiber Scheduler C API, so operations use the normal direct/NOGVL execution paths instead of scheduler coordination.
|
|
330
332
|
|
|
331
333
|
### No Code Changes Required
|
|
332
334
|
|
data/README.md
CHANGED
|
@@ -33,7 +33,7 @@ Bundled library versions in the current release:
|
|
|
33
33
|
- **Zero external dependencies**: All C libraries are vendored and compiled
|
|
34
34
|
- **Unified API**: Same interface for all algorithms — just change the `algo:` parameter
|
|
35
35
|
- **Performance first**: Direct bindings to C libraries, minimal overhead
|
|
36
|
-
- **Fiber-friendly**:
|
|
36
|
+
- **Fiber-friendly when available**: On Ruby runtimes exposing the Fiber Scheduler C API, compression and decompression cooperate with an active `Fiber::Scheduler` — safe to use under `async`, `falcon`, or similar runtimes without blocking the event loop. On Ruby 2.7.1, the same API works through the normal direct/NOGVL execution paths. See [GET_STARTED.md](GET_STARTED.md) for details and examples.
|
|
37
37
|
- **Memory efficient**: Streaming support for large datasets, proper resource cleanup
|
|
38
38
|
- **Operationally focused**: Clear errors, comprehensive tests, and streaming support for practical workloads
|
|
39
39
|
|
|
@@ -174,7 +174,7 @@ Or use the build script:
|
|
|
174
174
|
|
|
175
175
|
## Requirements
|
|
176
176
|
|
|
177
|
-
- Ruby >=
|
|
177
|
+
- Ruby >= 2.7.1
|
|
178
178
|
- C compiler (gcc, clang)
|
|
179
179
|
|
|
180
180
|
## License
|
|
@@ -6,6 +6,8 @@ USE_SYSTEM = arg_config("--use-system-libraries") ||
|
|
|
6
6
|
ENV["COMPRESS_USE_SYSTEM_LIBRARIES"]
|
|
7
7
|
FORCE_VENDORED = arg_config("--force-vendored") ||
|
|
8
8
|
ENV["COMPRESS_FORCE_VENDORED"]
|
|
9
|
+
DISABLE_ZSTD_ASM = arg_config("--disable-zstd-asm") ||
|
|
10
|
+
ENV["MULTI_COMPRESS_DISABLE_ZSTD_ASM"] == "1"
|
|
9
11
|
|
|
10
12
|
ZSTD_SUBDIRS = %w[lib/common lib/compress lib/decompress lib/dictBuilder].freeze
|
|
11
13
|
BROTLI_SUBDIRS = %w[c/common c/enc c/dec].freeze
|
|
@@ -53,6 +55,15 @@ def find_compress_c_dir
|
|
|
53
55
|
&.then { |path| File.expand_path(path) } || __dir__
|
|
54
56
|
end
|
|
55
57
|
|
|
58
|
+
def zstd_asm_supported?
|
|
59
|
+
case RUBY_PLATFORM
|
|
60
|
+
when /x86_64|amd64/
|
|
61
|
+
!RUBY_PLATFORM.include?("mswin") && !RUBY_PLATFORM.include?("mingw")
|
|
62
|
+
else
|
|
63
|
+
false
|
|
64
|
+
end
|
|
65
|
+
end
|
|
66
|
+
|
|
56
67
|
def configure_system_libraries
|
|
57
68
|
puts "Building with SYSTEM libraries"
|
|
58
69
|
|
|
@@ -98,7 +109,10 @@ def configure_vendored_libraries(vendor_dir)
|
|
|
98
109
|
puts " #{all_vendor_srcs.length} vendored C files"
|
|
99
110
|
|
|
100
111
|
add_include_dirs(zstd_dir, lz4_dir, brotli_dir)
|
|
101
|
-
|
|
112
|
+
if DISABLE_ZSTD_ASM
|
|
113
|
+
$CPPFLAGS += " -DZSTD_DISABLE_ASM"
|
|
114
|
+
puts " ZSTD ASM Huffman decoder disabled (--disable-zstd-asm or MULTI_COMPRESS_DISABLE_ZSTD_ASM=1)"
|
|
115
|
+
end
|
|
102
116
|
|
|
103
117
|
vpath_dirs = build_vpath_dirs(zstd_dir, lz4_dir, brotli_dir)
|
|
104
118
|
|
|
@@ -106,9 +120,14 @@ def configure_vendored_libraries(vendor_dir)
|
|
|
106
120
|
|
|
107
121
|
compress_c_dir = find_compress_c_dir
|
|
108
122
|
|
|
109
|
-
|
|
123
|
+
c_srcs = all_vendor_srcs.reject { |s| s.end_with?(".S") }
|
|
124
|
+
asm_srcs = all_vendor_srcs.select { |s| s.end_with?(".S") }
|
|
125
|
+
|
|
126
|
+
$srcs = ["multi_compress.c"] + c_srcs.map { |s| File.basename(s) }
|
|
110
127
|
$VPATH = [compress_c_dir] + vpath_dirs
|
|
111
128
|
|
|
129
|
+
$multi_compress_asm_srcs = asm_srcs
|
|
130
|
+
|
|
112
131
|
$warnflags = ""
|
|
113
132
|
|
|
114
133
|
vpath_dirs
|
|
@@ -117,6 +136,11 @@ end
|
|
|
117
136
|
def collect_vendor_sources(zstd_dir, lz4_dir, brotli_dir)
|
|
118
137
|
zstd_srcs = ZSTD_SUBDIRS.flat_map { |d| Dir[File.join(zstd_dir, d, "*.c")] }
|
|
119
138
|
|
|
139
|
+
unless DISABLE_ZSTD_ASM
|
|
140
|
+
asm = File.join(zstd_dir, "lib", "decompress", "huf_decompress_amd64.S")
|
|
141
|
+
zstd_srcs << asm if File.exist?(asm) && zstd_asm_supported?
|
|
142
|
+
end
|
|
143
|
+
|
|
120
144
|
lz4_srcs = LZ4_SOURCES.filter_map do |f|
|
|
121
145
|
path = File.join(lz4_dir, "lib", f)
|
|
122
146
|
path if File.exist?(path)
|
|
@@ -178,6 +202,36 @@ def patch_makefile_vpath!(vpath_dirs)
|
|
|
178
202
|
puts " Patched Makefile with #{vpath_dirs.length} VPATH entries"
|
|
179
203
|
end
|
|
180
204
|
|
|
205
|
+
def patch_makefile_asm!(asm_srcs)
|
|
206
|
+
return if asm_srcs.nil? || asm_srcs.empty?
|
|
207
|
+
|
|
208
|
+
makefile = File.read("Makefile")
|
|
209
|
+
return if makefile.include?("# vendored asm")
|
|
210
|
+
|
|
211
|
+
asm_dirs = asm_srcs.map { |s| File.dirname(s) }.uniq
|
|
212
|
+
vpath_lines = asm_dirs.map { |d| "vpath %.S #{d}" }.join("\n")
|
|
213
|
+
|
|
214
|
+
asm_objs = asm_srcs.map { |s| File.basename(s, ".S") + ".o" }
|
|
215
|
+
obj_append = asm_objs.join(" ")
|
|
216
|
+
|
|
217
|
+
unless makefile.sub!(/^(OBJS\s*=\s*[^\n]+?)(\s*)$/) { "#{Regexp.last_match(1)} #{obj_append}#{Regexp.last_match(2)}" }
|
|
218
|
+
makefile << "\nOBJS = #{obj_append}\n"
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
pattern_rule = <<~MAKE
|
|
222
|
+
# vendored asm
|
|
223
|
+
#{vpath_lines}
|
|
224
|
+
%.o: %.S
|
|
225
|
+
\t$(ECHO) compiling $(<)
|
|
226
|
+
\t$(Q) $(CC) $(INCFLAGS) $(CPPFLAGS) $(CFLAGS) -c -o $@ $<
|
|
227
|
+
MAKE
|
|
228
|
+
|
|
229
|
+
makefile << "\n#{pattern_rule}\n"
|
|
230
|
+
|
|
231
|
+
File.write("Makefile", makefile)
|
|
232
|
+
puts " Patched Makefile with #{asm_srcs.length} ASM source(s): #{asm_objs.join(", ")}"
|
|
233
|
+
end
|
|
234
|
+
|
|
181
235
|
# --- Main ---
|
|
182
236
|
|
|
183
237
|
VENDOR_DIR = find_vendor_dir
|
|
@@ -211,3 +265,4 @@ have_library("pthread") unless RUBY_PLATFORM.include?("darwin")
|
|
|
211
265
|
create_makefile("multi_compress/multi_compress")
|
|
212
266
|
|
|
213
267
|
patch_makefile_vpath!(vpath_dirs) if VENDORED && !USE_SYSTEM && vpath_dirs
|
|
268
|
+
patch_makefile_asm!($multi_compress_asm_srcs) if VENDORED && !USE_SYSTEM && $multi_compress_asm_srcs
|
|
@@ -1,7 +1,10 @@
|
|
|
1
1
|
#include <ruby.h>
|
|
2
2
|
#include <ruby/encoding.h>
|
|
3
3
|
#include <ruby/thread.h>
|
|
4
|
+
|
|
5
|
+
#ifdef HAVE_RUBY_FIBER_SCHEDULER_H
|
|
4
6
|
#include <ruby/fiber/scheduler.h>
|
|
7
|
+
#endif
|
|
5
8
|
#include <brotli/decode.h>
|
|
6
9
|
#include <brotli/encode.h>
|
|
7
10
|
#include <lz4.h>
|
|
@@ -95,6 +98,97 @@ typedef enum { LZ4_FORMAT_BLOCK = 0, LZ4_FORMAT_FRAME = 1 } lz4_format_t;
|
|
|
95
98
|
|
|
96
99
|
#define MC_NUM_ALGOS 3
|
|
97
100
|
|
|
101
|
+
static pthread_once_t zstd_tls_once = PTHREAD_ONCE_INIT;
|
|
102
|
+
static pthread_key_t zstd_cctx_key;
|
|
103
|
+
static pthread_key_t zstd_dctx_key;
|
|
104
|
+
|
|
105
|
+
static void zstd_tls_free_cctx(void *ptr) {
|
|
106
|
+
if (ptr)
|
|
107
|
+
ZSTD_freeCCtx((ZSTD_CCtx *)ptr);
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
static void zstd_tls_free_dctx(void *ptr) {
|
|
111
|
+
if (ptr)
|
|
112
|
+
ZSTD_freeDCtx((ZSTD_DCtx *)ptr);
|
|
113
|
+
}
|
|
114
|
+
|
|
115
|
+
static void zstd_tls_init(void) {
|
|
116
|
+
if (pthread_key_create(&zstd_cctx_key, zstd_tls_free_cctx) != 0)
|
|
117
|
+
abort();
|
|
118
|
+
if (pthread_key_create(&zstd_dctx_key, zstd_tls_free_dctx) != 0)
|
|
119
|
+
abort();
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
static ZSTD_CCtx *zstd_tls_get_cctx(void) {
|
|
123
|
+
pthread_once(&zstd_tls_once, zstd_tls_init);
|
|
124
|
+
|
|
125
|
+
ZSTD_CCtx *cctx = (ZSTD_CCtx *)pthread_getspecific(zstd_cctx_key);
|
|
126
|
+
if (cctx)
|
|
127
|
+
return cctx;
|
|
128
|
+
|
|
129
|
+
cctx = ZSTD_createCCtx();
|
|
130
|
+
if (!cctx)
|
|
131
|
+
return NULL;
|
|
132
|
+
|
|
133
|
+
if (pthread_setspecific(zstd_cctx_key, cctx) != 0) {
|
|
134
|
+
ZSTD_freeCCtx(cctx);
|
|
135
|
+
return NULL;
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
return cctx;
|
|
139
|
+
}
|
|
140
|
+
|
|
141
|
+
static ZSTD_DCtx *zstd_tls_get_dctx(void) {
|
|
142
|
+
pthread_once(&zstd_tls_once, zstd_tls_init);
|
|
143
|
+
|
|
144
|
+
ZSTD_DCtx *dctx = (ZSTD_DCtx *)pthread_getspecific(zstd_dctx_key);
|
|
145
|
+
if (dctx)
|
|
146
|
+
return dctx;
|
|
147
|
+
|
|
148
|
+
dctx = ZSTD_createDCtx();
|
|
149
|
+
if (!dctx)
|
|
150
|
+
return NULL;
|
|
151
|
+
|
|
152
|
+
if (pthread_setspecific(zstd_dctx_key, dctx) != 0) {
|
|
153
|
+
ZSTD_freeDCtx(dctx);
|
|
154
|
+
return NULL;
|
|
155
|
+
}
|
|
156
|
+
|
|
157
|
+
return dctx;
|
|
158
|
+
}
|
|
159
|
+
|
|
160
|
+
static size_t zstd_compress_cached(char *dst, size_t dst_cap, const char *src, size_t src_len,
|
|
161
|
+
int level, ZSTD_CDict *cdict, int *ctx_error) {
|
|
162
|
+
*ctx_error = 0;
|
|
163
|
+
|
|
164
|
+
ZSTD_CCtx *cctx = zstd_tls_get_cctx();
|
|
165
|
+
if (!cctx) {
|
|
166
|
+
*ctx_error = 1;
|
|
167
|
+
return 0;
|
|
168
|
+
}
|
|
169
|
+
|
|
170
|
+
if (cdict)
|
|
171
|
+
return ZSTD_compress_usingCDict(cctx, dst, dst_cap, src, src_len, cdict);
|
|
172
|
+
|
|
173
|
+
return ZSTD_compressCCtx(cctx, dst, dst_cap, src, src_len, level);
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
static size_t zstd_decompress_cached(void *dst, size_t dst_cap, const void *src, size_t src_len,
|
|
177
|
+
ZSTD_DDict *ddict, int *ctx_error) {
|
|
178
|
+
*ctx_error = 0;
|
|
179
|
+
|
|
180
|
+
ZSTD_DCtx *dctx = zstd_tls_get_dctx();
|
|
181
|
+
if (!dctx) {
|
|
182
|
+
*ctx_error = 1;
|
|
183
|
+
return 0;
|
|
184
|
+
}
|
|
185
|
+
|
|
186
|
+
if (ddict)
|
|
187
|
+
return ZSTD_decompress_usingDDict(dctx, dst, dst_cap, src, src_len, ddict);
|
|
188
|
+
|
|
189
|
+
return ZSTD_decompressDCtx(dctx, dst, dst_cap, src, src_len);
|
|
190
|
+
}
|
|
191
|
+
|
|
98
192
|
_Static_assert(ALGO_BROTLI == MC_NUM_ALGOS - 1,
|
|
99
193
|
"compress_algo_t must be contiguous [0..MC_NUM_ALGOS-1]");
|
|
100
194
|
|
|
@@ -439,8 +533,6 @@ static inline VALUE rb_binary_str_buf_new(long capa) {
|
|
|
439
533
|
static inline VALUE rb_binary_str_buf_reserve(long capa) {
|
|
440
534
|
VALUE str = rb_str_buf_new(capa);
|
|
441
535
|
rb_enc_associate(str, binary_encoding);
|
|
442
|
-
if (capa > 0)
|
|
443
|
-
rb_str_modify_expand(str, capa + 1);
|
|
444
536
|
return str;
|
|
445
537
|
}
|
|
446
538
|
|
|
@@ -527,10 +619,16 @@ static inline void enforce_output_and_ratio_limits(size_t total_output, size_t t
|
|
|
527
619
|
}
|
|
528
620
|
|
|
529
621
|
static VALUE current_fiber_scheduler(void) {
|
|
622
|
+
#ifdef HAVE_RUBY_FIBER_SCHEDULER_H
|
|
530
623
|
VALUE sched = rb_fiber_scheduler_current();
|
|
624
|
+
|
|
531
625
|
if (sched == Qnil || sched == Qfalse)
|
|
532
626
|
return Qnil;
|
|
627
|
+
|
|
533
628
|
return sched;
|
|
629
|
+
#else
|
|
630
|
+
return Qnil;
|
|
631
|
+
#endif
|
|
534
632
|
}
|
|
535
633
|
|
|
536
634
|
static int has_fiber_scheduler(void) {
|
|
@@ -689,13 +787,19 @@ static inline size_t fiber_maybe_yield(size_t bytes_since_yield, size_t just_pro
|
|
|
689
787
|
return bytes_since_yield;
|
|
690
788
|
}
|
|
691
789
|
|
|
692
|
-
#define
|
|
693
|
-
|
|
790
|
+
#define DICT_ZSTD_MIN_LEVEL 1
|
|
791
|
+
#define DICT_ZSTD_MAX_LEVEL 22
|
|
792
|
+
#define DICT_CDICT_CACHE_SIZE (DICT_ZSTD_MAX_LEVEL + 1)
|
|
793
|
+
_Static_assert(DICT_CDICT_CACHE_SIZE > DICT_ZSTD_MAX_LEVEL,
|
|
794
|
+
"CDict cache needs one slot for every accepted zstd level");
|
|
694
795
|
|
|
695
|
-
|
|
696
|
-
|
|
697
|
-
|
|
698
|
-
|
|
796
|
+
#if defined(__GNUC__) || defined(__clang__)
|
|
797
|
+
#define MC_HAS_ATOMIC_PTR 1
|
|
798
|
+
#define MC_ATOMIC_LOAD_PTR(ptr) __atomic_load_n((ptr), __ATOMIC_ACQUIRE)
|
|
799
|
+
#define MC_ATOMIC_STORE_PTR(ptr, val) __atomic_store_n((ptr), (val), __ATOMIC_RELEASE)
|
|
800
|
+
#else
|
|
801
|
+
#define MC_HAS_ATOMIC_PTR 0
|
|
802
|
+
#endif
|
|
699
803
|
|
|
700
804
|
struct dictionary_s {
|
|
701
805
|
compress_algo_t algo;
|
|
@@ -703,8 +807,7 @@ struct dictionary_s {
|
|
|
703
807
|
size_t size;
|
|
704
808
|
pthread_mutex_t cache_mutex;
|
|
705
809
|
|
|
706
|
-
|
|
707
|
-
int cdict_cache_count;
|
|
810
|
+
ZSTD_CDict *cdict_cache[DICT_CDICT_CACHE_SIZE];
|
|
708
811
|
|
|
709
812
|
ZSTD_DDict *ddict;
|
|
710
813
|
};
|
|
@@ -713,9 +816,9 @@ static void dict_free(void *ptr) {
|
|
|
713
816
|
dictionary_t *dict = (dictionary_t *)ptr;
|
|
714
817
|
if (!dict)
|
|
715
818
|
return;
|
|
716
|
-
for (int i =
|
|
717
|
-
if (dict->cdict_cache[i]
|
|
718
|
-
ZSTD_freeCDict(dict->cdict_cache[i]
|
|
819
|
+
for (int i = DICT_ZSTD_MIN_LEVEL; i <= DICT_ZSTD_MAX_LEVEL; i++) {
|
|
820
|
+
if (dict->cdict_cache[i])
|
|
821
|
+
ZSTD_freeCDict(dict->cdict_cache[i]);
|
|
719
822
|
}
|
|
720
823
|
if (dict->ddict)
|
|
721
824
|
ZSTD_freeDDict(dict->ddict);
|
|
@@ -732,9 +835,9 @@ static size_t dict_memsize(const void *ptr) {
|
|
|
732
835
|
|
|
733
836
|
size_t total = sizeof(dictionary_t) + d->size;
|
|
734
837
|
if (d->algo == ALGO_ZSTD) {
|
|
735
|
-
for (int i =
|
|
736
|
-
if (d->cdict_cache[i]
|
|
737
|
-
total += ZSTD_sizeof_CDict(d->cdict_cache[i]
|
|
838
|
+
for (int i = DICT_ZSTD_MIN_LEVEL; i <= DICT_ZSTD_MAX_LEVEL; i++) {
|
|
839
|
+
if (d->cdict_cache[i])
|
|
840
|
+
total += ZSTD_sizeof_CDict(d->cdict_cache[i]);
|
|
738
841
|
}
|
|
739
842
|
if (d->ddict)
|
|
740
843
|
total += ZSTD_sizeof_DDict(d->ddict);
|
|
@@ -756,17 +859,18 @@ static VALUE dict_alloc(VALUE klass) {
|
|
|
756
859
|
}
|
|
757
860
|
|
|
758
861
|
static ZSTD_CDict *dict_get_cdict(dictionary_t *dict, int level) {
|
|
759
|
-
|
|
862
|
+
if (MC_UNLIKELY(level < DICT_ZSTD_MIN_LEVEL || level > DICT_ZSTD_MAX_LEVEL))
|
|
863
|
+
rb_raise(eLevelError, "zstd level must be %d..%d, got %d", DICT_ZSTD_MIN_LEVEL,
|
|
864
|
+
DICT_ZSTD_MAX_LEVEL, level);
|
|
760
865
|
|
|
866
|
+
ZSTD_CDict *existing;
|
|
867
|
+
#if MC_HAS_ATOMIC_PTR
|
|
868
|
+
existing = MC_ATOMIC_LOAD_PTR(&dict->cdict_cache[level]);
|
|
869
|
+
#else
|
|
761
870
|
pthread_mutex_lock(&dict->cache_mutex);
|
|
762
|
-
|
|
763
|
-
if (dict->cdict_cache[i].level == level) {
|
|
764
|
-
existing = dict->cdict_cache[i].cdict;
|
|
765
|
-
break;
|
|
766
|
-
}
|
|
767
|
-
}
|
|
871
|
+
existing = dict->cdict_cache[level];
|
|
768
872
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
769
|
-
|
|
873
|
+
#endif
|
|
770
874
|
if (existing)
|
|
771
875
|
return existing;
|
|
772
876
|
|
|
@@ -775,34 +879,31 @@ static ZSTD_CDict *dict_get_cdict(dictionary_t *dict, int level) {
|
|
|
775
879
|
return NULL;
|
|
776
880
|
|
|
777
881
|
pthread_mutex_lock(&dict->cache_mutex);
|
|
778
|
-
|
|
779
|
-
|
|
780
|
-
|
|
781
|
-
|
|
782
|
-
|
|
783
|
-
|
|
784
|
-
|
|
785
|
-
}
|
|
786
|
-
|
|
787
|
-
if (dict->cdict_cache_count >= DICT_CDICT_CACHE_SIZE) {
|
|
882
|
+
existing = dict->cdict_cache[level];
|
|
883
|
+
if (!existing) {
|
|
884
|
+
#if MC_HAS_ATOMIC_PTR
|
|
885
|
+
MC_ATOMIC_STORE_PTR(&dict->cdict_cache[level], cdict);
|
|
886
|
+
#else
|
|
887
|
+
dict->cdict_cache[level] = cdict;
|
|
888
|
+
#endif
|
|
788
889
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
789
|
-
|
|
790
|
-
rb_raise(eError, "zstd dictionary cdict cache exhausted");
|
|
890
|
+
return cdict;
|
|
791
891
|
}
|
|
792
892
|
|
|
793
|
-
dict->cdict_cache[dict->cdict_cache_count].level = level;
|
|
794
|
-
dict->cdict_cache[dict->cdict_cache_count].cdict = cdict;
|
|
795
|
-
dict->cdict_cache_count++;
|
|
796
893
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
797
|
-
|
|
894
|
+
ZSTD_freeCDict(cdict);
|
|
895
|
+
return existing;
|
|
798
896
|
}
|
|
799
897
|
|
|
800
898
|
static ZSTD_DDict *dict_get_ddict(dictionary_t *dict) {
|
|
801
899
|
ZSTD_DDict *existing;
|
|
802
|
-
|
|
900
|
+
#if MC_HAS_ATOMIC_PTR
|
|
901
|
+
existing = MC_ATOMIC_LOAD_PTR(&dict->ddict);
|
|
902
|
+
#else
|
|
803
903
|
pthread_mutex_lock(&dict->cache_mutex);
|
|
804
904
|
existing = dict->ddict;
|
|
805
905
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
906
|
+
#endif
|
|
806
907
|
if (existing)
|
|
807
908
|
return existing;
|
|
808
909
|
|
|
@@ -811,12 +912,17 @@ static ZSTD_DDict *dict_get_ddict(dictionary_t *dict) {
|
|
|
811
912
|
return NULL;
|
|
812
913
|
|
|
813
914
|
pthread_mutex_lock(&dict->cache_mutex);
|
|
814
|
-
|
|
915
|
+
existing = dict->ddict;
|
|
916
|
+
if (!existing) {
|
|
917
|
+
#if MC_HAS_ATOMIC_PTR
|
|
918
|
+
MC_ATOMIC_STORE_PTR(&dict->ddict, created);
|
|
919
|
+
#else
|
|
815
920
|
dict->ddict = created;
|
|
921
|
+
#endif
|
|
816
922
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
817
923
|
return created;
|
|
818
924
|
}
|
|
819
|
-
|
|
925
|
+
|
|
820
926
|
pthread_mutex_unlock(&dict->cache_mutex);
|
|
821
927
|
ZSTD_freeDDict(created);
|
|
822
928
|
return existing;
|
|
@@ -835,19 +941,8 @@ typedef struct {
|
|
|
835
941
|
|
|
836
942
|
static void *zstd_compress_nogvl(void *arg) {
|
|
837
943
|
zstd_compress_args_t *a = (zstd_compress_args_t *)arg;
|
|
838
|
-
|
|
839
|
-
|
|
840
|
-
if (!cctx) {
|
|
841
|
-
a->error = 1;
|
|
842
|
-
return NULL;
|
|
843
|
-
}
|
|
844
|
-
a->result =
|
|
845
|
-
ZSTD_compress_usingCDict(cctx, a->dst, a->dst_cap, a->src, a->src_len, a->cdict);
|
|
846
|
-
ZSTD_freeCCtx(cctx);
|
|
847
|
-
} else {
|
|
848
|
-
a->result = ZSTD_compress(a->dst, a->dst_cap, a->src, a->src_len, a->level);
|
|
849
|
-
}
|
|
850
|
-
a->error = 0;
|
|
944
|
+
a->result =
|
|
945
|
+
zstd_compress_cached(a->dst, a->dst_cap, a->src, a->src_len, a->level, a->cdict, &a->error);
|
|
851
946
|
return NULL;
|
|
852
947
|
}
|
|
853
948
|
|
|
@@ -863,19 +958,7 @@ typedef struct {
|
|
|
863
958
|
|
|
864
959
|
static void *zstd_decompress_nogvl(void *arg) {
|
|
865
960
|
zstd_decompress_args_t *a = (zstd_decompress_args_t *)arg;
|
|
866
|
-
|
|
867
|
-
ZSTD_DCtx *dctx = ZSTD_createDCtx();
|
|
868
|
-
if (!dctx) {
|
|
869
|
-
a->error = 1;
|
|
870
|
-
return NULL;
|
|
871
|
-
}
|
|
872
|
-
a->result =
|
|
873
|
-
ZSTD_decompress_usingDDict(dctx, a->dst, a->dst_cap, a->src, a->src_len, a->ddict);
|
|
874
|
-
ZSTD_freeDCtx(dctx);
|
|
875
|
-
} else {
|
|
876
|
-
a->result = ZSTD_decompress(a->dst, a->dst_cap, a->src, a->src_len);
|
|
877
|
-
}
|
|
878
|
-
a->error = 0;
|
|
961
|
+
a->result = zstd_decompress_cached(a->dst, a->dst_cap, a->src, a->src_len, a->ddict, &a->error);
|
|
879
962
|
return NULL;
|
|
880
963
|
}
|
|
881
964
|
|
|
@@ -1168,18 +1251,8 @@ static void *brotli_decompress_stream_fiber_nogvl(void *arg) {
|
|
|
1168
1251
|
|
|
1169
1252
|
static void *zstd_fiber_compress_nogvl(void *arg) {
|
|
1170
1253
|
zstd_fiber_compress_t *a = (zstd_fiber_compress_t *)arg;
|
|
1171
|
-
|
|
1172
|
-
|
|
1173
|
-
if (!cctx) {
|
|
1174
|
-
a->error = 1;
|
|
1175
|
-
return NULL;
|
|
1176
|
-
}
|
|
1177
|
-
a->result =
|
|
1178
|
-
ZSTD_compress_usingCDict(cctx, a->dst, a->dst_cap, a->src, a->src_len, a->cdict);
|
|
1179
|
-
ZSTD_freeCCtx(cctx);
|
|
1180
|
-
} else {
|
|
1181
|
-
a->result = ZSTD_compress(a->dst, a->dst_cap, a->src, a->src_len, a->level);
|
|
1182
|
-
}
|
|
1254
|
+
a->result =
|
|
1255
|
+
zstd_compress_cached(a->dst, a->dst_cap, a->src, a->src_len, a->level, a->cdict, &a->error);
|
|
1183
1256
|
return NULL;
|
|
1184
1257
|
}
|
|
1185
1258
|
|
|
@@ -1226,16 +1299,11 @@ static VALUE compress_compress(int argc, VALUE *argv, VALUE self) {
|
|
|
1226
1299
|
|
|
1227
1300
|
if (slen < policy->gvl_unlock_threshold) {
|
|
1228
1301
|
VALUE dst = rb_binary_str_buf_reserve(bound);
|
|
1229
|
-
|
|
1230
|
-
|
|
1231
|
-
|
|
1232
|
-
|
|
1233
|
-
|
|
1234
|
-
csize = ZSTD_compress_usingCDict(cctx, RSTRING_PTR(dst), bound, src, slen, cdict);
|
|
1235
|
-
ZSTD_freeCCtx(cctx);
|
|
1236
|
-
} else {
|
|
1237
|
-
csize = ZSTD_compress(RSTRING_PTR(dst), bound, src, slen, level);
|
|
1238
|
-
}
|
|
1302
|
+
int ctx_error = 0;
|
|
1303
|
+
size_t csize =
|
|
1304
|
+
zstd_compress_cached(RSTRING_PTR(dst), bound, src, slen, level, cdict, &ctx_error);
|
|
1305
|
+
if (ctx_error)
|
|
1306
|
+
rb_raise(eMemError, "zstd: failed to create context");
|
|
1239
1307
|
if (ZSTD_isError(csize))
|
|
1240
1308
|
rb_raise(eError, "zstd compress: %s", ZSTD_getErrorName(csize));
|
|
1241
1309
|
rb_str_set_len(dst, (long)csize);
|
|
@@ -1416,8 +1484,18 @@ static VALUE compress_compress(int argc, VALUE *argv, VALUE self) {
|
|
|
1416
1484
|
rb_raise(eMemError, "brotli: failed to prepare dictionary");
|
|
1417
1485
|
}
|
|
1418
1486
|
|
|
1419
|
-
if (!BrotliEncoderSetParameter(enc, BROTLI_PARAM_QUALITY, level)
|
|
1420
|
-
|
|
1487
|
+
if (!BrotliEncoderSetParameter(enc, BROTLI_PARAM_QUALITY, level)) {
|
|
1488
|
+
BrotliEncoderDestroyPreparedDictionary(pd);
|
|
1489
|
+
BrotliEncoderDestroyInstance(enc);
|
|
1490
|
+
rb_raise(eError, "brotli: failed to set quality parameter");
|
|
1491
|
+
}
|
|
1492
|
+
if (!BrotliEncoderSetParameter(enc, BROTLI_PARAM_SIZE_HINT,
|
|
1493
|
+
slen > UINT32_MAX ? UINT32_MAX : (uint32_t)slen)) {
|
|
1494
|
+
BrotliEncoderDestroyPreparedDictionary(pd);
|
|
1495
|
+
BrotliEncoderDestroyInstance(enc);
|
|
1496
|
+
rb_raise(eError, "brotli: failed to set size hint parameter");
|
|
1497
|
+
}
|
|
1498
|
+
if (!BrotliEncoderAttachPreparedDictionary(enc, pd)) {
|
|
1421
1499
|
BrotliEncoderDestroyPreparedDictionary(pd);
|
|
1422
1500
|
BrotliEncoderDestroyInstance(enc);
|
|
1423
1501
|
rb_raise(eError, "brotli: failed to attach dictionary");
|
|
@@ -1576,20 +1654,19 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1576
1654
|
} else {
|
|
1577
1655
|
VALUE dst = rb_binary_str_buf_reserve((size_t)frame_size);
|
|
1578
1656
|
|
|
1657
|
+
ZSTD_DDict *ddict = NULL;
|
|
1579
1658
|
if (dict) {
|
|
1580
|
-
|
|
1659
|
+
ddict = dict_get_ddict(dict);
|
|
1581
1660
|
if (!ddict)
|
|
1582
1661
|
rb_raise(eMemError, "zstd: failed to create ddict");
|
|
1583
|
-
ZSTD_DCtx *dctx = ZSTD_createDCtx();
|
|
1584
|
-
if (!dctx)
|
|
1585
|
-
rb_raise(eMemError, "zstd: failed to create dctx");
|
|
1586
|
-
dsize = ZSTD_decompress_usingDDict(dctx, RSTRING_PTR(dst), (size_t)frame_size,
|
|
1587
|
-
src, slen, ddict);
|
|
1588
|
-
ZSTD_freeDCtx(dctx);
|
|
1589
|
-
} else {
|
|
1590
|
-
dsize = ZSTD_decompress(RSTRING_PTR(dst), (size_t)frame_size, src, slen);
|
|
1591
1662
|
}
|
|
1592
1663
|
|
|
1664
|
+
int ctx_error = 0;
|
|
1665
|
+
dsize = zstd_decompress_cached(RSTRING_PTR(dst), (size_t)frame_size, src, slen,
|
|
1666
|
+
ddict, &ctx_error);
|
|
1667
|
+
if (ctx_error)
|
|
1668
|
+
rb_raise(eMemError, "zstd: failed to create dctx");
|
|
1669
|
+
|
|
1593
1670
|
if (ZSTD_isError(dsize))
|
|
1594
1671
|
rb_raise(eDataError, "zstd decompress: %s", ZSTD_getErrorName(dsize));
|
|
1595
1672
|
enforce_output_and_ratio_limits(dsize, slen, limits.max_output_size,
|
|
@@ -1601,16 +1678,21 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1601
1678
|
}
|
|
1602
1679
|
}
|
|
1603
1680
|
|
|
1604
|
-
ZSTD_DCtx *dctx =
|
|
1681
|
+
ZSTD_DCtx *dctx = zstd_tls_get_dctx();
|
|
1605
1682
|
if (!dctx)
|
|
1606
1683
|
rb_raise(eMemError, "zstd: failed to create dctx");
|
|
1607
1684
|
|
|
1685
|
+
{
|
|
1686
|
+
size_t r = ZSTD_DCtx_reset(dctx, ZSTD_reset_session_and_parameters);
|
|
1687
|
+
if (ZSTD_isError(r))
|
|
1688
|
+
rb_raise(eError, "zstd dctx reset: %s", ZSTD_getErrorName(r));
|
|
1689
|
+
}
|
|
1690
|
+
|
|
1608
1691
|
if (dict) {
|
|
1609
1692
|
ZSTD_DDict *ddict = dict_get_ddict(dict);
|
|
1610
1693
|
if (ddict) {
|
|
1611
1694
|
size_t r = ZSTD_DCtx_refDDict(dctx, ddict);
|
|
1612
1695
|
if (ZSTD_isError(r)) {
|
|
1613
|
-
ZSTD_freeDCtx(dctx);
|
|
1614
1696
|
rb_raise(eError, "zstd dict ref: %s", ZSTD_getErrorName(r));
|
|
1615
1697
|
}
|
|
1616
1698
|
}
|
|
@@ -1629,7 +1711,6 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1629
1711
|
while (input.pos < input.size) {
|
|
1630
1712
|
if (total_out >= alloc_size) {
|
|
1631
1713
|
if (alloc_size >= limits.max_output_size) {
|
|
1632
|
-
ZSTD_freeDCtx(dctx);
|
|
1633
1714
|
rb_raise(eDataError, "decompressed output exceeds limit (%zu bytes)",
|
|
1634
1715
|
limits.max_output_size);
|
|
1635
1716
|
}
|
|
@@ -1642,7 +1723,6 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1642
1723
|
|
|
1643
1724
|
size_t remaining_budget = limits.max_output_size - total_out;
|
|
1644
1725
|
if (remaining_budget == 0) {
|
|
1645
|
-
ZSTD_freeDCtx(dctx);
|
|
1646
1726
|
rb_raise(eDataError, "decompressed output exceeds limit (%zu bytes)",
|
|
1647
1727
|
limits.max_output_size);
|
|
1648
1728
|
}
|
|
@@ -1654,7 +1734,6 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1654
1734
|
ZSTD_outBuffer output = {RSTRING_PTR(dst) + total_out, out_cap, 0};
|
|
1655
1735
|
size_t ret = ZSTD_decompressStream(dctx, &output, &input);
|
|
1656
1736
|
if (ZSTD_isError(ret)) {
|
|
1657
|
-
ZSTD_freeDCtx(dctx);
|
|
1658
1737
|
rb_raise(eDataError, "zstd decompress: %s", ZSTD_getErrorName(ret));
|
|
1659
1738
|
}
|
|
1660
1739
|
total_out = checked_add_size(total_out, output.pos,
|
|
@@ -1665,7 +1744,6 @@ static VALUE compress_decompress(int argc, VALUE *argv, VALUE self) {
|
|
|
1665
1744
|
break;
|
|
1666
1745
|
}
|
|
1667
1746
|
|
|
1668
|
-
ZSTD_freeDCtx(dctx);
|
|
1669
1747
|
rb_str_set_len(dst, total_out);
|
|
1670
1748
|
RB_GC_GUARD(data);
|
|
1671
1749
|
RB_GC_GUARD(dict_val);
|
|
@@ -1875,7 +1953,7 @@ static void crc32_init_tables(void) {
|
|
|
1875
1953
|
for (uint32_t i = 0; i < 256; i++) {
|
|
1876
1954
|
uint32_t crc = i;
|
|
1877
1955
|
for (int j = 0; j < 8; j++) {
|
|
1878
|
-
crc = (crc >> 1) ^ (
|
|
1956
|
+
crc = (crc >> 1) ^ (0xEDB88320u & (0u - (crc & 1u)));
|
|
1879
1957
|
}
|
|
1880
1958
|
crc32_tables[0][i] = crc;
|
|
1881
1959
|
}
|
|
@@ -2189,6 +2267,10 @@ static VALUE lz4_compress_ring_block(deflater_t *d) {
|
|
|
2189
2267
|
|
|
2190
2268
|
write_le_u32((uint8_t *)out, (uint32_t)src_size);
|
|
2191
2269
|
|
|
2270
|
+
/* Keep blocks independently decodable. Switching to LZ4_*_continue would
|
|
2271
|
+
* require a coordinated format/decoder change that preserves dictionaries
|
|
2272
|
+
* across blocks.
|
|
2273
|
+
*/
|
|
2192
2274
|
int csize;
|
|
2193
2275
|
if (d->level > 1) {
|
|
2194
2276
|
csize = LZ4_compress_HC(block_start, out + 8, src_size, bound, d->level);
|
|
@@ -2979,13 +3061,24 @@ static VALUE inflater_write(VALUE self, VALUE chunk) {
|
|
|
2979
3061
|
}
|
|
2980
3062
|
case ALGO_LZ4: {
|
|
2981
3063
|
size_t data_len = inf->lz4_buf.len - inf->lz4_buf.offset;
|
|
2982
|
-
size_t needed =
|
|
3064
|
+
size_t needed =
|
|
3065
|
+
checked_add_size(data_len, slen, "lz4 stream input buffer exceeds representable size");
|
|
2983
3066
|
|
|
2984
|
-
if (
|
|
2985
|
-
|
|
2986
|
-
|
|
2987
|
-
|
|
2988
|
-
|
|
3067
|
+
if (needed > inf->lz4_buf.cap) {
|
|
3068
|
+
size_t new_cap = needed > SIZE_MAX / 2 ? needed : needed * 2;
|
|
3069
|
+
if (inf->lz4_buf.offset > 0) {
|
|
3070
|
+
char *new_buf = ALLOC_N(char, new_cap);
|
|
3071
|
+
if (data_len > 0)
|
|
3072
|
+
memcpy(new_buf, inf->lz4_buf.buf + inf->lz4_buf.offset, data_len);
|
|
3073
|
+
xfree(inf->lz4_buf.buf);
|
|
3074
|
+
inf->lz4_buf.buf = new_buf;
|
|
3075
|
+
inf->lz4_buf.offset = 0;
|
|
3076
|
+
inf->lz4_buf.len = data_len;
|
|
3077
|
+
inf->lz4_buf.cap = new_cap;
|
|
3078
|
+
} else {
|
|
3079
|
+
REALLOC_N(inf->lz4_buf.buf, char, new_cap);
|
|
3080
|
+
inf->lz4_buf.cap = new_cap;
|
|
3081
|
+
}
|
|
2989
3082
|
} else if (inf->lz4_buf.offset > inf->lz4_buf.cap / 2) {
|
|
2990
3083
|
if (data_len > 0)
|
|
2991
3084
|
memmove(inf->lz4_buf.buf, inf->lz4_buf.buf + inf->lz4_buf.offset, data_len);
|
|
@@ -2993,11 +3086,6 @@ static VALUE inflater_write(VALUE self, VALUE chunk) {
|
|
|
2993
3086
|
inf->lz4_buf.len = data_len;
|
|
2994
3087
|
}
|
|
2995
3088
|
|
|
2996
|
-
needed = inf->lz4_buf.len + slen;
|
|
2997
|
-
if (needed > inf->lz4_buf.cap) {
|
|
2998
|
-
inf->lz4_buf.cap = needed * 2;
|
|
2999
|
-
REALLOC_N(inf->lz4_buf.buf, char, inf->lz4_buf.cap);
|
|
3000
|
-
}
|
|
3001
3089
|
memcpy(inf->lz4_buf.buf + inf->lz4_buf.len, src, slen);
|
|
3002
3090
|
inf->lz4_buf.len += slen;
|
|
3003
3091
|
|
data/lib/multi_compress.rb
CHANGED
|
@@ -97,17 +97,19 @@ module MultiCompress
|
|
|
97
97
|
end
|
|
98
98
|
|
|
99
99
|
def self.zstd(data, level: nil)
|
|
100
|
-
compress(data, algo: :zstd,
|
|
100
|
+
compress(data, algo: :zstd, level: level)
|
|
101
101
|
end
|
|
102
102
|
|
|
103
103
|
def self.lz4(data, level: nil, format: nil)
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
104
|
+
if format
|
|
105
|
+
compress(data, algo: :lz4, level: level, format: format)
|
|
106
|
+
else
|
|
107
|
+
compress(data, algo: :lz4, level: level)
|
|
108
|
+
end
|
|
107
109
|
end
|
|
108
110
|
|
|
109
111
|
def self.brotli(data, level: nil)
|
|
110
|
-
compress(data, algo: :brotli,
|
|
112
|
+
compress(data, algo: :brotli, level: level)
|
|
111
113
|
end
|
|
112
114
|
|
|
113
115
|
def self.decompress(data, **opts)
|
|
@@ -119,9 +121,11 @@ module MultiCompress
|
|
|
119
121
|
end
|
|
120
122
|
|
|
121
123
|
def self.lz4_decompress(data, format: nil)
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
124
|
+
if format
|
|
125
|
+
decompress(data, algo: :lz4, format: format)
|
|
126
|
+
else
|
|
127
|
+
decompress(data, algo: :lz4)
|
|
128
|
+
end
|
|
125
129
|
end
|
|
126
130
|
|
|
127
131
|
def self.brotli_decompress(data)
|
|
@@ -132,17 +136,13 @@ module MultiCompress
|
|
|
132
136
|
EXTENSION_MAP[File.extname(path).downcase]
|
|
133
137
|
end
|
|
134
138
|
|
|
135
|
-
def self.level_opts(level)
|
|
136
|
-
level ? { level: level } : {}
|
|
137
|
-
end
|
|
138
|
-
|
|
139
139
|
def self.resolved_one_shot_options(opts)
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
140
|
+
return opts.merge(max_output_size: config.max_output_size) unless opts.key?(:max_output_size)
|
|
141
|
+
|
|
142
|
+
opts
|
|
143
143
|
end
|
|
144
144
|
|
|
145
|
-
private_class_method :
|
|
145
|
+
private_class_method :resolved_one_shot_options
|
|
146
146
|
|
|
147
147
|
module InflaterDefaults
|
|
148
148
|
def initialize(*args, **opts)
|
|
@@ -248,6 +248,7 @@ module MultiCompress
|
|
|
248
248
|
|
|
249
249
|
class Reader
|
|
250
250
|
CHUNK_SIZE = 8192
|
|
251
|
+
BUFFER_COMPACT_THRESHOLD = 64 * 1024
|
|
251
252
|
|
|
252
253
|
def self.open(path_or_io, algo: nil, dictionary: nil, **opts, &block)
|
|
253
254
|
io, algo, owned = resolve_io(path_or_io, algo, mode: "rb")
|
|
@@ -264,12 +265,13 @@ module MultiCompress
|
|
|
264
265
|
end
|
|
265
266
|
|
|
266
267
|
def initialize(io, algo: nil, dictionary: nil, **opts)
|
|
267
|
-
@io
|
|
268
|
-
@inflater
|
|
269
|
-
@closed
|
|
270
|
-
@owned_io
|
|
271
|
-
@buffer
|
|
272
|
-
@
|
|
268
|
+
@io = io
|
|
269
|
+
@inflater = Inflater.new(algo: algo, dictionary: dictionary, **opts)
|
|
270
|
+
@closed = false
|
|
271
|
+
@owned_io = false
|
|
272
|
+
@buffer = +"".b
|
|
273
|
+
@buffer_pos = 0
|
|
274
|
+
@eof = false
|
|
273
275
|
end
|
|
274
276
|
|
|
275
277
|
def read(length = nil)
|
|
@@ -281,12 +283,12 @@ module MultiCompress
|
|
|
281
283
|
|
|
282
284
|
def gets(separator = "\n")
|
|
283
285
|
ensure_open!
|
|
284
|
-
return nil if @eof &&
|
|
286
|
+
return nil if @eof && buffer_empty?
|
|
285
287
|
|
|
286
|
-
fill_buffer_until {
|
|
288
|
+
fill_buffer_until { buffer_includes?(separator) }
|
|
287
289
|
|
|
288
|
-
return extract_line(separator) if
|
|
289
|
-
return consume_buffer unless
|
|
290
|
+
return extract_line(separator) if buffer_includes?(separator)
|
|
291
|
+
return consume_buffer unless buffer_empty?
|
|
290
292
|
|
|
291
293
|
nil
|
|
292
294
|
end
|
|
@@ -305,7 +307,7 @@ module MultiCompress
|
|
|
305
307
|
end
|
|
306
308
|
|
|
307
309
|
def eof?
|
|
308
|
-
@eof &&
|
|
310
|
+
@eof && buffer_empty?
|
|
309
311
|
end
|
|
310
312
|
|
|
311
313
|
def each_line
|
|
@@ -334,11 +336,44 @@ module MultiCompress
|
|
|
334
336
|
raise StreamError, "reader is closed" if @closed
|
|
335
337
|
end
|
|
336
338
|
|
|
339
|
+
def buffer_size
|
|
340
|
+
@buffer.bytesize - @buffer_pos
|
|
341
|
+
end
|
|
342
|
+
|
|
343
|
+
def buffer_empty?
|
|
344
|
+
@buffer_pos >= @buffer.bytesize
|
|
345
|
+
end
|
|
346
|
+
|
|
347
|
+
def buffer_append(data)
|
|
348
|
+
compact_buffer_if_needed
|
|
349
|
+
@buffer << data
|
|
350
|
+
end
|
|
351
|
+
|
|
352
|
+
def compact_buffer_if_needed
|
|
353
|
+
return if @buffer_pos == 0
|
|
354
|
+
|
|
355
|
+
total = @buffer.bytesize
|
|
356
|
+
return unless @buffer_pos >= BUFFER_COMPACT_THRESHOLD && @buffer_pos * 2 >= total
|
|
357
|
+
|
|
358
|
+
@buffer = @buffer.byteslice(@buffer_pos, total - @buffer_pos)
|
|
359
|
+
@buffer_pos = 0
|
|
360
|
+
end
|
|
361
|
+
|
|
362
|
+
def buffer_includes?(separator)
|
|
363
|
+
idx = @buffer.index(separator, @buffer_pos)
|
|
364
|
+
!idx.nil?
|
|
365
|
+
end
|
|
366
|
+
|
|
337
367
|
def read_all
|
|
338
|
-
return nil if @eof &&
|
|
368
|
+
return nil if @eof && buffer_empty?
|
|
339
369
|
|
|
340
|
-
result =
|
|
370
|
+
result = if buffer_empty?
|
|
371
|
+
+"".b
|
|
372
|
+
else
|
|
373
|
+
@buffer.byteslice(@buffer_pos, @buffer.bytesize - @buffer_pos) || +"".b
|
|
374
|
+
end
|
|
341
375
|
@buffer.clear
|
|
376
|
+
@buffer_pos = 0
|
|
342
377
|
|
|
343
378
|
until @eof
|
|
344
379
|
chunk = read_compressed_chunk
|
|
@@ -355,15 +390,16 @@ module MultiCompress
|
|
|
355
390
|
end
|
|
356
391
|
|
|
357
392
|
def read_exactly(length)
|
|
358
|
-
return nil if @eof &&
|
|
393
|
+
return nil if @eof && buffer_empty?
|
|
359
394
|
|
|
360
|
-
fill_buffer_until {
|
|
395
|
+
fill_buffer_until { buffer_size >= length }
|
|
361
396
|
|
|
362
|
-
if
|
|
363
|
-
result
|
|
364
|
-
@
|
|
397
|
+
if buffer_size >= length
|
|
398
|
+
result = @buffer.byteslice(@buffer_pos, length)
|
|
399
|
+
@buffer_pos += length
|
|
400
|
+
compact_buffer_if_needed
|
|
365
401
|
result
|
|
366
|
-
elsif
|
|
402
|
+
elsif !buffer_empty?
|
|
367
403
|
consume_buffer
|
|
368
404
|
end
|
|
369
405
|
end
|
|
@@ -376,20 +412,23 @@ module MultiCompress
|
|
|
376
412
|
break
|
|
377
413
|
end
|
|
378
414
|
decompressed = @inflater.write(chunk)
|
|
379
|
-
|
|
415
|
+
buffer_append(decompressed) if decompressed
|
|
380
416
|
end
|
|
381
417
|
end
|
|
382
418
|
|
|
383
419
|
def extract_line(separator)
|
|
384
|
-
idx = @buffer.index(separator)
|
|
385
|
-
|
|
386
|
-
|
|
420
|
+
idx = @buffer.index(separator, @buffer_pos)
|
|
421
|
+
end_pos = idx + separator.bytesize
|
|
422
|
+
result = @buffer.byteslice(@buffer_pos, end_pos - @buffer_pos)
|
|
423
|
+
@buffer_pos = end_pos
|
|
424
|
+
compact_buffer_if_needed
|
|
387
425
|
result
|
|
388
426
|
end
|
|
389
427
|
|
|
390
428
|
def consume_buffer
|
|
391
|
-
result
|
|
392
|
-
@buffer
|
|
429
|
+
result = @buffer.byteslice(@buffer_pos, @buffer.bytesize - @buffer_pos) || +"".b
|
|
430
|
+
@buffer.clear
|
|
431
|
+
@buffer_pos = 0
|
|
393
432
|
result
|
|
394
433
|
end
|
|
395
434
|
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: multi_compress
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.5
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Roman Haydarov
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-
|
|
11
|
+
date: 2026-07-01 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: bundler
|
|
@@ -336,14 +336,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
336
336
|
requirements:
|
|
337
337
|
- - ">="
|
|
338
338
|
- !ruby/object:Gem::Version
|
|
339
|
-
version:
|
|
339
|
+
version: 2.7.1
|
|
340
340
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
341
341
|
requirements:
|
|
342
342
|
- - ">="
|
|
343
343
|
- !ruby/object:Gem::Version
|
|
344
344
|
version: '0'
|
|
345
345
|
requirements: []
|
|
346
|
-
rubygems_version: 3.
|
|
346
|
+
rubygems_version: 3.4.22
|
|
347
347
|
signing_key:
|
|
348
348
|
specification_version: 4
|
|
349
349
|
summary: 'Modern fiber-friendly compression for Ruby: zstd, lz4, brotli in one gem'
|