deflate-ruby 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. checksums.yaml +7 -0
  2. data/CLAUDE.md +138 -0
  3. data/LICENSE.txt +21 -0
  4. data/README.md +117 -0
  5. data/ext/deflate_ruby/deflate_ruby.c +301 -0
  6. data/ext/deflate_ruby/extconf.rb +34 -0
  7. data/ext/deflate_ruby/libdeflate/CMakeLists.txt +270 -0
  8. data/ext/deflate_ruby/libdeflate/COPYING +22 -0
  9. data/ext/deflate_ruby/libdeflate/NEWS.md +494 -0
  10. data/ext/deflate_ruby/libdeflate/README.md +228 -0
  11. data/ext/deflate_ruby/libdeflate/common_defs.h +747 -0
  12. data/ext/deflate_ruby/libdeflate/lib/adler32.c +162 -0
  13. data/ext/deflate_ruby/libdeflate/lib/arm/adler32_impl.h +358 -0
  14. data/ext/deflate_ruby/libdeflate/lib/arm/cpu_features.c +230 -0
  15. data/ext/deflate_ruby/libdeflate/lib/arm/cpu_features.h +214 -0
  16. data/ext/deflate_ruby/libdeflate/lib/arm/crc32_impl.h +600 -0
  17. data/ext/deflate_ruby/libdeflate/lib/arm/crc32_pmull_helpers.h +156 -0
  18. data/ext/deflate_ruby/libdeflate/lib/arm/crc32_pmull_wide.h +226 -0
  19. data/ext/deflate_ruby/libdeflate/lib/arm/matchfinder_impl.h +78 -0
  20. data/ext/deflate_ruby/libdeflate/lib/bt_matchfinder.h +342 -0
  21. data/ext/deflate_ruby/libdeflate/lib/cpu_features_common.h +93 -0
  22. data/ext/deflate_ruby/libdeflate/lib/crc32.c +262 -0
  23. data/ext/deflate_ruby/libdeflate/lib/crc32_multipliers.h +377 -0
  24. data/ext/deflate_ruby/libdeflate/lib/crc32_tables.h +587 -0
  25. data/ext/deflate_ruby/libdeflate/lib/decompress_template.h +777 -0
  26. data/ext/deflate_ruby/libdeflate/lib/deflate_compress.c +4129 -0
  27. data/ext/deflate_ruby/libdeflate/lib/deflate_compress.h +15 -0
  28. data/ext/deflate_ruby/libdeflate/lib/deflate_constants.h +56 -0
  29. data/ext/deflate_ruby/libdeflate/lib/deflate_decompress.c +1208 -0
  30. data/ext/deflate_ruby/libdeflate/lib/gzip_compress.c +90 -0
  31. data/ext/deflate_ruby/libdeflate/lib/gzip_constants.h +45 -0
  32. data/ext/deflate_ruby/libdeflate/lib/gzip_decompress.c +144 -0
  33. data/ext/deflate_ruby/libdeflate/lib/hc_matchfinder.h +401 -0
  34. data/ext/deflate_ruby/libdeflate/lib/ht_matchfinder.h +234 -0
  35. data/ext/deflate_ruby/libdeflate/lib/lib_common.h +106 -0
  36. data/ext/deflate_ruby/libdeflate/lib/matchfinder_common.h +224 -0
  37. data/ext/deflate_ruby/libdeflate/lib/riscv/matchfinder_impl.h +97 -0
  38. data/ext/deflate_ruby/libdeflate/lib/utils.c +141 -0
  39. data/ext/deflate_ruby/libdeflate/lib/x86/adler32_impl.h +134 -0
  40. data/ext/deflate_ruby/libdeflate/lib/x86/adler32_template.h +518 -0
  41. data/ext/deflate_ruby/libdeflate/lib/x86/cpu_features.c +183 -0
  42. data/ext/deflate_ruby/libdeflate/lib/x86/cpu_features.h +169 -0
  43. data/ext/deflate_ruby/libdeflate/lib/x86/crc32_impl.h +160 -0
  44. data/ext/deflate_ruby/libdeflate/lib/x86/crc32_pclmul_template.h +495 -0
  45. data/ext/deflate_ruby/libdeflate/lib/x86/decompress_impl.h +57 -0
  46. data/ext/deflate_ruby/libdeflate/lib/x86/matchfinder_impl.h +122 -0
  47. data/ext/deflate_ruby/libdeflate/lib/zlib_compress.c +82 -0
  48. data/ext/deflate_ruby/libdeflate/lib/zlib_constants.h +21 -0
  49. data/ext/deflate_ruby/libdeflate/lib/zlib_decompress.c +104 -0
  50. data/ext/deflate_ruby/libdeflate/libdeflate-config.cmake.in +3 -0
  51. data/ext/deflate_ruby/libdeflate/libdeflate.h +411 -0
  52. data/ext/deflate_ruby/libdeflate/libdeflate.pc.in +18 -0
  53. data/ext/deflate_ruby/libdeflate/programs/CMakeLists.txt +105 -0
  54. data/ext/deflate_ruby/libdeflate/programs/benchmark.c +696 -0
  55. data/ext/deflate_ruby/libdeflate/programs/checksum.c +218 -0
  56. data/ext/deflate_ruby/libdeflate/programs/config.h.in +19 -0
  57. data/ext/deflate_ruby/libdeflate/programs/gzip.c +688 -0
  58. data/ext/deflate_ruby/libdeflate/programs/prog_util.c +521 -0
  59. data/ext/deflate_ruby/libdeflate/programs/prog_util.h +225 -0
  60. data/ext/deflate_ruby/libdeflate/programs/test_checksums.c +200 -0
  61. data/ext/deflate_ruby/libdeflate/programs/test_custom_malloc.c +155 -0
  62. data/ext/deflate_ruby/libdeflate/programs/test_incomplete_codes.c +385 -0
  63. data/ext/deflate_ruby/libdeflate/programs/test_invalid_streams.c +130 -0
  64. data/ext/deflate_ruby/libdeflate/programs/test_litrunlen_overflow.c +72 -0
  65. data/ext/deflate_ruby/libdeflate/programs/test_overread.c +95 -0
  66. data/ext/deflate_ruby/libdeflate/programs/test_slow_decompression.c +472 -0
  67. data/ext/deflate_ruby/libdeflate/programs/test_trailing_bytes.c +151 -0
  68. data/ext/deflate_ruby/libdeflate/programs/test_util.c +237 -0
  69. data/ext/deflate_ruby/libdeflate/programs/test_util.h +61 -0
  70. data/ext/deflate_ruby/libdeflate/programs/tgetopt.c +118 -0
  71. data/ext/deflate_ruby/libdeflate/scripts/android_build.sh +118 -0
  72. data/ext/deflate_ruby/libdeflate/scripts/android_tests.sh +69 -0
  73. data/ext/deflate_ruby/libdeflate/scripts/benchmark.sh +10 -0
  74. data/ext/deflate_ruby/libdeflate/scripts/checksum.sh +10 -0
  75. data/ext/deflate_ruby/libdeflate/scripts/checksum_benchmarks.sh +253 -0
  76. data/ext/deflate_ruby/libdeflate/scripts/cmake-helper.sh +17 -0
  77. data/ext/deflate_ruby/libdeflate/scripts/deflate_benchmarks.sh +119 -0
  78. data/ext/deflate_ruby/libdeflate/scripts/exec_tests.sh +38 -0
  79. data/ext/deflate_ruby/libdeflate/scripts/gen-release-archives.sh +37 -0
  80. data/ext/deflate_ruby/libdeflate/scripts/gen_bitreverse_tab.py +19 -0
  81. data/ext/deflate_ruby/libdeflate/scripts/gen_crc32_multipliers.c +199 -0
  82. data/ext/deflate_ruby/libdeflate/scripts/gen_crc32_tables.c +105 -0
  83. data/ext/deflate_ruby/libdeflate/scripts/gen_default_litlen_costs.py +44 -0
  84. data/ext/deflate_ruby/libdeflate/scripts/gen_offset_slot_map.py +29 -0
  85. data/ext/deflate_ruby/libdeflate/scripts/gzip_tests.sh +523 -0
  86. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/deflate_compress/corpus/0 +0 -0
  87. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/deflate_compress/fuzz.c +95 -0
  88. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/deflate_decompress/corpus/0 +3 -0
  89. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/deflate_decompress/fuzz.c +62 -0
  90. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/fuzz.sh +108 -0
  91. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/gzip_decompress/corpus/0 +0 -0
  92. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/gzip_decompress/fuzz.c +19 -0
  93. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/zlib_decompress/corpus/0 +3 -0
  94. data/ext/deflate_ruby/libdeflate/scripts/libFuzzer/zlib_decompress/fuzz.c +19 -0
  95. data/ext/deflate_ruby/libdeflate/scripts/run_tests.sh +416 -0
  96. data/ext/deflate_ruby/libdeflate/scripts/toolchain-i686-w64-mingw32.cmake +8 -0
  97. data/ext/deflate_ruby/libdeflate/scripts/toolchain-x86_64-w64-mingw32.cmake +8 -0
  98. data/lib/deflate_ruby/version.rb +5 -0
  99. data/lib/deflate_ruby.rb +71 -0
  100. metadata +191 -0
@@ -0,0 +1,228 @@
1
+ # Overview
2
+
3
+ libdeflate is a library for fast, whole-buffer DEFLATE-based compression and
4
+ decompression.
5
+
6
+ The supported formats are:
7
+
8
+ - DEFLATE (raw)
9
+ - zlib (a.k.a. DEFLATE with a zlib wrapper)
10
+ - gzip (a.k.a. DEFLATE with a gzip wrapper)
11
+
12
+ libdeflate is heavily optimized. It is significantly faster than the zlib
13
+ library, both for compression and decompression, and especially on x86 and ARM
14
+ processors. In addition, libdeflate provides optional high compression modes
15
+ that provide a better compression ratio than the zlib's "level 9".
16
+
17
+ libdeflate itself is a library. The following command-line programs which use
18
+ this library are also included:
19
+
20
+ * `libdeflate-gzip`, a program which can be a drop-in replacement for standard
21
+ `gzip` under some circumstances. Note that `libdeflate-gzip` has some
22
+ limitations; it is provided for convenience and is **not** meant to be the
23
+ main use case of libdeflate. It needs a lot of memory to process large files,
24
+ and it omits support for some infrequently-used options of GNU gzip.
25
+
26
+ * `benchmark`, a test program that does round-trip compression and decompression
27
+ of the provided data, and measures the compression and decompression speed.
28
+ It can use libdeflate, zlib, or a combination of the two.
29
+
30
+ * `checksum`, a test program that checksums the provided data with Adler-32 or
31
+ CRC-32, and optionally measures the speed. It can use libdeflate or zlib.
32
+
33
+ For the release notes, see the [NEWS file](NEWS.md).
34
+
35
+ ## Table of Contents
36
+
37
+ - [Building](#building)
38
+ - [Using CMake](#using-cmake)
39
+ - [Directly integrating the library sources](#directly-integrating-the-library-sources)
40
+ - [API](#api)
41
+ - [Bindings for other programming languages](#bindings-for-other-programming-languages)
42
+ - [DEFLATE vs. zlib vs. gzip](#deflate-vs-zlib-vs-gzip)
43
+ - [Compression levels](#compression-levels)
44
+ - [Motivation](#motivation)
45
+ - [License](#license)
46
+
47
+ # Building
48
+
49
+ ## Using CMake
50
+
51
+ libdeflate uses [CMake](https://cmake.org/). It can be built just like any
52
+ other CMake project, e.g. with:
53
+
54
+ cmake -B build && cmake --build build
55
+
56
+ By default the following targets are built:
57
+
58
+ - The static library (normally called `libdeflate.a`)
59
+ - The shared library (normally called `libdeflate.so`)
60
+ - The `libdeflate-gzip` program, including its alias `libdeflate-gunzip`
61
+
62
+ Besides the standard CMake build and installation options, there are some
63
+ libdeflate-specific build options. See `CMakeLists.txt` for the list of these
64
+ options. To set an option, add `-DOPTION=VALUE` to the `cmake` command.
65
+
66
+ Prebuilt Windows binaries can be downloaded from
67
+ https://github.com/ebiggers/libdeflate/releases.
68
+
69
+ ## Directly integrating the library sources
70
+
71
+ Although the official build system is CMake, care has been taken to keep the
72
+ library source files compilable directly, without a prerequisite configuration
73
+ step. Therefore, it is also fine to just add the library source files directly
74
+ to your application, without using CMake.
75
+
76
+ You should compile both `lib/*.c` and `lib/*/*.c`. You don't need to worry
77
+ about excluding irrelevant architecture-specific code, as this is already
78
+ handled in the source files themselves using `#ifdef`s.
79
+
80
+ If you are doing a freestanding build with `-ffreestanding`, you must add
81
+ `-DFREESTANDING` as well (matching what the `CMakeLists.txt` does).
82
+
83
+ ## Supported compilers
84
+
85
+ - gcc: v4.9 and later
86
+ - clang: v3.9 and later (upstream), Xcode 8 and later (Apple)
87
+ - MSVC: Visual Studio 2015 and later
88
+ - Other compilers: any other C99-compatible compiler should work, though if your
89
+ compiler pretends to be gcc, clang, or MSVC, it needs to be sufficiently
90
+ compatible with the compiler it pretends to be.
91
+
92
+ The above are the minimums, but using a newer compiler allows more of the
93
+ architecture-optimized code to be built. libdeflate is most heavily optimized
94
+ for gcc and clang, but MSVC is supported fairly well now too.
95
+
96
+ The recommended optimization flag is `-O2`, and the `CMakeLists.txt` sets this
97
+ for release builds. `-O3` is fine too, but often `-O2` actually gives better
98
+ results. It's unnecessary to add flags such as `-mavx2` or `/arch:AVX2`, though
99
+ you can do so if you want to. Most of the relevant optimized functions are
100
+ built regardless of such flags, and appropriate ones are selected at runtime.
101
+
102
+ # API
103
+
104
+ libdeflate has a simple API that is not zlib-compatible. You can create
105
+ compressors and decompressors and use them to compress or decompress buffers.
106
+ See libdeflate.h for details.
107
+
108
+ There is currently no support for streaming. This has been considered, but it
109
+ always significantly increases complexity and slows down fast paths.
110
+ Unfortunately, at this point it remains a future TODO. So: if your application
111
+ compresses data in "chunks", say, less than 1 MB in size, then libdeflate is a
112
+ great choice for you; that's what it's designed to do. This is perfect for
113
+ certain use cases such as transparent filesystem compression. But if your
114
+ application compresses large files as a single compressed stream, similarly to
115
+ the `gzip` program, then libdeflate isn't for you.
116
+
117
+ Note that with chunk-based compression, you generally should have the
118
+ uncompressed size of each chunk stored outside of the compressed data itself.
119
+ This enables you to allocate an output buffer of the correct size without
120
+ guessing. However, libdeflate's decompression routines do optionally provide
121
+ the actual number of output bytes in case you need it.
122
+
123
+ Windows developers: note that the calling convention of libdeflate.dll is
124
+ "cdecl". (libdeflate v1.4 through v1.12 used "stdcall" instead.)
125
+
126
+ # Bindings for other programming languages
127
+
128
+ The libdeflate project itself only provides a C library. If you need to use
129
+ libdeflate from a programming language other than C or C++, consider using the
130
+ following bindings:
131
+
132
+ * C#: [LibDeflate.NET](https://github.com/jzebedee/LibDeflate.NET)
133
+ * Go: [go-libdeflate](https://github.com/4kills/go-libdeflate)
134
+ * Java: [libdeflate-java](https://github.com/astei/libdeflate-java)
135
+ * Julia: [LibDeflate.jl](https://github.com/jakobnissen/LibDeflate.jl)
136
+ * Nim: [libdeflate-nim](https://github.com/gemesa/libdeflate-nim)
137
+ * Perl: [Gzip::Libdeflate](https://github.com/benkasminbullock/gzip-libdeflate)
138
+ * PHP: [ext-libdeflate](https://github.com/pmmp/ext-libdeflate)
139
+ * Python: [deflate](https://github.com/dcwatson/deflate)
140
+ * Ruby: [libdeflate-ruby](https://github.com/kaorimatz/libdeflate-ruby)
141
+ * Rust: [libdeflater](https://github.com/adamkewley/libdeflater)
142
+
143
+ Note: these are third-party projects which haven't necessarily been vetted by
144
+ the authors of libdeflate. Please direct all questions, bugs, and improvements
145
+ for these bindings to their authors.
146
+
147
+ Also, unfortunately many of these bindings bundle or pin an old version of
148
+ libdeflate. To avoid known issues in old versions and to improve performance,
149
+ before using any of these bindings please ensure that the bundled or pinned
150
+ version of libdeflate has been upgraded to the latest release.
151
+
152
+ # DEFLATE vs. zlib vs. gzip
153
+
154
+ The DEFLATE format ([rfc1951](https://www.ietf.org/rfc/rfc1951.txt)), the zlib
155
+ format ([rfc1950](https://www.ietf.org/rfc/rfc1950.txt)), and the gzip format
156
+ ([rfc1952](https://www.ietf.org/rfc/rfc1952.txt)) are commonly confused with
157
+ each other as well as with the [zlib software library](http://zlib.net), which
158
+ actually supports all three formats. libdeflate (this library) also supports
159
+ all three formats.
160
+
161
+ Briefly, DEFLATE is a raw compressed stream, whereas zlib and gzip are different
162
+ wrappers for this stream. Both zlib and gzip include checksums, but gzip can
163
+ include extra information such as the original filename. Generally, you should
164
+ choose a format as follows:
165
+
166
+ - If you are compressing whole files with no subdivisions, similar to the `gzip`
167
+ program, you probably should use the gzip format.
168
+ - Otherwise, if you don't need the features of the gzip header and footer but do
169
+ still want a checksum for corruption detection, you probably should use the
170
+ zlib format.
171
+ - Otherwise, you probably should use raw DEFLATE. This is ideal if you don't
172
+ need checksums, e.g. because they're simply not needed for your use case or
173
+ because you already compute your own checksums that are stored separately from
174
+ the compressed stream.
175
+
176
+ Note that gzip and zlib streams can be distinguished from each other based on
177
+ their starting bytes, but this is not necessarily true of raw DEFLATE streams.
178
+
179
+ # Compression levels
180
+
181
+ An often-underappreciated fact of compression formats such as DEFLATE is that
182
+ there are an enormous number of different ways that a given input could be
183
+ compressed. Different algorithms and different amounts of computation time will
184
+ result in different compression ratios, while remaining equally compatible with
185
+ the decompressor.
186
+
187
+ For this reason, the commonly used zlib library provides nine compression
188
+ levels. Level 1 is the fastest but provides the worst compression; level 9
189
+ provides the best compression but is the slowest. It defaults to level 6.
190
+ libdeflate uses this same design but is designed to improve on both zlib's
191
+ performance *and* compression ratio at every compression level. In addition,
192
+ libdeflate's levels go [up to 12](https://xkcd.com/670/) to make room for a
193
+ minimum-cost-path based algorithm (sometimes called "optimal parsing") that can
194
+ significantly improve on zlib's compression ratio.
195
+
196
+ If you are using DEFLATE (or zlib, or gzip) in your application, you should test
197
+ different levels to see which works best for your application.
198
+
199
+ # Motivation
200
+
201
+ Despite DEFLATE's widespread use mainly through the zlib library, in the
202
+ compression community this format from the early 1990s is often considered
203
+ obsolete. And in a few significant ways, it is.
204
+
205
+ So why implement DEFLATE at all, instead of focusing entirely on
206
+ bzip2/LZMA/xz/LZ4/LZX/ZSTD/Brotli/LZHAM/LZFSE/[insert cool new format here]?
207
+
208
+ To do something better, you need to understand what came before. And it turns
209
+ out that most ideas from DEFLATE are still relevant. Many of the newer formats
210
+ share a similar structure as DEFLATE, with different tweaks. The effects of
211
+ trivial but very useful tweaks, such as increasing the sliding window size, are
212
+ often confused with the effects of nontrivial but less useful tweaks. And
213
+ actually, many of these formats are similar enough that common algorithms and
214
+ optimizations (e.g. those dealing with LZ77 matchfinding) can be reused.
215
+
216
+ In addition, comparing compressors fairly is difficult because the performance
217
+ of a compressor depends heavily on optimizations which are not intrinsic to the
218
+ compression format itself. In this respect, the zlib library sometimes compares
219
+ poorly to certain newer code because zlib is not well optimized for modern
220
+ processors. libdeflate addresses this by providing an optimized DEFLATE
221
+ implementation which can be used for benchmarking purposes. And, of course,
222
+ real applications can use it as well.
223
+
224
+ # License
225
+
226
+ libdeflate is [MIT-licensed](COPYING).
227
+
228
+ I am not aware of any patents or patent applications relevant to libdeflate.