cabriolet 0.1.2 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.adoc +700 -38
- data/lib/cabriolet/algorithm_factory.rb +250 -0
- data/lib/cabriolet/base_compressor.rb +206 -0
- data/lib/cabriolet/binary/bitstream.rb +154 -14
- data/lib/cabriolet/binary/bitstream_writer.rb +129 -17
- data/lib/cabriolet/binary/chm_structures.rb +2 -2
- data/lib/cabriolet/binary/hlp_structures.rb +258 -37
- data/lib/cabriolet/binary/lit_structures.rb +231 -65
- data/lib/cabriolet/binary/oab_structures.rb +17 -1
- data/lib/cabriolet/cab/command_handler.rb +226 -0
- data/lib/cabriolet/cab/compressor.rb +35 -43
- data/lib/cabriolet/cab/decompressor.rb +14 -19
- data/lib/cabriolet/cab/extractor.rb +140 -31
- data/lib/cabriolet/chm/command_handler.rb +227 -0
- data/lib/cabriolet/chm/compressor.rb +7 -3
- data/lib/cabriolet/chm/decompressor.rb +39 -21
- data/lib/cabriolet/chm/parser.rb +5 -2
- data/lib/cabriolet/cli/base_command_handler.rb +127 -0
- data/lib/cabriolet/cli/command_dispatcher.rb +140 -0
- data/lib/cabriolet/cli/command_registry.rb +83 -0
- data/lib/cabriolet/cli.rb +356 -607
- data/lib/cabriolet/compressors/base.rb +1 -1
- data/lib/cabriolet/compressors/lzx.rb +241 -54
- data/lib/cabriolet/compressors/mszip.rb +35 -3
- data/lib/cabriolet/compressors/quantum.rb +34 -45
- data/lib/cabriolet/decompressors/base.rb +1 -1
- data/lib/cabriolet/decompressors/lzss.rb +13 -3
- data/lib/cabriolet/decompressors/lzx.rb +70 -33
- data/lib/cabriolet/decompressors/mszip.rb +126 -39
- data/lib/cabriolet/decompressors/quantum.rb +3 -2
- data/lib/cabriolet/errors.rb +3 -0
- data/lib/cabriolet/file_entry.rb +156 -0
- data/lib/cabriolet/file_manager.rb +144 -0
- data/lib/cabriolet/hlp/command_handler.rb +282 -0
- data/lib/cabriolet/hlp/compressor.rb +28 -238
- data/lib/cabriolet/hlp/decompressor.rb +107 -147
- data/lib/cabriolet/hlp/parser.rb +52 -101
- data/lib/cabriolet/hlp/quickhelp/compression_stream.rb +138 -0
- data/lib/cabriolet/hlp/quickhelp/compressor.rb +626 -0
- data/lib/cabriolet/hlp/quickhelp/decompressor.rb +558 -0
- data/lib/cabriolet/hlp/quickhelp/huffman_stream.rb +74 -0
- data/lib/cabriolet/hlp/quickhelp/huffman_tree.rb +167 -0
- data/lib/cabriolet/hlp/quickhelp/parser.rb +274 -0
- data/lib/cabriolet/hlp/winhelp/btree_builder.rb +289 -0
- data/lib/cabriolet/hlp/winhelp/compressor.rb +400 -0
- data/lib/cabriolet/hlp/winhelp/decompressor.rb +192 -0
- data/lib/cabriolet/hlp/winhelp/parser.rb +484 -0
- data/lib/cabriolet/hlp/winhelp/zeck_lz77.rb +271 -0
- data/lib/cabriolet/huffman/tree.rb +85 -1
- data/lib/cabriolet/kwaj/command_handler.rb +213 -0
- data/lib/cabriolet/kwaj/compressor.rb +7 -3
- data/lib/cabriolet/kwaj/decompressor.rb +18 -12
- data/lib/cabriolet/lit/command_handler.rb +221 -0
- data/lib/cabriolet/lit/compressor.rb +633 -38
- data/lib/cabriolet/lit/decompressor.rb +518 -152
- data/lib/cabriolet/lit/parser.rb +670 -0
- data/lib/cabriolet/models/hlp_file.rb +130 -29
- data/lib/cabriolet/models/hlp_header.rb +105 -17
- data/lib/cabriolet/models/lit_header.rb +212 -25
- data/lib/cabriolet/models/szdd_header.rb +10 -2
- data/lib/cabriolet/models/winhelp_header.rb +127 -0
- data/lib/cabriolet/oab/command_handler.rb +257 -0
- data/lib/cabriolet/oab/compressor.rb +17 -8
- data/lib/cabriolet/oab/decompressor.rb +41 -10
- data/lib/cabriolet/offset_calculator.rb +81 -0
- data/lib/cabriolet/plugin.rb +233 -0
- data/lib/cabriolet/plugin_manager.rb +453 -0
- data/lib/cabriolet/plugin_validator.rb +422 -0
- data/lib/cabriolet/system/io_system.rb +3 -0
- data/lib/cabriolet/system/memory_handle.rb +17 -4
- data/lib/cabriolet/szdd/command_handler.rb +217 -0
- data/lib/cabriolet/szdd/compressor.rb +15 -11
- data/lib/cabriolet/szdd/decompressor.rb +18 -9
- data/lib/cabriolet/version.rb +1 -1
- data/lib/cabriolet.rb +67 -17
- metadata +33 -2
data/README.adoc
CHANGED
|
@@ -1,6 +1,4 @@
|
|
|
1
|
-
= Cabriolet
|
|
2
|
-
:toc: left
|
|
3
|
-
:toclevels: 3
|
|
1
|
+
= Cabriolet: Working with Microsoft Compression Formats in Pure Ruby
|
|
4
2
|
|
|
5
3
|
image:https://img.shields.io/gem/v/cabriolet.svg[RubyGems Version, link=https://rubygems.org/gems/cabriolet]
|
|
6
4
|
image:https://img.shields.io/github/license/omnizip/cabriolet.svg[License]
|
|
@@ -10,15 +8,88 @@ format files.
|
|
|
10
8
|
|
|
11
9
|
== Introduction
|
|
12
10
|
|
|
13
|
-
Cabriolet extracts and creates Microsoft
|
|
11
|
+
Cabriolet extracts and creates Microsoft compression files and related
|
|
14
12
|
compression formats using pure Ruby.
|
|
15
13
|
|
|
16
|
-
This gem
|
|
17
|
-
Microsoft compression formats for both extraction (decompression) and
|
|
18
|
-
(compression).
|
|
14
|
+
This gem aims to cover the features of libmspack and cabextract, implementing
|
|
15
|
+
all Microsoft compression formats for both extraction (decompression) and
|
|
16
|
+
creation (compression).
|
|
19
17
|
|
|
20
18
|
NOTE: No C extensions required, works on any platform where Ruby runs.
|
|
21
19
|
|
|
20
|
+
== Supported formats
|
|
21
|
+
|
|
22
|
+
Cabriolet provides complete bidirectional support (compression and
|
|
23
|
+
decompression) for seven Microsoft compression formats:
|
|
24
|
+
|
|
25
|
+
CAB (Microsoft Cabinet)::
|
|
26
|
+
Microsoft Cabinet files (.CAB) are archive files used extensively in Windows
|
|
27
|
+
software distribution, updates, and installations. They support multiple
|
|
28
|
+
compression algorithms (None, LZSS, MSZIP, LZX, Quantum), multi-part spanning,
|
|
29
|
+
and can store multiple files with full metadata preservation including
|
|
30
|
+
timestamps and attributes. Cabriolet provides complete CAB support including
|
|
31
|
+
multi-part cabinet sets, embedded cabinet search, and salvage mode for corrupted
|
|
32
|
+
files.
|
|
33
|
+
|
|
34
|
+
CHM (Compiled HTML Help)::
|
|
35
|
+
Compiled HTML Help files (.CHM) are Microsoft's compressed help file format used
|
|
36
|
+
in Windows applications since Windows 98. CHM files use an internal file system
|
|
37
|
+
to store HTML pages, images, stylesheets, and a full-text search index, all
|
|
38
|
+
compressed with LZX. Cabriolet can extract CHM contents to recreate the original
|
|
39
|
+
HTML documentation, and create new CHM files from HTML sources with proper
|
|
40
|
+
compression and indexing.
|
|
41
|
+
|
|
42
|
+
SZDD (Single-File LZSS)::
|
|
43
|
+
SZDD is Microsoft's single-file compression format used primarily in Windows
|
|
44
|
+
installation media and DOS utilities. Files compressed with SZDD typically have
|
|
45
|
+
the last character of their extension replaced with an underscore (e.g., .TX_
|
|
46
|
+
for .TXT). SZDD uses LZSS MODE_EXPAND compression with a 4KB sliding window.
|
|
47
|
+
Cabriolet supports both normal SZDD format and the QBasic variant, with
|
|
48
|
+
automatic filename reconstruction during extraction.
|
|
49
|
+
|
|
50
|
+
KWAJ (Installation File)::
|
|
51
|
+
KWAJ format (.KWJ) is used in Microsoft installation packages to compress
|
|
52
|
+
individual files. It supports multiple compression methods including
|
|
53
|
+
uncompressed storage, XOR encryption (0xFF), SZDD (LZSS), and MSZIP. KWAJ files
|
|
54
|
+
can embed the original filename and uncompressed size in the header. Cabriolet
|
|
55
|
+
provides full KWAJ support for all compression methods and can preserve or
|
|
56
|
+
reconstruct original filenames.
|
|
57
|
+
|
|
58
|
+
DOS Help (QuickHelp)::
|
|
59
|
+
QuickHelp (.HLP) is the DOS-based help file format used in Microsoft development
|
|
60
|
+
tools like QuickC, QuickBASIC, and early Visual C++. Identified by the signature
|
|
61
|
+
0x4C 0x4E ("LN"), QuickHelp files contain help topics compressed with optional
|
|
62
|
+
Huffman coding and LZSS MODE_MSHELP compression. Topics are organized with
|
|
63
|
+
context strings for navigation. Cabriolet fully supports creating and extracting
|
|
64
|
+
QuickHelp files with all compression options.
|
|
65
|
+
|
|
66
|
+
Windows Help (WinHelp)::
|
|
67
|
+
Windows Help (.HLP) is the help file format used in Windows 3.x through Windows
|
|
68
|
+
XP, distinct from DOS Help/QuickHelp. WinHelp files are identified by magic
|
|
69
|
+
numbers 0x35F3 (version 3.x) or 0x3F5F (version 4.x) and use an internal file
|
|
70
|
+
system containing |SYSTEM (metadata), |TOPIC (compressed help text), and
|
|
71
|
+
optionally B-tree indexes. Topics are compressed with Zeck LZ77, a custom LZ77
|
|
72
|
+
variant with 4KB sliding window and variable-length matches (3-271 bytes).
|
|
73
|
+
Cabriolet provides complete support for both WinHelp 3.x and 4.x formats with
|
|
74
|
+
bidirectional Zeck LZ77 compression.
|
|
75
|
+
|
|
76
|
+
LIT (Microsoft Reader eBooks)::
|
|
77
|
+
LIT is Microsoft's proprietary eBook format for the Microsoft Reader
|
|
78
|
+
application. LIT files use a complex internal structure with directory systems
|
|
79
|
+
(IFCM/AOLL), manifest with content type mappings, and NameList with UTF-16LE
|
|
80
|
+
encoding. Content is typically compressed with LZX. Cabriolet supports reading
|
|
81
|
+
and creating non-encrypted LIT files; DRM-protected (DES-encrypted) LIT files
|
|
82
|
+
are intentionally not supported as DRM circumvention is not a goal of this
|
|
83
|
+
project.
|
|
84
|
+
|
|
85
|
+
OAB (Offline Address Book)::
|
|
86
|
+
Offline Address Book files (.OAB) are used by Microsoft Outlook and Exchange
|
|
87
|
+
Server to provide offline access to address book data. OAB files are compressed
|
|
88
|
+
with LZX and support incremental updates through patch files that contain only
|
|
89
|
+
changes from a base version. Cabriolet can extract full OAB files, apply
|
|
90
|
+
incremental patches, create new OAB files, and generate incremental patches
|
|
91
|
+
between versions.
|
|
92
|
+
|
|
22
93
|
|
|
23
94
|
=== Features
|
|
24
95
|
|
|
@@ -49,7 +120,7 @@ NOTE: No C extensions required, works on any platform where Ruby runs.
|
|
|
49
120
|
** Metadata preservation (timestamps, attributes)
|
|
50
121
|
|
|
51
122
|
* **Pure Ruby** - No compilation needed, works everywhere
|
|
52
|
-
* **Comprehensive testing** -
|
|
123
|
+
* **Comprehensive testing** - 1,225 test examples, 0 failures
|
|
53
124
|
* **Complete CLI** - 30+ commands for all operations
|
|
54
125
|
|
|
55
126
|
=== Architecture
|
|
@@ -70,6 +141,190 @@ Application Layer (CLI/API)
|
|
|
70
141
|
|
|
71
142
|
For complete architecture, see link:ARCHITECTURE.md[Architecture Documentation].
|
|
72
143
|
|
|
144
|
+
== Comparison with libmspack
|
|
145
|
+
|
|
146
|
+
Cabriolet is a pure Ruby alternative to https://www.cabextract.org.uk/libmspack/[libmspack], the reference C implementation for Microsoft compression formats. This comparison helps you choose the right tool for your needs.
|
|
147
|
+
|
|
148
|
+
=== Feature Comparison
|
|
149
|
+
|
|
150
|
+
[cols="2,1,1,2"]
|
|
151
|
+
|===
|
|
152
|
+
|Feature |Cabriolet |libmspack |Notes
|
|
153
|
+
|
|
154
|
+
4+h|**Formats**
|
|
155
|
+
|
|
156
|
+
|CAB (Microsoft Cabinet)
|
|
157
|
+
|✅
|
|
158
|
+
|✅
|
|
159
|
+
|Both support all compression types
|
|
160
|
+
|
|
161
|
+
|CHM (Compiled HTML Help)
|
|
162
|
+
|✅
|
|
163
|
+
|✅
|
|
164
|
+
|Full bidirectional support
|
|
165
|
+
|
|
166
|
+
|SZDD (Single-file LZSS)
|
|
167
|
+
|✅
|
|
168
|
+
|✅
|
|
169
|
+
|Including QBasic variant
|
|
170
|
+
|
|
171
|
+
|KWAJ (Installation files)
|
|
172
|
+
|✅
|
|
173
|
+
|✅
|
|
174
|
+
|All compression methods
|
|
175
|
+
|
|
176
|
+
|HLP (Windows Help)
|
|
177
|
+
|✅
|
|
178
|
+
|❌
|
|
179
|
+
|Cabriolet-only: QuickHelp + WinHelp 3.x/4.x
|
|
180
|
+
|
|
181
|
+
|LIT (Microsoft Reader)
|
|
182
|
+
|✅
|
|
183
|
+
|✅
|
|
184
|
+
|Non-DRM files only
|
|
185
|
+
|
|
186
|
+
|OAB (Offline Address Book)
|
|
187
|
+
|✅
|
|
188
|
+
|✅
|
|
189
|
+
|Including incremental patches
|
|
190
|
+
|
|
191
|
+
4+h|**Compression Algorithms**
|
|
192
|
+
|
|
193
|
+
|None (uncompressed)
|
|
194
|
+
|✅
|
|
195
|
+
|✅
|
|
196
|
+
|
|
|
197
|
+
|
|
198
|
+
|LZSS (4KB window)
|
|
199
|
+
|✅
|
|
200
|
+
|✅
|
|
201
|
+
|3 modes: EXPAND, MSHELP, QBASIC
|
|
202
|
+
|
|
203
|
+
|MSZIP (DEFLATE)
|
|
204
|
+
|✅
|
|
205
|
+
|✅
|
|
206
|
+
|RFC 1951 compatible
|
|
207
|
+
|
|
208
|
+
|LZX (advanced)
|
|
209
|
+
|✅
|
|
210
|
+
|✅
|
|
211
|
+
|Intel E8 preprocessing, 32KB-2MB windows
|
|
212
|
+
|
|
213
|
+
|Quantum (arithmetic)
|
|
214
|
+
|✅
|
|
215
|
+
|✅
|
|
216
|
+
|Decompression production-ready
|
|
217
|
+
|
|
218
|
+
4+h|**Operations**
|
|
219
|
+
|
|
220
|
+
|Decompression
|
|
221
|
+
|✅
|
|
222
|
+
|✅
|
|
223
|
+
|
|
|
224
|
+
|
|
225
|
+
|Compression
|
|
226
|
+
|✅
|
|
227
|
+
|⚠️
|
|
228
|
+
|libmspack has limited compression support
|
|
229
|
+
|
|
230
|
+
|Multi-part cabinets
|
|
231
|
+
|✅
|
|
232
|
+
|✅
|
|
233
|
+
|Spanning and merging
|
|
234
|
+
|
|
235
|
+
|Embedded cabinet search
|
|
236
|
+
|✅
|
|
237
|
+
|✅
|
|
238
|
+
|
|
|
239
|
+
|
|
240
|
+
|Salvage mode
|
|
241
|
+
|✅
|
|
242
|
+
|✅
|
|
243
|
+
|Corrupted file recovery
|
|
244
|
+
|
|
245
|
+
|Checksum verification
|
|
246
|
+
|✅
|
|
247
|
+
|✅
|
|
248
|
+
|
|
|
249
|
+
|
|
250
|
+
4+h|**Platform & Integration**
|
|
251
|
+
|
|
252
|
+
|Pure Ruby / No compilation
|
|
253
|
+
|✅
|
|
254
|
+
|❌
|
|
255
|
+
|Cabriolet works everywhere Ruby runs
|
|
256
|
+
|
|
257
|
+
|C library performance
|
|
258
|
+
|❌
|
|
259
|
+
|✅
|
|
260
|
+
|libmspack is faster for large files
|
|
261
|
+
|
|
262
|
+
|Ruby native integration
|
|
263
|
+
|✅
|
|
264
|
+
|⚠️
|
|
265
|
+
|libmspack requires FFI bindings
|
|
266
|
+
|
|
267
|
+
|JRuby / TruffleRuby
|
|
268
|
+
|✅
|
|
269
|
+
|❌
|
|
270
|
+
|Cabriolet works on all Ruby implementations
|
|
271
|
+
|
|
272
|
+
|Windows native
|
|
273
|
+
|✅
|
|
274
|
+
|⚠️
|
|
275
|
+
|libmspack needs compilation on Windows
|
|
276
|
+
|===
|
|
277
|
+
|
|
278
|
+
=== When to Use Cabriolet
|
|
279
|
+
|
|
280
|
+
* **Pure Ruby environment** - No compilation or native dependencies needed
|
|
281
|
+
* **Cross-platform deployment** - Works identically on Linux, macOS, Windows
|
|
282
|
+
* **Alternative Ruby implementations** - JRuby, TruffleRuby, etc.
|
|
283
|
+
* **HLP file support** - Only Cabriolet supports Windows Help files
|
|
284
|
+
* **Compression support** - Full bidirectional support for all formats
|
|
285
|
+
* **Simplicity** - Single gem install, no system dependencies
|
|
286
|
+
|
|
287
|
+
=== When to Use libmspack
|
|
288
|
+
|
|
289
|
+
* **Maximum performance** - C implementation is faster for large files
|
|
290
|
+
* **Existing C/C++ codebase** - Native integration without Ruby
|
|
291
|
+
* **Memory-constrained environments** - Lower memory overhead
|
|
292
|
+
* **Battle-tested stability** - 20+ years of production use
|
|
293
|
+
|
|
294
|
+
=== Performance Comparison
|
|
295
|
+
|
|
296
|
+
[cols="1,1,1"]
|
|
297
|
+
|===
|
|
298
|
+
|Operation |Cabriolet |libmspack
|
|
299
|
+
|
|
300
|
+
|Small CAB (<1MB)
|
|
301
|
+
|~50ms
|
|
302
|
+
|~10ms
|
|
303
|
+
|
|
304
|
+
|Large CAB (100MB)
|
|
305
|
+
|~5s
|
|
306
|
+
|~1s
|
|
307
|
+
|
|
308
|
+
|CHM extraction
|
|
309
|
+
|~100ms
|
|
310
|
+
|~20ms
|
|
311
|
+
|
|
312
|
+
|Memory usage
|
|
313
|
+
|Higher
|
|
314
|
+
|Lower
|
|
315
|
+
|===
|
|
316
|
+
|
|
317
|
+
NOTE: Performance varies by file content and compression type. For most applications, Cabriolet's performance is adequate. Use libmspack via https://github.com/davispuh/ruby-libmspack[FFI bindings] if raw speed is critical.
|
|
318
|
+
|
|
319
|
+
=== libmspack Compatibility
|
|
320
|
+
|
|
321
|
+
Cabriolet maintains **100% compatibility** with libmspack's behavior through extensive parity testing:
|
|
322
|
+
|
|
323
|
+
* **73 libmspack parity tests** - All passing
|
|
324
|
+
* **Identical output** - MD5-verified extraction results
|
|
325
|
+
* **Same error handling** - Compatible error conditions
|
|
326
|
+
* **CVE coverage** - Tests for known vulnerabilities (CVE-2014-9732, CVE-2015-4467, etc.)
|
|
327
|
+
|
|
73
328
|
== Installation
|
|
74
329
|
|
|
75
330
|
Add to your Gemfile:
|
|
@@ -321,20 +576,33 @@ cabriolet kwaj-info setup.kwj
|
|
|
321
576
|
|
|
322
577
|
==== HLP (Windows Help) operations
|
|
323
578
|
|
|
324
|
-
|
|
579
|
+
Cabriolet supports both HLP format variants:
|
|
580
|
+
|
|
581
|
+
* **QuickHelp** - DOS-based format (0x4C 0x4E signature)
|
|
582
|
+
* **Windows Help** - Windows 3.x/4.x format (0x35F3/0x3F5F signatures)
|
|
583
|
+
|
|
584
|
+
===== Extract HLP file (auto-detects format)
|
|
325
585
|
|
|
326
586
|
[source,shell]
|
|
327
587
|
----
|
|
328
588
|
cabriolet hlp-extract help.hlp output/
|
|
329
589
|
----
|
|
330
590
|
|
|
331
|
-
===== Create
|
|
591
|
+
===== Create QuickHelp file
|
|
332
592
|
|
|
333
593
|
[source,shell]
|
|
334
594
|
----
|
|
335
595
|
cabriolet hlp-create output.hlp topic1.txt topic2.txt
|
|
336
596
|
----
|
|
337
597
|
|
|
598
|
+
===== Create Windows Help file (3.x or 4.x)
|
|
599
|
+
|
|
600
|
+
[source,shell]
|
|
601
|
+
----
|
|
602
|
+
cabriolet hlp-create output.hlp topic1.txt topic2.txt --format winhelp3
|
|
603
|
+
cabriolet hlp-create output.hlp topic1.txt topic2.txt --format winhelp4
|
|
604
|
+
----
|
|
605
|
+
|
|
338
606
|
===== Show HLP information
|
|
339
607
|
|
|
340
608
|
[source,shell]
|
|
@@ -664,35 +932,84 @@ bytes = compressor.compress("file.exe", "file.kwj",
|
|
|
664
932
|
|
|
665
933
|
==== HLP (Windows Help) operations
|
|
666
934
|
|
|
667
|
-
===== Extract HLP file
|
|
935
|
+
===== Extract HLP file (auto-detects format)
|
|
668
936
|
|
|
669
937
|
[source,ruby]
|
|
670
938
|
----
|
|
939
|
+
# Works with both QuickHelp and Windows Help formats
|
|
671
940
|
decompressor = Cabriolet::HLP::Decompressor.new
|
|
672
|
-
|
|
941
|
+
header = decompressor.open("help.hlp")
|
|
942
|
+
|
|
943
|
+
# Format is automatically detected
|
|
944
|
+
case header
|
|
945
|
+
when Cabriolet::Models::HLPHeader
|
|
946
|
+
puts "QuickHelp format (DOS)"
|
|
947
|
+
when Cabriolet::Models::WinHelpHeader
|
|
948
|
+
puts "Windows Help format (#{header.version_string})"
|
|
949
|
+
end
|
|
673
950
|
|
|
674
951
|
# Extract files
|
|
675
|
-
|
|
676
|
-
decompressor.extract_file(file, "output/#{file.filename}")
|
|
677
|
-
end
|
|
952
|
+
decompressor.extract_all(header, "output/")
|
|
678
953
|
----
|
|
679
954
|
|
|
680
|
-
===== Create
|
|
955
|
+
===== Create QuickHelp file
|
|
681
956
|
|
|
682
957
|
[source,ruby]
|
|
683
958
|
----
|
|
684
959
|
compressor = Cabriolet::HLP::Compressor.new
|
|
685
960
|
|
|
686
|
-
# Add
|
|
687
|
-
compressor.
|
|
688
|
-
compressor.
|
|
961
|
+
# Add topics
|
|
962
|
+
compressor.add_data("Topic 1 text", "topic1")
|
|
963
|
+
compressor.add_data("Topic 2 text", "topic2")
|
|
964
|
+
|
|
965
|
+
# Generate QuickHelp format (DOS)
|
|
966
|
+
bytes = compressor.generate("help.hlp",
|
|
967
|
+
database_name: "MyHelp",
|
|
968
|
+
control_character: 0x3A) # ':'
|
|
969
|
+
----
|
|
970
|
+
|
|
971
|
+
===== Create Windows Help file
|
|
689
972
|
|
|
690
|
-
|
|
691
|
-
|
|
973
|
+
[source,ruby]
|
|
974
|
+
----
|
|
975
|
+
# Create WinHelp 3.x format file
|
|
976
|
+
compressor = Cabriolet::HLP::WinHelp::Compressor.new
|
|
977
|
+
|
|
978
|
+
# Add system metadata
|
|
979
|
+
compressor.add_system_file(
|
|
980
|
+
title: "My Help File",
|
|
981
|
+
copyright: "Copyright 2025",
|
|
982
|
+
contents: "contents.hlp")
|
|
983
|
+
|
|
984
|
+
# Add topics (automatically compressed with Zeck LZ77)
|
|
985
|
+
compressor.add_topic_file(["Topic 1 text", "Topic 2 text"], compress: true)
|
|
986
|
+
|
|
987
|
+
# Generate WinHelp 3.x or 4.x
|
|
988
|
+
bytes = compressor.generate("help.hlp", version: :winhelp3)
|
|
989
|
+
# or version: :winhelp4 for WinHelp 4.x format
|
|
990
|
+
----
|
|
991
|
+
|
|
992
|
+
===== Extract Windows Help internal files
|
|
993
|
+
|
|
994
|
+
[source,ruby]
|
|
692
995
|
----
|
|
996
|
+
decompressor = Cabriolet::HLP::WinHelp::Decompressor.new("help.hlp")
|
|
997
|
+
header = decompressor.parse
|
|
693
998
|
|
|
694
|
-
|
|
695
|
-
|
|
999
|
+
# List internal files (|SYSTEM, |TOPIC, etc.)
|
|
1000
|
+
puts decompressor.internal_filenames
|
|
1001
|
+
|
|
1002
|
+
# Extract specific internal file
|
|
1003
|
+
system_data = decompressor.extract_system_file
|
|
1004
|
+
topic_data = decompressor.extract_topic_file
|
|
1005
|
+
|
|
1006
|
+
# Decompress topics
|
|
1007
|
+
if topic_data
|
|
1008
|
+
decompressed = decompressor.decompress_topic(topic_data, expected_size)
|
|
1009
|
+
end
|
|
1010
|
+
----
|
|
1011
|
+
|
|
1012
|
+
NOTE: Windows Help format has limited public documentation. Implementation is based on reverse engineering and the helpdeco project.
|
|
696
1013
|
|
|
697
1014
|
==== LIT (eBook) operations
|
|
698
1015
|
|
|
@@ -805,6 +1122,298 @@ custom_io = CustomIOSystem.new
|
|
|
805
1122
|
decompressor = Cabriolet::CAB::Decompressor.new(custom_io)
|
|
806
1123
|
----
|
|
807
1124
|
|
|
1125
|
+
=== Custom Algorithm Registration
|
|
1126
|
+
|
|
1127
|
+
Cabriolet allows you to register custom compression/decompression algorithms with the [`AlgorithmFactory`](lib/cabriolet/algorithm_factory.rb:1). This enables:
|
|
1128
|
+
|
|
1129
|
+
* **Custom implementations** of standard algorithms for optimization
|
|
1130
|
+
* **Experimental algorithms** for research and development
|
|
1131
|
+
* **Format-specific variations** of compression algorithms
|
|
1132
|
+
* **Testing environments** with isolated algorithm sets
|
|
1133
|
+
|
|
1134
|
+
==== Registering a Custom Algorithm
|
|
1135
|
+
|
|
1136
|
+
[source,ruby]
|
|
1137
|
+
----
|
|
1138
|
+
# Define your custom algorithm (must inherit from Base)
|
|
1139
|
+
class MyOptimizedLZX < Cabriolet::Decompressors::Base
|
|
1140
|
+
def decompress(input_size, output_size)
|
|
1141
|
+
# Your optimized implementation
|
|
1142
|
+
data = @input.read(input_size)
|
|
1143
|
+
# ... custom decompression logic
|
|
1144
|
+
@output.write(decompressed_data)
|
|
1145
|
+
output_size
|
|
1146
|
+
end
|
|
1147
|
+
end
|
|
1148
|
+
|
|
1149
|
+
# Register globally
|
|
1150
|
+
Cabriolet.algorithm_factory.register(
|
|
1151
|
+
:optimized_lzx,
|
|
1152
|
+
MyOptimizedLZX,
|
|
1153
|
+
category: :decompressor,
|
|
1154
|
+
priority: 10 # Higher priority = preferred over built-ins
|
|
1155
|
+
)
|
|
1156
|
+
|
|
1157
|
+
# Use in extraction (automatically uses your custom algorithm)
|
|
1158
|
+
decompressor = Cabriolet::CAB::Decompressor.new("archive.cab")
|
|
1159
|
+
# When extracting LZX folders, your algorithm will be used
|
|
1160
|
+
----
|
|
1161
|
+
|
|
1162
|
+
==== Per-Instance Custom Factory
|
|
1163
|
+
|
|
1164
|
+
For isolated testing or experimentation without affecting global state:
|
|
1165
|
+
|
|
1166
|
+
[source,ruby]
|
|
1167
|
+
----
|
|
1168
|
+
# Create custom factory without built-in algorithms
|
|
1169
|
+
custom_factory = Cabriolet::AlgorithmFactory.new(auto_register: false)
|
|
1170
|
+
|
|
1171
|
+
# Register only your algorithms
|
|
1172
|
+
custom_factory.register(:my_algo, MyAlgorithm, category: :decompressor)
|
|
1173
|
+
|
|
1174
|
+
# Create decompressor instances with custom factory
|
|
1175
|
+
# (Note: Not all format handlers currently support custom factories)
|
|
1176
|
+
decompressor = Cabriolet::CAB::Decompressor.new
|
|
1177
|
+
# Custom factory usage would be implemented by format handlers
|
|
1178
|
+
----
|
|
1179
|
+
|
|
1180
|
+
==== Replacing Built-in Algorithms
|
|
1181
|
+
|
|
1182
|
+
You can replace built-in algorithms with optimized versions:
|
|
1183
|
+
|
|
1184
|
+
[source,ruby]
|
|
1185
|
+
----
|
|
1186
|
+
# Unregister the built-in
|
|
1187
|
+
Cabriolet.algorithm_factory.unregister(:lzss, :decompressor)
|
|
1188
|
+
|
|
1189
|
+
# Register your optimized version
|
|
1190
|
+
Cabriolet.algorithm_factory.register(
|
|
1191
|
+
:lzss,
|
|
1192
|
+
MyOptimizedLZSS,
|
|
1193
|
+
category: :decompressor,
|
|
1194
|
+
priority: 10
|
|
1195
|
+
)
|
|
1196
|
+
|
|
1197
|
+
# All future LZSS decompression will use your implementation
|
|
1198
|
+
----
|
|
1199
|
+
|
|
1200
|
+
==== Format-Specific Algorithms
|
|
1201
|
+
|
|
1202
|
+
Register algorithms that only apply to specific formats:
|
|
1203
|
+
|
|
1204
|
+
[source,ruby]
|
|
1205
|
+
----
|
|
1206
|
+
# Register CAB-specific LZX variant
|
|
1207
|
+
Cabriolet.algorithm_factory.register(
|
|
1208
|
+
:cab_lzx,
|
|
1209
|
+
CABOptimizedLZX,
|
|
1210
|
+
category: :decompressor,
|
|
1211
|
+
format: :cab # Only used for CAB files
|
|
1212
|
+
)
|
|
1213
|
+
|
|
1214
|
+
# Register CHM-specific variant
|
|
1215
|
+
Cabriolet.algorithm_factory.register(
|
|
1216
|
+
:chm_lzx,
|
|
1217
|
+
CHMOptimizedLZX,
|
|
1218
|
+
category: :decompressor,
|
|
1219
|
+
format: :chm # Only used for CHM files
|
|
1220
|
+
)
|
|
1221
|
+
----
|
|
1222
|
+
|
|
1223
|
+
==== Algorithm Requirements
|
|
1224
|
+
|
|
1225
|
+
Custom algorithms must:
|
|
1226
|
+
|
|
1227
|
+
* **Inherit from the appropriate base class**:
|
|
1228
|
+
** `Cabriolet::Compressors::Base` for compressors
|
|
1229
|
+
** `Cabriolet::Decompressors::Base` for decompressors
|
|
1230
|
+
|
|
1231
|
+
* **Implement required methods**:
|
|
1232
|
+
** Decompressors: `decompress(input_size, output_size)`
|
|
1233
|
+
** Compressors: `compress()`
|
|
1234
|
+
|
|
1235
|
+
* **Use provided instance variables**:
|
|
1236
|
+
** `@input` - Input handle (read operations)
|
|
1237
|
+
** `@output` - Output handle (write operations)
|
|
1238
|
+
** `@io_system` - I/O system for operations
|
|
1239
|
+
** `@buffer_size` - Buffer size for operations
|
|
1240
|
+
|
|
1241
|
+
**Example custom decompressor**:
|
|
1242
|
+
|
|
1243
|
+
[source,ruby]
|
|
1244
|
+
----
|
|
1245
|
+
class CustomAlgorithm < Cabriolet::Decompressors::Base
|
|
1246
|
+
def decompress(input_size, output_size)
|
|
1247
|
+
# Read compressed data
|
|
1248
|
+
compressed = @input.read(input_size)
|
|
1249
|
+
|
|
1250
|
+
# Your decompression logic
|
|
1251
|
+
decompressed = my_decompress_logic(compressed)
|
|
1252
|
+
|
|
1253
|
+
# Write decompressed data
|
|
1254
|
+
@output.write(decompressed)
|
|
1255
|
+
|
|
1256
|
+
# Return bytes written
|
|
1257
|
+
decompressed.bytesize
|
|
1258
|
+
end
|
|
1259
|
+
|
|
1260
|
+
private
|
|
1261
|
+
|
|
1262
|
+
def my_decompress_logic(data)
|
|
1263
|
+
# Custom decompression implementation
|
|
1264
|
+
end
|
|
1265
|
+
end
|
|
1266
|
+
----
|
|
1267
|
+
|
|
1268
|
+
**Example custom compressor**:
|
|
1269
|
+
|
|
1270
|
+
[source,ruby]
|
|
1271
|
+
----
|
|
1272
|
+
class CustomCompressor < Cabriolet::Compressors::Base
|
|
1273
|
+
def compress
|
|
1274
|
+
# Read uncompressed data
|
|
1275
|
+
data = @input.read
|
|
1276
|
+
|
|
1277
|
+
# Your compression logic
|
|
1278
|
+
compressed = my_compress_logic(data)
|
|
1279
|
+
|
|
1280
|
+
# Write compressed data
|
|
1281
|
+
@output.write(compressed)
|
|
1282
|
+
|
|
1283
|
+
# Return bytes written
|
|
1284
|
+
compressed.bytesize
|
|
1285
|
+
end
|
|
1286
|
+
|
|
1287
|
+
private
|
|
1288
|
+
|
|
1289
|
+
def my_compress_logic(data)
|
|
1290
|
+
# Custom compression implementation
|
|
1291
|
+
end
|
|
1292
|
+
end
|
|
1293
|
+
----
|
|
1294
|
+
|
|
1295
|
+
==== Use Cases
|
|
1296
|
+
|
|
1297
|
+
**Performance optimization**::
|
|
1298
|
+
Replace built-in algorithms with platform-optimized versions (e.g., using native extensions for specific platforms)
|
|
1299
|
+
|
|
1300
|
+
**Research and development**::
|
|
1301
|
+
Test experimental compression algorithms without modifying the core library
|
|
1302
|
+
|
|
1303
|
+
**Format variations**::
|
|
1304
|
+
Implement format-specific optimizations or variations of standard algorithms
|
|
1305
|
+
|
|
1306
|
+
**Testing**::
|
|
1307
|
+
Create isolated test environments with mock or simplified algorithms
|
|
1308
|
+
|
|
1309
|
+
== Plugin Architecture
|
|
1310
|
+
|
|
1311
|
+
Cabriolet supports a powerful plugin system that enables easy distribution and loading of extensions.
|
|
1312
|
+
|
|
1313
|
+
=== Installing Plugins
|
|
1314
|
+
|
|
1315
|
+
Plugins are distributed as Ruby gems with the naming pattern `cabriolet-plugin-*`:
|
|
1316
|
+
|
|
1317
|
+
[source,bash]
|
|
1318
|
+
----
|
|
1319
|
+
gem install cabriolet-plugin-bzip2
|
|
1320
|
+
----
|
|
1321
|
+
|
|
1322
|
+
=== Loading Plugins
|
|
1323
|
+
|
|
1324
|
+
Plugins are automatically discovered from installed gems:
|
|
1325
|
+
|
|
1326
|
+
[source,ruby]
|
|
1327
|
+
----
|
|
1328
|
+
require 'cabriolet'
|
|
1329
|
+
|
|
1330
|
+
# Discover all installed plugins
|
|
1331
|
+
Cabriolet.plugin_manager.discover_plugins
|
|
1332
|
+
|
|
1333
|
+
# Load and activate a specific plugin
|
|
1334
|
+
Cabriolet.plugin_manager.load_plugin('bzip2')
|
|
1335
|
+
Cabriolet.plugin_manager.activate_plugin('bzip2')
|
|
1336
|
+
|
|
1337
|
+
# Or auto-activate all plugins
|
|
1338
|
+
Cabriolet.plugin_manager.auto_activate_plugins
|
|
1339
|
+
----
|
|
1340
|
+
|
|
1341
|
+
=== Listing Plugins
|
|
1342
|
+
|
|
1343
|
+
[source,ruby]
|
|
1344
|
+
----
|
|
1345
|
+
# List all plugins
|
|
1346
|
+
plugins = Cabriolet.plugin_manager.list_plugins
|
|
1347
|
+
|
|
1348
|
+
# List only active plugins
|
|
1349
|
+
active = Cabriolet.plugin_manager.list_plugins(state: :active)
|
|
1350
|
+
|
|
1351
|
+
# Check if a plugin is active
|
|
1352
|
+
if Cabriolet.plugin_manager.plugin_active?('bzip2')
|
|
1353
|
+
puts "BZip2 plugin is active"
|
|
1354
|
+
end
|
|
1355
|
+
----
|
|
1356
|
+
|
|
1357
|
+
=== Creating Plugins
|
|
1358
|
+
|
|
1359
|
+
To create your own plugin, see the example plugins:
|
|
1360
|
+
|
|
1361
|
+
- `examples/plugins/cabriolet-plugin-example/` - Simple ROT13 example
|
|
1362
|
+
- `examples/plugins/cabriolet-plugin-bzip2/` - Advanced BZip2 example
|
|
1363
|
+
|
|
1364
|
+
Basic plugin structure:
|
|
1365
|
+
|
|
1366
|
+
[source,ruby]
|
|
1367
|
+
----
|
|
1368
|
+
class MyPlugin < Cabriolet::Plugin
|
|
1369
|
+
def metadata
|
|
1370
|
+
{
|
|
1371
|
+
name: "my-plugin",
|
|
1372
|
+
version: "1.0.0",
|
|
1373
|
+
author: "Your Name",
|
|
1374
|
+
description: "My custom compression algorithm",
|
|
1375
|
+
cabriolet_version: "~> 0.1"
|
|
1376
|
+
}
|
|
1377
|
+
end
|
|
1378
|
+
|
|
1379
|
+
def setup
|
|
1380
|
+
# Register your algorithms
|
|
1381
|
+
register_algorithm(:my_algo, MyCompressor, category: :compressor)
|
|
1382
|
+
register_algorithm(:my_algo, MyDecompressor, category: :decompressor)
|
|
1383
|
+
end
|
|
1384
|
+
end
|
|
1385
|
+
----
|
|
1386
|
+
|
|
1387
|
+
=== Plugin Configuration
|
|
1388
|
+
|
|
1389
|
+
Configure plugins via `~/.cabriolet/plugins.yml`:
|
|
1390
|
+
|
|
1391
|
+
[source,yaml]
|
|
1392
|
+
----
|
|
1393
|
+
discovery:
|
|
1394
|
+
auto_discover: true
|
|
1395
|
+
auto_load: true
|
|
1396
|
+
auto_activate: true
|
|
1397
|
+
|
|
1398
|
+
plugins:
|
|
1399
|
+
bzip2:
|
|
1400
|
+
enabled: true
|
|
1401
|
+
config:
|
|
1402
|
+
compression_level: 9
|
|
1403
|
+
----
|
|
1404
|
+
|
|
1405
|
+
=== Plugin Safety
|
|
1406
|
+
|
|
1407
|
+
All plugins are validated before loading:
|
|
1408
|
+
|
|
1409
|
+
- ✓ Inheritance validation
|
|
1410
|
+
- ✓ Metadata validation
|
|
1411
|
+
- ✓ Version compatibility checking
|
|
1412
|
+
- ✓ Dependency resolution
|
|
1413
|
+
- ✓ Safety scanning
|
|
1414
|
+
|
|
1415
|
+
Failed plugins are isolated and don't affect Cabriolet or other plugins.
|
|
1416
|
+
|
|
808
1417
|
=== Error Handling
|
|
809
1418
|
|
|
810
1419
|
==== Common errors
|
|
@@ -1101,6 +1710,26 @@ bundle exec rubocop -A # Auto-correct
|
|
|
1101
1710
|
|
|
1102
1711
|
== Known limitations
|
|
1103
1712
|
|
|
1713
|
+
For complete details on known issues and workarounds, see
|
|
1714
|
+
link:KNOWN_ISSUES.md[Known Issues].
|
|
1715
|
+
|
|
1716
|
+
=== LZX Compression
|
|
1717
|
+
|
|
1718
|
+
LZX compression is **production ready** for most use cases:
|
|
1719
|
+
|
|
1720
|
+
* ✅ **CHM files**: 100% working, all features
|
|
1721
|
+
* ✅ **Single-folder CAB**: 100% working
|
|
1722
|
+
* ✅ **Decompression**: UNCOMPRESSED blocks fully supported
|
|
1723
|
+
* ✅ **Compression**: UNCOMPRESSED blocks fully supported
|
|
1724
|
+
* ⚠️ **Multi-folder CAB**: Files at non-zero offsets in second+ folders
|
|
1725
|
+
** Affects: <5% of CAB files
|
|
1726
|
+
** Workaround: Use salvage mode or extract folders separately
|
|
1727
|
+
** Status: Deferred to v0.2.0
|
|
1728
|
+
* ⚠️ **VERBATIM/ALIGNED blocks**: Compression needs implementation
|
|
1729
|
+
** Affects: Advanced CHM creation
|
|
1730
|
+
** Decompression: Working
|
|
1731
|
+
** Status: Planned for v0.2.0
|
|
1732
|
+
|
|
1104
1733
|
=== Quantum compression
|
|
1105
1734
|
|
|
1106
1735
|
Quantum compression is **functional but experimental**:
|
|
@@ -1122,10 +1751,54 @@ Quantum compression is **functional but experimental**:
|
|
|
1122
1751
|
|
|
1123
1752
|
=== HLP/LIT/OAB Formats
|
|
1124
1753
|
|
|
1125
|
-
*
|
|
1126
|
-
*
|
|
1127
|
-
|
|
1128
|
-
|
|
1754
|
+
* LIT format has no public specification (implementation based on libmspack)
|
|
1755
|
+
* HLP format supports both QuickHelp (DOS) and Windows Help (3.x/4.x)
|
|
1756
|
+
** QuickHelp format fully documented, production ready
|
|
1757
|
+
** Windows Help format based on reverse engineering, production ready
|
|
1758
|
+
* OAB format has limited documentation (implementation based on libmspack)
|
|
1759
|
+
* All formats are fully functional for basic operations
|
|
1760
|
+
* Edge cases for advanced features may exist
|
|
1761
|
+
|
|
1762
|
+
=== Not yet supported
|
|
1763
|
+
|
|
1764
|
+
The following features are documented as pending (64 specs total):
|
|
1765
|
+
|
|
1766
|
+
**Multi-file extraction** (6 specs):
|
|
1767
|
+
- MSZIP folders with multiple files
|
|
1768
|
+
- LZX folders with multiple files
|
|
1769
|
+
- Requires: State reuse implementation (4-6 hours)
|
|
1770
|
+
- Status: In progress for v0.1.0
|
|
1771
|
+
|
|
1772
|
+
**LZX VERBATIM/ALIGNED compression** (7 specs):
|
|
1773
|
+
- CHM round-trip compression
|
|
1774
|
+
- Optimal LZX compression
|
|
1775
|
+
- Decompression works, compression needs trees
|
|
1776
|
+
- Status: Deferred to v0.2.0
|
|
1777
|
+
|
|
1778
|
+
**Quantum edge cases** (22 specs):
|
|
1779
|
+
- Very long matches (14+ bytes)
|
|
1780
|
+
- Complex pattern encoding
|
|
1781
|
+
- Frame boundary cases
|
|
1782
|
+
- Note: Core functionality validated with libmspack, likely over-cautious
|
|
1783
|
+
- Status: Low priority, optional refinement
|
|
1784
|
+
|
|
1785
|
+
**LIT extraction tests** (4 specs):
|
|
1786
|
+
- Tests need adjustment for directory model
|
|
1787
|
+
- Parser works correctly
|
|
1788
|
+
- Status: Test refactoring needed (1-2 hours)
|
|
1789
|
+
|
|
1790
|
+
**QuickHelp real files** (4 specs):
|
|
1791
|
+
- Real file extraction tests
|
|
1792
|
+
- Fixture investigation needed
|
|
1793
|
+
- Status: Low priority
|
|
1794
|
+
|
|
1795
|
+
**Edge cases** (21 specs):
|
|
1796
|
+
- 1-byte search buffer
|
|
1797
|
+
- Various format-specific edge cases
|
|
1798
|
+
- Window size variations
|
|
1799
|
+
- Status: Low priority, optional enhancements
|
|
1800
|
+
|
|
1801
|
+
**Total pending**: 64 specs (5% of test suite)
|
|
1129
1802
|
|
|
1130
1803
|
|
|
1131
1804
|
== Troubleshooting
|
|
@@ -1151,17 +1824,6 @@ decompressor.fix_mszip = true
|
|
|
1151
1824
|
decompressor.salvage = true
|
|
1152
1825
|
----
|
|
1153
1826
|
|
|
1154
|
-
=== Performance issues
|
|
1155
|
-
|
|
1156
|
-
Problem:: Slow extraction
|
|
1157
|
-
|
|
1158
|
-
Solution:: Increase buffer size:
|
|
1159
|
-
|
|
1160
|
-
[source,ruby]
|
|
1161
|
-
----
|
|
1162
|
-
decompressor.buffer_size = 16384
|
|
1163
|
-
----
|
|
1164
|
-
|
|
1165
1827
|
|
|
1166
1828
|
== Specifications
|
|
1167
1829
|
|