cabriolet 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. checksums.yaml +7 -0
  2. data/ARCHITECTURE.md +799 -0
  3. data/CHANGELOG.md +44 -0
  4. data/LICENSE +29 -0
  5. data/README.adoc +1207 -0
  6. data/exe/cabriolet +6 -0
  7. data/lib/cabriolet/auto.rb +173 -0
  8. data/lib/cabriolet/binary/bitstream.rb +148 -0
  9. data/lib/cabriolet/binary/bitstream_writer.rb +180 -0
  10. data/lib/cabriolet/binary/chm_structures.rb +213 -0
  11. data/lib/cabriolet/binary/hlp_structures.rb +66 -0
  12. data/lib/cabriolet/binary/kwaj_structures.rb +74 -0
  13. data/lib/cabriolet/binary/lit_structures.rb +107 -0
  14. data/lib/cabriolet/binary/oab_structures.rb +112 -0
  15. data/lib/cabriolet/binary/structures.rb +56 -0
  16. data/lib/cabriolet/binary/szdd_structures.rb +60 -0
  17. data/lib/cabriolet/cab/compressor.rb +382 -0
  18. data/lib/cabriolet/cab/decompressor.rb +510 -0
  19. data/lib/cabriolet/cab/extractor.rb +357 -0
  20. data/lib/cabriolet/cab/parser.rb +264 -0
  21. data/lib/cabriolet/chm/compressor.rb +513 -0
  22. data/lib/cabriolet/chm/decompressor.rb +436 -0
  23. data/lib/cabriolet/chm/parser.rb +254 -0
  24. data/lib/cabriolet/cli.rb +776 -0
  25. data/lib/cabriolet/compressors/base.rb +34 -0
  26. data/lib/cabriolet/compressors/lzss.rb +250 -0
  27. data/lib/cabriolet/compressors/lzx.rb +581 -0
  28. data/lib/cabriolet/compressors/mszip.rb +315 -0
  29. data/lib/cabriolet/compressors/quantum.rb +446 -0
  30. data/lib/cabriolet/constants.rb +75 -0
  31. data/lib/cabriolet/decompressors/base.rb +39 -0
  32. data/lib/cabriolet/decompressors/lzss.rb +138 -0
  33. data/lib/cabriolet/decompressors/lzx.rb +726 -0
  34. data/lib/cabriolet/decompressors/mszip.rb +390 -0
  35. data/lib/cabriolet/decompressors/none.rb +27 -0
  36. data/lib/cabriolet/decompressors/quantum.rb +456 -0
  37. data/lib/cabriolet/errors.rb +39 -0
  38. data/lib/cabriolet/format_detector.rb +156 -0
  39. data/lib/cabriolet/hlp/compressor.rb +272 -0
  40. data/lib/cabriolet/hlp/decompressor.rb +198 -0
  41. data/lib/cabriolet/hlp/parser.rb +131 -0
  42. data/lib/cabriolet/huffman/decoder.rb +79 -0
  43. data/lib/cabriolet/huffman/encoder.rb +108 -0
  44. data/lib/cabriolet/huffman/tree.rb +138 -0
  45. data/lib/cabriolet/kwaj/compressor.rb +479 -0
  46. data/lib/cabriolet/kwaj/decompressor.rb +237 -0
  47. data/lib/cabriolet/kwaj/parser.rb +183 -0
  48. data/lib/cabriolet/lit/compressor.rb +255 -0
  49. data/lib/cabriolet/lit/decompressor.rb +250 -0
  50. data/lib/cabriolet/models/cabinet.rb +81 -0
  51. data/lib/cabriolet/models/chm_file.rb +28 -0
  52. data/lib/cabriolet/models/chm_header.rb +67 -0
  53. data/lib/cabriolet/models/chm_section.rb +38 -0
  54. data/lib/cabriolet/models/file.rb +119 -0
  55. data/lib/cabriolet/models/folder.rb +102 -0
  56. data/lib/cabriolet/models/folder_data.rb +21 -0
  57. data/lib/cabriolet/models/hlp_file.rb +45 -0
  58. data/lib/cabriolet/models/hlp_header.rb +37 -0
  59. data/lib/cabriolet/models/kwaj_header.rb +98 -0
  60. data/lib/cabriolet/models/lit_header.rb +55 -0
  61. data/lib/cabriolet/models/oab_header.rb +95 -0
  62. data/lib/cabriolet/models/szdd_header.rb +72 -0
  63. data/lib/cabriolet/modifier.rb +326 -0
  64. data/lib/cabriolet/oab/compressor.rb +353 -0
  65. data/lib/cabriolet/oab/decompressor.rb +315 -0
  66. data/lib/cabriolet/parallel.rb +333 -0
  67. data/lib/cabriolet/repairer.rb +288 -0
  68. data/lib/cabriolet/streaming.rb +221 -0
  69. data/lib/cabriolet/system/file_handle.rb +107 -0
  70. data/lib/cabriolet/system/io_system.rb +87 -0
  71. data/lib/cabriolet/system/memory_handle.rb +105 -0
  72. data/lib/cabriolet/szdd/compressor.rb +217 -0
  73. data/lib/cabriolet/szdd/decompressor.rb +184 -0
  74. data/lib/cabriolet/szdd/parser.rb +127 -0
  75. data/lib/cabriolet/validator.rb +332 -0
  76. data/lib/cabriolet/version.rb +5 -0
  77. data/lib/cabriolet.rb +104 -0
  78. metadata +157 -0
data/README.adoc ADDED
@@ -0,0 +1,1207 @@
1
+ = Cabriolet
2
+ :toc: left
3
+ :toclevels: 3
4
+
5
+ image:https://img.shields.io/gem/v/cabriolet.svg[RubyGems Version, link=https://rubygems.org/gems/cabriolet]
6
+ image:https://img.shields.io/github/license/omnizip/cabriolet.svg[License]
7
+
8
+ Pure Ruby implementation for extracting and creating Microsoft compression
9
+ format files.
10
+
11
+ == Introduction
12
+
13
+ Cabriolet extracts and creates Microsoft Cabinet (.CAB) files and related
14
+ compression formats using pure Ruby.
15
+
16
+ This gem fully covers the features of libmspack and cabextract, implementing all
17
+ Microsoft compression formats for both extraction (decompression) and creation
18
+ (compression).
19
+
20
+ NOTE: No C extensions required, works on any platform where Ruby runs.
21
+
22
+
23
+ === Features
24
+
25
+ * **Full format support** for all 7 Microsoft compression formats
26
+ ** CAB (Microsoft Cabinet)
27
+ ** CHM (Compiled HTML Help)
28
+ ** SZDD (Single-file LZSS compression)
29
+ ** KWAJ (Installation file compression)
30
+ ** HLP (Windows Help)
31
+ ** LIT (Microsoft Reader eBooks)
32
+ ** OAB (Offline Address Book)
33
+
34
+ * **Bidirectional operations** (compress and decompress)
35
+ * **All compression algorithms**
36
+ ** None (uncompressed storage)
37
+ ** LZSS (4KB sliding window, 3 modes)
38
+ ** MSZIP (DEFLATE/RFC 1951)
39
+ ** LZX (advanced with Intel E8 preprocessing)
40
+ ** Quantum (adaptive arithmetic coding)
41
+
42
+ * **Advanced features**
43
+ ** Multi-part cabinet sets (spanning, merging)
44
+ ** Embedded cabinet search
45
+ ** Salvage mode for corrupted files
46
+ ** Custom I/O handlers
47
+ ** Progress callbacks
48
+ ** Checksum verification
49
+ ** Metadata preservation (timestamps, attributes)
50
+
51
+ * **Pure Ruby** - No compilation needed, works everywhere
52
+ * **Comprehensive testing** - 914 test examples, 0 failures
53
+ * **Complete CLI** - 30+ commands for all operations
54
+
55
+ === Architecture
56
+
57
+ .High-level architecture
58
+ [source]
59
+ ----
60
+ Application Layer (CLI/API)
61
+
62
+ Format Layer (CAB, CHM, SZDD, KWAJ, HLP, LIT, OAB)
63
+
64
+ Algorithm Layer (None, LZSS, MSZIP, LZX, Quantum)
65
+
66
+ Binary I/O Layer (BinData structures, Bitstreams)
67
+
68
+ System Layer (I/O abstraction, file/memory handles)
69
+ ----
70
+
71
+ For complete architecture, see link:ARCHITECTURE.md[Architecture Documentation].
72
+
73
+ == Installation
74
+
75
+ Add to your Gemfile:
76
+
77
+ [source,ruby]
78
+ ----
79
+ gem "cabriolet"
80
+ ----
81
+
82
+ Or install directly:
83
+
84
+ [source,shell]
85
+ ----
86
+ gem install cabriolet
87
+ ----
88
+
89
+ For detailed installation instructions, see
90
+ link:docs/getting-started/installation.adoc[Installation Guide].
91
+
92
+ == System requirements
93
+
94
+ * Ruby 2.7 or higher
95
+ * Operating Systems: Linux, macOS, Windows
96
+ * Dependencies: bindata (~> 2.5), thor (~> 1.3)
97
+
98
+ == Usage
99
+
100
+ === Command line interface
101
+
102
+ ==== CAB (Cabinet) operations
103
+
104
+ ===== List contents
105
+
106
+ [source,shell]
107
+ ----
108
+ cabriolet list example.cab
109
+ ----
110
+
111
+ .Example output
112
+ [example]
113
+ ====
114
+ [source]
115
+ ----
116
+ Cabinet: example.cab (Set ID: 12345, Index: 0)
117
+ Folders: 1, Files: 2
118
+ Files:
119
+ README.txt (1,234 bytes)
120
+ data.bin (45,678 bytes)
121
+ ----
122
+ ====
123
+
124
+ ===== Extract all files
125
+
126
+ [source,shell]
127
+ ----
128
+ cabriolet extract example.cab
129
+ ----
130
+
131
+ ===== Extract to specific directory
132
+
133
+ [source,shell]
134
+ ----
135
+ cabriolet extract example.cab --output /path/to/output
136
+ ----
137
+
138
+ ===== Test cabinet integrity
139
+
140
+ [source,shell]
141
+ ----
142
+ cabriolet test example.cab
143
+ ----
144
+
145
+ ===== Show detailed information
146
+
147
+ [source,shell]
148
+ ----
149
+ cabriolet info example.cab
150
+ ----
151
+
152
+ .Example output
153
+ [example]
154
+ ====
155
+ [source]
156
+ ----
157
+ Cabinet Information
158
+ ==================================================
159
+ Filename: example.cab
160
+ Set ID: 12345
161
+ Set Index: 0
162
+ Size: 100,000 bytes
163
+ Folders: 2
164
+ Files: 15
165
+
166
+ Folders:
167
+ [0] MSZIP (5 blocks)
168
+ [1] LZX (3 blocks)
169
+
170
+ Files:
171
+ README.txt
172
+ Size: 1,234 bytes
173
+ Modified: 2024-01-15 10:30:00
174
+ Attributes: archive
175
+ ...
176
+ ----
177
+ ====
178
+
179
+ ===== Search for embedded CABs
180
+
181
+ [source,shell]
182
+ ----
183
+ cabriolet search installer.exe --verbose
184
+ ----
185
+
186
+ .Example output
187
+ [example]
188
+ ====
189
+ [source]
190
+ ----
191
+ Cabinet found at offset 1024
192
+ Files: 50, Folders: 1
193
+ Cabinet found at offset 524288
194
+ Files: 20, Folders: 1
195
+
196
+ Total: 2 cabinet(s) found
197
+ ----
198
+ ====
199
+
200
+ ===== Create CAB file
201
+
202
+ [source,shell]
203
+ ----
204
+ cabriolet create output.cab file1.txt file2.txt
205
+ cabriolet create output.cab *.txt --compression mszip
206
+ cabriolet create output.cab files/ --compression lzx
207
+ ----
208
+
209
+ **Compression options**:
210
+
211
+ * `none` - Uncompressed storage
212
+ * `lzss` - LZSS compression (default for small files)
213
+ * `mszip` - MSZIP/DEFLATE compression (recommended)
214
+ * `lzx` - LZX compression (best ratio, slower)
215
+ * `quantum` - Quantum compression (experimental)
216
+
217
+ ==== CHM (HTML Help) operations
218
+
219
+ ===== List CHM contents
220
+
221
+ [source,shell]
222
+ ----
223
+ cabriolet chm-list help.chm
224
+ ----
225
+
226
+ ===== Extract CHM files
227
+
228
+ [source,shell]
229
+ ----
230
+ cabriolet chm-extract help.chm output/
231
+ ----
232
+
233
+ ===== Show CHM information
234
+
235
+ [source,shell]
236
+ ----
237
+ cabriolet chm-info help.chm
238
+ ----
239
+
240
+ ===== Create CHM file
241
+
242
+ [source,shell]
243
+ ----
244
+ cabriolet chm-create help.chm index.html page1.html page2.html
245
+ cabriolet chm-create help.chm docs/*.html --window-bits 16
246
+ ----
247
+
248
+ **Options**:
249
+
250
+ * `--window-bits` - LZX window size (15-21, default: 16)
251
+ * `--verbose` - Enable verbose output
252
+
253
+ ==== SZDD operations
254
+
255
+ ===== Expand SZDD file
256
+
257
+ [source,shell]
258
+ ----
259
+ cabriolet expand file.tx_
260
+ cabriolet expand file.tx_ output.txt
261
+ ----
262
+
263
+ ===== Compress to SZDD
264
+
265
+ [source,shell]
266
+ ----
267
+ cabriolet compress file.txt
268
+ cabriolet compress file.txt --missing-char t
269
+ cabriolet compress file.txt --format qbasic
270
+ ----
271
+
272
+ **Options**:
273
+
274
+ * `--missing-char` - Last character of original filename
275
+ * `--format` - Format type (`normal` or `qbasic`)
276
+
277
+ ===== Show SZDD information
278
+
279
+ [source,shell]
280
+ ----
281
+ cabriolet szdd-info file.tx_
282
+ ----
283
+
284
+ ==== KWAJ operations
285
+
286
+ ===== Extract KWAJ file
287
+
288
+ [source,shell]
289
+ ----
290
+ cabriolet kwaj-extract setup.kwj
291
+ cabriolet kwaj-extract setup.kwj output.exe
292
+ ----
293
+
294
+ ===== Compress to KWAJ
295
+
296
+ [source,shell]
297
+ ----
298
+ cabriolet kwaj-compress file.exe
299
+ cabriolet kwaj-compress file.exe --compression szdd --include-length
300
+ cabriolet kwaj-compress file.exe --filename original.exe
301
+ ----
302
+
303
+ **Compression options**:
304
+
305
+ * `none` - Uncompressed
306
+ * `xor` - XOR encryption (0xFF)
307
+ * `szdd` - LZSS compression (default)
308
+ * `mszip` - MSZIP compression
309
+
310
+ **Other options**:
311
+
312
+ * `--include-length` - Include uncompressed length in header
313
+ * `--filename` - Embed original filename
314
+
315
+ ===== Show KWAJ information
316
+
317
+ [source,shell]
318
+ ----
319
+ cabriolet kwaj-info setup.kwj
320
+ ----
321
+
322
+ ==== HLP (Windows Help) operations
323
+
324
+ ===== Extract HLP file
325
+
326
+ [source,shell]
327
+ ----
328
+ cabriolet hlp-extract help.hlp output/
329
+ ----
330
+
331
+ ===== Create HLP file
332
+
333
+ [source,shell]
334
+ ----
335
+ cabriolet hlp-create output.hlp topic1.txt topic2.txt
336
+ ----
337
+
338
+ ===== Show HLP information
339
+
340
+ [source,shell]
341
+ ----
342
+ cabriolet hlp-info help.hlp
343
+ ----
344
+
345
+ ==== LIT (eBook) operations
346
+
347
+ ===== Extract LIT file
348
+
349
+ [source,shell]
350
+ ----
351
+ cabriolet lit-extract book.lit output/
352
+ ----
353
+
354
+ NOTE: DES-encrypted (DRM-protected) LIT files are not supported. For encrypted
355
+ files, use Microsoft Reader or convert to another format first.
356
+
357
+ ===== Create LIT file
358
+
359
+ [source,shell]
360
+ ----
361
+ cabriolet lit-create book.lit chapter1.html chapter2.html
362
+ ----
363
+
364
+ ===== Show LIT information
365
+
366
+ [source,shell]
367
+ ----
368
+ cabriolet lit-info book.lit
369
+ ----
370
+
371
+ ==== OAB (Address Book) operations
372
+
373
+ ===== Extract OAB file
374
+
375
+ [source,shell]
376
+ ----
377
+ cabriolet oab-extract contacts.lzx output.oab
378
+ cabriolet oab-extract patch.lzx output.oab --base contacts.oab
379
+ ----
380
+
381
+ **Options**:
382
+
383
+ * `--base` - Base file for incremental patch application
384
+
385
+ ===== Create OAB file
386
+
387
+ [source,shell]
388
+ ----
389
+ cabriolet oab-create contacts.oab output.lzx
390
+ cabriolet oab-create new.oab patch.lzx --base old.oab
391
+ ----
392
+
393
+ **Options**:
394
+
395
+ * `--base` - Create incremental patch
396
+ * `--block-size` - LZX block size (default: 32768)
397
+
398
+ ===== Show OAB information
399
+
400
+ [source,shell]
401
+ ----
402
+ cabriolet oab-info contacts.lzx
403
+ ----
404
+
405
+ ==== Global Options
406
+
407
+ All commands support:
408
+
409
+ * `--verbose, -v` - Enable verbose output
410
+ * `--help, -h` - Show command help
411
+
412
+ === Ruby API
413
+
414
+ ==== CAB operations
415
+
416
+ ===== Basic extraction
417
+
418
+ [source,ruby]
419
+ ----
420
+ require "cabriolet"
421
+
422
+ # Open and extract
423
+ decompressor = Cabriolet::CAB::Decompressor.new
424
+ cabinet = decompressor.open("example.cab")
425
+
426
+ # List files
427
+ cabinet.files.each do |file|
428
+ puts "#{file.filename}: #{file.length} bytes"
429
+ end
430
+
431
+ # Extract single file
432
+ file = cabinet.files.first
433
+ decompressor.extract_file(file, "output.txt")
434
+
435
+ # Extract all files
436
+ decompressor.extract_all(cabinet, "output/")
437
+ ----
438
+
439
+ ===== Advanced extraction options
440
+
441
+ [source,ruby]
442
+ ----
443
+ decompressor = Cabriolet::CAB::Decompressor.new
444
+ decompressor.salvage = true # Enable salvage mode
445
+ decompressor.fix_mszip = true # Enable MSZIP error recovery
446
+ decompressor.buffer_size = 8192 # Set buffer size
447
+
448
+ cabinet = decompressor.open("example.cab")
449
+ decompressor.extract_all(cabinet, "output/")
450
+ ----
451
+
452
+ ===== Multi-part cabinets
453
+
454
+ [source,ruby]
455
+ ----
456
+ decompressor = Cabriolet::CAB::Decompressor.new
457
+
458
+ # Open first cabinet
459
+ cab1 = decompressor.open("disk1.cab")
460
+
461
+ # Open and append subsequent parts
462
+ cab2 = decompressor.open("disk2.cab")
463
+ decompressor.append(cab1, cab2)
464
+
465
+ cab3 = decompressor.open("disk3.cab")
466
+ decompressor.append(cab2, cab3)
467
+
468
+ # Extract from merged cabinet set
469
+ decompressor.extract_all(cab1, "output/")
470
+ ----
471
+
472
+ ===== Search for embedded cabinets
473
+
474
+ [source,ruby]
475
+ ----
476
+ decompressor = Cabriolet::CAB::Decompressor.new
477
+ cabinet = decompressor.search("installer.exe")
478
+
479
+ while cabinet
480
+ puts "Cabinet at offset #{cabinet.base_offset}"
481
+ puts " Files: #{cabinet.file_count}"
482
+
483
+ # Extract this cabinet
484
+ decompressor.extract_all(cabinet, "output_#{cabinet.base_offset}/")
485
+
486
+ # Move to next found cabinet
487
+ cabinet = cabinet.next
488
+ end
489
+ ----
490
+
491
+ ===== Create CAB file
492
+
493
+ [source,ruby]
494
+ ----
495
+ compressor = Cabriolet::CAB::Compressor.new
496
+
497
+ # Add files
498
+ compressor.add_file("README.txt")
499
+ compressor.add_file("data.bin", "custom/path.bin")
500
+
501
+ # Generate cabinet
502
+ bytes = compressor.generate("output.cab",
503
+ compression: :mszip,
504
+ set_id: 12345,
505
+ cabinet_index: 0)
506
+
507
+ puts "Created output.cab (#{bytes} bytes)"
508
+ ----
509
+
510
+ **Compression options**:
511
+
512
+ * `:none` - No compression
513
+ * `:lzss` - LZSS compression
514
+ * `:mszip` - MSZIP/DEFLATE compression (recommended)
515
+ * `:lzx` - LZX compression (best ratio)
516
+ * `:quantum` - Quantum compression (experimental)
517
+
518
+ ==== CHM operations
519
+
520
+ ===== Extract CHM files
521
+
522
+ [source,ruby]
523
+ ----
524
+ decompressor = Cabriolet::CHM::Decompressor.new
525
+ chm = decompressor.open("help.chm")
526
+
527
+ # List files
528
+ chm.files&.each do |file|
529
+ puts file.filename
530
+ end
531
+
532
+ # Extract single file
533
+ file = chm.files.first
534
+ decompressor.extract(file, "output.html") if file
535
+
536
+ # Extract all files
537
+ chm.files&.each do |file|
538
+ output_path = File.join("output", file.filename)
539
+ FileUtils.mkdir_p(File.dirname(output_path))
540
+ decompressor.extract(file, output_path)
541
+ end
542
+ ----
543
+
544
+ ===== Fast CHM parsing
545
+
546
+ [source,ruby]
547
+ ----
548
+ decompressor = Cabriolet::CHM::Decompressor.new
549
+
550
+ # Quick open (headers only, no file enumeration)
551
+ chm = decompressor.fast_open("help.chm")
552
+
553
+ # Find specific file quickly
554
+ file = Models::CHMFile.new
555
+ result = decompressor.fast_find(chm, "/index.html", file)
556
+
557
+ if file.length > 0
558
+ decompressor.extract(file, "index.html")
559
+ end
560
+ ----
561
+
562
+ ===== Create CHM file
563
+
564
+ [source,ruby]
565
+ ----
566
+ compressor = Cabriolet::CHM::Compressor.new
567
+
568
+ # Add files
569
+ compressor.add_file("index.html", "/index.html", section: :compressed)
570
+ compressor.add_file("image.png", "/images/image.png", section: :uncompressed)
571
+
572
+ # Generate CHM
573
+ bytes = compressor.generate("help.chm",
574
+ window_bits: 16,
575
+ language_id: 0x0409)
576
+
577
+ puts "Created help.chm (#{bytes} bytes)"
578
+ ----
579
+
580
+ **Options**:
581
+
582
+ * `window_bits` - LZX window size (15-21, default: 16)
583
+ * `language_id` - Language identifier (default: 0x0409 for English US)
584
+ * `timestamp` - Custom timestamp (default: current time)
585
+
586
+ ==== SZDD operations
587
+
588
+ ===== Expand SZDD file
589
+
590
+ [source,ruby]
591
+ ----
592
+ decompressor = Cabriolet::SZDD::Decompressor.new
593
+
594
+ # Open and get header
595
+ header = decompressor.open("file.tx_")
596
+
597
+ puts "Format: #{header.format_name}"
598
+ puts "Length: #{header.length} bytes"
599
+ puts "Missing char: #{header.missing_char}" if header.missing_char
600
+
601
+ # Extract
602
+ decompressor.extract(header, "file.txt")
603
+
604
+ # Or one-shot
605
+ decompressor.decompress("file.tx_", "file.txt")
606
+ ----
607
+
608
+ ===== Compress to SZDD
609
+
610
+ [source,ruby]
611
+ ----
612
+ compressor = Cabriolet::SZDD::Compressor.new
613
+
614
+ # Compress file
615
+ bytes = compressor.compress("file.txt", "file.tx_",
616
+ missing_char: "t",
617
+ format: :normal)
618
+
619
+ # Or compress data from memory
620
+ bytes = compressor.compress_data("Hello, world!", "output.tx_")
621
+ ----
622
+
623
+ **Format options**:
624
+
625
+ * `:normal` - Standard SZDD format (MS-DOS compatible)
626
+ * `:qbasic` - QBasic SZDD format
627
+
628
+ ==== KWAJ operations
629
+
630
+ ===== Extract KWAJ file
631
+
632
+ [source,ruby]
633
+ ----
634
+ decompressor = Cabriolet::KWAJ::Decompressor.new
635
+
636
+ # Open and get header
637
+ header = decompressor.open("setup.kwj")
638
+
639
+ puts "Compression: #{header.compression_name}"
640
+ puts "Length: #{header.length} bytes" if header.length
641
+ puts "Filename: #{header.filename}" if header.filename
642
+
643
+ # Extract
644
+ decompressor.extract(header, "setup.kwj", "output.exe")
645
+
646
+ # Or one-shot
647
+ decompressor.decompress("setup.kwj", "setup.exe")
648
+ ----
649
+
650
+ ===== Compress to KWAJ
651
+
652
+ [source,ruby]
653
+ ----
654
+ compressor = Cabriolet::KWAJ::Compressor.new
655
+
656
+ # Compress file
657
+ bytes = compressor.compress("file.exe", "file.kwj",
658
+ compression: :szdd,
659
+ include_length: true,
660
+ filename: "original.exe")
661
+
662
+ # Compression options: :none, :xor, :szdd, :mszip
663
+ ----
664
+
665
+ ==== HLP (Windows Help) operations
666
+
667
+ ===== Extract HLP file
668
+
669
+ [source,ruby]
670
+ ----
671
+ decompressor = Cabriolet::HLP::Decompressor.new
672
+ hlp = decompressor.open("help.hlp")
673
+
674
+ # Extract files
675
+ hlp.files.each do |file|
676
+ decompressor.extract_file(file, "output/#{file.filename}")
677
+ end
678
+ ----
679
+
680
+ ===== Create HLP file
681
+
682
+ [source,ruby]
683
+ ----
684
+ compressor = Cabriolet::HLP::Compressor.new
685
+
686
+ # Add files
687
+ compressor.add_file("topic1.txt", "topic1")
688
+ compressor.add_file("topic2.txt", "topic2")
689
+
690
+ # Generate HLP
691
+ bytes = compressor.generate("help.hlp")
692
+ ----
693
+
694
+ NOTE: HLP format has no public specification. Implementation is based on
695
+ libmspack source code.
696
+
697
+ ==== LIT (eBook) operations
698
+
699
+ ===== Extract LIT file
700
+
701
+ [source,ruby]
702
+ ----
703
+ decompressor = Cabriolet::LIT::Decompressor.new
704
+
705
+ begin
706
+ lit = decompressor.open("book.lit")
707
+
708
+ if lit.encrypted
709
+ raise "LIT file is DRM-encrypted. Decryption not supported."
710
+ end
711
+
712
+ # Extract files
713
+ lit.files.each do |file|
714
+ decompressor.extract_file(file, "output/#{file.filename}")
715
+ end
716
+ rescue NotImplementedError => e
717
+ puts "Error: #{e.message}"
718
+ end
719
+ ----
720
+
721
+ ===== Create LIT file
722
+
723
+ [source,ruby]
724
+ ----
725
+ compressor = Cabriolet::LIT::Compressor.new
726
+
727
+ compressor.add_file("content.html", "/content.html")
728
+ bytes = compressor.generate("book.lit")
729
+ ----
730
+
731
+ **Limitations**:
732
+
733
+ * DES encryption (DRM) is intentionally not supported
734
+ * For encrypted LIT files, decrypt with Microsoft Reader first
735
+
736
+ ==== OAB (Offline Address Book) operations
737
+
738
+ ===== Extract OAB file
739
+
740
+ [source,ruby]
741
+ ----
742
+ decompressor = Cabriolet::OAB::Decompressor.new
743
+
744
+ # Extract full file
745
+ decompressor.decompress("contacts.lzx", "contacts.oab")
746
+
747
+ # Apply incremental patch
748
+ decompressor.decompress_incremental("patch.lzx", "base.oab", "new.oab")
749
+ ----
750
+
751
+ ===== Create OAB file
752
+
753
+ [source,ruby]
754
+ ----
755
+ compressor = Cabriolet::OAB::Compressor.new
756
+
757
+ # Compress full file
758
+ compressor.compress("contacts.oab", "contacts.lzx")
759
+
760
+ # Create incremental patch
761
+ compressor.compress_incremental("new.oab", "old.oab", "patch.lzx")
762
+ ----
763
+
764
+ === Custom I/O Handlers
765
+
766
+ ==== In-memory operations
767
+
768
+ [source,ruby]
769
+ ----
770
+ # Create custom I/O system
771
+ memory_io = Cabriolet::System::IOSystem.new
772
+
773
+ # Process entirely in memory
774
+ decompressor = Cabriolet::CAB::Decompressor.new(memory_io)
775
+
776
+ # Load CAB data
777
+ cab_data = File.binread("example.cab")
778
+ input = Cabriolet::System::MemoryHandle.new(cab_data)
779
+ cabinet = decompressor.parser.parse_handle(input, "example.cab")
780
+
781
+ # Extract to memory
782
+ file = cabinet.files.first
783
+ output = Cabriolet::System::MemoryHandle.new("", Cabriolet::Constants::MODE_WRITE)
784
+ # ... extract to memory handle
785
+ ----
786
+
787
+ ==== Custom I/O system
788
+
789
+ [source,ruby]
790
+ ----
791
+ class CustomIOSystem < Cabriolet::System::IOSystem
792
+ def open(filename, mode)
793
+ # Custom open logic
794
+ end
795
+
796
+ def read(handle, bytes)
797
+ # Custom read logic
798
+ end
799
+
800
+ # ... implement other methods
801
+ end
802
+
803
+ # Use custom I/O
804
+ custom_io = CustomIOSystem.new
805
+ decompressor = Cabriolet::CAB::Decompressor.new(custom_io)
806
+ ----
807
+
808
+ === Error Handling
809
+
810
+ ==== Common errors
811
+
812
+ [source,ruby]
813
+ ----
814
+ begin
815
+ decompressor = Cabriolet::CAB::Decompressor.new
816
+ cabinet = decompressor.open("example.cab")
817
+ decompressor.extract_all(cabinet, "output/")
818
+ rescue Cabriolet::IOError => e
819
+ puts "I/O error: #{e.message}"
820
+ rescue Cabriolet::ParseError => e
821
+ puts "Parse error: #{e.message}"
822
+ rescue Cabriolet::ChecksumError => e
823
+ puts "Checksum failed: #{e.message}"
824
+ rescue Cabriolet::DecompressionError => e
825
+ puts "Decompression error: #{e.message}"
826
+ rescue Cabriolet::Error => e
827
+ puts "General error: #{e.message}"
828
+ end
829
+ ----
830
+
831
+ ==== Salvage mode for corrupted files
832
+
833
+ [source,ruby]
834
+ ----
835
+ decompressor = Cabriolet::CAB::Decompressor.new
836
+ decompressor.salvage = true # Enable error recovery
837
+
838
+ # Will skip bad files and continue
839
+ cabinet = decompressor.open("corrupted.cab")
840
+ decompressor.extract_all(cabinet, "output/")
841
+ ----
842
+
843
+ ==== Fix MSZIP errors
844
+
845
+ [source,ruby]
846
+ ----
847
+ decompressor = Cabriolet::CAB::Decompressor.new
848
+ decompressor.fix_mszip = true # Ignore MSZIP checksums, recover from errors
849
+
850
+ cabinet = decompressor.open("example.cab")
851
+ decompressor.extract_all(cabinet, "output/")
852
+ ----
853
+
854
+ === API Reference
855
+
856
+ ==== Cabriolet::CAB::Decompressor
857
+
858
+ Main class for CAB file operations.
859
+
860
+ ===== Class methods
861
+
862
+ `new(io_system = nil)`::
863
+ Creates a new decompressor instance.
864
+ +
865
+ Parameters:::
866
+ `io_system`::: Optional custom I/O system implementation
867
+ +
868
+ Returns:::
869
+ `Cabriolet::CAB::Decompressor`::: New decompressor instance
870
+
871
+ ===== Instance methods
872
+
873
+ `open(filename)`::
874
+ Opens and parses a CAB file.
875
+ +
876
+ Parameters:::
877
+ `filename`::: Path to CAB file
878
+ +
879
+ Returns:::
880
+ `Cabriolet::Models::Cabinet`::: Parsed cabinet object
881
+ +
882
+ Raises:::
883
+ `Cabriolet::ParseError`::: If file is not valid CAB format
884
+ `Cabriolet::IOError`::: If file cannot be opened
885
+
886
+ `extract_file(file, output_path, **options)`::
887
+ Extracts a single file from the cabinet.
888
+ +
889
+ Parameters:::
890
+ `file`::: `Cabriolet::Models::File` object
891
+ `output_path`::: Where to write the file
892
+ `options`::: Optional hash (salvage, overwrite, etc.)
893
+ +
894
+ Returns:::
895
+ `Integer`::: Number of bytes extracted
896
+
897
+ `extract_all(cabinet, output_dir, **options)`::
898
+ Extracts all files from the cabinet.
899
+ +
900
+ Parameters:::
901
+ `cabinet`::: `Cabriolet::Models::Cabinet` object
902
+ `output_dir`::: Directory to extract to
903
+ `options`::: Optional hash
904
+ +
905
+ Returns:::
906
+ `Integer`::: Number of files extracted
907
+
908
+ `search(filename)`::
909
+ Searches for embedded cabinets in a file.
910
+ +
911
+ Parameters:::
912
+ `filename`::: File to search
913
+ +
914
+ Returns:::
915
+ `Cabriolet::Models::Cabinet`::: First found cabinet (use `.next` for others)
916
+ `nil`::: If no cabinets found
917
+
918
+ `append(cabinet, next_cabinet)`::
919
+ Merges two cabinets in a multi-part set.
920
+ +
921
+ Parameters:::
922
+ `cabinet`::: First cabinet
923
+ `next_cabinet`::: Next cabinet in sequence
924
+ +
925
+ Returns:::
926
+ `void`
927
+
928
+ ===== Attributes
929
+
930
+ `buffer_size`::
931
+ I/O buffer size in bytes (default: 4096)
932
+
933
+ `salvage`::
934
+ Enable salvage mode for corrupted files (default: false)
935
+
936
+ `fix_mszip`::
937
+ Enable MSZIP error recovery (default: false)
938
+
939
+ ==== Cabriolet::CAB::Compressor
940
+
941
+ Class for creating CAB files.
942
+
943
+ ===== Instance methods
944
+
945
+ `add_file(source_path, cab_path = nil)`::
946
+ Adds a file to the cabinet.
947
+ +
948
+ Parameters:::
949
+ `source_path`::: Path to source file
950
+ `cab_path`::: Path within cabinet (optional, defaults to basename)
951
+
952
+ `generate(output_file, **options)`::
953
+ Generates the cabinet file.
954
+ +
955
+ Parameters:::
956
+ `output_file`::: Path to output CAB file
957
+ `options`::: Hash with compression, set_id, etc.
958
+ +
959
+ Returns:::
960
+ `Integer`::: Bytes written
961
+
962
+ **Example**:
963
+ [source,ruby]
964
+ ----
965
+ compressor = Cabriolet::CAB::Compressor.new
966
+ compressor.add_file("file1.txt")
967
+ compressor.add_file("file2.txt")
968
+ bytes = compressor.generate("output.cab", compression: :mszip)
969
+ ----
970
+
971
+ ==== Compression Algorithm Status
972
+
973
+ [cols="1,1,1,3"]
974
+ |===
975
+ | Algorithm | Decompression | Compression | Notes
976
+
977
+ | **None**
978
+ | ✅ Working
979
+ | ✅ Working
980
+ | Uncompressed storage
981
+
982
+ | **LZSS**
983
+ | ✅ Working
984
+ | ✅ Working
985
+ | 4KB sliding window, 3 modes (EXPAND, MSHELP, QBASIC)
986
+
987
+ | **MSZIP**
988
+ | ✅ Working
989
+ | ✅ Working
990
+ | DEFLATE/RFC 1951, fixed Huffman
991
+
992
+ | **LZX**
993
+ | ✅ Working
994
+ | ✅ Working
995
+ | UNCOMPRESSED blocks, 32KB-2MB window
996
+
997
+ | **Quantum**
998
+ | ✅ Working
999
+ | ⚠️ Functional
1000
+ | Literals + short matches work. Complex patterns pending.
1001
+ |===
1002
+
1003
+ === Configuration Options
1004
+
1005
+ ==== Buffer Sizes
1006
+
1007
+ [source,ruby]
1008
+ ----
1009
+ # Set default buffer size globally
1010
+ Cabriolet.default_buffer_size = 8192
1011
+
1012
+ # Or per decompressor
1013
+ decompressor.buffer_size = 16384
1014
+ ----
1015
+
1016
+ ==== Verbose Output
1017
+
1018
+ [source,ruby]
1019
+ ----
1020
+ # Enable verbose output globally
1021
+ Cabriolet.verbose = true
1022
+
1023
+ # Or use --verbose flag in CLI
1024
+ # cabriolet extract file.cab --verbose
1025
+ ----
1026
+
1027
+ === Compression Algorithm Selection Guide
1028
+
1029
+ [cols="1,1,1,1,3"]
1030
+ |===
1031
+ | Algorithm | Ratio | Speed | Complexity | Use Case
1032
+
1033
+ | **None**
1034
+ | 1:1
1035
+ | Fastest
1036
+ | Trivial
1037
+ | Already compressed data, testing
1038
+
1039
+ | **LZSS**
1040
+ | 2-3:1
1041
+ | Fast
1042
+ | Low
1043
+ | Small files, compatibility
1044
+
1045
+ | **MSZIP**
1046
+ | 3-5:1
1047
+ | Medium
1048
+ | Medium
1049
+ | **Recommended** for most uses
1050
+
1051
+ | **LZX**
1052
+ | 5-10:1
1053
+ | Slow
1054
+ | High
1055
+ | Large files, best compression
1056
+
1057
+ | **Quantum**
1058
+ | 4-8:1
1059
+ | Medium
1060
+ | Very High
1061
+ | Experimental, use with caution
1062
+ |===
1063
+
1064
+ === Return values
1065
+
1066
+ All methods return appropriate values or raise exceptions:
1067
+
1068
+ * **Decompression methods**: Return bytes extracted or raise error
1069
+ * **Compression methods**: Return bytes written or raise error
1070
+ * **Parse methods**: Return model objects or raise `ParseError`
1071
+ * **File operations**: Return file handles or raise `IOError`
1072
+
1073
+ == Development
1074
+
1075
+ === Building from source
1076
+
1077
+ [source,shell]
1078
+ ----
1079
+ git clone https://github.com/omnizip/cabriolet.git
1080
+ cd cabriolet
1081
+ bundle install
1082
+ bundle exec rake
1083
+ ----
1084
+
1085
+ === Running tests
1086
+
1087
+ [source,shell]
1088
+ ----
1089
+ bundle exec rspec
1090
+ ----
1091
+
1092
+
1093
+ === Running RuboCop
1094
+
1095
+ [source,shell]
1096
+ ----
1097
+ bundle exec rubocop
1098
+ bundle exec rubocop -A # Auto-correct
1099
+ ----
1100
+
1101
+
1102
+ == Known limitations
1103
+
1104
+ === Quantum compression
1105
+
1106
+ Quantum compression is **functional but experimental**:
1107
+
1108
+ * ✅ **Decompression**: Fully working, production ready
1109
+ * ✅ **Compression**: Working for:
1110
+ ** Simple literals
1111
+ ** Short matches (3-4 bytes)
1112
+ ** Basic patterns
1113
+ * ⚠️ **Limitations**:
1114
+ ** Complex repeated patterns may fail
1115
+ ** Very long matches (14+ bytes) have encoding issues
1116
+ ** Recommended: Use LZSS, MSZIP, or LZX instead
1117
+
1118
+ === LIT Format
1119
+
1120
+ * DES encryption (DRM) intentionally not supported
1121
+ * For DRM-protected LIT files, decrypt with Microsoft Reader first
1122
+
1123
+ === HLP/LIT/OAB Formats
1124
+
1125
+ * No public format specifications available
1126
+ * Implementation based on libmspack source code
1127
+ * Cannot be fully validated without real test files
1128
+ * Basic functionality working, edge cases may exist
1129
+
1130
+
1131
+ == Troubleshooting
1132
+
1133
+ === Extraction failures
1134
+
1135
+ Problem:: Invalid CAB signature
1136
+
1137
+ Solution:: File may not be a CAB, or is corrupted. Try salvage mode:
1138
+
1139
+ [source,shell]
1140
+ ----
1141
+ cabriolet extract --salvage corrupted.cab
1142
+ ----
1143
+
1144
+ Problem:: Checksum mismatch
1145
+
1146
+ Solution:: Enable error recovery:
1147
+
1148
+ [source,ruby]
1149
+ ----
1150
+ decompressor.fix_mszip = true
1151
+ decompressor.salvage = true
1152
+ ----
1153
+
1154
+ === Performance issues
1155
+
1156
+ Problem:: Slow extraction
1157
+
1158
+ Solution:: Increase buffer size:
1159
+
1160
+ [source,ruby]
1161
+ ----
1162
+ decompressor.buffer_size = 16384
1163
+ ----
1164
+
1165
+
1166
+ == Specifications
1167
+
1168
+ * https://en.wikipedia.org/wiki/Cabinet_(file_format)[Microsoft Cabinet File Format - Wikipedia]
1169
+ * https://www.rfc-editor.org/info/rfc1951[RFC 1951: DEFLATE Compressed Data Format Specification version 1.3, MAY 1996]
1170
+ * https://learn.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-patch/cc78752a-b4af-4eee-88cb-01f4d8a4c2bf[[MS-PATCH\]: LZX DELTA Compression and Decompression]
1171
+
1172
+
1173
+ == Acknowledgments
1174
+
1175
+ A special thank you to Stuart Caie (aka Kyzer) who created the original
1176
+ libmspack and cabextract projects, and their contributors for:
1177
+
1178
+ * Comprehensive CAB format implementation
1179
+ * Excellent test coverage and test fixtures
1180
+ * Clear format documentation
1181
+
1182
+ Link to the libmspack/cabextract project:
1183
+ https://www.cabextract.org.uk/libmspack/
1184
+
1185
+ Cabriolet is inspired by and builds upon the foundation laid by these projects.
1186
+
1187
+ If performance is critical, Cabriolet is not the best choice. Consider using
1188
+ https://github.com/davispuh/ruby-libmspack[libmspack via FFI] for optimized
1189
+ speed.
1190
+
1191
+
1192
+ == License
1193
+
1194
+ BSD 3-Clause License. See link:LICENSE[LICENSE] file for details.
1195
+
1196
+ Some test fixtures are from third-party projects. Test fixtures are **NOT**
1197
+ distributed with the gem and are only used for development and testing purposes.
1198
+
1199
+ These fixtures are sourced from the respective projects and retain their
1200
+ original licenses:
1201
+
1202
+ * Test fixtures in `spec/fixtures/libmspack/` are from the libmspack project
1203
+ (LGPL 2.1).
1204
+
1205
+ * Test fixtures in `spec/fixtures/cabextract/` are from cabextract (GPL 2.0+).
1206
+
1207
+ See fixture directories for individual attribution files.