cabriolet 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/lib/cabriolet/platform.rb +27 -0
- data/lib/cabriolet/version.rb +1 -1
- data/lib/cabriolet.rb +1 -1
- metadata +2 -3
- data/ARCHITECTURE.md +0 -799
- data/CHANGELOG.md +0 -44
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: edd5ee62d319aad301cffc319c13a9ec24c311aff284f77fd5a0612f598c14bd
|
|
4
|
+
data.tar.gz: b5bee141d6c010f845ff58104f4e99217c5f7c0fbbec5af94911e100fa2864e5
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 14f0bd63386aa18cf45dc0d4524513ee6cca16eb4537084244279c5f61c9b089dc519888083d498f0f0f1bdbd909716278e7787e9df28f594a93f2588ce61341
|
|
7
|
+
data.tar.gz: 007c90c8cb3aec1c43ff240810d532ed4ab237ec0b95713121536e64d100cced3dd150b1be76d8ff92fb1ed4a3ac236d4effd7e3f4cb4f9f35638c32109e456b
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Cabriolet
|
|
4
|
+
# Platform detection for handling OS-specific behavior
|
|
5
|
+
module Platform
|
|
6
|
+
# Check if running on Windows
|
|
7
|
+
#
|
|
8
|
+
# @return [Boolean] true if on Windows (including MinGW, Cygwin)
|
|
9
|
+
def self.windows?
|
|
10
|
+
RUBY_PLATFORM =~ /mswin|mingw|cygwin/
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
# Check if running on Unix-like system
|
|
14
|
+
#
|
|
15
|
+
# @return [Boolean] true if on Unix (Linux, macOS, BSD, etc.)
|
|
16
|
+
def self.unix?
|
|
17
|
+
!windows?
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
# Check if platform supports Unix file permissions
|
|
21
|
+
#
|
|
22
|
+
# @return [Boolean] true if platform supports chmod with Unix permission bits
|
|
23
|
+
def self.supports_unix_permissions?
|
|
24
|
+
unix?
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
end
|
data/lib/cabriolet/version.rb
CHANGED
data/lib/cabriolet.rb
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require_relative "cabriolet/version"
|
|
4
|
-
require_relative "cabriolet/
|
|
4
|
+
require_relative "cabriolet/platform"
|
|
5
5
|
require_relative "cabriolet/constants"
|
|
6
6
|
|
|
7
7
|
# Cabriolet is a pure Ruby library for extracting Microsoft Cabinet (.CAB) files,
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: cabriolet
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ribose Inc.
|
|
@@ -49,8 +49,6 @@ executables:
|
|
|
49
49
|
extensions: []
|
|
50
50
|
extra_rdoc_files: []
|
|
51
51
|
files:
|
|
52
|
-
- ARCHITECTURE.md
|
|
53
|
-
- CHANGELOG.md
|
|
54
52
|
- LICENSE
|
|
55
53
|
- README.adoc
|
|
56
54
|
- exe/cabriolet
|
|
@@ -115,6 +113,7 @@ files:
|
|
|
115
113
|
- lib/cabriolet/oab/compressor.rb
|
|
116
114
|
- lib/cabriolet/oab/decompressor.rb
|
|
117
115
|
- lib/cabriolet/parallel.rb
|
|
116
|
+
- lib/cabriolet/platform.rb
|
|
118
117
|
- lib/cabriolet/repairer.rb
|
|
119
118
|
- lib/cabriolet/streaming.rb
|
|
120
119
|
- lib/cabriolet/system/file_handle.rb
|
data/ARCHITECTURE.md
DELETED
|
@@ -1,799 +0,0 @@
|
|
|
1
|
-
# Cabriolet Architecture Plan
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
**Cabriolet** is a pure Ruby gem for extracting Microsoft compression formats, focusing primarily on CAB (Cabinet) files. This implementation is a Ruby port of libmspack and cabextract.
|
|
6
|
-
|
|
7
|
-
## Goals
|
|
8
|
-
|
|
9
|
-
1. **Pure Ruby Implementation**: No C extensions, fully portable
|
|
10
|
-
2. **Full CAB Format Support**: Handle all compression methods (MSZIP, LZX, Quantum)
|
|
11
|
-
3. **Extensible Design**: Easy to add support for CHM, LIT, HLP formats later
|
|
12
|
-
4. **Well-Tested**: Comprehensive test coverage using libmspack test files
|
|
13
|
-
5. **Performance**: Optimized for reasonable performance while maintaining readability
|
|
14
|
-
|
|
15
|
-
## Source Material
|
|
16
|
-
|
|
17
|
-
- **libmspack**: https://github.com/kyz/libmspack (LGPL 2.1)
|
|
18
|
-
- **Location**: `/Users/mulgogi/src/external/libmspack`
|
|
19
|
-
- **Primary Files to Port**:
|
|
20
|
-
- `mspack/cabd.c` - CAB decompressor
|
|
21
|
-
- `mspack/lzxd.c` - LZX decompression
|
|
22
|
-
- `mspack/mszipd.c` - MSZIP decompression
|
|
23
|
-
- `mspack/qtmd.c` - Quantum decompression
|
|
24
|
-
- `mspack/lzssd.c` - LZSS decompression
|
|
25
|
-
- `mspack/system.c` - I/O abstraction
|
|
26
|
-
|
|
27
|
-
## Architecture
|
|
28
|
-
|
|
29
|
-
### High-Level Structure
|
|
30
|
-
|
|
31
|
-
```
|
|
32
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
33
|
-
│ Cabriolet Gem │
|
|
34
|
-
├─────────────────────────────────────────────────────────────┤
|
|
35
|
-
│ │
|
|
36
|
-
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
37
|
-
│ │ CLI │ │ Cabinet │ │ Models │ │
|
|
38
|
-
│ │ Tool │ │ Extractor │ │ (Lutaml) │ │
|
|
39
|
-
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
40
|
-
│ │ │ │ │
|
|
41
|
-
│ └─────────────────┴──────────────────┘ │
|
|
42
|
-
│ │ │
|
|
43
|
-
│ ┌────────────────────────┴────────────────────────┐ │
|
|
44
|
-
│ │ CAB Decompressor (Core) │ │
|
|
45
|
-
│ ├─────────────────────────────────────────────────┤ │
|
|
46
|
-
│ │ • Cabinet Parser │ │
|
|
47
|
-
│ │ • Folder/File Management │ │
|
|
48
|
-
│ │ • Decompression Strategy Selection │ │
|
|
49
|
-
│ └─────────────────────────────────────────────────┘ │
|
|
50
|
-
│ │ │
|
|
51
|
-
│ ┌────────────────────────┴────────────────────────┐ │
|
|
52
|
-
│ │ Decompression Algorithms │ │
|
|
53
|
-
│ ├─────────────────────────────────────────────────┤ │
|
|
54
|
-
│ │ • MSZIP (Deflate) │ │
|
|
55
|
-
│ │ • LZX │ │
|
|
56
|
-
│ │ • Quantum │ │
|
|
57
|
-
│ │ • LZSS │ │
|
|
58
|
-
│ │ • None (Uncompressed) │ │
|
|
59
|
-
│ └─────────────────────────────────────────────────┘ │
|
|
60
|
-
│ │ │
|
|
61
|
-
│ ┌────────────────────────┴────────────────────────┐ │
|
|
62
|
-
│ │ Foundation Layer │ │
|
|
63
|
-
│ ├─────────────────────────────────────────────────┤ │
|
|
64
|
-
│ │ • System I/O Abstraction │ │
|
|
65
|
-
│ │ • Binary I/O (Endianness handling) │ │
|
|
66
|
-
│ │ • Bitstream Reader │ │
|
|
67
|
-
│ │ • Huffman Tree Decoder │ │
|
|
68
|
-
│ └─────────────────────────────────────────────────┘ │
|
|
69
|
-
│ │
|
|
70
|
-
└─────────────────────────────────────────────────────────────┘
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
### Directory Structure
|
|
74
|
-
|
|
75
|
-
```
|
|
76
|
-
cabriolet/
|
|
77
|
-
├── lib/
|
|
78
|
-
│ └── cabriolet/
|
|
79
|
-
│ ├── version.rb
|
|
80
|
-
│ ├── errors.rb
|
|
81
|
-
│ ├── constants.rb
|
|
82
|
-
│ │
|
|
83
|
-
│ ├── system/ # System abstraction layer
|
|
84
|
-
│ │ ├── io_system.rb # File I/O abstraction
|
|
85
|
-
│ │ ├── file_handle.rb # File handle wrapper
|
|
86
|
-
│ │ └── memory_handle.rb # In-memory I/O
|
|
87
|
-
│ │
|
|
88
|
-
│ ├── binary/ # Binary I/O utilities
|
|
89
|
-
│ │ ├── reader.rb # Binary data reader
|
|
90
|
-
│ │ ├── bitstream.rb # Bitstream reader
|
|
91
|
-
│ │ └── endian.rb # Endianness handling
|
|
92
|
-
│ │
|
|
93
|
-
│ ├── huffman/ # Huffman decoding
|
|
94
|
-
│ │ ├── tree.rb # Huffman tree structure
|
|
95
|
-
│ │ └── decoder.rb # Huffman decoder
|
|
96
|
-
│ │
|
|
97
|
-
│ ├── models/ # Data models (Lutaml::Model)
|
|
98
|
-
│ │ ├── cabinet.rb # Cabinet structure
|
|
99
|
-
│ │ ├── folder.rb # Folder structure
|
|
100
|
-
│ │ └── file.rb # File structure
|
|
101
|
-
│ │
|
|
102
|
-
│ ├── decompressors/ # Decompression algorithms
|
|
103
|
-
│ │ ├── base.rb # Base decompressor
|
|
104
|
-
│ │ ├── none.rb # No compression
|
|
105
|
-
│ │ ├── lzss.rb # LZSS algorithm
|
|
106
|
-
│ │ ├── mszip.rb # MSZIP (deflate)
|
|
107
|
-
│ │ ├── lzx.rb # LZX algorithm
|
|
108
|
-
│ │ └── quantum.rb # Quantum algorithm
|
|
109
|
-
│ │
|
|
110
|
-
│ ├── cab/ # CAB format support
|
|
111
|
-
│ │ ├── parser.rb # CAB file parser
|
|
112
|
-
│ │ ├── decompressor.rb # Main decompressor
|
|
113
|
-
│ │ └── extractor.rb # File extraction
|
|
114
|
-
│ │
|
|
115
|
-
│ └── cli.rb # Command-line interface
|
|
116
|
-
│
|
|
117
|
-
├── spec/
|
|
118
|
-
│ ├── fixtures/ # Test CAB files
|
|
119
|
-
│ ├── system/
|
|
120
|
-
│ ├── binary/
|
|
121
|
-
│ ├── huffman/
|
|
122
|
-
│ ├── models/
|
|
123
|
-
│ ├── decompressors/
|
|
124
|
-
│ └── cab/
|
|
125
|
-
│
|
|
126
|
-
├── exe/
|
|
127
|
-
│ └── cabriolet # CLI executable
|
|
128
|
-
│
|
|
129
|
-
├── ARCHITECTURE.md
|
|
130
|
-
├── README.adoc
|
|
131
|
-
├── CHANGELOG.md
|
|
132
|
-
├── LICENSE
|
|
133
|
-
├── Gemfile
|
|
134
|
-
└── cabriolet.gemspec
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
## Core Components
|
|
138
|
-
|
|
139
|
-
### 1. System Abstraction Layer
|
|
140
|
-
|
|
141
|
-
**Purpose**: Abstract file I/O, memory management, and system calls
|
|
142
|
-
|
|
143
|
-
**Files**:
|
|
144
|
-
- `system/io_system.rb` - Main I/O abstraction
|
|
145
|
-
- `system/file_handle.rb` - File operations wrapper
|
|
146
|
-
- `system/memory_handle.rb` - In-memory operations
|
|
147
|
-
|
|
148
|
-
**Design**:
|
|
149
|
-
```ruby
|
|
150
|
-
module Cabriolet
|
|
151
|
-
module System
|
|
152
|
-
class IOSystem
|
|
153
|
-
def open(filename, mode)
|
|
154
|
-
# Returns FileHandle or MemoryHandle
|
|
155
|
-
end
|
|
156
|
-
|
|
157
|
-
def close(handle)
|
|
158
|
-
# Closes the handle
|
|
159
|
-
end
|
|
160
|
-
|
|
161
|
-
def read(handle, bytes)
|
|
162
|
-
# Reads bytes from handle
|
|
163
|
-
end
|
|
164
|
-
|
|
165
|
-
def write(handle, data)
|
|
166
|
-
# Writes data to handle
|
|
167
|
-
end
|
|
168
|
-
|
|
169
|
-
def seek(handle, offset, whence)
|
|
170
|
-
# Seeks to position
|
|
171
|
-
end
|
|
172
|
-
|
|
173
|
-
def tell(handle)
|
|
174
|
-
# Returns current position
|
|
175
|
-
end
|
|
176
|
-
end
|
|
177
|
-
end
|
|
178
|
-
end
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
### 2. Binary I/O Layer
|
|
182
|
-
|
|
183
|
-
**Purpose**: Handle binary data reading with proper endianness
|
|
184
|
-
|
|
185
|
-
**Files**:
|
|
186
|
-
- `binary/reader.rb` - Binary data reader
|
|
187
|
-
- `binary/bitstream.rb` - Bitstream operations
|
|
188
|
-
- `binary/endian.rb` - Endian conversion utilities
|
|
189
|
-
|
|
190
|
-
**Key Features**:
|
|
191
|
-
- Little-endian integer reading (CAB uses little-endian)
|
|
192
|
-
- Bitstream reading for compressed data
|
|
193
|
-
- Buffer management
|
|
194
|
-
|
|
195
|
-
**Design**:
|
|
196
|
-
```ruby
|
|
197
|
-
module Cabriolet
|
|
198
|
-
module Binary
|
|
199
|
-
class Reader
|
|
200
|
-
def read_uint16_le
|
|
201
|
-
# Read 16-bit little-endian unsigned integer
|
|
202
|
-
end
|
|
203
|
-
|
|
204
|
-
def read_uint32_le
|
|
205
|
-
# Read 32-bit little-endian unsigned integer
|
|
206
|
-
end
|
|
207
|
-
|
|
208
|
-
def read_bytes(count)
|
|
209
|
-
# Read raw bytes
|
|
210
|
-
end
|
|
211
|
-
end
|
|
212
|
-
|
|
213
|
-
class Bitstream
|
|
214
|
-
def initialize(io_system, file_handle, buffer_size)
|
|
215
|
-
# Initialize bitstream reader
|
|
216
|
-
end
|
|
217
|
-
|
|
218
|
-
def read_bits(num_bits)
|
|
219
|
-
# Read specified number of bits
|
|
220
|
-
end
|
|
221
|
-
|
|
222
|
-
def byte_align
|
|
223
|
-
# Align to byte boundary
|
|
224
|
-
end
|
|
225
|
-
end
|
|
226
|
-
end
|
|
227
|
-
end
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
### 3. Huffman Decoding
|
|
231
|
-
|
|
232
|
-
**Purpose**: Decode Huffman-encoded data streams
|
|
233
|
-
|
|
234
|
-
**Files**:
|
|
235
|
-
- `huffman/tree.rb` - Huffman tree construction
|
|
236
|
-
- `huffman/decoder.rb` - Decoding logic
|
|
237
|
-
|
|
238
|
-
**Design**:
|
|
239
|
-
```ruby
|
|
240
|
-
module Cabriolet
|
|
241
|
-
module Huffman
|
|
242
|
-
class Tree
|
|
243
|
-
def initialize(lengths, num_symbols)
|
|
244
|
-
# Build Huffman tree from code lengths
|
|
245
|
-
end
|
|
246
|
-
|
|
247
|
-
def build_table(table_bits)
|
|
248
|
-
# Build fast decode table
|
|
249
|
-
end
|
|
250
|
-
end
|
|
251
|
-
|
|
252
|
-
class Decoder
|
|
253
|
-
def decode_symbol(bitstream, table)
|
|
254
|
-
# Decode one symbol from bitstream
|
|
255
|
-
end
|
|
256
|
-
end
|
|
257
|
-
end
|
|
258
|
-
end
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
### 4. Data Models
|
|
262
|
-
|
|
263
|
-
**Purpose**: Represent CAB file structures
|
|
264
|
-
|
|
265
|
-
**Files**:
|
|
266
|
-
- `models/cabinet.rb`
|
|
267
|
-
- `models/folder.rb`
|
|
268
|
-
- `models/file.rb`
|
|
269
|
-
|
|
270
|
-
**Design** (Plain Ruby classes):
|
|
271
|
-
```ruby
|
|
272
|
-
module Cabriolet
|
|
273
|
-
module Models
|
|
274
|
-
class Cabinet
|
|
275
|
-
attr_accessor :filename, :length, :set_id, :set_index, :flags
|
|
276
|
-
attr_accessor :folders, :files, :next_cabinet, :prev_cabinet
|
|
277
|
-
attr_accessor :base_offset, :header_resv, :prevname, :nextname
|
|
278
|
-
attr_accessor :previnfo, :nextinfo
|
|
279
|
-
|
|
280
|
-
def initialize
|
|
281
|
-
@folders = []
|
|
282
|
-
@files = []
|
|
283
|
-
end
|
|
284
|
-
end
|
|
285
|
-
|
|
286
|
-
class Folder
|
|
287
|
-
attr_accessor :comp_type, :num_blocks, :data_offset
|
|
288
|
-
attr_accessor :next, :data_cab, :merge_prev, :merge_next
|
|
289
|
-
|
|
290
|
-
def initialize
|
|
291
|
-
@data_cab = nil
|
|
292
|
-
@merge_prev = nil
|
|
293
|
-
@merge_next = nil
|
|
294
|
-
end
|
|
295
|
-
end
|
|
296
|
-
|
|
297
|
-
class File
|
|
298
|
-
attr_accessor :filename, :length, :offset, :folder
|
|
299
|
-
attr_accessor :attribs, :date, :time
|
|
300
|
-
attr_accessor :time_h, :time_m, :time_s
|
|
301
|
-
attr_accessor :date_d, :date_m, :date_y
|
|
302
|
-
attr_accessor :next
|
|
303
|
-
|
|
304
|
-
def initialize
|
|
305
|
-
@next = nil
|
|
306
|
-
end
|
|
307
|
-
end
|
|
308
|
-
end
|
|
309
|
-
end
|
|
310
|
-
```
|
|
311
|
-
|
|
312
|
-
### 5. Decompressors
|
|
313
|
-
|
|
314
|
-
**Purpose**: Implement compression algorithms
|
|
315
|
-
|
|
316
|
-
**Base Class**:
|
|
317
|
-
```ruby
|
|
318
|
-
module Cabriolet
|
|
319
|
-
module Decompressors
|
|
320
|
-
class Base
|
|
321
|
-
def initialize(io_system, input_handle, output_handle, buffer_size)
|
|
322
|
-
@io_system = io_system
|
|
323
|
-
@input = input_handle
|
|
324
|
-
@output = output_handle
|
|
325
|
-
@buffer_size = buffer_size
|
|
326
|
-
end
|
|
327
|
-
|
|
328
|
-
def decompress(bytes)
|
|
329
|
-
# Abstract method - implemented by subclasses
|
|
330
|
-
raise NotImplementedError
|
|
331
|
-
end
|
|
332
|
-
end
|
|
333
|
-
end
|
|
334
|
-
end
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
**Subclasses**:
|
|
338
|
-
|
|
339
|
-
1. **LZSS** (`decompressors/lzss.rb`):
|
|
340
|
-
- Window size: 4096 bytes
|
|
341
|
-
- Used by SZDD, KWAJ formats
|
|
342
|
-
- Simple sliding window compression
|
|
343
|
-
|
|
344
|
-
2. **MSZIP** (`decompressors/mszip.rb`):
|
|
345
|
-
- Deflate algorithm (RFC 1951)
|
|
346
|
-
- 32KB sliding window
|
|
347
|
-
- Huffman coding + LZ77
|
|
348
|
-
|
|
349
|
-
3. **LZX** (`decompressors/lzx.rb`):
|
|
350
|
-
- Window sizes: 32KB to 2MB
|
|
351
|
-
- Intel E8 preprocessing
|
|
352
|
-
- Multiple Huffman trees
|
|
353
|
-
|
|
354
|
-
4. **Quantum** (`decompressors/quantum.rb`):
|
|
355
|
-
- Proprietary format
|
|
356
|
-
- Complex algorithm
|
|
357
|
-
- Huffman coding + sliding window
|
|
358
|
-
|
|
359
|
-
5. **None** (`decompressors/none.rb`):
|
|
360
|
-
- Simple copy operation
|
|
361
|
-
- No decompression
|
|
362
|
-
|
|
363
|
-
### 6. CAB Format Support
|
|
364
|
-
|
|
365
|
-
**Parser** (`cab/parser.rb`):
|
|
366
|
-
```ruby
|
|
367
|
-
module Cabriolet
|
|
368
|
-
module CAB
|
|
369
|
-
class Parser
|
|
370
|
-
def initialize(io_system)
|
|
371
|
-
@io_system = io_system
|
|
372
|
-
end
|
|
373
|
-
|
|
374
|
-
def parse(filename)
|
|
375
|
-
# Parse CAB file headers
|
|
376
|
-
# Returns Cabinet model
|
|
377
|
-
end
|
|
378
|
-
|
|
379
|
-
private
|
|
380
|
-
|
|
381
|
-
def read_header(handle)
|
|
382
|
-
# Read CFHEADER structure
|
|
383
|
-
end
|
|
384
|
-
|
|
385
|
-
def read_folders(handle, count)
|
|
386
|
-
# Read CFFOLDER structures
|
|
387
|
-
end
|
|
388
|
-
|
|
389
|
-
def read_files(handle, count)
|
|
390
|
-
# Read CFFILE structures
|
|
391
|
-
end
|
|
392
|
-
end
|
|
393
|
-
end
|
|
394
|
-
end
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
**Decompressor** (`cab/decompressor.rb`):
|
|
398
|
-
```ruby
|
|
399
|
-
module Cabriolet
|
|
400
|
-
module CAB
|
|
401
|
-
class Decompressor
|
|
402
|
-
def initialize(io_system = nil)
|
|
403
|
-
@io_system = io_system || System::IOSystem.new
|
|
404
|
-
@parser = Parser.new(@io_system)
|
|
405
|
-
end
|
|
406
|
-
|
|
407
|
-
def open(filename)
|
|
408
|
-
# Open and parse CAB file
|
|
409
|
-
@parser.parse(filename)
|
|
410
|
-
end
|
|
411
|
-
|
|
412
|
-
def extract(file, output_filename)
|
|
413
|
-
# Extract a single file
|
|
414
|
-
end
|
|
415
|
-
|
|
416
|
-
def extract_all(output_directory)
|
|
417
|
-
# Extract all files
|
|
418
|
-
end
|
|
419
|
-
|
|
420
|
-
private
|
|
421
|
-
|
|
422
|
-
def select_decompressor(comp_type)
|
|
423
|
-
# Select appropriate decompressor
|
|
424
|
-
end
|
|
425
|
-
end
|
|
426
|
-
end
|
|
427
|
-
end
|
|
428
|
-
```
|
|
429
|
-
|
|
430
|
-
**Extractor** (`cab/extractor.rb`):
|
|
431
|
-
```ruby
|
|
432
|
-
module Cabriolet
|
|
433
|
-
module CAB
|
|
434
|
-
class Extractor
|
|
435
|
-
def initialize(cabinet, io_system)
|
|
436
|
-
@cabinet = cabinet
|
|
437
|
-
@io_system = io_system
|
|
438
|
-
end
|
|
439
|
-
|
|
440
|
-
def extract_file(file, output_path)
|
|
441
|
-
# Extract single file from cabinet
|
|
442
|
-
end
|
|
443
|
-
end
|
|
444
|
-
end
|
|
445
|
-
end
|
|
446
|
-
```
|
|
447
|
-
|
|
448
|
-
### 7. CLI Tool
|
|
449
|
-
|
|
450
|
-
**Design** (`cli.rb`):
|
|
451
|
-
```ruby
|
|
452
|
-
require 'thor'
|
|
453
|
-
|
|
454
|
-
module Cabriolet
|
|
455
|
-
class CLI < Thor
|
|
456
|
-
desc 'list FILE', 'List contents of CAB file'
|
|
457
|
-
def list(file)
|
|
458
|
-
# List all files in cabinet
|
|
459
|
-
end
|
|
460
|
-
|
|
461
|
-
desc 'extract FILE [OUTPUT_DIR]', 'Extract CAB file'
|
|
462
|
-
option :verbose, type: :boolean, aliases: '-v'
|
|
463
|
-
def extract(file, output_dir = '.')
|
|
464
|
-
# Extract files
|
|
465
|
-
end
|
|
466
|
-
|
|
467
|
-
desc 'info FILE', 'Show CAB file information'
|
|
468
|
-
def info(file)
|
|
469
|
-
# Show detailed cabinet info
|
|
470
|
-
end
|
|
471
|
-
|
|
472
|
-
desc 'test FILE', 'Test CAB file integrity'
|
|
473
|
-
def test(file)
|
|
474
|
-
# Test file integrity
|
|
475
|
-
end
|
|
476
|
-
end
|
|
477
|
-
end
|
|
478
|
-
```
|
|
479
|
-
|
|
480
|
-
## Implementation Phases
|
|
481
|
-
|
|
482
|
-
### Phase 1: Foundation (Weeks 1-2)
|
|
483
|
-
|
|
484
|
-
- [x] Project setup (Gemfile, gemspec, RSpec)
|
|
485
|
-
- [ ] System abstraction layer
|
|
486
|
-
- [ ] Binary I/O utilities
|
|
487
|
-
- [ ] Bitstream reader
|
|
488
|
-
- [ ] Basic error handling
|
|
489
|
-
|
|
490
|
-
### Phase 2: Format Support (Weeks 3-4)
|
|
491
|
-
|
|
492
|
-
- [ ] CAB format constants
|
|
493
|
-
- [ ] Data models with Lutaml::Model
|
|
494
|
-
- [ ] CAB parser (headers, folders, files)
|
|
495
|
-
- [ ] Cabinet search functionality
|
|
496
|
-
|
|
497
|
-
### Phase 3: Basic Decompression (Weeks 5-6)
|
|
498
|
-
|
|
499
|
-
- [ ] Base decompressor class
|
|
500
|
-
- [ ] None decompressor (uncompressed)
|
|
501
|
-
- [ ] LZSS decompressor
|
|
502
|
-
- [ ] Basic extraction workflow
|
|
503
|
-
|
|
504
|
-
### Phase 4: MSZIP Support (Weeks 7-8)
|
|
505
|
-
|
|
506
|
-
- [ ] Huffman tree builder
|
|
507
|
-
- [ ] Huffman decoder
|
|
508
|
-
- [ ] MSZIP/Deflate decompressor
|
|
509
|
-
- [ ] Integration with CAB extractor
|
|
510
|
-
|
|
511
|
-
### Phase 5: LZX Support (Weeks 9-11)
|
|
512
|
-
|
|
513
|
-
- [ ] LZX constants and structures
|
|
514
|
-
- [ ] LZX bitstream handling
|
|
515
|
-
- [ ] LZX Huffman trees
|
|
516
|
-
- [ ] Intel E8 transformation
|
|
517
|
-
- [ ] LZX decompressor
|
|
518
|
-
|
|
519
|
-
### Phase 6: Quantum Support (Weeks 12-13)
|
|
520
|
-
|
|
521
|
-
- [ ] Quantum algorithm research
|
|
522
|
-
- [ ] Quantum decompressor
|
|
523
|
-
- [ ] Special case handling
|
|
524
|
-
|
|
525
|
-
### Phase 7: Testing & Polish (Weeks 14-15)
|
|
526
|
-
|
|
527
|
-
- [ ] Comprehensive test suite
|
|
528
|
-
- [ ] Performance optimization
|
|
529
|
-
- [ ] Documentation
|
|
530
|
-
- [ ] CLI refinement
|
|
531
|
-
|
|
532
|
-
### Phase 8: Extended Formats (Future)
|
|
533
|
-
|
|
534
|
-
- [ ] CHM (HTML Help) format support
|
|
535
|
-
- [ ] LIT (eBook) format support
|
|
536
|
-
- [ ] HLP (Help) format support
|
|
537
|
-
|
|
538
|
-
## CAB Format Specification
|
|
539
|
-
|
|
540
|
-
### File Structure
|
|
541
|
-
|
|
542
|
-
```
|
|
543
|
-
┌──────────────────────────────┐
|
|
544
|
-
│ CFHEADER (36+ bytes) │ Cabinet header
|
|
545
|
-
├──────────────────────────────┤
|
|
546
|
-
│ Reserved area (optional) │
|
|
547
|
-
├──────────────────────────────┤
|
|
548
|
-
│ Previous cabinet name │ (if flags & 0x01)
|
|
549
|
-
├──────────────────────────────┤
|
|
550
|
-
│ Next cabinet name │ (if flags & 0x02)
|
|
551
|
-
├──────────────────────────────┤
|
|
552
|
-
│ CFFOLDER[1] (8+ bytes) │ Folder entries
|
|
553
|
-
│ CFFOLDER[2] │
|
|
554
|
-
│ ... │
|
|
555
|
-
├──────────────────────────────┤
|
|
556
|
-
│ CFFILE[1] (16+ bytes) │ File entries
|
|
557
|
-
│ CFFILE[2] │
|
|
558
|
-
│ ... │
|
|
559
|
-
├──────────────────────────────┤
|
|
560
|
-
│ CFDATA[1] (8+ bytes) │ Data blocks
|
|
561
|
-
│ Compressed data[1] │
|
|
562
|
-
│ CFDATA[2] │
|
|
563
|
-
│ Compressed data[2] │
|
|
564
|
-
│ ... │
|
|
565
|
-
└──────────────────────────────┘
|
|
566
|
-
```
|
|
567
|
-
|
|
568
|
-
### CFHEADER Structure
|
|
569
|
-
|
|
570
|
-
```
|
|
571
|
-
Offset Size Description
|
|
572
|
-
------ ---- -----------
|
|
573
|
-
0 4 Signature (0x4643534D = "MSCF")
|
|
574
|
-
4 4 Reserved
|
|
575
|
-
8 4 Cabinet file size
|
|
576
|
-
12 4 Reserved
|
|
577
|
-
16 4 Files offset
|
|
578
|
-
20 4 Reserved
|
|
579
|
-
24 1 Minor version
|
|
580
|
-
25 1 Major version
|
|
581
|
-
26 2 Number of folders
|
|
582
|
-
28 2 Number of files
|
|
583
|
-
30 2 Flags
|
|
584
|
-
32 2 Set ID
|
|
585
|
-
34 2 Cabinet index
|
|
586
|
-
```
|
|
587
|
-
|
|
588
|
-
### Compression Types
|
|
589
|
-
|
|
590
|
-
| Type | Value | Description |
|
|
591
|
-
|------|-------|-------------|
|
|
592
|
-
| None | 0 | No compression |
|
|
593
|
-
| MSZIP | 1 | MSZIP (deflate) |
|
|
594
|
-
| Quantum | 2 | Quantum compression |
|
|
595
|
-
| LZX | 3 | LZX compression |
|
|
596
|
-
|
|
597
|
-
## Testing Strategy
|
|
598
|
-
|
|
599
|
-
### Unit Tests
|
|
600
|
-
|
|
601
|
-
- Each decompressor tested independently
|
|
602
|
-
- Binary I/O utilities tested with known data
|
|
603
|
-
- Huffman decoder tested with sample trees
|
|
604
|
-
- Parser tested with valid/invalid CAB files
|
|
605
|
-
|
|
606
|
-
### Integration Tests
|
|
607
|
-
|
|
608
|
-
- Full extraction of known CAB files
|
|
609
|
-
- Multi-cabinet spanning tests
|
|
610
|
-
- Error recovery tests
|
|
611
|
-
- Performance benchmarks
|
|
612
|
-
|
|
613
|
-
### Test Data
|
|
614
|
-
|
|
615
|
-
#### libmspack Test Fixtures
|
|
616
|
-
|
|
617
|
-
Copy test files from libmspack to `spec/fixtures/libmspack/`:
|
|
618
|
-
|
|
619
|
-
```bash
|
|
620
|
-
# Directory structure
|
|
621
|
-
spec/fixtures/libmspack/
|
|
622
|
-
├── README.adoc # License acknowledgment
|
|
623
|
-
└── cabd/
|
|
624
|
-
├── normal_2files_1folder.cab # Basic CAB
|
|
625
|
-
├── mszip_lzx_qtm.cab # Multiple compression
|
|
626
|
-
├── multi_basic_pt1.cab # Multi-part cabinet
|
|
627
|
-
├── multi_basic_pt2.cab
|
|
628
|
-
├── cve-2010-2800-mszip-infinite-loop.cab # Security test
|
|
629
|
-
└── ...
|
|
630
|
-
```
|
|
631
|
-
|
|
632
|
-
#### Test Coverage Strategy
|
|
633
|
-
|
|
634
|
-
Each RSpec file tests its corresponding class:
|
|
635
|
-
- **Unit Tests**: Test each class in isolation with mocks/stubs
|
|
636
|
-
- **Integration Tests**: Test component interactions
|
|
637
|
-
- **End-to-End Tests**: Full extraction workflow with real CAB files
|
|
638
|
-
|
|
639
|
-
**Example RSpec structure**:
|
|
640
|
-
```ruby
|
|
641
|
-
# spec/decompressors/lzx_spec.rb
|
|
642
|
-
RSpec.describe Cabriolet::Decompressors::LZX do
|
|
643
|
-
describe '#initialize' do
|
|
644
|
-
# Test initialization
|
|
645
|
-
end
|
|
646
|
-
|
|
647
|
-
describe '#decompress' do
|
|
648
|
-
context 'with valid LZX data' do
|
|
649
|
-
# Test decompression
|
|
650
|
-
end
|
|
651
|
-
|
|
652
|
-
context 'with corrupted data' do
|
|
653
|
-
# Test error handling
|
|
654
|
-
end
|
|
655
|
-
end
|
|
656
|
-
end
|
|
657
|
-
```
|
|
658
|
-
|
|
659
|
-
## Error Handling
|
|
660
|
-
|
|
661
|
-
### Error Classes
|
|
662
|
-
|
|
663
|
-
```ruby
|
|
664
|
-
module Cabriolet
|
|
665
|
-
class Error < StandardError; end
|
|
666
|
-
|
|
667
|
-
class IOError < Error; end
|
|
668
|
-
class ParseError < Error; end
|
|
669
|
-
class DecompressionError < Error; end
|
|
670
|
-
class ChecksumError < Error; end
|
|
671
|
-
class UnsupportedFormatError < Error; end
|
|
672
|
-
end
|
|
673
|
-
```
|
|
674
|
-
|
|
675
|
-
### Error Strategy
|
|
676
|
-
|
|
677
|
-
1. **Graceful degradation**: Attempt partial extraction on errors
|
|
678
|
-
2. **Clear messages**: Provide actionable error information
|
|
679
|
-
3. **Salvage mode**: Optional parameter to skip errors and extract what's possible
|
|
680
|
-
4. **Validation**: Verify checksums and data integrity
|
|
681
|
-
|
|
682
|
-
## Performance Considerations
|
|
683
|
-
|
|
684
|
-
1. **Buffer Sizes**: Default 4KB buffers, configurable
|
|
685
|
-
2. **Memory Usage**: Stream-based processing, avoid loading entire files
|
|
686
|
-
3. **Lookup Tables**: Pre-computed Huffman decode tables
|
|
687
|
-
4. **Ruby Optimization**:
|
|
688
|
-
- Use byte arrays instead of strings where appropriate
|
|
689
|
-
- Minimize object allocation in hot paths
|
|
690
|
-
- Use bitwise operations efficiently
|
|
691
|
-
|
|
692
|
-
## Documentation
|
|
693
|
-
|
|
694
|
-
### README.adoc Structure
|
|
695
|
-
|
|
696
|
-
```asciidoc
|
|
697
|
-
= Cabriolet
|
|
698
|
-
|
|
699
|
-
Pure Ruby implementation of Microsoft CAB file extraction.
|
|
700
|
-
|
|
701
|
-
== Features
|
|
702
|
-
|
|
703
|
-
* Full CAB format support
|
|
704
|
-
* Multiple compression algorithms
|
|
705
|
-
* No C extensions required
|
|
706
|
-
* CLI tool included
|
|
707
|
-
|
|
708
|
-
== Installation
|
|
709
|
-
|
|
710
|
-
== Usage
|
|
711
|
-
|
|
712
|
-
=== Library
|
|
713
|
-
|
|
714
|
-
=== Command Line
|
|
715
|
-
|
|
716
|
-
== Architecture
|
|
717
|
-
|
|
718
|
-
== Development
|
|
719
|
-
|
|
720
|
-
== License
|
|
721
|
-
```
|
|
722
|
-
|
|
723
|
-
### Documentation
|
|
724
|
-
|
|
725
|
-
See [`DOCUMENTATION_PLAN.md`](DOCUMENTATION_PLAN.md:1) for complete documentation architecture.
|
|
726
|
-
|
|
727
|
-
**Documentation Structure**:
|
|
728
|
-
- `docs/getting-started/` - Installation, quick start, first extraction
|
|
729
|
-
- `docs/user-guide/` - Basic usage, advanced usage, CLI/API reference
|
|
730
|
-
- `docs/formats/` - CAB format, compression algorithms (MSZIP, LZX, Quantum, LZSS)
|
|
731
|
-
- `docs/technical/` - Architecture, system abstraction, binary I/O, Huffman coding
|
|
732
|
-
- `docs/developer/` - Contributing, code style, testing, extending
|
|
733
|
-
- `docs/appendix/` - Glossary, CAB spec, troubleshooting, FAQ
|
|
734
|
-
|
|
735
|
-
**Standard Document Format**:
|
|
736
|
-
Every document follows: Purpose → References → Concepts → Body → Bibliography
|
|
737
|
-
|
|
738
|
-
**Cross-Cutting Documentation**:
|
|
739
|
-
- Common options shared between CLI and API documented once
|
|
740
|
-
- Each compression format gets detailed explanation
|
|
741
|
-
- Progressive disclosure: basic → intermediate → advanced
|
|
742
|
-
|
|
743
|
-
## Dependencies
|
|
744
|
-
|
|
745
|
-
### Runtime
|
|
746
|
-
|
|
747
|
-
- `bindata` (~> 2.5) - For binary data structures
|
|
748
|
-
- `thor` (~> 1.3) - For CLI
|
|
749
|
-
|
|
750
|
-
### Development
|
|
751
|
-
|
|
752
|
-
- `rspec` - Testing framework
|
|
753
|
-
- `rake` - Build tool
|
|
754
|
-
- `rubocop` - Code style
|
|
755
|
-
- `yard` - Documentation
|
|
756
|
-
|
|
757
|
-
## Licensing
|
|
758
|
-
|
|
759
|
-
### Cabriolet License
|
|
760
|
-
|
|
761
|
-
**BSD 3-Clause License**
|
|
762
|
-
|
|
763
|
-
The Cabriolet gem itself is released under the BSD 3-Clause License, allowing:
|
|
764
|
-
- Commercial use
|
|
765
|
-
- Modification
|
|
766
|
-
- Distribution
|
|
767
|
-
- Private use
|
|
768
|
-
|
|
769
|
-
With conditions:
|
|
770
|
-
- License and copyright notice must be included
|
|
771
|
-
- No liability or warranty
|
|
772
|
-
|
|
773
|
-
### Test Fixtures License
|
|
774
|
-
|
|
775
|
-
The test fixtures in `spec/fixtures/libmspack/` are from the libmspack project and remain under the **LGPL 2.1** license. These are used solely for testing and validation purposes and are not distributed as part of the gem's runtime code.
|
|
776
|
-
|
|
777
|
-
A `spec/fixtures/libmspack/README.adoc` file will acknowledge:
|
|
778
|
-
- Copyright by Stuart Caie and libmspack contributors
|
|
779
|
-
- LGPL 2.1 licensing of test files
|
|
780
|
-
- Gratitude to the libmspack project for excellent test coverage
|
|
781
|
-
|
|
782
|
-
### Implementation Notes
|
|
783
|
-
|
|
784
|
-
This is a clean-room implementation based on:
|
|
785
|
-
1. Public CAB file format specifications (Microsoft documentation)
|
|
786
|
-
2. Algorithm specifications (LZX, deflate/RFC 1951, etc.)
|
|
787
|
-
3. Test-driven development using publicly available test files
|
|
788
|
-
|
|
789
|
-
The implementation does not copy code from libmspack but reimplements the algorithms in Ruby based on specifications and format documentation.
|
|
790
|
-
|
|
791
|
-
## Success Criteria
|
|
792
|
-
|
|
793
|
-
1. Successfully extract all test CAB files from libmspack test suite
|
|
794
|
-
2. Handle all compression methods (MSZIP, LZX, Quantum, LZSS, None)
|
|
795
|
-
3. Support multi-part cabinet sets
|
|
796
|
-
4. Achieve reasonable performance (within 3-5x of native C implementation)
|
|
797
|
-
5. Zero C extension dependencies
|
|
798
|
-
6. Comprehensive test coverage (>90%)
|
|
799
|
-
7. Well-documented API and CLI
|
data/CHANGELOG.md
DELETED
|
@@ -1,44 +0,0 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
All notable changes to this project will be documented in this file.
|
|
4
|
-
|
|
5
|
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
-
|
|
8
|
-
## [Unreleased]
|
|
9
|
-
|
|
10
|
-
### Added
|
|
11
|
-
|
|
12
|
-
- Initial project structure and architecture
|
|
13
|
-
- System abstraction layer (IOSystem, FileHandle, MemoryHandle)
|
|
14
|
-
- Binary I/O utilities (BinData structures, Bitstream)
|
|
15
|
-
- Domain models (Cabinet, Folder, File)
|
|
16
|
-
- Decompressor base classes and stubs
|
|
17
|
-
- CAB parser and extractor framework
|
|
18
|
-
- CLI tool with Thor
|
|
19
|
-
- Comprehensive documentation plan
|
|
20
|
-
- Test infrastructure with RSpec
|
|
21
|
-
|
|
22
|
-
### Changed
|
|
23
|
-
|
|
24
|
-
- Nothing yet
|
|
25
|
-
|
|
26
|
-
### Deprecated
|
|
27
|
-
|
|
28
|
-
- Nothing yet
|
|
29
|
-
|
|
30
|
-
### Removed
|
|
31
|
-
|
|
32
|
-
- Nothing yet
|
|
33
|
-
|
|
34
|
-
### Fixed
|
|
35
|
-
|
|
36
|
-
- Nothing yet
|
|
37
|
-
|
|
38
|
-
### Security
|
|
39
|
-
|
|
40
|
-
- Nothing yet
|
|
41
|
-
|
|
42
|
-
## [0.1.0] - TBD
|
|
43
|
-
|
|
44
|
-
- Initial release (planned)
|