omnizip 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (511) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +3 -0
  3. data/.rubocop.yml +32 -0
  4. data/.rubocop_todo.yml +754 -0
  5. data/COPYING +502 -0
  6. data/Gemfile +17 -0
  7. data/LICENSE +12 -0
  8. data/README.adoc +1045 -0
  9. data/Rakefile +12 -0
  10. data/benchmark/README.md +260 -0
  11. data/benchmark/benchmark_suite.rb +125 -0
  12. data/benchmark/compression_bench.rb +181 -0
  13. data/benchmark/filter_bench.rb +180 -0
  14. data/benchmark/models/benchmark_result.rb +59 -0
  15. data/benchmark/models/comparison_result.rb +69 -0
  16. data/benchmark/profile_suite.rb +167 -0
  17. data/benchmark/reporter.rb +150 -0
  18. data/benchmark/run_benchmarks.rb +66 -0
  19. data/benchmark/test_data.rb +137 -0
  20. data/config/formats/rar3_spec.yml +91 -0
  21. data/config/formats/rar5_spec.yml +102 -0
  22. data/docs/.github/workflows/docs.yml +142 -0
  23. data/docs/.gitignore +21 -0
  24. data/docs/.lychee.toml +67 -0
  25. data/docs/Gemfile +13 -0
  26. data/docs/RAR_WRITE_SUPPORT.md +26 -0
  27. data/docs/README.md +101 -0
  28. data/docs/_config.yml +112 -0
  29. data/docs/assets/logo.svg +1 -0
  30. data/docs/assets/omnizip-logo.pdf +1540 -11
  31. data/docs/comparison/feature-matrix.adoc +694 -0
  32. data/docs/comparison/index.adoc +113 -0
  33. data/docs/comparison/vs-7zip.adoc +309 -0
  34. data/docs/comparison/vs-peazip.adoc +77 -0
  35. data/docs/comparison/vs-rubyzip.adoc +342 -0
  36. data/docs/comparison/vs-winrar.adoc +100 -0
  37. data/docs/compatibility.adoc +579 -0
  38. data/docs/concepts/index.adoc +129 -0
  39. data/docs/developer/architecture.adoc +256 -0
  40. data/docs/developer/contributing.adoc +158 -0
  41. data/docs/developer/index.adoc +25 -0
  42. data/docs/developer/testing.adoc +212 -0
  43. data/docs/getting-started/basic-usage.adoc +271 -0
  44. data/docs/getting-started/index.adoc +42 -0
  45. data/docs/getting-started/installation.adoc +138 -0
  46. data/docs/getting-started/quick-start.adoc +185 -0
  47. data/docs/getting-started/your-first-archive.adoc +218 -0
  48. data/docs/guides/advanced-features/encryption.adoc +300 -0
  49. data/docs/guides/advanced-features/index.adoc +49 -0
  50. data/docs/guides/advanced-features/parallel-processing.adoc +246 -0
  51. data/docs/guides/advanced-features/progress-tracking.adoc +320 -0
  52. data/docs/guides/advanced-features/streaming.adoc +212 -0
  53. data/docs/guides/archive-formats/gzip-format.adoc +107 -0
  54. data/docs/guides/archive-formats/index.adoc +130 -0
  55. data/docs/guides/archive-formats/rar-format.adoc +104 -0
  56. data/docs/guides/archive-formats/rar5.adoc +521 -0
  57. data/docs/guides/archive-formats/seven-zip-format.adoc +35 -0
  58. data/docs/guides/archive-formats/tar-format.adoc +106 -0
  59. data/docs/guides/archive-formats/xz-format.adoc +118 -0
  60. data/docs/guides/archive-formats/zip-format.adoc +35 -0
  61. data/docs/guides/compression-algorithms/bzip2.adoc +113 -0
  62. data/docs/guides/compression-algorithms/deflate.adoc +319 -0
  63. data/docs/guides/compression-algorithms/index.adoc +190 -0
  64. data/docs/guides/compression-algorithms/lzma.adoc +398 -0
  65. data/docs/guides/compression-algorithms/lzma2.adoc +327 -0
  66. data/docs/guides/compression-algorithms/ppmd.adoc +316 -0
  67. data/docs/guides/compression-algorithms/zstandard.adoc +361 -0
  68. data/docs/guides/creating-archives.adoc +354 -0
  69. data/docs/guides/extracting-archives.adoc +53 -0
  70. data/docs/guides/format-conversion.adoc +64 -0
  71. data/docs/guides/index.adoc +49 -0
  72. data/docs/guides/migration-rubyzip.adoc +217 -0
  73. data/docs/guides/parity-archives.adoc +605 -0
  74. data/docs/guides/performance-tuning.adoc +88 -0
  75. data/docs/index.adoc +218 -0
  76. data/docs/lychee.toml +67 -0
  77. data/docs/reference/api/overview.adoc +188 -0
  78. data/docs/reference/cli/compress-command.adoc +114 -0
  79. data/docs/reference/cli/overview.adoc +140 -0
  80. data/docs/reference/index.adoc +26 -0
  81. data/docs/resources/faq.adoc +185 -0
  82. data/docs/resources/quick-reference.adoc +222 -0
  83. data/docs/troubleshooting/index.adoc +208 -0
  84. data/examples/api_comparison.rb +205 -0
  85. data/examples/deflate64_example.rb +96 -0
  86. data/examples/par2_demo.rb +121 -0
  87. data/examples/quick_start_native.rb +150 -0
  88. data/examples/quick_start_rubyzip.rb +115 -0
  89. data/examples/rubyzip_compatibility_demo.rb +194 -0
  90. data/exe/omnizip +27 -0
  91. data/lib/omnizip/algorithm.rb +130 -0
  92. data/lib/omnizip/algorithm_registry.rb +86 -0
  93. data/lib/omnizip/algorithms/.keep +0 -0
  94. data/lib/omnizip/algorithms/bzip2/bwt.rb +225 -0
  95. data/lib/omnizip/algorithms/bzip2/decoder.rb +193 -0
  96. data/lib/omnizip/algorithms/bzip2/encoder.rb +237 -0
  97. data/lib/omnizip/algorithms/bzip2/huffman.rb +206 -0
  98. data/lib/omnizip/algorithms/bzip2/mtf.rb +101 -0
  99. data/lib/omnizip/algorithms/bzip2/rle.rb +151 -0
  100. data/lib/omnizip/algorithms/bzip2.rb +130 -0
  101. data/lib/omnizip/algorithms/deflate/constants.rb +28 -0
  102. data/lib/omnizip/algorithms/deflate/decoder.rb +38 -0
  103. data/lib/omnizip/algorithms/deflate/encoder.rb +46 -0
  104. data/lib/omnizip/algorithms/deflate.rb +128 -0
  105. data/lib/omnizip/algorithms/deflate64/constants.rb +45 -0
  106. data/lib/omnizip/algorithms/deflate64/decoder.rb +153 -0
  107. data/lib/omnizip/algorithms/deflate64/encoder.rb +98 -0
  108. data/lib/omnizip/algorithms/deflate64/huffman_coder.rb +354 -0
  109. data/lib/omnizip/algorithms/deflate64/lz77_encoder.rb +142 -0
  110. data/lib/omnizip/algorithms/deflate64.rb +109 -0
  111. data/lib/omnizip/algorithms/lzma/bit_model.rb +120 -0
  112. data/lib/omnizip/algorithms/lzma/constants.rb +112 -0
  113. data/lib/omnizip/algorithms/lzma/decoder.rb +148 -0
  114. data/lib/omnizip/algorithms/lzma/dictionary.rb +69 -0
  115. data/lib/omnizip/algorithms/lzma/distance_coder.rb +415 -0
  116. data/lib/omnizip/algorithms/lzma/encoder.rb +142 -0
  117. data/lib/omnizip/algorithms/lzma/length_coder.rb +260 -0
  118. data/lib/omnizip/algorithms/lzma/literal_decoder.rb +320 -0
  119. data/lib/omnizip/algorithms/lzma/literal_encoder.rb +210 -0
  120. data/lib/omnizip/algorithms/lzma/lzip_decoder.rb +341 -0
  121. data/lib/omnizip/algorithms/lzma/lzma_alone_decoder.rb +192 -0
  122. data/lib/omnizip/algorithms/lzma/lzma_state.rb +128 -0
  123. data/lib/omnizip/algorithms/lzma/match.rb +32 -0
  124. data/lib/omnizip/algorithms/lzma/match_finder.rb +205 -0
  125. data/lib/omnizip/algorithms/lzma/match_finder_config.rb +142 -0
  126. data/lib/omnizip/algorithms/lzma/match_finder_factory.rb +88 -0
  127. data/lib/omnizip/algorithms/lzma/optimal_encoder.rb +130 -0
  128. data/lib/omnizip/algorithms/lzma/probability_models.rb +72 -0
  129. data/lib/omnizip/algorithms/lzma/range_coder.rb +85 -0
  130. data/lib/omnizip/algorithms/lzma/range_decoder.rb +434 -0
  131. data/lib/omnizip/algorithms/lzma/range_encoder.rb +194 -0
  132. data/lib/omnizip/algorithms/lzma/state.rb +127 -0
  133. data/lib/omnizip/algorithms/lzma/xz_buffered_range_encoder.rb +325 -0
  134. data/lib/omnizip/algorithms/lzma/xz_encoder.rb +426 -0
  135. data/lib/omnizip/algorithms/lzma/xz_encoder_fast.rb +645 -0
  136. data/lib/omnizip/algorithms/lzma/xz_match_finder_adapter.rb +227 -0
  137. data/lib/omnizip/algorithms/lzma/xz_price_calculator.rb +169 -0
  138. data/lib/omnizip/algorithms/lzma/xz_probability_models.rb +261 -0
  139. data/lib/omnizip/algorithms/lzma/xz_range_encoder.rb +223 -0
  140. data/lib/omnizip/algorithms/lzma/xz_range_encoder_exact.rb +331 -0
  141. data/lib/omnizip/algorithms/lzma/xz_state.rb +116 -0
  142. data/lib/omnizip/algorithms/lzma/xz_utils_decoder.rb +2055 -0
  143. data/lib/omnizip/algorithms/lzma.rb +238 -0
  144. data/lib/omnizip/algorithms/lzma2/chunk_manager.rb +182 -0
  145. data/lib/omnizip/algorithms/lzma2/constants.rb +41 -0
  146. data/lib/omnizip/algorithms/lzma2/encoder.rb +147 -0
  147. data/lib/omnizip/algorithms/lzma2/lzma2_chunk.rb +161 -0
  148. data/lib/omnizip/algorithms/lzma2/properties.rb +179 -0
  149. data/lib/omnizip/algorithms/lzma2/simple_lzma2_encoder.rb +127 -0
  150. data/lib/omnizip/algorithms/lzma2/xz_encoder_adapter.rb +85 -0
  151. data/lib/omnizip/algorithms/lzma2.rb +141 -0
  152. data/lib/omnizip/algorithms/ppmd7/constants.rb +74 -0
  153. data/lib/omnizip/algorithms/ppmd7/context.rb +154 -0
  154. data/lib/omnizip/algorithms/ppmd7/decoder.rb +126 -0
  155. data/lib/omnizip/algorithms/ppmd7/encoder.rb +163 -0
  156. data/lib/omnizip/algorithms/ppmd7/model.rb +248 -0
  157. data/lib/omnizip/algorithms/ppmd7/symbol_state.rb +57 -0
  158. data/lib/omnizip/algorithms/ppmd7.rb +116 -0
  159. data/lib/omnizip/algorithms/ppmd8/constants.rb +61 -0
  160. data/lib/omnizip/algorithms/ppmd8/context.rb +34 -0
  161. data/lib/omnizip/algorithms/ppmd8/decoder.rb +107 -0
  162. data/lib/omnizip/algorithms/ppmd8/encoder.rb +138 -0
  163. data/lib/omnizip/algorithms/ppmd8/model.rb +250 -0
  164. data/lib/omnizip/algorithms/ppmd8/restoration_method.rb +78 -0
  165. data/lib/omnizip/algorithms/ppmd8.rb +82 -0
  166. data/lib/omnizip/algorithms/ppmd_base.rb +138 -0
  167. data/lib/omnizip/algorithms/sevenzip_lzma2.rb +123 -0
  168. data/lib/omnizip/algorithms/xz_lzma2.rb +118 -0
  169. data/lib/omnizip/algorithms/zstandard/constants.rb +25 -0
  170. data/lib/omnizip/algorithms/zstandard/decoder.rb +46 -0
  171. data/lib/omnizip/algorithms/zstandard/encoder.rb +51 -0
  172. data/lib/omnizip/algorithms/zstandard.rb +138 -0
  173. data/lib/omnizip/buffer/memory_archive.rb +251 -0
  174. data/lib/omnizip/buffer/memory_extractor.rb +224 -0
  175. data/lib/omnizip/buffer.rb +176 -0
  176. data/lib/omnizip/checksum_registry.rb +114 -0
  177. data/lib/omnizip/checksums/crc32.rb +100 -0
  178. data/lib/omnizip/checksums/crc64.rb +101 -0
  179. data/lib/omnizip/checksums/crc_base.rb +158 -0
  180. data/lib/omnizip/checksums/verifier.rb +131 -0
  181. data/lib/omnizip/chunked/memory_manager.rb +194 -0
  182. data/lib/omnizip/chunked/reader.rb +78 -0
  183. data/lib/omnizip/chunked/writer.rb +120 -0
  184. data/lib/omnizip/chunked.rb +129 -0
  185. data/lib/omnizip/cli/output_formatter.rb +104 -0
  186. data/lib/omnizip/cli.rb +572 -0
  187. data/lib/omnizip/commands/.keep +0 -0
  188. data/lib/omnizip/commands/archive_create_command.rb +427 -0
  189. data/lib/omnizip/commands/archive_extract_command.rb +272 -0
  190. data/lib/omnizip/commands/archive_list_command.rb +218 -0
  191. data/lib/omnizip/commands/archive_repair_command.rb +131 -0
  192. data/lib/omnizip/commands/archive_verify_command.rb +117 -0
  193. data/lib/omnizip/commands/compress_command.rb +117 -0
  194. data/lib/omnizip/commands/decompress_command.rb +120 -0
  195. data/lib/omnizip/commands/list_command.rb +53 -0
  196. data/lib/omnizip/commands/metadata_command.rb +153 -0
  197. data/lib/omnizip/commands/parity_create_command.rb +122 -0
  198. data/lib/omnizip/commands/parity_repair_command.rb +122 -0
  199. data/lib/omnizip/commands/parity_verify_command.rb +124 -0
  200. data/lib/omnizip/commands/profile_list_command.rb +56 -0
  201. data/lib/omnizip/commands/profile_show_command.rb +44 -0
  202. data/lib/omnizip/convenience.rb +359 -0
  203. data/lib/omnizip/converter/conversion_registry.rb +49 -0
  204. data/lib/omnizip/converter/conversion_strategy.rb +121 -0
  205. data/lib/omnizip/converter/seven_zip_to_zip_strategy.rb +97 -0
  206. data/lib/omnizip/converter/zip_to_seven_zip_strategy.rb +112 -0
  207. data/lib/omnizip/converter.rb +105 -0
  208. data/lib/omnizip/crypto/aes256/cipher.rb +100 -0
  209. data/lib/omnizip/crypto/aes256/constants.rb +28 -0
  210. data/lib/omnizip/crypto/aes256/key_derivation.rb +101 -0
  211. data/lib/omnizip/crypto/aes256.rb +102 -0
  212. data/lib/omnizip/error.rb +106 -0
  213. data/lib/omnizip/eta/exponential_smoothing_estimator.rb +98 -0
  214. data/lib/omnizip/eta/moving_average_estimator.rb +99 -0
  215. data/lib/omnizip/eta/rate_calculator.rb +104 -0
  216. data/lib/omnizip/eta/sample_history.rb +143 -0
  217. data/lib/omnizip/eta/time_estimator.rb +106 -0
  218. data/lib/omnizip/eta.rb +63 -0
  219. data/lib/omnizip/extraction/filter_chain.rb +177 -0
  220. data/lib/omnizip/extraction/glob_pattern.rb +140 -0
  221. data/lib/omnizip/extraction/pattern_matcher.rb +70 -0
  222. data/lib/omnizip/extraction/predicate_pattern.rb +52 -0
  223. data/lib/omnizip/extraction/regex_pattern.rb +50 -0
  224. data/lib/omnizip/extraction/selective_extractor.rb +240 -0
  225. data/lib/omnizip/extraction.rb +111 -0
  226. data/lib/omnizip/file_type/mime_classifier.rb +144 -0
  227. data/lib/omnizip/file_type.rb +113 -0
  228. data/lib/omnizip/filter.rb +139 -0
  229. data/lib/omnizip/filter_pipeline.rb +108 -0
  230. data/lib/omnizip/filter_registry.rb +166 -0
  231. data/lib/omnizip/filters/bcj.rb +279 -0
  232. data/lib/omnizip/filters/bcj2/constants.rb +53 -0
  233. data/lib/omnizip/filters/bcj2/decoder.rb +200 -0
  234. data/lib/omnizip/filters/bcj2/encoder.rb +61 -0
  235. data/lib/omnizip/filters/bcj2/stream_data.rb +93 -0
  236. data/lib/omnizip/filters/bcj2.rb +99 -0
  237. data/lib/omnizip/filters/bcj_arm.rb +176 -0
  238. data/lib/omnizip/filters/bcj_arm64.rb +244 -0
  239. data/lib/omnizip/filters/bcj_ia64.rb +196 -0
  240. data/lib/omnizip/filters/bcj_ppc.rb +190 -0
  241. data/lib/omnizip/filters/bcj_sparc.rb +176 -0
  242. data/lib/omnizip/filters/bcj_x86.rb +193 -0
  243. data/lib/omnizip/filters/delta.rb +196 -0
  244. data/lib/omnizip/filters/filter_base.rb +72 -0
  245. data/lib/omnizip/filters/registry.rb +123 -0
  246. data/lib/omnizip/filters/xz_delta.rb +258 -0
  247. data/lib/omnizip/format_detector.rb +162 -0
  248. data/lib/omnizip/format_registry.rb +59 -0
  249. data/lib/omnizip/formats/.keep +0 -0
  250. data/lib/omnizip/formats/bzip2_file.rb +172 -0
  251. data/lib/omnizip/formats/cpio/constants.rb +55 -0
  252. data/lib/omnizip/formats/cpio/entry.rb +385 -0
  253. data/lib/omnizip/formats/cpio/reader.rb +196 -0
  254. data/lib/omnizip/formats/cpio/writer.rb +234 -0
  255. data/lib/omnizip/formats/cpio.rb +140 -0
  256. data/lib/omnizip/formats/format_spec_loader.rb +230 -0
  257. data/lib/omnizip/formats/gzip.rb +238 -0
  258. data/lib/omnizip/formats/iso/directory_builder.rb +297 -0
  259. data/lib/omnizip/formats/iso/directory_record.rb +152 -0
  260. data/lib/omnizip/formats/iso/joliet.rb +204 -0
  261. data/lib/omnizip/formats/iso/path_table.rb +125 -0
  262. data/lib/omnizip/formats/iso/reader.rb +197 -0
  263. data/lib/omnizip/formats/iso/rock_ridge.rb +349 -0
  264. data/lib/omnizip/formats/iso/volume_builder.rb +320 -0
  265. data/lib/omnizip/formats/iso/volume_descriptor.rb +168 -0
  266. data/lib/omnizip/formats/iso/writer.rb +530 -0
  267. data/lib/omnizip/formats/iso.rb +140 -0
  268. data/lib/omnizip/formats/lzip.rb +175 -0
  269. data/lib/omnizip/formats/lzma_alone.rb +171 -0
  270. data/lib/omnizip/formats/rar/archive_repairer.rb +243 -0
  271. data/lib/omnizip/formats/rar/archive_verifier.rb +195 -0
  272. data/lib/omnizip/formats/rar/block_parser.rb +243 -0
  273. data/lib/omnizip/formats/rar/compression/bit_stream.rb +180 -0
  274. data/lib/omnizip/formats/rar/compression/dispatcher.rb +217 -0
  275. data/lib/omnizip/formats/rar/compression/lz77_huffman/decoder.rb +216 -0
  276. data/lib/omnizip/formats/rar/compression/lz77_huffman/encoder.rb +158 -0
  277. data/lib/omnizip/formats/rar/compression/lz77_huffman/huffman_builder.rb +217 -0
  278. data/lib/omnizip/formats/rar/compression/lz77_huffman/huffman_coder.rb +189 -0
  279. data/lib/omnizip/formats/rar/compression/lz77_huffman/match_finder.rb +135 -0
  280. data/lib/omnizip/formats/rar/compression/lz77_huffman/sliding_window.rb +165 -0
  281. data/lib/omnizip/formats/rar/compression/ppmd/context.rb +105 -0
  282. data/lib/omnizip/formats/rar/compression/ppmd/decoder.rb +219 -0
  283. data/lib/omnizip/formats/rar/compression/ppmd/encoder.rb +262 -0
  284. data/lib/omnizip/formats/rar/compression_method_registry.rb +106 -0
  285. data/lib/omnizip/formats/rar/constants.rb +82 -0
  286. data/lib/omnizip/formats/rar/decompressor.rb +238 -0
  287. data/lib/omnizip/formats/rar/external_writer.rb +312 -0
  288. data/lib/omnizip/formats/rar/header.rb +192 -0
  289. data/lib/omnizip/formats/rar/license_validator.rb +109 -0
  290. data/lib/omnizip/formats/rar/models/rar_archive.rb +77 -0
  291. data/lib/omnizip/formats/rar/models/rar_entry.rb +65 -0
  292. data/lib/omnizip/formats/rar/models/rar_volume.rb +56 -0
  293. data/lib/omnizip/formats/rar/parity_handler.rb +292 -0
  294. data/lib/omnizip/formats/rar/rar5/compression/lzma.rb +202 -0
  295. data/lib/omnizip/formats/rar/rar5/compression/lzss.rb +578 -0
  296. data/lib/omnizip/formats/rar/rar5/compression/store.rb +60 -0
  297. data/lib/omnizip/formats/rar/rar5/crc32.rb +39 -0
  298. data/lib/omnizip/formats/rar/rar5/encryption/aes256_cbc.rb +97 -0
  299. data/lib/omnizip/formats/rar/rar5/encryption/encryption_header.rb +114 -0
  300. data/lib/omnizip/formats/rar/rar5/encryption/encryption_manager.rb +166 -0
  301. data/lib/omnizip/formats/rar/rar5/encryption/key_derivation.rb +97 -0
  302. data/lib/omnizip/formats/rar/rar5/header.rb +187 -0
  303. data/lib/omnizip/formats/rar/rar5/models/encryption_options.rb +74 -0
  304. data/lib/omnizip/formats/rar/rar5/models/recovery_options.rb +63 -0
  305. data/lib/omnizip/formats/rar/rar5/models/solid_options.rb +63 -0
  306. data/lib/omnizip/formats/rar/rar5/models/volume_options.rb +74 -0
  307. data/lib/omnizip/formats/rar/rar5/multi_volume/ARCHITECTURE.md +290 -0
  308. data/lib/omnizip/formats/rar/rar5/multi_volume/volume_manager.rb +264 -0
  309. data/lib/omnizip/formats/rar/rar5/multi_volume/volume_splitter.rb +155 -0
  310. data/lib/omnizip/formats/rar/rar5/multi_volume/volume_writer.rb +194 -0
  311. data/lib/omnizip/formats/rar/rar5/solid/solid_encoder.rb +109 -0
  312. data/lib/omnizip/formats/rar/rar5/solid/solid_manager.rb +142 -0
  313. data/lib/omnizip/formats/rar/rar5/solid/solid_stream.rb +121 -0
  314. data/lib/omnizip/formats/rar/rar5/vint.rb +65 -0
  315. data/lib/omnizip/formats/rar/rar5/writer.rb +466 -0
  316. data/lib/omnizip/formats/rar/rar_format_base.rb +241 -0
  317. data/lib/omnizip/formats/rar/reader.rb +366 -0
  318. data/lib/omnizip/formats/rar/recovery_record.rb +245 -0
  319. data/lib/omnizip/formats/rar/volume_manager.rb +168 -0
  320. data/lib/omnizip/formats/rar/writer.rb +431 -0
  321. data/lib/omnizip/formats/rar.rb +205 -0
  322. data/lib/omnizip/formats/rar3/compressor.rb +73 -0
  323. data/lib/omnizip/formats/rar3/decompressor.rb +66 -0
  324. data/lib/omnizip/formats/rar3/reader.rb +386 -0
  325. data/lib/omnizip/formats/rar3/writer.rb +219 -0
  326. data/lib/omnizip/formats/rar5/compressor.rb +73 -0
  327. data/lib/omnizip/formats/rar5/decompressor.rb +66 -0
  328. data/lib/omnizip/formats/rar5/reader.rb +342 -0
  329. data/lib/omnizip/formats/rar5/writer.rb +214 -0
  330. data/lib/omnizip/formats/seven_zip/coder_chain.rb +150 -0
  331. data/lib/omnizip/formats/seven_zip/constants.rb +126 -0
  332. data/lib/omnizip/formats/seven_zip/encoded_header.rb +114 -0
  333. data/lib/omnizip/formats/seven_zip/encrypted_header.rb +142 -0
  334. data/lib/omnizip/formats/seven_zip/file_collector.rb +144 -0
  335. data/lib/omnizip/formats/seven_zip/header.rb +106 -0
  336. data/lib/omnizip/formats/seven_zip/header_encryptor.rb +134 -0
  337. data/lib/omnizip/formats/seven_zip/header_writer.rb +466 -0
  338. data/lib/omnizip/formats/seven_zip/models/coder_info.rb +30 -0
  339. data/lib/omnizip/formats/seven_zip/models/file_entry.rb +58 -0
  340. data/lib/omnizip/formats/seven_zip/models/folder.rb +69 -0
  341. data/lib/omnizip/formats/seven_zip/models/stream_info.rb +42 -0
  342. data/lib/omnizip/formats/seven_zip/parser.rb +660 -0
  343. data/lib/omnizip/formats/seven_zip/reader.rb +458 -0
  344. data/lib/omnizip/formats/seven_zip/split_archive_reader.rb +632 -0
  345. data/lib/omnizip/formats/seven_zip/split_archive_writer.rb +315 -0
  346. data/lib/omnizip/formats/seven_zip/stream_compressor.rb +151 -0
  347. data/lib/omnizip/formats/seven_zip/stream_decompressor.rb +162 -0
  348. data/lib/omnizip/formats/seven_zip/writer.rb +740 -0
  349. data/lib/omnizip/formats/seven_zip.rb +93 -0
  350. data/lib/omnizip/formats/tar/constants.rb +73 -0
  351. data/lib/omnizip/formats/tar/entry.rb +94 -0
  352. data/lib/omnizip/formats/tar/header.rb +168 -0
  353. data/lib/omnizip/formats/tar/reader.rb +121 -0
  354. data/lib/omnizip/formats/tar/writer.rb +216 -0
  355. data/lib/omnizip/formats/tar.rb +84 -0
  356. data/lib/omnizip/formats/xz/reader.rb +116 -0
  357. data/lib/omnizip/formats/xz.rb +237 -0
  358. data/lib/omnizip/formats/xz_impl/block_decoder.rb +754 -0
  359. data/lib/omnizip/formats/xz_impl/block_encoder.rb +306 -0
  360. data/lib/omnizip/formats/xz_impl/block_header.rb +210 -0
  361. data/lib/omnizip/formats/xz_impl/block_header_parser.rb +186 -0
  362. data/lib/omnizip/formats/xz_impl/constants.rb +49 -0
  363. data/lib/omnizip/formats/xz_impl/index_decoder.rb +174 -0
  364. data/lib/omnizip/formats/xz_impl/index_encoder.rb +122 -0
  365. data/lib/omnizip/formats/xz_impl/stream_decoder.rb +468 -0
  366. data/lib/omnizip/formats/xz_impl/stream_encoder.rb +99 -0
  367. data/lib/omnizip/formats/xz_impl/stream_footer.rb +81 -0
  368. data/lib/omnizip/formats/xz_impl/stream_footer_parser.rb +117 -0
  369. data/lib/omnizip/formats/xz_impl/stream_header.rb +55 -0
  370. data/lib/omnizip/formats/xz_impl/stream_header_parser.rb +108 -0
  371. data/lib/omnizip/formats/xz_impl/vli.rb +128 -0
  372. data/lib/omnizip/formats/xz_impl/writer.rb +421 -0
  373. data/lib/omnizip/formats/zip/central_directory_header.rb +195 -0
  374. data/lib/omnizip/formats/zip/constants.rb +69 -0
  375. data/lib/omnizip/formats/zip/end_of_central_directory.rb +133 -0
  376. data/lib/omnizip/formats/zip/local_file_header.rb +138 -0
  377. data/lib/omnizip/formats/zip/reader.rb +250 -0
  378. data/lib/omnizip/formats/zip/unix_extra_field.rb +153 -0
  379. data/lib/omnizip/formats/zip/writer.rb +375 -0
  380. data/lib/omnizip/formats/zip/zip64_end_of_central_directory.rb +104 -0
  381. data/lib/omnizip/formats/zip/zip64_end_of_central_directory_locator.rb +66 -0
  382. data/lib/omnizip/formats/zip/zip64_extra_field.rb +114 -0
  383. data/lib/omnizip/formats/zip.rb +50 -0
  384. data/lib/omnizip/implementations/base/lzma2_decoder_base.rb +75 -0
  385. data/lib/omnizip/implementations/base/lzma2_encoder_base.rb +128 -0
  386. data/lib/omnizip/implementations/base/lzma_decoder_base.rb +83 -0
  387. data/lib/omnizip/implementations/base/lzma_encoder_base.rb +108 -0
  388. data/lib/omnizip/implementations/base/state_machine_base.rb +182 -0
  389. data/lib/omnizip/implementations/seven_zip/lzma/decoder.rb +421 -0
  390. data/lib/omnizip/implementations/seven_zip/lzma/encoder.rb +465 -0
  391. data/lib/omnizip/implementations/seven_zip/lzma/match_finder.rb +288 -0
  392. data/lib/omnizip/implementations/seven_zip/lzma/range_decoder.rb +200 -0
  393. data/lib/omnizip/implementations/seven_zip/lzma/range_encoder.rb +197 -0
  394. data/lib/omnizip/implementations/seven_zip/lzma/state_machine.rb +141 -0
  395. data/lib/omnizip/implementations/seven_zip/lzma2/encoder.rb +519 -0
  396. data/lib/omnizip/implementations/xz_utils/lzma2/decoder.rb +723 -0
  397. data/lib/omnizip/implementations/xz_utils/lzma2/encoder.rb +750 -0
  398. data/lib/omnizip/io/buffered_input.rb +146 -0
  399. data/lib/omnizip/io/buffered_output.rb +105 -0
  400. data/lib/omnizip/io/stream_manager.rb +115 -0
  401. data/lib/omnizip/link_handler/hard_link.rb +79 -0
  402. data/lib/omnizip/link_handler/symbolic_link.rb +74 -0
  403. data/lib/omnizip/link_handler.rb +124 -0
  404. data/lib/omnizip/metadata/archive_metadata.rb +114 -0
  405. data/lib/omnizip/metadata/entry_metadata.rb +146 -0
  406. data/lib/omnizip/metadata/metadata_editor.rb +171 -0
  407. data/lib/omnizip/metadata/metadata_registry.rb +64 -0
  408. data/lib/omnizip/metadata/metadata_validator.rb +99 -0
  409. data/lib/omnizip/metadata.rb +57 -0
  410. data/lib/omnizip/models/.keep +0 -0
  411. data/lib/omnizip/models/algorithm_metadata.rb +73 -0
  412. data/lib/omnizip/models/compression_options.rb +71 -0
  413. data/lib/omnizip/models/conversion_options.rb +87 -0
  414. data/lib/omnizip/models/conversion_result.rb +135 -0
  415. data/lib/omnizip/models/eta_result.rb +46 -0
  416. data/lib/omnizip/models/extraction_rule.rb +115 -0
  417. data/lib/omnizip/models/filter_chain.rb +144 -0
  418. data/lib/omnizip/models/filter_config.rb +183 -0
  419. data/lib/omnizip/models/match_result.rb +124 -0
  420. data/lib/omnizip/models/optimization_suggestion.rb +91 -0
  421. data/lib/omnizip/models/parallel_options.rb +104 -0
  422. data/lib/omnizip/models/performance_result.rb +79 -0
  423. data/lib/omnizip/models/profile_report.rb +82 -0
  424. data/lib/omnizip/models/progress_options.rb +38 -0
  425. data/lib/omnizip/models/split_options.rb +116 -0
  426. data/lib/omnizip/optimization_registry.rb +81 -0
  427. data/lib/omnizip/parallel/job_queue.rb +209 -0
  428. data/lib/omnizip/parallel/job_scheduler.rb +203 -0
  429. data/lib/omnizip/parallel/parallel_compressor.rb +347 -0
  430. data/lib/omnizip/parallel/parallel_extractor.rb +329 -0
  431. data/lib/omnizip/parallel/worker_pool.rb +223 -0
  432. data/lib/omnizip/parallel.rb +149 -0
  433. data/lib/omnizip/parity/chunked_block_processor.rb +196 -0
  434. data/lib/omnizip/parity/galois16.rb +145 -0
  435. data/lib/omnizip/parity/models/creator_packet.rb +73 -0
  436. data/lib/omnizip/parity/models/file_description_packet.rb +133 -0
  437. data/lib/omnizip/parity/models/ifsc_packet.rb +123 -0
  438. data/lib/omnizip/parity/models/main_packet.rb +128 -0
  439. data/lib/omnizip/parity/models/packet.rb +156 -0
  440. data/lib/omnizip/parity/models/packet_registry.rb +109 -0
  441. data/lib/omnizip/parity/models/recovery_slice_packet.rb +78 -0
  442. data/lib/omnizip/parity/par2_creator.rb +531 -0
  443. data/lib/omnizip/parity/par2_repairer.rb +407 -0
  444. data/lib/omnizip/parity/par2_verifier.rb +364 -0
  445. data/lib/omnizip/parity/par2cmdline_algorithm.rb +110 -0
  446. data/lib/omnizip/parity/par2cmdline_coefficients.rb +78 -0
  447. data/lib/omnizip/parity/reed_solomon_decoder.rb +266 -0
  448. data/lib/omnizip/parity/reed_solomon_encoder.rb +111 -0
  449. data/lib/omnizip/parity/reed_solomon_matrix.rb +342 -0
  450. data/lib/omnizip/parity.rb +186 -0
  451. data/lib/omnizip/password/encryption_registry.rb +65 -0
  452. data/lib/omnizip/password/encryption_strategy.rb +96 -0
  453. data/lib/omnizip/password/password_validator.rb +129 -0
  454. data/lib/omnizip/password/winzip_aes_strategy.rb +192 -0
  455. data/lib/omnizip/password/zip_crypto_strategy.rb +141 -0
  456. data/lib/omnizip/password.rb +87 -0
  457. data/lib/omnizip/pipe/stream_compressor.rb +124 -0
  458. data/lib/omnizip/pipe/stream_decompressor.rb +174 -0
  459. data/lib/omnizip/pipe.rb +121 -0
  460. data/lib/omnizip/platform/ntfs_streams.rb +201 -0
  461. data/lib/omnizip/platform.rb +189 -0
  462. data/lib/omnizip/profile/archive_profile.rb +39 -0
  463. data/lib/omnizip/profile/balanced_profile.rb +33 -0
  464. data/lib/omnizip/profile/binary_profile.rb +36 -0
  465. data/lib/omnizip/profile/compression_profile.rb +158 -0
  466. data/lib/omnizip/profile/custom_profile.rb +157 -0
  467. data/lib/omnizip/profile/fast_profile.rb +33 -0
  468. data/lib/omnizip/profile/maximum_profile.rb +33 -0
  469. data/lib/omnizip/profile/profile_detector.rb +110 -0
  470. data/lib/omnizip/profile/profile_registry.rb +161 -0
  471. data/lib/omnizip/profile/text_profile.rb +36 -0
  472. data/lib/omnizip/profile.rb +190 -0
  473. data/lib/omnizip/profiler/memory_profiler.rb +66 -0
  474. data/lib/omnizip/profiler/method_profiler.rb +49 -0
  475. data/lib/omnizip/profiler/report_generator.rb +169 -0
  476. data/lib/omnizip/profiler.rb +204 -0
  477. data/lib/omnizip/progress/callback_reporter.rb +36 -0
  478. data/lib/omnizip/progress/console_reporter.rb +62 -0
  479. data/lib/omnizip/progress/log_reporter.rb +91 -0
  480. data/lib/omnizip/progress/operation_progress.rb +118 -0
  481. data/lib/omnizip/progress/progress_bar.rb +156 -0
  482. data/lib/omnizip/progress/progress_reporter.rb +40 -0
  483. data/lib/omnizip/progress/progress_tracker.rb +190 -0
  484. data/lib/omnizip/progress/silent_reporter.rb +24 -0
  485. data/lib/omnizip/progress.rb +127 -0
  486. data/lib/omnizip/rubyzip_compat.rb +63 -0
  487. data/lib/omnizip/temp/safe_extract.rb +168 -0
  488. data/lib/omnizip/temp/temp_file.rb +124 -0
  489. data/lib/omnizip/temp/temp_file_pool.rb +109 -0
  490. data/lib/omnizip/temp.rb +181 -0
  491. data/lib/omnizip/version.rb +5 -0
  492. data/lib/omnizip/zip/entry.rb +156 -0
  493. data/lib/omnizip/zip/file.rb +485 -0
  494. data/lib/omnizip/zip/input_stream.rb +273 -0
  495. data/lib/omnizip/zip/output_stream.rb +324 -0
  496. data/lib/omnizip.rb +156 -0
  497. data/readme-docs/advanced-features.adoc +515 -0
  498. data/readme-docs/api-usage.adoc +444 -0
  499. data/readme-docs/architecture.adoc +449 -0
  500. data/readme-docs/archive-formats.adoc +479 -0
  501. data/readme-docs/cli-usage.adoc +222 -0
  502. data/readme-docs/compression-algorithms.adoc +442 -0
  503. data/readme-docs/compression-profiles.adoc +247 -0
  504. data/readme-docs/encryption-checksums.adoc +328 -0
  505. data/readme-docs/format-converter.adoc +325 -0
  506. data/readme-docs/installation.adoc +228 -0
  507. data/readme-docs/par2-archives.adoc +608 -0
  508. data/readme-docs/performance-profiler.adoc +389 -0
  509. data/readme-docs/preprocessing-filters.adoc +280 -0
  510. data/xz-file-format-1.2.1.txt +1174 -0
  511. metadata +617 -0
@@ -0,0 +1,389 @@
1
+ = Performance Profiler
2
+ :toc:
3
+ :toclevels: 3
4
+
5
+ == Purpose
6
+
7
+ The Performance Profiler provides comprehensive profiling and optimization tools to identify bottlenecks and improve compression performance.
8
+
9
+ == Features
10
+
11
+ * **Method profiling** - Track execution time and call counts
12
+ * **Memory profiling** - Monitor allocation and retention
13
+ * **Hot path analysis** - Identify performance bottlenecks
14
+ * **Optimization suggestions** - AI-powered recommendations
15
+ * **Report generation** - Formatted profiling reports
16
+
17
+ == Basic Profiling
18
+
19
+ === Profile a Block of Code
20
+
21
+ [source,ruby]
22
+ ----
23
+ # Simple profiling
24
+ result = Omnizip::Profiler.profile do
25
+ Omnizip::Formats::SevenZip::Writer.new('archive.7z') do |zip|
26
+ zip.add_file('large_file.dat')
27
+ end
28
+ end
29
+
30
+ puts "Execution time: #{result.total_time}s"
31
+ puts "Memory allocated: #{result.memory_allocated} bytes"
32
+ ----
33
+
34
+ === Profile with Custom Name
35
+
36
+ [source,ruby]
37
+ ----
38
+ profiler = Omnizip::Profiler.new(profile_name: "compression_test")
39
+
40
+ profiler.profile("LZMA compression") do
41
+ algorithm = Omnizip::AlgorithmRegistry.get(:lzma).new(level: 9)
42
+ File.open('input.txt', 'rb') do |input|
43
+ File.open('output.lzma', 'wb') do |output|
44
+ algorithm.compress(input, output)
45
+ end
46
+ end
47
+ end
48
+
49
+ # Get profiling report
50
+ report = profiler.report
51
+ puts "Total execution time: #{report.total_execution_time}s"
52
+ ----
53
+
54
+ == Hot Path Analysis
55
+
56
+ === Identify Performance Bottlenecks
57
+
58
+ [source,ruby]
59
+ ----
60
+ profiler = Omnizip::Profiler.new
61
+
62
+ # Profile multiple operations
63
+ profiler.profile("read_file") { File.read('data.txt') }
64
+ profiler.profile("compress") { compress_data(data) }
65
+ profiler.profile("write_file") { File.write('output.dat', compressed) }
66
+
67
+ # Analyze hot paths (operations >10% of total time)
68
+ hot_paths = profiler.analyze_hot_paths(threshold_percentage: 10.0)
69
+
70
+ hot_paths.each do |operation|
71
+ puts "Hot path: #{operation.operation_name}"
72
+ puts " Time: #{operation.total_time}s"
73
+ puts " Percentage: #{(operation.total_time / profiler.report.total_execution_time * 100).round(1)}%"
74
+ end
75
+ ----
76
+
77
+ == Bottleneck Identification
78
+
79
+ === Find CPU and Memory Bottlenecks
80
+
81
+ [source,ruby]
82
+ ----
83
+ profiler = Omnizip::Profiler.new
84
+
85
+ # Profile compression pipeline
86
+ profiler.profile("BWT") { bwt_transform(data) }
87
+ profiler.profile("MTF") { mtf_encode(transformed) }
88
+ profiler.profile("Huffman") { huffman_encode(encoded) }
89
+
90
+ # Identify bottlenecks
91
+ bottlenecks = profiler.identify_bottlenecks
92
+
93
+ bottlenecks.each do |bottleneck|
94
+ case bottleneck[:type]
95
+ when :cpu
96
+ puts "CPU bottleneck: #{bottleneck[:operation]}"
97
+ puts " Time: #{bottleneck[:time]}s"
98
+ puts " Severity: #{bottleneck[:severity]}"
99
+ when :memory
100
+ puts "Memory bottleneck: #{bottleneck[:operation]}"
101
+ puts " Allocated: #{bottleneck[:allocated]} bytes"
102
+ when :gc
103
+ puts "GC pressure: #{bottleneck[:operation]}"
104
+ puts " GC pressure: #{bottleneck[:gc_pressure]}"
105
+ end
106
+ end
107
+ ----
108
+
109
+ == Optimization Suggestions
110
+
111
+ === Generate Improvement Recommendations
112
+
113
+ [source,ruby]
114
+ ----
115
+ profiler = Omnizip::Profiler.new
116
+
117
+ # Run profiling
118
+ 10.times do |i|
119
+ profiler.profile("iteration_#{i}") do
120
+ # Compression operations
121
+ end
122
+ end
123
+
124
+ # Generate suggestions
125
+ suggestions = profiler.generate_suggestions
126
+
127
+ suggestions.each do |suggestion|
128
+ puts "\n#{suggestion.title}"
129
+ puts " #{suggestion.description}"
130
+ puts " Severity: #{suggestion.severity}"
131
+ puts " Category: #{suggestion.category}"
132
+ puts " Estimated impact: #{(suggestion.impact_estimate * 100).round(1)}%"
133
+
134
+ if suggestion.recommendation
135
+ puts " Recommendation: #{suggestion.recommendation}"
136
+ end
137
+ end
138
+ ----
139
+
140
+ == Profiling Reports
141
+
142
+ === Generate Detailed Reports
143
+
144
+ [source,ruby]
145
+ ----
146
+ profiler = Omnizip::Profiler.new(profile_name: "BZip2 Compression")
147
+
148
+ # Profile operations
149
+ profiler.profile("initialization") { setup_compressor }
150
+ profiler.profile("bwt_transform") { bwt.transform(data) }
151
+ profiler.profile("mtf_encoding") { mtf.encode(transformed) }
152
+ profiler.profile("huffman_coding") { huffman.encode(encoded) }
153
+ profiler.profile("finalization") { write_output }
154
+
155
+ # Get detailed report
156
+ report = profiler.report
157
+
158
+ puts "=== Profiling Report: #{report.profile_name} ==="
159
+ puts "\nTotal execution time: #{report.total_execution_time}s"
160
+ puts "Total memory allocated: #{report.total_memory_allocated} bytes"
161
+ puts "Operations profiled: #{report.results.size}"
162
+
163
+ puts "\n=== Slowest Operations ==="
164
+ report.slowest_operations(limit: 3).each do |op|
165
+ percentage = (op.total_time / report.total_execution_time * 100).round(1)
166
+ puts " #{op.operation_name}: #{op.total_time}s (#{percentage}%)"
167
+ end
168
+
169
+ puts "\n=== Memory Intensive Operations ==="
170
+ report.memory_intensive_operations(limit: 3).each do |op|
171
+ mb = (op.memory_allocated.to_f / (1024 * 1024)).round(2)
172
+ puts " #{op.operation_name}: #{mb}MB"
173
+ end
174
+ ----
175
+
176
+ == Method Profiling
177
+
178
+ === Profile Specific Methods
179
+
180
+ [source,ruby]
181
+ ----
182
+ profiler = Omnizip::Profiler.new
183
+
184
+ # Register method profiler
185
+ method_profiler = Omnizip::Profiler::MethodProfiler.new
186
+ profiler.register_profiler(:method, method_profiler)
187
+
188
+ # Profile method calls
189
+ algorithm = Omnizip::AlgorithmRegistry.get(:lzma).new
190
+ profiler.profile_method(algorithm, :compress, input, output)
191
+
192
+ # Check results
193
+ results = profiler.report.results.find { |r| r.operation_name.include?('compress') }
194
+ puts "Method calls: #{results.call_count}"
195
+ puts "Average time per call: #{results.average_time}s"
196
+ ----
197
+
198
+ == Memory Profiling
199
+
200
+ === Track Memory Allocation
201
+
202
+ [source,ruby]
203
+ ----
204
+ profiler = Omnizip::Profiler.new
205
+
206
+ # Register memory profiler
207
+ memory_profiler = Omnizip::Profiler::MemoryProfiler.new
208
+ profiler.register_profiler(:memory, memory_profiler)
209
+
210
+ # Profile with memory tracking
211
+ profiler.profile("data_processing", profiler_type: :memory) do
212
+ data = Array.new(1_000_000) { rand }
213
+ data.map { |x| x * 2 }
214
+ end
215
+
216
+ # Check memory usage
217
+ report = profiler.report
218
+ puts "Memory allocated: #{report.total_memory_allocated} bytes"
219
+ puts "Memory retained: #{report.total_memory_retained} bytes"
220
+ ----
221
+
222
+ == Examples
223
+
224
+ === Example 1: Find Compression Bottleneck
225
+
226
+ [source,ruby]
227
+ ----
228
+ def profile_compression(file_path)
229
+ profiler = Omnizip::Profiler.new(profile_name: "Compression Analysis")
230
+
231
+ # Profile each stage
232
+ data = profiler.profile("read_input") do
233
+ File.read(file_path)
234
+ end
235
+
236
+ compressed = profiler.profile("compress") do
237
+ algorithm = Omnizip::AlgorithmRegistry.get(:bzip2).new(level: 9)
238
+ output = StringIO.new
239
+ algorithm.compress(StringIO.new(data), output)
240
+ output.string
241
+ end
242
+
243
+ profiler.profile("write_output") do
244
+ File.write("#{file_path}.bz2", compressed)
245
+ end
246
+
247
+ # Analyze results
248
+ report = profiler.report
249
+ puts "\n=== Compression Profile ==="
250
+ puts "Total time: #{report.total_execution_time}s"
251
+
252
+ report.results.each do |result|
253
+ percentage = (result.total_time / report.total_execution_time * 100).round(1)
254
+ puts "#{result.operation_name}: #{result.total_time}s (#{percentage}%)"
255
+ end
256
+
257
+ # Generate optimization suggestions
258
+ suggestions = profiler.generate_suggestions
259
+ if suggestions.any?
260
+ puts "\n=== Optimization Suggestions ==="
261
+ suggestions.first(3).each { |s| puts "- #{s.title}" }
262
+ end
263
+ end
264
+
265
+ profile_compression('large_file.txt')
266
+ ----
267
+
268
+ === Example 2: Compare Algorithm Performance
269
+
270
+ [source,ruby]
271
+ ----
272
+ def compare_algorithms(data, algorithms)
273
+ results = {}
274
+
275
+ algorithms.each do |algo_name|
276
+ profiler = Omnizip::Profiler.new(profile_name: algo_name.to_s)
277
+
278
+ compressed_size = profiler.profile("compression") do
279
+ algorithm = Omnizip::AlgorithmRegistry.get(algo_name).new(level: 6)
280
+ output = StringIO.new
281
+ algorithm.compress(StringIO.new(data), output)
282
+ output.size
283
+ end
284
+
285
+ results[algo_name] = {
286
+ time: profiler.report.total_execution_time,
287
+ size: compressed_size,
288
+ ratio: (1 - compressed_size.to_f / data.size) * 100
289
+ }
290
+ end
291
+
292
+ # Print comparison
293
+ puts "\n=== Algorithm Comparison ==="
294
+ puts "Original size: #{data.size} bytes\n\n"
295
+
296
+ results.sort_by { |_, v| v[:time] }.each do |algo, stats|
297
+ puts "#{algo}:"
298
+ puts " Time: #{stats[:time].round(3)}s"
299
+ puts " Size: #{stats[:size]} bytes"
300
+ puts " Ratio: #{stats[:ratio].round(1)}%"
301
+ end
302
+ end
303
+
304
+ data = File.read('test_file.dat')
305
+ compare_algorithms(data, [:deflate, :lzma, :bzip2, :zstd])
306
+ ----
307
+
308
+ === Example 3: Memory Leak Detection
309
+
310
+ [source,ruby]
311
+ ----
312
+ def detect_memory_leaks(iterations = 100)
313
+ profiler = Omnizip::Profiler.new(profile_name: "Memory Leak Detection")
314
+ memory_profiler = Omnizip::Profiler::MemoryProfiler.new
315
+ profiler.register_profiler(:memory, memory_profiler)
316
+
317
+ baseline_memory = nil
318
+
319
+ iterations.times do |i|
320
+ profiler.profile("iteration_#{i}", profiler_type: :memory) do
321
+ # Suspect operation
322
+ data = Array.new(10_000) { rand }
323
+ compress_data(data)
324
+ end
325
+
326
+ current_memory = profiler.report.total_memory_allocated
327
+
328
+ if i == 0
329
+ baseline_memory = current_memory
330
+ elsif i % 10 == 0
331
+ growth = current_memory - baseline_memory
332
+ growth_rate = (growth.to_f / baseline_memory * 100).round(2)
333
+
334
+ puts "Iteration #{i}: #{growth} bytes growth (#{growth_rate}%)"
335
+
336
+ if growth_rate > 50
337
+ puts "⚠️ Potential memory leak detected!"
338
+ break
339
+ end
340
+ end
341
+ end
342
+ end
343
+
344
+ detect_memory_leaks
345
+ ----
346
+
347
+ == Profiler Configuration
348
+
349
+ === Enable/Disable Profiling
350
+
351
+ [source,ruby]
352
+ ----
353
+ profiler = Omnizip::Profiler.new
354
+
355
+ # Disable profiling (no overhead)
356
+ profiler.disable!
357
+
358
+ # Operations run without profiling
359
+ profiler.profile("operation") { slow_operation } # Not profiled
360
+
361
+ # Re-enable profiling
362
+ profiler.enable!
363
+
364
+ # Operations are now profiled again
365
+ profiler.profile("operation") { slow_operation } # Profiled
366
+ ----
367
+
368
+ === Reset Profiler State
369
+
370
+ [source,ruby]
371
+ ----
372
+ profiler = Omnizip::Profiler.new
373
+
374
+ # Collect some data
375
+ profiler.profile("op1") { operation1 }
376
+ profiler.profile("op2") { operation2 }
377
+
378
+ # Reset profiler
379
+ profiler.reset!
380
+
381
+ # Start fresh profiling
382
+ profiler.profile("op3") { operation3 } # Previous data cleared
383
+ ----
384
+
385
+ == See Also
386
+
387
+ * link:../README.adoc#performance[Performance Analysis]
388
+ * link:compression-profiles.adoc[Compression Profiles]
389
+ * link:advanced-features.adoc[Advanced Features]
@@ -0,0 +1,280 @@
1
+ = Preprocessing Filters Guide
2
+ :toc:
3
+ :toclevels: 3
4
+
5
+ == Purpose
6
+
7
+ This document covers preprocessing filters that improve compression of specific data types, particularly executable files and multimedia data.
8
+
9
+ == Supported Filters
10
+
11
+ [cols="20,15,65",options="header"]
12
+ |===
13
+ |Filter |ID |Description
14
+
15
+ |BCJ x86 |0x04 |Branch conversion for x86 executables
16
+ |BCJ2 |0x0303011B |Advanced 4-stream x86 filter
17
+ |BCJ ARM |0x05 |ARM executable filter
18
+ |BCJ ARM64 |0x0A |ARM64/AArch64 filter
19
+ |BCJ PPC |0x07 |PowerPC filter
20
+ |BCJ IA-64 |0x06 |Itanium filter
21
+ |BCJ SPARC |0x08 |SPARC filter
22
+ |Delta |0x03 |Delta encoding (configurable distance)
23
+ |===
24
+
25
+ == BCJ (Branch-Call-Jump) Filters
26
+
27
+ === General
28
+
29
+ Branch-Call-Jump filters improve compression of executable files by converting relative addresses to absolute addresses. This transformation makes the data more compressible because branch instructions share common patterns.
30
+
31
+ === Supported Architectures
32
+
33
+ * **BCJ x86** - Intel/AMD x86 (32-bit and 64-bit)
34
+ * **BCJ ARM** - ARM 32-bit executables
35
+ * **BCJ ARM64** - ARM 64-bit (AArch64) executables
36
+ * **BCJ PPC** - PowerPC executables
37
+ * **BCJ SPARC** - SPARC executables
38
+ * **BCJ IA-64** - Intel Itanium executables
39
+
40
+ === How It Works
41
+
42
+ . Scans binary code for branch/call instructions
43
+ . Converts relative offsets to absolute addresses
44
+ . Makes patterns more regular and compressible
45
+ . Decoder reverses the transformation after decompression
46
+
47
+ === Usage
48
+
49
+ [source,ruby]
50
+ ----
51
+ # Use with filter pipeline
52
+ pipeline = Omnizip::FilterPipeline.new
53
+ pipeline.add_filter(:bcj_x86) # For x86 executables
54
+
55
+ # Apply before compression
56
+ filtered_data = pipeline.encode(executable_data)
57
+ algorithm.compress(StringIO.new(filtered_data), output)
58
+
59
+ # Different architectures
60
+ pipeline_arm = Omnizip::FilterPipeline.new
61
+ pipeline_arm.add_filter(:bcj_arm)
62
+
63
+ pipeline_arm64 = Omnizip::FilterPipeline.new
64
+ pipeline_arm64.add_filter(:bcj_arm64)
65
+ ----
66
+
67
+ === Typical Improvements
68
+
69
+ Using BCJ filters on executable files typically improves compression by:
70
+
71
+ * **10-30%** for x86/x64 executables
72
+ * **15-35%** for ARM executables
73
+ * **20-40%** for stripped executables (no debug symbols)
74
+
75
+ == BCJ2 Filter
76
+
77
+ === General
78
+
79
+ BCJ2 provides advanced 4-stream filtering for x86 code, achieving better compression than standard BCJ. It splits the filtered data into four separate streams that can be compressed independently.
80
+
81
+ === How It Works
82
+
83
+ . Analyzes x86 code structure
84
+ . Splits into 4 streams:
85
+ * Main stream (data and unprocessed bytes)
86
+ * Call stream (call instruction targets)
87
+ * Jump stream (jump instruction targets)
88
+ * Common stream (shared data)
89
+ . Each stream is compressed separately
90
+ . Decoder merges streams during decompression
91
+
92
+ === Usage
93
+
94
+ [source,ruby]
95
+ ----
96
+ filter = Omnizip::FilterRegistry.get(:bcj2).new
97
+ encoded_streams = filter.encode(x86_code)
98
+ # Returns 4 separate streams for optimal compression
99
+
100
+ # Compress each stream
101
+ encoded_streams.each_with_index do |stream, idx|
102
+ algorithm.compress(StringIO.new(stream), output_files[idx])
103
+ end
104
+ ----
105
+
106
+ === When to Use BCJ2
107
+
108
+ **Use BCJ2 when:**
109
+
110
+ * Maximum compression is needed
111
+ * Processing large x86 executables
112
+ * Archive size is critical (software distribution)
113
+
114
+ **Use standard BCJ when:**
115
+
116
+ * Simplicity is preferred
117
+ * Working with mixed content
118
+ * Speed is more important than maximum compression
119
+
120
+ === Typical Improvements
121
+
122
+ BCJ2 typically achieves:
123
+
124
+ * **5-10% better** compression than standard BCJ
125
+ * **15-40%** better than no filter
126
+ * Best results on large executables (> 1MB)
127
+
128
+ == Delta Filter
129
+
130
+ === General
131
+
132
+ Delta encoding is effective for multimedia files and time-series data where consecutive bytes have small differences. It transforms absolute values into differences between consecutive values.
133
+
134
+ === How It Works
135
+
136
+ . Computes differences between consecutive bytes
137
+ . Stores deltas instead of absolute values
138
+ . Makes nearly constant data highly compressible
139
+ . Configurable distance parameter for pattern matching
140
+
141
+ === Configuration
142
+
143
+ The `distance` parameter determines the stride:
144
+
145
+ * **distance=1:** Byte-by-byte differences (images, audio)
146
+ * **distance=2:** 16-bit word differences
147
+ * **distance=4:** 32-bit word differences
148
+ * **distance=N:** Custom stride
149
+
150
+ === Usage
151
+
152
+ [source,ruby]
153
+ ----
154
+ # Basic delta filter (distance=1)
155
+ filter = Omnizip::FilterRegistry.get(:delta).new(distance: 1)
156
+ filtered = filter.encode(audio_data)
157
+
158
+ # For 16-bit audio samples
159
+ filter_16bit = Omnizip::FilterRegistry.get(:delta).new(distance: 2)
160
+ filtered_audio = filter_16bit.encode(audio_samples)
161
+
162
+ # For database dumps with aligned records
163
+ filter_db = Omnizip::FilterRegistry.get(:delta).new(distance: 4)
164
+ filtered_db = filter_db.encode(database_dump)
165
+ ----
166
+
167
+ === Best Use Cases
168
+
169
+ **Excellent for:**
170
+
171
+ * Uncompressed audio (WAV, raw PCM)
172
+ * Time-series sensor data
173
+ * Database dumps with sequential indices
174
+ * Bitmap images (BMP, uncompressed TIFF)
175
+
176
+ **Not suitable for:**
177
+
178
+ * Already compressed data (MP3, JPEG)
179
+ * Random data
180
+ * Text files
181
+ * Encrypted data
182
+
183
+ === Typical Improvements
184
+
185
+ Delta filter can achieve:
186
+
187
+ * **50-80%** better compression on audio waveforms
188
+ * **30-60%** better on time-series data
189
+ * **20-40%** better on bitmap images
190
+ * **Minimal improvement** on text or compressed data
191
+
192
+ == Filter Chaining
193
+
194
+ === General
195
+
196
+ Multiple filters can be chained together for optimal compression. The order matters - apply filters in sequence that makes data progressively more compressible.
197
+
198
+ === Usage
199
+
200
+ [source,ruby]
201
+ ----
202
+ # Chain multiple filters
203
+ pipeline = Omnizip::FilterPipeline.new
204
+ pipeline.add_filter(:bcj_x86) # First: convert branches
205
+ pipeline.add_filter(:delta, distance: 1) # Then: delta encode
206
+
207
+ # Apply chain
208
+ filtered_data = pipeline.encode(executable_data)
209
+ algorithm.compress(StringIO.new(filtered_data), output)
210
+ ----
211
+
212
+ === Recommended Filter Chains
213
+
214
+ **For x86 executables:**
215
+ ```
216
+ BCJ x86 → LZMA2
217
+ ```
218
+
219
+ **For ARM executables:**
220
+ ```
221
+ BCJ ARM → LZMA2
222
+ ```
223
+
224
+ **For large x86 binaries (maximum compression):**
225
+ ```
226
+ BCJ2 (4 streams) → LZMA2 (each stream)
227
+ ```
228
+
229
+ **For uncompressed audio:**
230
+ ```
231
+ Delta (distance=2) → LZMA2 or BZip2
232
+ ```
233
+
234
+ **For bitmap images:**
235
+ ```
236
+ Delta (distance=1) → LZMA2
237
+ ```
238
+
239
+ == Performance Considerations
240
+
241
+ === Processing Overhead
242
+
243
+ Filters add minimal overhead:
244
+
245
+ * **BCJ filters:** < 5% processing time
246
+ * **Delta filter:** < 3% processing time
247
+ * **BCJ2:** 10-15% processing time (4-stream handling)
248
+
249
+ The compression gains far outweigh the processing cost.
250
+
251
+ === Memory Usage
252
+
253
+ * **BCJ filters:** Minimal (< 1MB)
254
+ * **Delta filter:** Minimal (< 1MB)
255
+ * **BCJ2:** Moderate (needs buffer for 4 streams)
256
+
257
+ == Integration with .7z Archives
258
+
259
+ Filters are automatically applied when using .7z archives:
260
+
261
+ [source,ruby]
262
+ ----
263
+ # Create .7z with BCJ filter
264
+ writer = Omnizip::Formats::SevenZip::Writer.new('programs.7z')
265
+ writer.add_filter(:bcj_x86) # Applies to all files
266
+ writer.add_file('program.exe')
267
+ writer.close
268
+
269
+ # Filter is automatically applied during extraction
270
+ reader = Omnizip::Formats::SevenZip::Reader.new('programs.7z')
271
+ reader.extract_all('output/') # BCJ filter automatically reversed
272
+ reader.close
273
+ ----
274
+
275
+ == See Also
276
+
277
+ * link:compression-algorithms.adoc[Compression Algorithms]
278
+ * link:api-usage.adoc[Library API Usage]
279
+ * link:archive-formats.adoc[Archive Formats]
280
+ * link:../README.adoc[Main README]