bio-velvet_underground 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (286) hide show
  1. checksums.yaml +7 -0
  2. data/.document +5 -0
  3. data/.gitmodules +3 -0
  4. data/.travis.yml +13 -0
  5. data/Gemfile +19 -0
  6. data/LICENSE.txt +20 -0
  7. data/README.md +53 -0
  8. data/Rakefile +51 -0
  9. data/VERSION +1 -0
  10. data/ext/bioruby.patch +60 -0
  11. data/ext/mkrf_conf.rb +50 -0
  12. data/ext/src/Makefile +125 -0
  13. data/ext/src/src/allocArray.c +305 -0
  14. data/ext/src/src/allocArray.h +86 -0
  15. data/ext/src/src/autoOpen.c +107 -0
  16. data/ext/src/src/autoOpen.h +18 -0
  17. data/ext/src/src/binarySequences.c +813 -0
  18. data/ext/src/src/binarySequences.h +125 -0
  19. data/ext/src/src/concatenatedGraph.c +233 -0
  20. data/ext/src/src/concatenatedGraph.h +30 -0
  21. data/ext/src/src/concatenatedPreGraph.c +262 -0
  22. data/ext/src/src/concatenatedPreGraph.h +29 -0
  23. data/ext/src/src/correctedGraph.c +2642 -0
  24. data/ext/src/src/correctedGraph.h +32 -0
  25. data/ext/src/src/dfib.c +509 -0
  26. data/ext/src/src/dfib.h +69 -0
  27. data/ext/src/src/dfibHeap.c +89 -0
  28. data/ext/src/src/dfibHeap.h +39 -0
  29. data/ext/src/src/dfibpriv.h +105 -0
  30. data/ext/src/src/fib.c +628 -0
  31. data/ext/src/src/fib.h +78 -0
  32. data/ext/src/src/fibHeap.c +79 -0
  33. data/ext/src/src/fibHeap.h +41 -0
  34. data/ext/src/src/fibpriv.h +110 -0
  35. data/ext/src/src/globals.h +153 -0
  36. data/ext/src/src/graph.c +3983 -0
  37. data/ext/src/src/graph.h +233 -0
  38. data/ext/src/src/graphReConstruction.c +1472 -0
  39. data/ext/src/src/graphReConstruction.h +30 -0
  40. data/ext/src/src/graphStats.c +2167 -0
  41. data/ext/src/src/graphStats.h +72 -0
  42. data/ext/src/src/kmer.c +652 -0
  43. data/ext/src/src/kmer.h +73 -0
  44. data/ext/src/src/kmerOccurenceTable.c +236 -0
  45. data/ext/src/src/kmerOccurenceTable.h +44 -0
  46. data/ext/src/src/kseq.h +223 -0
  47. data/ext/src/src/locallyCorrectedGraph.c +557 -0
  48. data/ext/src/src/locallyCorrectedGraph.h +40 -0
  49. data/ext/src/src/passageMarker.c +677 -0
  50. data/ext/src/src/passageMarker.h +137 -0
  51. data/ext/src/src/preGraph.c +1717 -0
  52. data/ext/src/src/preGraph.h +106 -0
  53. data/ext/src/src/preGraphConstruction.c +990 -0
  54. data/ext/src/src/preGraphConstruction.h +26 -0
  55. data/ext/src/src/readCoherentGraph.c +557 -0
  56. data/ext/src/src/readCoherentGraph.h +30 -0
  57. data/ext/src/src/readSet.c +1734 -0
  58. data/ext/src/src/readSet.h +67 -0
  59. data/ext/src/src/recycleBin.c +199 -0
  60. data/ext/src/src/recycleBin.h +58 -0
  61. data/ext/src/src/roadMap.c +342 -0
  62. data/ext/src/src/roadMap.h +65 -0
  63. data/ext/src/src/run.c +318 -0
  64. data/ext/src/src/run.h +52 -0
  65. data/ext/src/src/run2.c +712 -0
  66. data/ext/src/src/scaffold.c +1876 -0
  67. data/ext/src/src/scaffold.h +64 -0
  68. data/ext/src/src/shortReadPairs.c +1243 -0
  69. data/ext/src/src/shortReadPairs.h +32 -0
  70. data/ext/src/src/splay.c +259 -0
  71. data/ext/src/src/splay.h +43 -0
  72. data/ext/src/src/splayTable.c +1315 -0
  73. data/ext/src/src/splayTable.h +31 -0
  74. data/ext/src/src/tightString.c +362 -0
  75. data/ext/src/src/tightString.h +82 -0
  76. data/ext/src/src/utility.c +199 -0
  77. data/ext/src/src/utility.h +98 -0
  78. data/ext/src/third-party/zlib-1.2.3/ChangeLog +855 -0
  79. data/ext/src/third-party/zlib-1.2.3/FAQ +339 -0
  80. data/ext/src/third-party/zlib-1.2.3/INDEX +51 -0
  81. data/ext/src/third-party/zlib-1.2.3/Makefile +154 -0
  82. data/ext/src/third-party/zlib-1.2.3/Makefile.in +154 -0
  83. data/ext/src/third-party/zlib-1.2.3/README +125 -0
  84. data/ext/src/third-party/zlib-1.2.3/adler32.c +149 -0
  85. data/ext/src/third-party/zlib-1.2.3/algorithm.txt +209 -0
  86. data/ext/src/third-party/zlib-1.2.3/amiga/Makefile.pup +66 -0
  87. data/ext/src/third-party/zlib-1.2.3/amiga/Makefile.sas +65 -0
  88. data/ext/src/third-party/zlib-1.2.3/as400/bndsrc +132 -0
  89. data/ext/src/third-party/zlib-1.2.3/as400/compile.clp +123 -0
  90. data/ext/src/third-party/zlib-1.2.3/as400/readme.txt +111 -0
  91. data/ext/src/third-party/zlib-1.2.3/as400/zlib.inc +331 -0
  92. data/ext/src/third-party/zlib-1.2.3/compress.c +79 -0
  93. data/ext/src/third-party/zlib-1.2.3/configure +459 -0
  94. data/ext/src/third-party/zlib-1.2.3/contrib/README.contrib +71 -0
  95. data/ext/src/third-party/zlib-1.2.3/contrib/ada/buffer_demo.adb +106 -0
  96. data/ext/src/third-party/zlib-1.2.3/contrib/ada/mtest.adb +156 -0
  97. data/ext/src/third-party/zlib-1.2.3/contrib/ada/read.adb +156 -0
  98. data/ext/src/third-party/zlib-1.2.3/contrib/ada/readme.txt +65 -0
  99. data/ext/src/third-party/zlib-1.2.3/contrib/ada/test.adb +463 -0
  100. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib-streams.adb +225 -0
  101. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib-streams.ads +114 -0
  102. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib-thin.adb +141 -0
  103. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib-thin.ads +450 -0
  104. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib.adb +701 -0
  105. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib.ads +328 -0
  106. data/ext/src/third-party/zlib-1.2.3/contrib/ada/zlib.gpr +20 -0
  107. data/ext/src/third-party/zlib-1.2.3/contrib/asm586/README.586 +43 -0
  108. data/ext/src/third-party/zlib-1.2.3/contrib/asm586/match.S +364 -0
  109. data/ext/src/third-party/zlib-1.2.3/contrib/asm686/README.686 +34 -0
  110. data/ext/src/third-party/zlib-1.2.3/contrib/asm686/match.S +329 -0
  111. data/ext/src/third-party/zlib-1.2.3/contrib/blast/Makefile +8 -0
  112. data/ext/src/third-party/zlib-1.2.3/contrib/blast/README +4 -0
  113. data/ext/src/third-party/zlib-1.2.3/contrib/blast/blast.c +444 -0
  114. data/ext/src/third-party/zlib-1.2.3/contrib/blast/blast.h +71 -0
  115. data/ext/src/third-party/zlib-1.2.3/contrib/blast/test.pk +0 -0
  116. data/ext/src/third-party/zlib-1.2.3/contrib/blast/test.txt +1 -0
  117. data/ext/src/third-party/zlib-1.2.3/contrib/delphi/ZLib.pas +557 -0
  118. data/ext/src/third-party/zlib-1.2.3/contrib/delphi/ZLibConst.pas +11 -0
  119. data/ext/src/third-party/zlib-1.2.3/contrib/delphi/readme.txt +76 -0
  120. data/ext/src/third-party/zlib-1.2.3/contrib/delphi/zlibd32.mak +93 -0
  121. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib.build +33 -0
  122. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib.chm +0 -0
  123. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib.sln +21 -0
  124. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/AssemblyInfo.cs +58 -0
  125. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/ChecksumImpl.cs +202 -0
  126. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/CircularBuffer.cs +83 -0
  127. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/CodecBase.cs +198 -0
  128. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/Deflater.cs +106 -0
  129. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/DotZLib.cs +288 -0
  130. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/DotZLib.csproj +141 -0
  131. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/GZipStream.cs +301 -0
  132. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/Inflater.cs +105 -0
  133. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/DotZLib/UnitTests.cs +274 -0
  134. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/LICENSE_1_0.txt +23 -0
  135. data/ext/src/third-party/zlib-1.2.3/contrib/dotzlib/readme.txt +58 -0
  136. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/README +1 -0
  137. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/infback9.c +608 -0
  138. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/infback9.h +37 -0
  139. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/inffix9.h +107 -0
  140. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/inflate9.h +47 -0
  141. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/inftree9.c +323 -0
  142. data/ext/src/third-party/zlib-1.2.3/contrib/infback9/inftree9.h +55 -0
  143. data/ext/src/third-party/zlib-1.2.3/contrib/inflate86/inffas86.c +1157 -0
  144. data/ext/src/third-party/zlib-1.2.3/contrib/inflate86/inffast.S +1368 -0
  145. data/ext/src/third-party/zlib-1.2.3/contrib/iostream/test.cpp +24 -0
  146. data/ext/src/third-party/zlib-1.2.3/contrib/iostream/zfstream.cpp +329 -0
  147. data/ext/src/third-party/zlib-1.2.3/contrib/iostream/zfstream.h +128 -0
  148. data/ext/src/third-party/zlib-1.2.3/contrib/iostream2/zstream.h +307 -0
  149. data/ext/src/third-party/zlib-1.2.3/contrib/iostream2/zstream_test.cpp +25 -0
  150. data/ext/src/third-party/zlib-1.2.3/contrib/iostream3/README +35 -0
  151. data/ext/src/third-party/zlib-1.2.3/contrib/iostream3/TODO +17 -0
  152. data/ext/src/third-party/zlib-1.2.3/contrib/iostream3/test.cc +50 -0
  153. data/ext/src/third-party/zlib-1.2.3/contrib/iostream3/zfstream.cc +479 -0
  154. data/ext/src/third-party/zlib-1.2.3/contrib/iostream3/zfstream.h +466 -0
  155. data/ext/src/third-party/zlib-1.2.3/contrib/masm686/match.asm +413 -0
  156. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/bld_ml64.bat +2 -0
  157. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/gvmat64.asm +513 -0
  158. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/gvmat64.obj +0 -0
  159. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/inffas8664.c +186 -0
  160. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/inffasx64.asm +392 -0
  161. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/inffasx64.obj +0 -0
  162. data/ext/src/third-party/zlib-1.2.3/contrib/masmx64/readme.txt +28 -0
  163. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/bld_ml32.bat +2 -0
  164. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/gvmat32.asm +972 -0
  165. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/gvmat32.obj +0 -0
  166. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/gvmat32c.c +62 -0
  167. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/inffas32.asm +1083 -0
  168. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/inffas32.obj +0 -0
  169. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/mkasm.bat +3 -0
  170. data/ext/src/third-party/zlib-1.2.3/contrib/masmx86/readme.txt +21 -0
  171. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/ChangeLogUnzip +67 -0
  172. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/Makefile +25 -0
  173. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/crypt.h +132 -0
  174. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/ioapi.c +177 -0
  175. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/ioapi.h +75 -0
  176. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/iowin32.c +270 -0
  177. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/iowin32.h +21 -0
  178. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/miniunz.c +585 -0
  179. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/minizip.c +420 -0
  180. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/mztools.c +281 -0
  181. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/mztools.h +31 -0
  182. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/unzip.c +1598 -0
  183. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/unzip.h +354 -0
  184. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/zip.c +1219 -0
  185. data/ext/src/third-party/zlib-1.2.3/contrib/minizip/zip.h +235 -0
  186. data/ext/src/third-party/zlib-1.2.3/contrib/pascal/example.pas +599 -0
  187. data/ext/src/third-party/zlib-1.2.3/contrib/pascal/readme.txt +76 -0
  188. data/ext/src/third-party/zlib-1.2.3/contrib/pascal/zlibd32.mak +93 -0
  189. data/ext/src/third-party/zlib-1.2.3/contrib/pascal/zlibpas.pas +236 -0
  190. data/ext/src/third-party/zlib-1.2.3/contrib/puff/Makefile +8 -0
  191. data/ext/src/third-party/zlib-1.2.3/contrib/puff/README +63 -0
  192. data/ext/src/third-party/zlib-1.2.3/contrib/puff/puff.c +837 -0
  193. data/ext/src/third-party/zlib-1.2.3/contrib/puff/puff.h +31 -0
  194. data/ext/src/third-party/zlib-1.2.3/contrib/puff/zeros.raw +0 -0
  195. data/ext/src/third-party/zlib-1.2.3/contrib/testzlib/testzlib.c +275 -0
  196. data/ext/src/third-party/zlib-1.2.3/contrib/testzlib/testzlib.txt +10 -0
  197. data/ext/src/third-party/zlib-1.2.3/contrib/untgz/Makefile +14 -0
  198. data/ext/src/third-party/zlib-1.2.3/contrib/untgz/Makefile.msc +17 -0
  199. data/ext/src/third-party/zlib-1.2.3/contrib/untgz/untgz.c +674 -0
  200. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/readme.txt +73 -0
  201. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/miniunz.vcproj +126 -0
  202. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/minizip.vcproj +126 -0
  203. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/testzlib.vcproj +126 -0
  204. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/zlib.rc +32 -0
  205. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/zlibstat.vcproj +246 -0
  206. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/zlibvc.def +92 -0
  207. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/zlibvc.sln +78 -0
  208. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc7/zlibvc.vcproj +445 -0
  209. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/miniunz.vcproj +566 -0
  210. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/minizip.vcproj +563 -0
  211. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/testzlib.vcproj +948 -0
  212. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/testzlibdll.vcproj +567 -0
  213. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/zlib.rc +32 -0
  214. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/zlibstat.vcproj +870 -0
  215. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/zlibvc.def +92 -0
  216. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/zlibvc.sln +144 -0
  217. data/ext/src/third-party/zlib-1.2.3/contrib/vstudio/vc8/zlibvc.vcproj +1219 -0
  218. data/ext/src/third-party/zlib-1.2.3/crc32.c +423 -0
  219. data/ext/src/third-party/zlib-1.2.3/crc32.h +441 -0
  220. data/ext/src/third-party/zlib-1.2.3/deflate.c +1736 -0
  221. data/ext/src/third-party/zlib-1.2.3/deflate.h +331 -0
  222. data/ext/src/third-party/zlib-1.2.3/example.c +565 -0
  223. data/ext/src/third-party/zlib-1.2.3/examples/README.examples +42 -0
  224. data/ext/src/third-party/zlib-1.2.3/examples/fitblk.c +233 -0
  225. data/ext/src/third-party/zlib-1.2.3/examples/gun.c +693 -0
  226. data/ext/src/third-party/zlib-1.2.3/examples/gzappend.c +500 -0
  227. data/ext/src/third-party/zlib-1.2.3/examples/gzjoin.c +448 -0
  228. data/ext/src/third-party/zlib-1.2.3/examples/gzlog.c +413 -0
  229. data/ext/src/third-party/zlib-1.2.3/examples/gzlog.h +58 -0
  230. data/ext/src/third-party/zlib-1.2.3/examples/zlib_how.html +523 -0
  231. data/ext/src/third-party/zlib-1.2.3/examples/zpipe.c +191 -0
  232. data/ext/src/third-party/zlib-1.2.3/examples/zran.c +404 -0
  233. data/ext/src/third-party/zlib-1.2.3/gzio.c +1026 -0
  234. data/ext/src/third-party/zlib-1.2.3/infback.c +623 -0
  235. data/ext/src/third-party/zlib-1.2.3/inffast.c +318 -0
  236. data/ext/src/third-party/zlib-1.2.3/inffast.h +11 -0
  237. data/ext/src/third-party/zlib-1.2.3/inffixed.h +94 -0
  238. data/ext/src/third-party/zlib-1.2.3/inflate.c +1368 -0
  239. data/ext/src/third-party/zlib-1.2.3/inflate.h +115 -0
  240. data/ext/src/third-party/zlib-1.2.3/inftrees.c +329 -0
  241. data/ext/src/third-party/zlib-1.2.3/inftrees.h +55 -0
  242. data/ext/src/third-party/zlib-1.2.3/make_vms.com +461 -0
  243. data/ext/src/third-party/zlib-1.2.3/minigzip.c +322 -0
  244. data/ext/src/third-party/zlib-1.2.3/msdos/Makefile.bor +109 -0
  245. data/ext/src/third-party/zlib-1.2.3/msdos/Makefile.dj2 +104 -0
  246. data/ext/src/third-party/zlib-1.2.3/msdos/Makefile.emx +69 -0
  247. data/ext/src/third-party/zlib-1.2.3/msdos/Makefile.msc +106 -0
  248. data/ext/src/third-party/zlib-1.2.3/msdos/Makefile.tc +94 -0
  249. data/ext/src/third-party/zlib-1.2.3/old/Makefile.riscos +151 -0
  250. data/ext/src/third-party/zlib-1.2.3/old/README +3 -0
  251. data/ext/src/third-party/zlib-1.2.3/old/descrip.mms +48 -0
  252. data/ext/src/third-party/zlib-1.2.3/old/os2/Makefile.os2 +136 -0
  253. data/ext/src/third-party/zlib-1.2.3/old/os2/zlib.def +51 -0
  254. data/ext/src/third-party/zlib-1.2.3/old/visual-basic.txt +160 -0
  255. data/ext/src/third-party/zlib-1.2.3/old/zlib.html +971 -0
  256. data/ext/src/third-party/zlib-1.2.3/projects/README.projects +41 -0
  257. data/ext/src/third-party/zlib-1.2.3/projects/visualc6/README.txt +73 -0
  258. data/ext/src/third-party/zlib-1.2.3/projects/visualc6/example.dsp +278 -0
  259. data/ext/src/third-party/zlib-1.2.3/projects/visualc6/minigzip.dsp +278 -0
  260. data/ext/src/third-party/zlib-1.2.3/projects/visualc6/zlib.dsp +609 -0
  261. data/ext/src/third-party/zlib-1.2.3/projects/visualc6/zlib.dsw +59 -0
  262. data/ext/src/third-party/zlib-1.2.3/qnx/package.qpg +141 -0
  263. data/ext/src/third-party/zlib-1.2.3/trees.c +1219 -0
  264. data/ext/src/third-party/zlib-1.2.3/trees.h +128 -0
  265. data/ext/src/third-party/zlib-1.2.3/uncompr.c +61 -0
  266. data/ext/src/third-party/zlib-1.2.3/win32/DLL_FAQ.txt +397 -0
  267. data/ext/src/third-party/zlib-1.2.3/win32/Makefile.bor +107 -0
  268. data/ext/src/third-party/zlib-1.2.3/win32/Makefile.emx +69 -0
  269. data/ext/src/third-party/zlib-1.2.3/win32/Makefile.gcc +141 -0
  270. data/ext/src/third-party/zlib-1.2.3/win32/Makefile.msc +126 -0
  271. data/ext/src/third-party/zlib-1.2.3/win32/VisualC.txt +3 -0
  272. data/ext/src/third-party/zlib-1.2.3/win32/zlib.def +60 -0
  273. data/ext/src/third-party/zlib-1.2.3/win32/zlib1.rc +39 -0
  274. data/ext/src/third-party/zlib-1.2.3/zconf.h +332 -0
  275. data/ext/src/third-party/zlib-1.2.3/zconf.in.h +332 -0
  276. data/ext/src/third-party/zlib-1.2.3/zlib.3 +159 -0
  277. data/ext/src/third-party/zlib-1.2.3/zlib.h +1357 -0
  278. data/ext/src/third-party/zlib-1.2.3/zutil.c +318 -0
  279. data/ext/src/third-party/zlib-1.2.3/zutil.h +269 -0
  280. data/lib/bio-velvet_underground.rb +12 -0
  281. data/lib/bio-velvet_underground/external/VERSION +1 -0
  282. data/lib/bio-velvet_underground/velvet_underground.rb +72 -0
  283. data/spec/binary_sequence_store_spec.rb +27 -0
  284. data/spec/data/1/CnyUnifiedSeq +0 -0
  285. data/spec/spec_helper.rb +31 -0
  286. metadata +456 -0
@@ -0,0 +1,154 @@
1
+ # Makefile for zlib
2
+ # Copyright (C) 1995-2005 Jean-loup Gailly.
3
+ # For conditions of distribution and use, see copyright notice in zlib.h
4
+
5
+ # To compile and test, type:
6
+ # ./configure; make test
7
+ # The call of configure is optional if you don't have special requirements
8
+ # If you wish to build zlib as a shared library, use: ./configure -s
9
+
10
+ # To use the asm code, type:
11
+ # cp contrib/asm?86/match.S ./match.S
12
+ # make LOC=-DASMV OBJA=match.o
13
+
14
+ # To install /usr/local/lib/libz.* and /usr/local/include/zlib.h, type:
15
+ # make install
16
+ # To install in $HOME instead of /usr/local, use:
17
+ # make install prefix=$HOME
18
+
19
+ CC=cc
20
+
21
+ CFLAGS=-O
22
+ #CFLAGS=-O -DMAX_WBITS=14 -DMAX_MEM_LEVEL=7
23
+ #CFLAGS=-g -DDEBUG
24
+ #CFLAGS=-O3 -Wall -Wwrite-strings -Wpointer-arith -Wconversion \
25
+ # -Wstrict-prototypes -Wmissing-prototypes
26
+
27
+ LDFLAGS=libz.a
28
+ LDSHARED=$(CC)
29
+ CPP=$(CC) -E
30
+
31
+ LIBS=libz.a
32
+ SHAREDLIB=libz.so
33
+ SHAREDLIBV=libz.so.1.2.3
34
+ SHAREDLIBM=libz.so.1
35
+
36
+ AR=ar rc
37
+ RANLIB=ranlib
38
+ TAR=tar
39
+ SHELL=/bin/sh
40
+ EXE=
41
+
42
+ prefix = /usr/local
43
+ exec_prefix = ${prefix}
44
+ libdir = ${exec_prefix}/lib
45
+ includedir = ${prefix}/include
46
+ mandir = ${prefix}/share/man
47
+ man3dir = ${mandir}/man3
48
+
49
+ OBJS = adler32.o compress.o crc32.o gzio.o uncompr.o deflate.o trees.o \
50
+ zutil.o inflate.o infback.o inftrees.o inffast.o
51
+
52
+ OBJA =
53
+ # to use the asm code: make OBJA=match.o
54
+
55
+ TEST_OBJS = example.o minigzip.o
56
+
57
+ all: example$(EXE) minigzip$(EXE)
58
+
59
+ check: test
60
+ test: all
61
+ @LD_LIBRARY_PATH=.:$(LD_LIBRARY_PATH) ; export LD_LIBRARY_PATH; \
62
+ echo hello world | ./minigzip | ./minigzip -d || \
63
+ echo ' *** minigzip test FAILED ***' ; \
64
+ if ./example; then \
65
+ echo ' *** zlib test OK ***'; \
66
+ else \
67
+ echo ' *** zlib test FAILED ***'; \
68
+ fi
69
+
70
+ libz.a: $(OBJS) $(OBJA)
71
+ $(AR) $@ $(OBJS) $(OBJA)
72
+ -@ ($(RANLIB) $@ || true) >/dev/null 2>&1
73
+
74
+ match.o: match.S
75
+ $(CPP) match.S > _match.s
76
+ $(CC) -c _match.s
77
+ mv _match.o match.o
78
+ rm -f _match.s
79
+
80
+ $(SHAREDLIBV): $(OBJS)
81
+ $(LDSHARED) -o $@ $(OBJS)
82
+ rm -f $(SHAREDLIB) $(SHAREDLIBM)
83
+ ln -s $@ $(SHAREDLIB)
84
+ ln -s $@ $(SHAREDLIBM)
85
+
86
+ example$(EXE): example.o $(LIBS)
87
+ $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
88
+
89
+ minigzip$(EXE): minigzip.o $(LIBS)
90
+ $(CC) $(CFLAGS) -o $@ minigzip.o $(LDFLAGS)
91
+
92
+ install: $(LIBS)
93
+ -@if [ ! -d $(exec_prefix) ]; then mkdir -p $(exec_prefix); fi
94
+ -@if [ ! -d $(includedir) ]; then mkdir -p $(includedir); fi
95
+ -@if [ ! -d $(libdir) ]; then mkdir -p $(libdir); fi
96
+ -@if [ ! -d $(man3dir) ]; then mkdir -p $(man3dir); fi
97
+ cp zlib.h zconf.h $(includedir)
98
+ chmod 644 $(includedir)/zlib.h $(includedir)/zconf.h
99
+ cp $(LIBS) $(libdir)
100
+ cd $(libdir); chmod 755 $(LIBS)
101
+ -@(cd $(libdir); $(RANLIB) libz.a || true) >/dev/null 2>&1
102
+ cd $(libdir); if test -f $(SHAREDLIBV); then \
103
+ rm -f $(SHAREDLIB) $(SHAREDLIBM); \
104
+ ln -s $(SHAREDLIBV) $(SHAREDLIB); \
105
+ ln -s $(SHAREDLIBV) $(SHAREDLIBM); \
106
+ (ldconfig || true) >/dev/null 2>&1; \
107
+ fi
108
+ cp zlib.3 $(man3dir)
109
+ chmod 644 $(man3dir)/zlib.3
110
+ # The ranlib in install is needed on NeXTSTEP which checks file times
111
+ # ldconfig is for Linux
112
+
113
+ uninstall:
114
+ cd $(includedir); \
115
+ cd $(libdir); rm -f libz.a; \
116
+ if test -f $(SHAREDLIBV); then \
117
+ rm -f $(SHAREDLIBV) $(SHAREDLIB) $(SHAREDLIBM); \
118
+ fi
119
+ cd $(man3dir); rm -f zlib.3
120
+
121
+ mostlyclean: clean
122
+ clean:
123
+ rm -f *.o *~ example$(EXE) minigzip$(EXE) \
124
+ libz.* foo.gz so_locations \
125
+ _match.s maketree contrib/infback9/*.o
126
+
127
+ maintainer-clean: distclean
128
+ distclean: clean
129
+ cp -p Makefile.in Makefile
130
+ cp -p zconf.in.h zconf.h
131
+ rm -f .DS_Store
132
+
133
+ tags:
134
+ etags *.[ch]
135
+
136
+ depend:
137
+ makedepend -- $(CFLAGS) -- *.[ch]
138
+
139
+ # DO NOT DELETE THIS LINE -- make depend depends on it.
140
+
141
+ adler32.o: zlib.h zconf.h
142
+ compress.o: zlib.h zconf.h
143
+ crc32.o: crc32.h zlib.h zconf.h
144
+ deflate.o: deflate.h zutil.h zlib.h zconf.h
145
+ example.o: zlib.h zconf.h
146
+ gzio.o: zutil.h zlib.h zconf.h
147
+ inffast.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h
148
+ inflate.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h
149
+ infback.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h
150
+ inftrees.o: zutil.h zlib.h zconf.h inftrees.h
151
+ minigzip.o: zlib.h zconf.h
152
+ trees.o: deflate.h zutil.h zlib.h zconf.h trees.h
153
+ uncompr.o: zlib.h zconf.h
154
+ zutil.o: zutil.h zlib.h zconf.h
@@ -0,0 +1,125 @@
1
+ ZLIB DATA COMPRESSION LIBRARY
2
+
3
+ zlib 1.2.3 is a general purpose data compression library. All the code is
4
+ thread safe. The data format used by the zlib library is described by RFCs
5
+ (Request for Comments) 1950 to 1952 in the files
6
+ http://www.ietf.org/rfc/rfc1950.txt (zlib format), rfc1951.txt (deflate format)
7
+ and rfc1952.txt (gzip format). These documents are also available in other
8
+ formats from ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html
9
+
10
+ All functions of the compression library are documented in the file zlib.h
11
+ (volunteer to write man pages welcome, contact zlib@gzip.org). A usage example
12
+ of the library is given in the file example.c which also tests that the library
13
+ is working correctly. Another example is given in the file minigzip.c. The
14
+ compression library itself is composed of all source files except example.c and
15
+ minigzip.c.
16
+
17
+ To compile all files and run the test program, follow the instructions given at
18
+ the top of Makefile. In short "make test; make install" should work for most
19
+ machines. For Unix: "./configure; make test; make install". For MSDOS, use one
20
+ of the special makefiles such as Makefile.msc. For VMS, use make_vms.com.
21
+
22
+ Questions about zlib should be sent to <zlib@gzip.org>, or to Gilles Vollant
23
+ <info@winimage.com> for the Windows DLL version. The zlib home page is
24
+ http://www.zlib.org or http://www.gzip.org/zlib/ Before reporting a problem,
25
+ please check this site to verify that you have the latest version of zlib;
26
+ otherwise get the latest version and check whether the problem still exists or
27
+ not.
28
+
29
+ PLEASE read the zlib FAQ http://www.gzip.org/zlib/zlib_faq.html before asking
30
+ for help.
31
+
32
+ Mark Nelson <markn@ieee.org> wrote an article about zlib for the Jan. 1997
33
+ issue of Dr. Dobb's Journal; a copy of the article is available in
34
+ http://dogma.net/markn/articles/zlibtool/zlibtool.htm
35
+
36
+ The changes made in version 1.2.3 are documented in the file ChangeLog.
37
+
38
+ Unsupported third party contributions are provided in directory "contrib".
39
+
40
+ A Java implementation of zlib is available in the Java Development Kit
41
+ http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
42
+ See the zlib home page http://www.zlib.org for details.
43
+
44
+ A Perl interface to zlib written by Paul Marquess <pmqs@cpan.org> is in the
45
+ CPAN (Comprehensive Perl Archive Network) sites
46
+ http://www.cpan.org/modules/by-module/Compress/
47
+
48
+ A Python interface to zlib written by A.M. Kuchling <amk@amk.ca> is
49
+ available in Python 1.5 and later versions, see
50
+ http://www.python.org/doc/lib/module-zlib.html
51
+
52
+ A zlib binding for TCL written by Andreas Kupries <a.kupries@westend.com> is
53
+ availlable at http://www.oche.de/~akupries/soft/trf/trf_zip.html
54
+
55
+ An experimental package to read and write files in .zip format, written on top
56
+ of zlib by Gilles Vollant <info@winimage.com>, is available in the
57
+ contrib/minizip directory of zlib.
58
+
59
+
60
+ Notes for some targets:
61
+
62
+ - For Windows DLL versions, please see win32/DLL_FAQ.txt
63
+
64
+ - For 64-bit Irix, deflate.c must be compiled without any optimization. With
65
+ -O, one libpng test fails. The test works in 32 bit mode (with the -n32
66
+ compiler flag). The compiler bug has been reported to SGI.
67
+
68
+ - zlib doesn't work with gcc 2.6.3 on a DEC 3000/300LX under OSF/1 2.1 it works
69
+ when compiled with cc.
70
+
71
+ - On Digital Unix 4.0D (formely OSF/1) on AlphaServer, the cc option -std1 is
72
+ necessary to get gzprintf working correctly. This is done by configure.
73
+
74
+ - zlib doesn't work on HP-UX 9.05 with some versions of /bin/cc. It works with
75
+ other compilers. Use "make test" to check your compiler.
76
+
77
+ - gzdopen is not supported on RISCOS, BEOS and by some Mac compilers.
78
+
79
+ - For PalmOs, see http://palmzlib.sourceforge.net/
80
+
81
+ - When building a shared, i.e. dynamic library on Mac OS X, the library must be
82
+ installed before testing (do "make install" before "make test"), since the
83
+ library location is specified in the library.
84
+
85
+
86
+ Acknowledgments:
87
+
88
+ The deflate format used by zlib was defined by Phil Katz. The deflate
89
+ and zlib specifications were written by L. Peter Deutsch. Thanks to all the
90
+ people who reported problems and suggested various improvements in zlib;
91
+ they are too numerous to cite here.
92
+
93
+ Copyright notice:
94
+
95
+ (C) 1995-2004 Jean-loup Gailly and Mark Adler
96
+
97
+ This software is provided 'as-is', without any express or implied
98
+ warranty. In no event will the authors be held liable for any damages
99
+ arising from the use of this software.
100
+
101
+ Permission is granted to anyone to use this software for any purpose,
102
+ including commercial applications, and to alter it and redistribute it
103
+ freely, subject to the following restrictions:
104
+
105
+ 1. The origin of this software must not be misrepresented; you must not
106
+ claim that you wrote the original software. If you use this software
107
+ in a product, an acknowledgment in the product documentation would be
108
+ appreciated but is not required.
109
+ 2. Altered source versions must be plainly marked as such, and must not be
110
+ misrepresented as being the original software.
111
+ 3. This notice may not be removed or altered from any source distribution.
112
+
113
+ Jean-loup Gailly Mark Adler
114
+ jloup@gzip.org madler@alumni.caltech.edu
115
+
116
+ If you use the zlib library in a product, we would appreciate *not*
117
+ receiving lengthy legal documents to sign. The sources are provided
118
+ for free but without warranty of any kind. The library has been
119
+ entirely written by Jean-loup Gailly and Mark Adler; it does not
120
+ include third-party code.
121
+
122
+ If you redistribute modified sources, we would appreciate that you include
123
+ in the file ChangeLog history information documenting your changes. Please
124
+ read the FAQ for more information on the distribution of modified source
125
+ versions.
@@ -0,0 +1,149 @@
1
+ /* adler32.c -- compute the Adler-32 checksum of a data stream
2
+ * Copyright (C) 1995-2004 Mark Adler
3
+ * For conditions of distribution and use, see copyright notice in zlib.h
4
+ */
5
+
6
+ /* @(#) $Id$ */
7
+
8
+ #define ZLIB_INTERNAL
9
+ #include "zlib.h"
10
+
11
+ #define BASE 65521UL /* largest prime smaller than 65536 */
12
+ #define NMAX 5552
13
+ /* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
14
+
15
+ #define DO1(buf,i) {adler += (buf)[i]; sum2 += adler;}
16
+ #define DO2(buf,i) DO1(buf,i); DO1(buf,i+1);
17
+ #define DO4(buf,i) DO2(buf,i); DO2(buf,i+2);
18
+ #define DO8(buf,i) DO4(buf,i); DO4(buf,i+4);
19
+ #define DO16(buf) DO8(buf,0); DO8(buf,8);
20
+
21
+ /* use NO_DIVIDE if your processor does not do division in hardware */
22
+ #ifdef NO_DIVIDE
23
+ # define MOD(a) \
24
+ do { \
25
+ if (a >= (BASE << 16)) a -= (BASE << 16); \
26
+ if (a >= (BASE << 15)) a -= (BASE << 15); \
27
+ if (a >= (BASE << 14)) a -= (BASE << 14); \
28
+ if (a >= (BASE << 13)) a -= (BASE << 13); \
29
+ if (a >= (BASE << 12)) a -= (BASE << 12); \
30
+ if (a >= (BASE << 11)) a -= (BASE << 11); \
31
+ if (a >= (BASE << 10)) a -= (BASE << 10); \
32
+ if (a >= (BASE << 9)) a -= (BASE << 9); \
33
+ if (a >= (BASE << 8)) a -= (BASE << 8); \
34
+ if (a >= (BASE << 7)) a -= (BASE << 7); \
35
+ if (a >= (BASE << 6)) a -= (BASE << 6); \
36
+ if (a >= (BASE << 5)) a -= (BASE << 5); \
37
+ if (a >= (BASE << 4)) a -= (BASE << 4); \
38
+ if (a >= (BASE << 3)) a -= (BASE << 3); \
39
+ if (a >= (BASE << 2)) a -= (BASE << 2); \
40
+ if (a >= (BASE << 1)) a -= (BASE << 1); \
41
+ if (a >= BASE) a -= BASE; \
42
+ } while (0)
43
+ # define MOD4(a) \
44
+ do { \
45
+ if (a >= (BASE << 4)) a -= (BASE << 4); \
46
+ if (a >= (BASE << 3)) a -= (BASE << 3); \
47
+ if (a >= (BASE << 2)) a -= (BASE << 2); \
48
+ if (a >= (BASE << 1)) a -= (BASE << 1); \
49
+ if (a >= BASE) a -= BASE; \
50
+ } while (0)
51
+ #else
52
+ # define MOD(a) a %= BASE
53
+ # define MOD4(a) a %= BASE
54
+ #endif
55
+
56
+ /* ========================================================================= */
57
+ uLong ZEXPORT adler32(adler, buf, len)
58
+ uLong adler;
59
+ const Bytef *buf;
60
+ uInt len;
61
+ {
62
+ unsigned long sum2;
63
+ unsigned n;
64
+
65
+ /* split Adler-32 into component sums */
66
+ sum2 = (adler >> 16) & 0xffff;
67
+ adler &= 0xffff;
68
+
69
+ /* in case user likes doing a byte at a time, keep it fast */
70
+ if (len == 1) {
71
+ adler += buf[0];
72
+ if (adler >= BASE)
73
+ adler -= BASE;
74
+ sum2 += adler;
75
+ if (sum2 >= BASE)
76
+ sum2 -= BASE;
77
+ return adler | (sum2 << 16);
78
+ }
79
+
80
+ /* initial Adler-32 value (deferred check for len == 1 speed) */
81
+ if (buf == Z_NULL)
82
+ return 1L;
83
+
84
+ /* in case short lengths are provided, keep it somewhat fast */
85
+ if (len < 16) {
86
+ while (len--) {
87
+ adler += *buf++;
88
+ sum2 += adler;
89
+ }
90
+ if (adler >= BASE)
91
+ adler -= BASE;
92
+ MOD4(sum2); /* only added so many BASE's */
93
+ return adler | (sum2 << 16);
94
+ }
95
+
96
+ /* do length NMAX blocks -- requires just one modulo operation */
97
+ while (len >= NMAX) {
98
+ len -= NMAX;
99
+ n = NMAX / 16; /* NMAX is divisible by 16 */
100
+ do {
101
+ DO16(buf); /* 16 sums unrolled */
102
+ buf += 16;
103
+ } while (--n);
104
+ MOD(adler);
105
+ MOD(sum2);
106
+ }
107
+
108
+ /* do remaining bytes (less than NMAX, still just one modulo) */
109
+ if (len) { /* avoid modulos if none remaining */
110
+ while (len >= 16) {
111
+ len -= 16;
112
+ DO16(buf);
113
+ buf += 16;
114
+ }
115
+ while (len--) {
116
+ adler += *buf++;
117
+ sum2 += adler;
118
+ }
119
+ MOD(adler);
120
+ MOD(sum2);
121
+ }
122
+
123
+ /* return recombined sums */
124
+ return adler | (sum2 << 16);
125
+ }
126
+
127
+ /* ========================================================================= */
128
+ uLong ZEXPORT adler32_combine(adler1, adler2, len2)
129
+ uLong adler1;
130
+ uLong adler2;
131
+ z_off_t len2;
132
+ {
133
+ unsigned long sum1;
134
+ unsigned long sum2;
135
+ unsigned rem;
136
+
137
+ /* the derivation of this formula is left as an exercise for the reader */
138
+ rem = (unsigned)(len2 % BASE);
139
+ sum1 = adler1 & 0xffff;
140
+ sum2 = rem * sum1;
141
+ MOD(sum2);
142
+ sum1 += (adler2 & 0xffff) + BASE - 1;
143
+ sum2 += ((adler1 >> 16) & 0xffff) + ((adler2 >> 16) & 0xffff) + BASE - rem;
144
+ if (sum1 > BASE) sum1 -= BASE;
145
+ if (sum1 > BASE) sum1 -= BASE;
146
+ if (sum2 > (BASE << 1)) sum2 -= (BASE << 1);
147
+ if (sum2 > BASE) sum2 -= BASE;
148
+ return sum1 | (sum2 << 16);
149
+ }
@@ -0,0 +1,209 @@
1
+ 1. Compression algorithm (deflate)
2
+
3
+ The deflation algorithm used by gzip (also zip and zlib) is a variation of
4
+ LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
5
+ the input data. The second occurrence of a string is replaced by a
6
+ pointer to the previous string, in the form of a pair (distance,
7
+ length). Distances are limited to 32K bytes, and lengths are limited
8
+ to 258 bytes. When a string does not occur anywhere in the previous
9
+ 32K bytes, it is emitted as a sequence of literal bytes. (In this
10
+ description, `string' must be taken as an arbitrary sequence of bytes,
11
+ and is not restricted to printable characters.)
12
+
13
+ Literals or match lengths are compressed with one Huffman tree, and
14
+ match distances are compressed with another tree. The trees are stored
15
+ in a compact form at the start of each block. The blocks can have any
16
+ size (except that the compressed data for one block must fit in
17
+ available memory). A block is terminated when deflate() determines that
18
+ it would be useful to start another block with fresh trees. (This is
19
+ somewhat similar to the behavior of LZW-based _compress_.)
20
+
21
+ Duplicated strings are found using a hash table. All input strings of
22
+ length 3 are inserted in the hash table. A hash index is computed for
23
+ the next 3 bytes. If the hash chain for this index is not empty, all
24
+ strings in the chain are compared with the current input string, and
25
+ the longest match is selected.
26
+
27
+ The hash chains are searched starting with the most recent strings, to
28
+ favor small distances and thus take advantage of the Huffman encoding.
29
+ The hash chains are singly linked. There are no deletions from the
30
+ hash chains, the algorithm simply discards matches that are too old.
31
+
32
+ To avoid a worst-case situation, very long hash chains are arbitrarily
33
+ truncated at a certain length, determined by a runtime option (level
34
+ parameter of deflateInit). So deflate() does not always find the longest
35
+ possible match but generally finds a match which is long enough.
36
+
37
+ deflate() also defers the selection of matches with a lazy evaluation
38
+ mechanism. After a match of length N has been found, deflate() searches for
39
+ a longer match at the next input byte. If a longer match is found, the
40
+ previous match is truncated to a length of one (thus producing a single
41
+ literal byte) and the process of lazy evaluation begins again. Otherwise,
42
+ the original match is kept, and the next match search is attempted only N
43
+ steps later.
44
+
45
+ The lazy match evaluation is also subject to a runtime parameter. If
46
+ the current match is long enough, deflate() reduces the search for a longer
47
+ match, thus speeding up the whole process. If compression ratio is more
48
+ important than speed, deflate() attempts a complete second search even if
49
+ the first match is already long enough.
50
+
51
+ The lazy match evaluation is not performed for the fastest compression
52
+ modes (level parameter 1 to 3). For these fast modes, new strings
53
+ are inserted in the hash table only when no match was found, or
54
+ when the match is not too long. This degrades the compression ratio
55
+ but saves time since there are both fewer insertions and fewer searches.
56
+
57
+
58
+ 2. Decompression algorithm (inflate)
59
+
60
+ 2.1 Introduction
61
+
62
+ The key question is how to represent a Huffman code (or any prefix code) so
63
+ that you can decode fast. The most important characteristic is that shorter
64
+ codes are much more common than longer codes, so pay attention to decoding the
65
+ short codes fast, and let the long codes take longer to decode.
66
+
67
+ inflate() sets up a first level table that covers some number of bits of
68
+ input less than the length of longest code. It gets that many bits from the
69
+ stream, and looks it up in the table. The table will tell if the next
70
+ code is that many bits or less and how many, and if it is, it will tell
71
+ the value, else it will point to the next level table for which inflate()
72
+ grabs more bits and tries to decode a longer code.
73
+
74
+ How many bits to make the first lookup is a tradeoff between the time it
75
+ takes to decode and the time it takes to build the table. If building the
76
+ table took no time (and if you had infinite memory), then there would only
77
+ be a first level table to cover all the way to the longest code. However,
78
+ building the table ends up taking a lot longer for more bits since short
79
+ codes are replicated many times in such a table. What inflate() does is
80
+ simply to make the number of bits in the first table a variable, and then
81
+ to set that variable for the maximum speed.
82
+
83
+ For inflate, which has 286 possible codes for the literal/length tree, the size
84
+ of the first table is nine bits. Also the distance trees have 30 possible
85
+ values, and the size of the first table is six bits. Note that for each of
86
+ those cases, the table ended up one bit longer than the ``average'' code
87
+ length, i.e. the code length of an approximately flat code which would be a
88
+ little more than eight bits for 286 symbols and a little less than five bits
89
+ for 30 symbols.
90
+
91
+
92
+ 2.2 More details on the inflate table lookup
93
+
94
+ Ok, you want to know what this cleverly obfuscated inflate tree actually
95
+ looks like. You are correct that it's not a Huffman tree. It is simply a
96
+ lookup table for the first, let's say, nine bits of a Huffman symbol. The
97
+ symbol could be as short as one bit or as long as 15 bits. If a particular
98
+ symbol is shorter than nine bits, then that symbol's translation is duplicated
99
+ in all those entries that start with that symbol's bits. For example, if the
100
+ symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101
+ symbol is nine bits long, it appears in the table once.
102
+
103
+ If the symbol is longer than nine bits, then that entry in the table points
104
+ to another similar table for the remaining bits. Again, there are duplicated
105
+ entries as needed. The idea is that most of the time the symbol will be short
106
+ and there will only be one table look up. (That's whole idea behind data
107
+ compression in the first place.) For the less frequent long symbols, there
108
+ will be two lookups. If you had a compression method with really long
109
+ symbols, you could have as many levels of lookups as is efficient. For
110
+ inflate, two is enough.
111
+
112
+ So a table entry either points to another table (in which case nine bits in
113
+ the above example are gobbled), or it contains the translation for the symbol
114
+ and the number of bits to gobble. Then you start again with the next
115
+ ungobbled bit.
116
+
117
+ You may wonder: why not just have one lookup table for how ever many bits the
118
+ longest symbol is? The reason is that if you do that, you end up spending
119
+ more time filling in duplicate symbol entries than you do actually decoding.
120
+ At least for deflate's output that generates new trees every several 10's of
121
+ kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122
+ would take too long if you're only decoding several thousand symbols. At the
123
+ other extreme, you could make a new table for every bit in the code. In fact,
124
+ that's essentially a Huffman tree. But then you spend two much time
125
+ traversing the tree while decoding, even for short symbols.
126
+
127
+ So the number of bits for the first lookup table is a trade of the time to
128
+ fill out the table vs. the time spent looking at the second level and above of
129
+ the table.
130
+
131
+ Here is an example, scaled down:
132
+
133
+ The code being decoded, with 10 symbols, from 1 to 6 bits long:
134
+
135
+ A: 0
136
+ B: 10
137
+ C: 1100
138
+ D: 11010
139
+ E: 11011
140
+ F: 11100
141
+ G: 11101
142
+ H: 11110
143
+ I: 111110
144
+ J: 111111
145
+
146
+ Let's make the first table three bits long (eight entries):
147
+
148
+ 000: A,1
149
+ 001: A,1
150
+ 010: A,1
151
+ 011: A,1
152
+ 100: B,2
153
+ 101: B,2
154
+ 110: -> table X (gobble 3 bits)
155
+ 111: -> table Y (gobble 3 bits)
156
+
157
+ Each entry is what the bits decode as and how many bits that is, i.e. how
158
+ many bits to gobble. Or the entry points to another table, with the number of
159
+ bits to gobble implicit in the size of the table.
160
+
161
+ Table X is two bits long since the longest code starting with 110 is five bits
162
+ long:
163
+
164
+ 00: C,1
165
+ 01: C,1
166
+ 10: D,2
167
+ 11: E,2
168
+
169
+ Table Y is three bits long since the longest code starting with 111 is six
170
+ bits long:
171
+
172
+ 000: F,2
173
+ 001: F,2
174
+ 010: G,2
175
+ 011: G,2
176
+ 100: H,2
177
+ 101: H,2
178
+ 110: I,3
179
+ 111: J,3
180
+
181
+ So what we have here are three tables with a total of 20 entries that had to
182
+ be constructed. That's compared to 64 entries for a single table. Or
183
+ compared to 16 entries for a Huffman tree (six two entry tables and one four
184
+ entry table). Assuming that the code ideally represents the probability of
185
+ the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186
+ to one lookup for the single table, or 1.66 lookups per symbol for the
187
+ Huffman tree.
188
+
189
+ There, I think that gives you a picture of what's going on. For inflate, the
190
+ meaning of a particular symbol is often more than just a letter. It can be a
191
+ byte (a "literal"), or it can be either a length or a distance which
192
+ indicates a base value and a number of bits to fetch after the code that is
193
+ added to the base value. Or it might be the special end-of-block code. The
194
+ data structures created in inftrees.c try to encode all that information
195
+ compactly in the tables.
196
+
197
+
198
+ Jean-loup Gailly Mark Adler
199
+ jloup@gzip.org madler@alumni.caltech.edu
200
+
201
+
202
+ References:
203
+
204
+ [LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
205
+ Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
206
+ pp. 337-343.
207
+
208
+ ``DEFLATE Compressed Data Format Specification'' available in
209
+ http://www.ietf.org/rfc/rfc1951.txt