minimap2 0.2.22.0 → 0.2.24.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (101) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +60 -76
  3. data/ext/Rakefile +55 -0
  4. data/ext/cmappy/cmappy.c +129 -0
  5. data/ext/cmappy/cmappy.h +44 -0
  6. data/ext/minimap2/FAQ.md +46 -0
  7. data/ext/minimap2/LICENSE.txt +24 -0
  8. data/ext/minimap2/MANIFEST.in +10 -0
  9. data/ext/minimap2/Makefile +132 -0
  10. data/ext/minimap2/Makefile.simde +97 -0
  11. data/ext/minimap2/NEWS.md +821 -0
  12. data/ext/minimap2/README.md +403 -0
  13. data/ext/minimap2/align.c +1020 -0
  14. data/ext/minimap2/bseq.c +169 -0
  15. data/ext/minimap2/bseq.h +64 -0
  16. data/ext/minimap2/code_of_conduct.md +30 -0
  17. data/ext/minimap2/cookbook.md +243 -0
  18. data/ext/minimap2/esterr.c +64 -0
  19. data/ext/minimap2/example.c +63 -0
  20. data/ext/minimap2/format.c +559 -0
  21. data/ext/minimap2/hit.c +466 -0
  22. data/ext/minimap2/index.c +775 -0
  23. data/ext/minimap2/kalloc.c +205 -0
  24. data/ext/minimap2/kalloc.h +76 -0
  25. data/ext/minimap2/kdq.h +132 -0
  26. data/ext/minimap2/ketopt.h +120 -0
  27. data/ext/minimap2/khash.h +615 -0
  28. data/ext/minimap2/krmq.h +474 -0
  29. data/ext/minimap2/kseq.h +256 -0
  30. data/ext/minimap2/ksort.h +153 -0
  31. data/ext/minimap2/ksw2.h +184 -0
  32. data/ext/minimap2/ksw2_dispatch.c +96 -0
  33. data/ext/minimap2/ksw2_extd2_sse.c +402 -0
  34. data/ext/minimap2/ksw2_exts2_sse.c +416 -0
  35. data/ext/minimap2/ksw2_extz2_sse.c +313 -0
  36. data/ext/minimap2/ksw2_ll_sse.c +152 -0
  37. data/ext/minimap2/kthread.c +159 -0
  38. data/ext/minimap2/kthread.h +15 -0
  39. data/ext/minimap2/kvec.h +105 -0
  40. data/ext/minimap2/lchain.c +369 -0
  41. data/ext/minimap2/main.c +459 -0
  42. data/ext/minimap2/map.c +714 -0
  43. data/ext/minimap2/minimap.h +410 -0
  44. data/ext/minimap2/minimap2.1 +725 -0
  45. data/ext/minimap2/misc/README.md +179 -0
  46. data/ext/minimap2/misc/mmphase.js +335 -0
  47. data/ext/minimap2/misc/paftools.js +3149 -0
  48. data/ext/minimap2/misc.c +162 -0
  49. data/ext/minimap2/mmpriv.h +132 -0
  50. data/ext/minimap2/options.c +234 -0
  51. data/ext/minimap2/pe.c +177 -0
  52. data/ext/minimap2/python/README.rst +196 -0
  53. data/ext/minimap2/python/cmappy.h +152 -0
  54. data/ext/minimap2/python/cmappy.pxd +153 -0
  55. data/ext/minimap2/python/mappy.pyx +273 -0
  56. data/ext/minimap2/python/minimap2.py +39 -0
  57. data/ext/minimap2/sdust.c +213 -0
  58. data/ext/minimap2/sdust.h +25 -0
  59. data/ext/minimap2/seed.c +131 -0
  60. data/ext/minimap2/setup.py +55 -0
  61. data/ext/minimap2/sketch.c +143 -0
  62. data/ext/minimap2/splitidx.c +84 -0
  63. data/ext/minimap2/sse2neon/emmintrin.h +1689 -0
  64. data/ext/minimap2/test/MT-human.fa +278 -0
  65. data/ext/minimap2/test/MT-orang.fa +276 -0
  66. data/ext/minimap2/test/q-inv.fa +4 -0
  67. data/ext/minimap2/test/q2.fa +2 -0
  68. data/ext/minimap2/test/t-inv.fa +127 -0
  69. data/ext/minimap2/test/t2.fa +2 -0
  70. data/ext/minimap2/tex/Makefile +21 -0
  71. data/ext/minimap2/tex/bioinfo.cls +930 -0
  72. data/ext/minimap2/tex/blasr-mc.eval +17 -0
  73. data/ext/minimap2/tex/bowtie2-s3.sam.eval +28 -0
  74. data/ext/minimap2/tex/bwa-s3.sam.eval +52 -0
  75. data/ext/minimap2/tex/bwa.eval +55 -0
  76. data/ext/minimap2/tex/eval2roc.pl +33 -0
  77. data/ext/minimap2/tex/graphmap.eval +4 -0
  78. data/ext/minimap2/tex/hs38-simu.sh +10 -0
  79. data/ext/minimap2/tex/minialign.eval +49 -0
  80. data/ext/minimap2/tex/minimap2.bib +460 -0
  81. data/ext/minimap2/tex/minimap2.tex +724 -0
  82. data/ext/minimap2/tex/mm2-s3.sam.eval +62 -0
  83. data/ext/minimap2/tex/mm2-update.tex +240 -0
  84. data/ext/minimap2/tex/mm2.approx.eval +12 -0
  85. data/ext/minimap2/tex/mm2.eval +13 -0
  86. data/ext/minimap2/tex/natbib.bst +1288 -0
  87. data/ext/minimap2/tex/natbib.sty +803 -0
  88. data/ext/minimap2/tex/ngmlr.eval +38 -0
  89. data/ext/minimap2/tex/roc.gp +60 -0
  90. data/ext/minimap2/tex/snap-s3.sam.eval +62 -0
  91. data/ext/minimap2.patch +19 -0
  92. data/lib/minimap2/aligner.rb +4 -4
  93. data/lib/minimap2/alignment.rb +11 -11
  94. data/lib/minimap2/ffi/constants.rb +20 -16
  95. data/lib/minimap2/ffi/functions.rb +5 -0
  96. data/lib/minimap2/ffi.rb +4 -5
  97. data/lib/minimap2/version.rb +2 -2
  98. data/lib/minimap2.rb +51 -15
  99. metadata +97 -79
  100. data/lib/minimap2/ffi_helper.rb +0 -53
  101. data/vendor/libminimap2.so +0 -0
@@ -0,0 +1,821 @@
1
+ Release 2.24-r1122 (26 December 2021)
2
+ -------------------------------------
3
+
4
+ This release improves alignment around long poorly aligned regions. Older
5
+ minimap2 may chain through such regions in rare cases which may result in
6
+ missing alignments later. The issue has become worse since the the change of
7
+ the chaining algorithm in v2.19. v2.23 implements an incomplete remedy. This
8
+ release provides a better solution with a X-drop-like heuristic and by enabling
9
+ two-bandwidth chaining in the assembly mode.
10
+
11
+ (2.24: 26 December 2021, r1122)
12
+
13
+
14
+
15
+ Release 2.23-r1111 (18 November 2021)
16
+ -------------------------------------
17
+
18
+ Notable changes:
19
+
20
+ * Bugfix: fixed missing alignments around long inversions (#806 and #816).
21
+ This bug affected v2.19 through v2.22.
22
+
23
+ * Improvement: avoid extremely long mapping time for pathologic reads with
24
+ highly repeated k-mers not in the reference (#771). Use --q-occ-frac=0
25
+ to disable the new heuristic.
26
+
27
+ * Change: use --cap-kalloc=1g by default.
28
+
29
+ (2.23: 18 November 2021, r1111)
30
+
31
+
32
+
33
+ Release 2.22-r1101 (7 August 2021)
34
+ ----------------------------------
35
+
36
+ When choosing the best alignment, this release uses logarithm gap penalty and
37
+ query-specific mismatch penalty. It improves the sensitivity to long INDELs in
38
+ repetitive regions.
39
+
40
+ Other notable changes:
41
+
42
+ * Bugfix: fixed an indirect memory leak that may waste a large amount of
43
+ memory given highly repetitive reference such as a 16S RNA database (#749).
44
+ All versions of minimap2 have this issue.
45
+
46
+ * New feature: added --cap-kalloc to reduce the peak memory. This option is
47
+ not enabled by default but may become the default in future releases.
48
+
49
+ Known issue:
50
+
51
+ * Minimap2 may take a long time to map a read (#771). So far it is not clear
52
+ if this happens to v2.18 and earlier versions.
53
+
54
+ (2.22: 7 August 2021, r1101)
55
+
56
+
57
+
58
+ Release 2.21-r1071 (6 July 2021)
59
+ --------------------------------
60
+
61
+ This release fixed a regression in short-read mapping introduced in v2.19
62
+ (#776). It also fixed invalid comparisons of uninitialized variables, though
63
+ these are harmless (#752). Long-read alignment should be identical to v2.20.
64
+
65
+ (2.21: 6 July 2021, r1071)
66
+
67
+
68
+
69
+ Release 2.20-r1061 (27 May 2021)
70
+ --------------------------------
71
+
72
+ This release fixed a bug in the Python module and improves the command-line
73
+ compatibiliity with v2.18. In v2.19, if `-r` is specified with an `asm*` preset,
74
+ users would get alignments more fragmented than v2.18. This could be an issue
75
+ for existing pipelines specifying `-r`. This release resolves this issue.
76
+
77
+ (2.20: 27 May 2021, r1061)
78
+
79
+
80
+
81
+ Release 2.19-r1057 (26 May 2021)
82
+ --------------------------------
83
+
84
+ This release includes a few important improvements backported from unimap:
85
+
86
+ * Improvement: more contiguous alignment through long INDELs. This is enabled
87
+ by the minigraph chaining algorithm. All `asm*` presets now use the new
88
+ algorithm. They can find INDELs up to 100kb and may be faster for
89
+ chromosome-long contigs. The default mode and `map*` presets use this
90
+ algorithm to replace the long-join heuristic.
91
+
92
+ * Improvement: better alignment in highly repetitive regions by rescuing
93
+ high-occurrence seeds. If the distance between two adjacent seeds is too
94
+ large, attempt to choose a fraction of high-occurrence seeds in-between.
95
+ Minimap2 now produces fewer clippings and alignment break points in long
96
+ satellite regions.
97
+
98
+ * Improvement: allow to specify an interval of k-mer occurrences with `-U`.
99
+ For repeat-rich genomes, the automatic k-mer occurrence threshold determined
100
+ by `-f` may be too large and makes alignment impractically slow. The new
101
+ option protects against such cases. Enabled for `asm*` and `map-hifi`.
102
+
103
+ * New feature: added the `map-hifi` preset for maping PacBio High-Fidelity
104
+ (HiFi) reads.
105
+
106
+ * Change to the default: apply `--cap-sw-mem=100m` for genomic alignment.
107
+
108
+ * Bugfix: minimap2 could not generate an index file with `-xsr` (#734).
109
+
110
+ This release represents the most signficant algorithmic change since v2.1 in
111
+ 2017. With features backported from unimap, minimap2 now has similar power to
112
+ unimap for contig alignment. Unimap will remain an experimental project and is
113
+ no longer recommended over minimap2. Sorry for reverting the recommendation in
114
+ short time.
115
+
116
+ (2.19: 26 May 2021, r1057)
117
+
118
+
119
+
120
+ Release 2.18-r1015 (9 April 2021)
121
+ ---------------------------------
122
+
123
+ This release fixes multiple rare bugs in minimap2 and adds additional
124
+ functionality to paftools.js.
125
+
126
+ Changes to minimap2:
127
+
128
+ * Bugfix: a rare segfault caused by an off-by-one error (#489)
129
+
130
+ * Bugfix: minimap2 segfaulted due to an uninitilized variable (#622 and #625).
131
+
132
+ * Bugfix: minimap2 parsed spaces as field separators in BED (#721). This led
133
+ to issues when the BED name column contains spaces.
134
+
135
+ * Bugfix: minimap2 `--split-prefix` did not work with long reference names
136
+ (#394).
137
+
138
+ * Bugfix: option `--junc-bonus` didn't work (#513)
139
+
140
+ * Bugfix: minimap2 didn't return 1 on I/O errors (#532)
141
+
142
+ * Bugfix: the `de:f` tag (sequence divergence) could be negative if there were
143
+ ambiguous bases
144
+
145
+ * Bugfix: fixed two undefined behaviors caused by calling memcpy() on
146
+ zero-length blocks (#443)
147
+
148
+ * Bugfix: there were duplicated SAM @SQ lines if option `--split-prefix` is in
149
+ use (#400 and #527)
150
+
151
+ * Bugfix: option -K had to be smaller than 2 billion (#491). This was caused
152
+ by a 32-bit integer overflow.
153
+
154
+ * Improvement: optionally compile against SIMDe (#597). Minimap2 should work
155
+ with IBM POWER CPUs, though this has not been tested. To compile with SIMDe,
156
+ please use `make -f Makefile.simde`.
157
+
158
+ * Improvement: more informative error message for I/O errors (#454) and for
159
+ FASTQ parsing errors (#510)
160
+
161
+ * Improvement: abort given malformatted RG line (#541)
162
+
163
+ * Improvement: better formula to estimate the `dv:f` tag (approximate sequence
164
+ divergence). See DOI:10.1101/2021.01.15.426881.
165
+
166
+ * New feature: added the `--mask-len` option to fine control the removal of
167
+ redundant hits (#659). The default behavior is unchanged.
168
+
169
+ Changes to mappy:
170
+
171
+ * Bugfix: mappy caused segmentation fault if the reference index is not
172
+ present (#413).
173
+
174
+ * Bugfix: fixed a memory leak via 238b6bb3
175
+
176
+ * Change: always require Cython to compile the mappy module (#723). Older
177
+ mappy packages at PyPI bundled the C source code generated by Cython such
178
+ that end users did not need to install Cython to compile mappy. However, as
179
+ Python 3.9 is breaking backward compatibility, older mappy does not work
180
+ with Python 3.9 anymore. We have to add this Cython dependency as a
181
+ workaround.
182
+
183
+ Changes to paftools.js:
184
+
185
+ * Bugfix: the "part10-" line from asmgene was wrong (#581)
186
+
187
+ * Improvement: compatibility with GTF files from GenBank (#422)
188
+
189
+ * New feature: asmgene also checks missing multi-copy genes
190
+
191
+ * New feature: added the misjoin command to evaluate large-scale misjoins and
192
+ megabase-long inversions.
193
+
194
+ Although given the many bug fixes and minor improvements, the core algorithm
195
+ stays the same. This version of minimap2 produces nearly identical alignments
196
+ to v2.17 except very rare corner cases.
197
+
198
+ Now unimap is recommended over minimap2 for aligning long contigs against a
199
+ reference genome. It often takes less wall-clock time and is much more
200
+ sensitive to long insertions and deletions.
201
+
202
+ (2.18: 9 April 2021, r1015)
203
+
204
+
205
+
206
+ Release 2.17-r941 (4 May 2019)
207
+ ------------------------------
208
+
209
+ Changes since the last release:
210
+
211
+ * Fixed flawed CIGARs like `5I6D7I` (#392).
212
+
213
+ * Bugfix: TLEN should be 0 when either end is unmapped (#373 and #365).
214
+
215
+ * Bugfix: mappy is unable to write index (#372).
216
+
217
+ * Added option `--junc-bed` to load known gene annotations in the BED12
218
+ format. Minimap2 prefers annotated junctions over novel junctions (#197 and
219
+ #348). GTF can be converted to BED12 with `paftools.js gff2bed`.
220
+
221
+ * Added option `--sam-hit-only` to suppress unmapped hits in SAM (#377).
222
+
223
+ * Added preset `splice:hq` for high-quality CCS or mRNA sequences. It applies
224
+ better scoring and improves the sensitivity to small exons. This preset may
225
+ introduce false small introns, but the overall accuracy should be higher.
226
+
227
+ This version produces nearly identical alignments to v2.16, except for CIGARs
228
+ affected by the bug mentioned above.
229
+
230
+ (2.17: 5 May 2019, r941)
231
+
232
+
233
+
234
+ Release 2.16-r922 (28 February 2019)
235
+ ------------------------------------
236
+
237
+ This release is 50% faster for mapping ultra-long nanopore reads at comparable
238
+ accuracy. For short-read mapping, long-read overlapping and ordinary long-read
239
+ mapping, the performance and accuracy remain similar. This speedup is achieved
240
+ with a new heuristic to limit the number of chaining iterations (#324). Users
241
+ can disable the heuristic by increasing a new option `--max-chain-iter` to a
242
+ huge number.
243
+
244
+ Other changes to minimap2:
245
+
246
+ * Implemented option `--paf-no-hit` to output unmapped query sequences in PAF.
247
+ The strand and reference name columns are both `*` at an unmapped line. The
248
+ hidden option is available in earlier minimap2 but had a different 2-column
249
+ output format instead of PAF.
250
+
251
+ * Fixed a bug that leads to wrongly calculated `de` tags when ambiguous bases
252
+ are involved (#309). This bug only affects v2.15.
253
+
254
+ * Fixed a bug when parsing command-line option `--splice` (#344). This bug was
255
+ introduced in v2.13.
256
+
257
+ * Fixed two division-by-zero cases (#326). They don't affect final alignments
258
+ because the results of the divisions are not used in both case.
259
+
260
+ * Added an option `-o` to output alignments to a specified file. It is still
261
+ recommended to use UNIX pipes for on-the-fly conversion or compression.
262
+
263
+ * Output a new `rl` tag to give the length of query regions harboring
264
+ repetitive seeds.
265
+
266
+ Changes to paftool.js:
267
+
268
+ * Added a new option to convert the MD tag to the long form of the cs tag.
269
+
270
+ Changes to mappy:
271
+
272
+ * Added the `mappy.Aligner.seq_names` method to return sequence names (#312).
273
+
274
+ For NA12878 ultra-long reads, this release changes the alignments of <0.1% of
275
+ reads in comparison to v2.15. All these reads have highly fragmented alignments
276
+ and are likely to be problematic anyway. For shorter or well aligned reads,
277
+ this release should produce mostly identical alignments to v2.15.
278
+
279
+ (2.16: 28 February 2019, r922)
280
+
281
+
282
+
283
+ Release 2.15-r905 (10 January 2019)
284
+ -----------------------------------
285
+
286
+ Changes to minimap2:
287
+
288
+ * Fixed a rare segmentation fault when option -H is in use (#307). This may
289
+ happen when there are very long homopolymers towards the 5'-end of a read.
290
+
291
+ * Fixed wrong CIGARs when option --eqx is used (#266).
292
+
293
+ * Fixed a typo in the base encoding table (#264). This should have no
294
+ practical effect.
295
+
296
+ * Fixed a typo in the example code (#265).
297
+
298
+ * Improved the C++ compatibility by removing "register" (#261). However,
299
+ minimap2 still can't be compiled in the pedantic C++ mode (#306).
300
+
301
+ * Output a new "de" tag for gap-compressed sequence divergence.
302
+
303
+ Changes to paftools.js:
304
+
305
+ * Added "asmgene" to evaluate the completeness of an assembly by measuring the
306
+ uniquely mapped single-copy genes. This command learns the idea of BUSCO.
307
+
308
+ * Added "vcfpair" to call a phased VCF from phased whole-genome assemblies. An
309
+ earlier version of this script is used to produce the ground truth for the
310
+ syndip benchmark [PMID:30013044].
311
+
312
+ This release produces identical alignment coordinates and CIGARs in comparison
313
+ to v2.14. Users are advised to upgrade due to the several bug fixes.
314
+
315
+ (2.15: 10 Janurary 2019, r905)
316
+
317
+
318
+
319
+ Release 2.14-r883 (5 November 2018)
320
+ -----------------------------------
321
+
322
+ Notable changes:
323
+
324
+ * Fixed two minor bugs caused by typos (#254 and #266).
325
+
326
+ * Fixed a bug that made minimap2 abort when --eqx was used together with --MD
327
+ or --cs (#257).
328
+
329
+ * Added --cap-sw-mem to cap the size of DP matrices (#259). Base alignment may
330
+ take a lot of memory in the splicing mode. This may lead to issues when we
331
+ run minimap2 on a cluster with a hard memory limit. The new option avoids
332
+ unlimited memory usage at the cost of missing a few long introns.
333
+
334
+ * Conforming to C99 and C11 when possible (#261).
335
+
336
+ * Warn about malformatted FASTA or FASTQ (#252 and #255).
337
+
338
+ This release occasionally produces base alignments different from v2.13. The
339
+ overall alignment accuracy remain similar.
340
+
341
+ (2.14: 5 November 2018, r883)
342
+
343
+
344
+
345
+ Release 2.13-r850 (11 October 2018)
346
+ -----------------------------------
347
+
348
+ Changes to minimap2:
349
+
350
+ * Fixed wrongly formatted SAM when -L is in use (#231 and #233).
351
+
352
+ * Fixed an integer overflow in rare cases.
353
+
354
+ * Added --hard-mask-level to fine control split alignments (#244).
355
+
356
+ * Made --MD work with spliced alignment (#139).
357
+
358
+ * Replaced musl's getopt with ketopt for portability.
359
+
360
+ * Log peak memory usage on exit.
361
+
362
+ This release should produce alignments identical to v2.12 and v2.11.
363
+
364
+ (2.13: 11 October 2018, r850)
365
+
366
+
367
+
368
+ Release 2.12-r827 (6 August 2018)
369
+ ---------------------------------
370
+
371
+ Changes to minimap2:
372
+
373
+ * Added option --split-prefix to write proper alignments (correct mapping
374
+ quality and clustered query sequences) given a multi-part index (#141 and
375
+ #189; mostly by @hasindu2008).
376
+
377
+ * Fixed a memory leak when option -y is in use.
378
+
379
+ Changes to mappy:
380
+
381
+ * Support the MD/cs tag (#183 and #203).
382
+
383
+ * Allow mappy to index a single sequence, to add extra flags and to change the
384
+ scoring system.
385
+
386
+ Minimap2 should produce alignments identical to v2.11.
387
+
388
+ (2.12: 6 August 2018, r827)
389
+
390
+
391
+
392
+ Release 2.11-r797 (20 June 2018)
393
+ --------------------------------
394
+
395
+ Changes to minimap2:
396
+
397
+ * Improved alignment accuracy in low-complexity regions for SV calling. Thank
398
+ @armintoepfer for multiple offline examples.
399
+
400
+ * Added option --eqx to encode sequence match/mismatch with the =/X CIGAR
401
+ operators (#156, #157 and #175).
402
+
403
+ * When compiled with VC++, minimap2 generated wrong alignments due to a
404
+ comparison between a signed integer and an unsigned integer (#184). Also
405
+ fixed warnings reported by "clang -Wextra".
406
+
407
+ * Fixed incorrect anchor filtering due to a missing 64- to 32-bit cast.
408
+
409
+ * Fixed incorrect mapping quality for inversions (#148).
410
+
411
+ * Fixed incorrect alignment involving ambiguous bases (#155).
412
+
413
+ * Fixed incorrect presets: option `-r 2000` is intended to be used with
414
+ ava-ont, not ava-pb. The bug was introduced in 2.10.
415
+
416
+ * Fixed a bug when --for-only/--rev-only is used together with --sr or
417
+ --heap-sort=yes (#166).
418
+
419
+ * Fixed option -Y that was not working in the previous releases.
420
+
421
+ * Added option --lj-min-ratio to fine control the alignment of long gaps
422
+ found by the "long-join" heuristic (#128).
423
+
424
+ * Exposed `mm_idx_is_idx`, `mm_idx_load` and `mm_idx_dump` C APIs (#177).
425
+ Also fixed a bug when indexing without reference names (this feature is not
426
+ exposed to the command line).
427
+
428
+ Changes to mappy:
429
+
430
+ * Added `__version__` (#165).
431
+
432
+ * Exposed the maximum fragment length parameter to mappy (#174).
433
+
434
+ Changes to paftools:
435
+
436
+ * Don't crash when there is no "cg" tag (#153).
437
+
438
+ * Fixed wrong coverage report by "paftools.js call" (#145).
439
+
440
+ This version may produce slightly different base-level alignment. The overall
441
+ alignment statistics should remain similar.
442
+
443
+ (2.11: 20 June 2018, r797)
444
+
445
+
446
+
447
+ Release 2.10-r761 (27 March 2018)
448
+ ---------------------------------
449
+
450
+ Changes to minimap2:
451
+
452
+ * Optionally output the MD tag for compatibility with existing tools (#63,
453
+ #118 and #137).
454
+
455
+ * Use SSE compiler flags more precisely to prevent compiling errors on certain
456
+ machines (#127).
457
+
458
+ * Added option --min-occ-floor to set a minimum occurrence threshold. Presets
459
+ intended for assembly-to-reference alignment set this option to 100. This
460
+ option alleviates issues with regions having high copy numbers (#107).
461
+
462
+ * Exit with non-zero code on file writing errors (e.g. disk full; #103 and
463
+ #132).
464
+
465
+ * Added option -y to copy FASTA/FASTQ comments in query sequences to the
466
+ output (#136).
467
+
468
+ * Added the asm20 preset for alignments between genomes at 5-10% sequence
469
+ divergence.
470
+
471
+ * Changed the band-width in the ava-ont preset from 500 to 2000. Oxford
472
+ Nanopore reads may contain long deletion sequencing errors that break
473
+ chaining.
474
+
475
+ Changes to mappy, the Python binding:
476
+
477
+ * Fixed a typo in Align.seq() (#126).
478
+
479
+ Changes to paftools.js, the companion script:
480
+
481
+ * Command sam2paf now converts the MD tag to cs.
482
+
483
+ * Support VCF output for assembly-to-reference variant calling (#109).
484
+
485
+ This version should produce identical alignment for read overlapping, RNA-seq
486
+ read mapping, and genomic read mapping. We have also added a cook book to show
487
+ the variety uses of minimap2 on real datasets. Please see cookbook.md in the
488
+ minimap2 source code directory.
489
+
490
+ (2.10: 27 March 2017, r761)
491
+
492
+
493
+
494
+ Release 2.9-r720 (23 February 2018)
495
+ -----------------------------------
496
+
497
+ This release fixed multiple minor bugs.
498
+
499
+ * Fixed two bugs that lead to incorrect inversion alignment. Also improved the
500
+ sensitivity to small inversions by using double Z-drop cutoff (#112).
501
+
502
+ * Fixed an issue that may cause the end of a query sequence unmapped (#104).
503
+
504
+ * Added a mappy API to retrieve sequences from the index (#126) and to reverse
505
+ complement DNA sequences. Fixed a bug where the `best_n` parameter did not
506
+ work (#117).
507
+
508
+ * Avoided segmentation fault given incorrect FASTQ input (#111).
509
+
510
+ * Combined all auxiliary javascripts to paftools.js. Fixed several bugs in
511
+ these scripts at the same time.
512
+
513
+ (2.9: 24 February 2018, r720)
514
+
515
+
516
+
517
+ Release 2.8-r672 (1 February 2018)
518
+ ----------------------------------
519
+
520
+ Notable changes in this release include:
521
+
522
+ * Speed up short-read alignment by ~10%. The overall mapping accuracy stays
523
+ the same, but the output alignments are not always identical to v2.7 due to
524
+ unstable sorting employed during chaining. Long-read alignment is not
525
+ affected by this change as the speedup is short-read specific.
526
+
527
+ * Mappy now supports paired-end short-read alignment (#87). Please see
528
+ python/README.rst for details.
529
+
530
+ * Added option --for-only and --rev-only to perform alignment against the
531
+ forward or the reverse strand of the reference genome only (#91).
532
+
533
+ * Alleviated the issue with undesired diagonal alignment in the self mapping
534
+ mode (#10). Even if the output is not ideal, it should not interfere with
535
+ other alignments. Fully resolving the issue is intricate and may require
536
+ additional heuristic thresholds.
537
+
538
+ * Enhanced error checking against incorrect input (#92 and #96).
539
+
540
+ For long query sequences, minimap2 should output identical alignments to v2.7.
541
+
542
+ (2.8: 1 February 2018, r672)
543
+
544
+
545
+
546
+ Release 2.7-r654 (9 January 2018)
547
+ ---------------------------------
548
+
549
+ This release fixed a bug in the splice mode and added a few minor features:
550
+
551
+ * Fixed a bug that occasionally takes an intron as a long deletion in the
552
+ splice mode. This was caused by wrong backtracking at the last CIGAR
553
+ operator. The current fix eliminates the error, but it is not optimal in
554
+ that it often produces a wrong junction when the last operator is an intron.
555
+ A future version of minimap2 may improve upon this.
556
+
557
+ * Support high-end ARM CPUs that implement the NEON instruction set (#81).
558
+ This enables minimap2 to work on Raspberry Pi 3 and Odroid XU4.
559
+
560
+ * Added a C API to construct a minimizer index from a set of C strings (#80).
561
+
562
+ * Check scoring specified on the command line (#79). Due to the 8-bit limit,
563
+ excessively large score penalties fail minimap2.
564
+
565
+ For genomic sequences, minimap2 should give identical alignments to v2.6.
566
+
567
+ (2.7: 9 January 2018, r654)
568
+
569
+
570
+
571
+ Release 2.6-r623 (12 December 2017)
572
+ -----------------------------------
573
+
574
+ This release adds several features and fixes two minor bugs:
575
+
576
+ * Optionally build an index without sequences. This helps to reduce the
577
+ peak memory for read overlapping and is automatically applied when
578
+ base-level alignment is not requested.
579
+
580
+ * Approximately estimate per-base sequence divergence (i.e. 1-identity)
581
+ without performing base-level alignment, using a MashMap-like method. The
582
+ estimate is written to a new dv:f tag.
583
+
584
+ * Reduced the number of tiny terminal exons in RNA-seq alignment. The current
585
+ setting is conservative. Increase --end-seed-pen to drop more such exons.
586
+
587
+ * Reduced the peak memory when aligning long query sequences.
588
+
589
+ * Fixed a bug that is caused by HPC minimizers longer than 256bp. This should
590
+ have no effect in practice, but it is recommended to rebuild HPC indices if
591
+ possible.
592
+
593
+ * Fixed a bug when identifying identical hits (#71). This should only affect
594
+ artifactual reference consisting of near identical sequences.
595
+
596
+ For genomic sequences, minimap2 should give nearly identical alignments to
597
+ v2.5, except the new dv:f tag.
598
+
599
+ (2.6: 12 December 2017, r623)
600
+
601
+
602
+
603
+ Release 2.5-r572 (11 November 2017)
604
+ -----------------------------------
605
+
606
+ This release fixes several bugs and brings a couple of minor improvements:
607
+
608
+ * Fixed a severe bug that leads to incorrect mapping coordinates in rare
609
+ corner cases.
610
+
611
+ * Fixed underestimated mapping quality for chimeric alignments when the whole
612
+ query sequence contain many repetitive minimizers, and for chimeric
613
+ alignments caused by Z-drop.
614
+
615
+ * Fixed two bugs in Python binding: incorrect strand field (#57) and incorrect
616
+ sequence names for Python3 (#55).
617
+
618
+ * Improved mapping accuracy for highly overlapping paired ends.
619
+
620
+ * Added option -Y to use soft clipping for supplementary alignments (#56).
621
+
622
+ (2.5: 11 November 2017, r572)
623
+
624
+
625
+
626
+ Release 2.4-r555 (6 November 2017)
627
+ ----------------------------------
628
+
629
+ As is planned, this release focuses on fine tuning the base algorithm. Notable
630
+ changes include
631
+
632
+ * Changed the mapping quality scale to match the scale of BWA-MEM. This makes
633
+ minimap2 and BWA-MEM achieve similar sensitivity-specificity balance on real
634
+ short-read data.
635
+
636
+ * Improved the accuracy of splice alignment by modeling one additional base
637
+ close to the GT-AG signal. This model is used by default with `-x splice`.
638
+ For SIRV control data, however, it is recommended to add `--splice-flank=no`
639
+ to disable this feature as the SIRV splice signals are slightly different.
640
+
641
+ * Tuned the parameters for Nanopore Direct RNA reads. The recommended command
642
+ line is `-axsplice -k14 -uf` (#46).
643
+
644
+ * Fixed a segmentation fault when aligning PacBio reads (#47 and #48). This
645
+ bug is very rare but it affects all versions of minimap2. It is also
646
+ recommended to re-index reference genomes created with `map-pb`. For human,
647
+ two minimizers in an old index are wrong.
648
+
649
+ * Changed option `-L` in sync with the final decision of hts-specs: a fake
650
+ CIGAR takes the form of `<readLen>S<refLen>N`. Note that `-L` only enables
651
+ future tools to recognize long CIGARs. It is not possible for older tools to
652
+ work with such alignments in BAM (#43 and #51).
653
+
654
+ * Fixed a tiny issue whereby minimap2 may waste 8 bytes per candidate
655
+ alignment.
656
+
657
+ The minimap2 technical note hosted at arXiv has also been updated to reflect
658
+ recent changes.
659
+
660
+ (2.4: 6 November 2017, r555)
661
+
662
+
663
+
664
+ Release 2.3-r531 (22 October 2017)
665
+ ----------------------------------
666
+
667
+ This release come with many improvements and bug fixes:
668
+
669
+ * The **sr** preset now supports paired-end short-read alignment. Minimap2 is
670
+ 3-4 times as fast as BWA-MEM, but is slightly less accurate on simulated
671
+ reads.
672
+
673
+ * Meticulous improvements to assembly-to-assembly alignment (special thanks to
674
+ Alexey Gurevich from the QUAST team): a) apply a small penalty to matches
675
+ between ambiguous bases; b) reduce missing alignments due to spurious
676
+ overlaps; c) introduce the short form of the `cs` tag, an improvement to the
677
+ SAM MD tag.
678
+
679
+ * Make sure gaps are always left-aligned.
680
+
681
+ * Recognize `U` bases from Oxford Nanopore Direct RNA-seq (#33).
682
+
683
+ * Fixed slightly wrong chaining score. Fixed slightly inaccurate coordinates
684
+ for split alignment.
685
+
686
+ * Fixed multiple reported bugs: 1) wrong reference name for inversion
687
+ alignment (#30); 2) redundant SQ lines when multiple query files are
688
+ specified (#39); 3) non-functioning option `-K` (#36).
689
+
690
+ This release has implemented all the major features I planned five months ago,
691
+ with the addition of spliced long-read alignment. The next couple of releases
692
+ will focus on fine tuning of the base algorithms.
693
+
694
+ (2.3: 22 October 2017, r531)
695
+
696
+
697
+
698
+ Release 2.2-r409 (17 September 2017)
699
+ ------------------------------------
700
+
701
+ This is a feature release. It improves single-end short-read alignment and
702
+ comes with Python bindings. Detailed changes include:
703
+
704
+ * Added the **sr** preset for single-end short-read alignment. In this mode,
705
+ minimap2 runs faster than BWA-MEM, but is slightly less accurate on
706
+ simulated data sets. Paired-end alignment is not supported as of now.
707
+
708
+ * Improved mapping quality estimate with more accurate identification of
709
+ repetitive hits. This mainly helps short-read alignment.
710
+
711
+ * Implemented **mappy**, a Python binding for minimap2, which is available
712
+ from PyPI and can be installed with `pip install --user mappy`. Python users
713
+ can perform read alignment without the minimap2 executable.
714
+
715
+ * Restructured the indexing APIs and documented key minimap2 APIs in the
716
+ header file minimap.h. Updated example.c with the new APIs. Old APIs still
717
+ work but may become deprecated in future.
718
+
719
+ This release may output alignments different from the previous version, though
720
+ the overall alignment statistics, such as the number of aligned bases and long
721
+ gaps, remain close.
722
+
723
+ (2.2: 17 September 2017, r409)
724
+
725
+
726
+
727
+ Release 2.1.1-r341 (6 September 2017)
728
+ -------------------------------------
729
+
730
+ This is a maintenance release that is expected to output identical alignment to
731
+ v2.1. Detailed changes include:
732
+
733
+ * Support CPU dispatch. By default, minimap2 is compiled with both SSE2 and
734
+ SSE4 based implementation of alignment and automatically chooses the right
735
+ one at runtime. This avoids unexpected errors on older CPUs (#21).
736
+
737
+ * Improved Windows support as is requested by Oxford Nanopore (#19). Minimap2
738
+ now avoids variable-length stacked arrays, eliminates alloca(), ships with
739
+ getopt_long() and provides timing functions implemented with Windows APIs.
740
+
741
+ * Fixed a potential segmentation fault when specifying -k/-w/-H with
742
+ multi-part index (#23).
743
+
744
+ * Fixed two memory leaks in example.c
745
+
746
+ (2.1.1: 6 September 2017, r341)
747
+
748
+
749
+
750
+ Release 2.1-r311 (25 August 2017)
751
+ ---------------------------------
752
+
753
+ This release adds spliced alignment for long noisy RNA-seq reads. On a SMRT
754
+ Iso-Seq and a Oxford Nanopore data sets, minimap2 appears to outperform
755
+ traditional mRNA aligners. For DNA alignment, this release gives almost
756
+ identical output to v2.0. Other changes include:
757
+
758
+ * Added option `-R` to set the read group header line in SAM.
759
+
760
+ * Optionally output the `cs:Z` tag in PAF to encode both the query and the
761
+ reference sequences in the alignment.
762
+
763
+ * Fixed an issue where DP alignment uses excessive memory.
764
+
765
+ The minimap2 technical report has been updated with more details and the
766
+ evaluation of spliced alignment:
767
+
768
+ * Li, H. (2017). Minimap2: fast pairwise alignment for long nucleotide
769
+ sequences. [arXiv:1708.01492v2](https://arxiv.org/abs/1708.01492v2).
770
+
771
+ (2.1: 25 August 2017, r311)
772
+
773
+
774
+
775
+ Release 2.0-r275 (8 August 2017)
776
+ --------------------------------
777
+
778
+ This release is identical to version 2.0rc1, except the version number. It is
779
+ described and evaluated in the following technical report:
780
+
781
+ * Li, H. (2017). Minimap2: fast pairwise alignment for long DNA sequences.
782
+ [arXiv:1708.01492v1](https://arxiv.org/abs/1708.01492v1).
783
+
784
+ (2.0: 8 August 2017, r275)
785
+
786
+
787
+
788
+ Release 2.0rc1-r232 (30 July 2017)
789
+ ----------------------------------
790
+
791
+ This release improves the accuracy of long-read alignment and added several
792
+ minor features.
793
+
794
+ * Improved mapping quality estimate for short alignments containing few seed
795
+ hits.
796
+
797
+ * Fixed a minor bug that affects the chaining accuracy towards the ends of a
798
+ chain. Changed the gap cost for chaining to reduce false seeding.
799
+
800
+ * Skip potentially wrong seeding and apply dynamic programming more frequently.
801
+ This slightly increases run time, but greatly reduces false long gaps.
802
+
803
+ * Perform local alignment at Z-drop break point to recover potential inversion
804
+ alignment. Output the SA tag in the SAM format. Added scripts to evaluate
805
+ mapping accuracy for reads simulated with pbsim.
806
+
807
+ This release completes features intended for v2.0. No major features will be
808
+ added to the master branch before the final v2.0.
809
+
810
+ (2.0rc1: 30 July 2017, r232)
811
+
812
+
813
+
814
+ Release r191 (19 July 2017)
815
+ ---------------------------
816
+
817
+ This is the first public release of minimap2, an aligner for long reads and
818
+ assemblies. This release has a few issues and is generally not recommended for
819
+ production uses.
820
+
821
+ (19 July 2017, r191)