minimap2 0.2.23.0 → 0.2.23.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (96) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +60 -76
  3. data/ext/Rakefile +41 -0
  4. data/ext/cmappy/cmappy.c +129 -0
  5. data/ext/cmappy/cmappy.h +44 -0
  6. data/ext/minimap2/FAQ.md +46 -0
  7. data/ext/minimap2/LICENSE.txt +24 -0
  8. data/ext/minimap2/MANIFEST.in +10 -0
  9. data/ext/minimap2/Makefile +132 -0
  10. data/ext/minimap2/Makefile.simde +97 -0
  11. data/ext/minimap2/NEWS.md +807 -0
  12. data/ext/minimap2/README.md +403 -0
  13. data/ext/minimap2/align.c +1020 -0
  14. data/ext/minimap2/bseq.c +169 -0
  15. data/ext/minimap2/bseq.h +64 -0
  16. data/ext/minimap2/code_of_conduct.md +30 -0
  17. data/ext/minimap2/cookbook.md +243 -0
  18. data/ext/minimap2/esterr.c +64 -0
  19. data/ext/minimap2/example.c +63 -0
  20. data/ext/minimap2/format.c +559 -0
  21. data/ext/minimap2/hit.c +466 -0
  22. data/ext/minimap2/index.c +775 -0
  23. data/ext/minimap2/kalloc.c +205 -0
  24. data/ext/minimap2/kalloc.h +76 -0
  25. data/ext/minimap2/kdq.h +132 -0
  26. data/ext/minimap2/ketopt.h +120 -0
  27. data/ext/minimap2/khash.h +615 -0
  28. data/ext/minimap2/krmq.h +474 -0
  29. data/ext/minimap2/kseq.h +256 -0
  30. data/ext/minimap2/ksort.h +153 -0
  31. data/ext/minimap2/ksw2.h +184 -0
  32. data/ext/minimap2/ksw2_dispatch.c +96 -0
  33. data/ext/minimap2/ksw2_extd2_sse.c +402 -0
  34. data/ext/minimap2/ksw2_exts2_sse.c +416 -0
  35. data/ext/minimap2/ksw2_extz2_sse.c +313 -0
  36. data/ext/minimap2/ksw2_ll_sse.c +152 -0
  37. data/ext/minimap2/kthread.c +159 -0
  38. data/ext/minimap2/kthread.h +15 -0
  39. data/ext/minimap2/kvec.h +105 -0
  40. data/ext/minimap2/lchain.c +344 -0
  41. data/ext/minimap2/main.c +455 -0
  42. data/ext/minimap2/map.c +714 -0
  43. data/ext/minimap2/minimap.h +409 -0
  44. data/ext/minimap2/minimap2.1 +722 -0
  45. data/ext/minimap2/misc/README.md +179 -0
  46. data/ext/minimap2/misc/mmphase.js +335 -0
  47. data/ext/minimap2/misc/paftools.js +3149 -0
  48. data/ext/minimap2/misc.c +162 -0
  49. data/ext/minimap2/mmpriv.h +131 -0
  50. data/ext/minimap2/options.c +233 -0
  51. data/ext/minimap2/pe.c +177 -0
  52. data/ext/minimap2/python/README.rst +196 -0
  53. data/ext/minimap2/python/cmappy.h +152 -0
  54. data/ext/minimap2/python/cmappy.pxd +153 -0
  55. data/ext/minimap2/python/mappy.pyx +273 -0
  56. data/ext/minimap2/python/minimap2.py +39 -0
  57. data/ext/minimap2/sdust.c +213 -0
  58. data/ext/minimap2/sdust.h +25 -0
  59. data/ext/minimap2/seed.c +131 -0
  60. data/ext/minimap2/setup.py +55 -0
  61. data/ext/minimap2/sketch.c +143 -0
  62. data/ext/minimap2/splitidx.c +84 -0
  63. data/ext/minimap2/sse2neon/emmintrin.h +1689 -0
  64. data/ext/minimap2/test/MT-human.fa +278 -0
  65. data/ext/minimap2/test/MT-orang.fa +276 -0
  66. data/ext/minimap2/test/q-inv.fa +4 -0
  67. data/ext/minimap2/test/q2.fa +2 -0
  68. data/ext/minimap2/test/t-inv.fa +127 -0
  69. data/ext/minimap2/test/t2.fa +2 -0
  70. data/ext/minimap2/tex/Makefile +21 -0
  71. data/ext/minimap2/tex/bioinfo.cls +930 -0
  72. data/ext/minimap2/tex/blasr-mc.eval +17 -0
  73. data/ext/minimap2/tex/bowtie2-s3.sam.eval +28 -0
  74. data/ext/minimap2/tex/bwa-s3.sam.eval +52 -0
  75. data/ext/minimap2/tex/bwa.eval +55 -0
  76. data/ext/minimap2/tex/eval2roc.pl +33 -0
  77. data/ext/minimap2/tex/graphmap.eval +4 -0
  78. data/ext/minimap2/tex/hs38-simu.sh +10 -0
  79. data/ext/minimap2/tex/minialign.eval +49 -0
  80. data/ext/minimap2/tex/minimap2.bib +460 -0
  81. data/ext/minimap2/tex/minimap2.tex +724 -0
  82. data/ext/minimap2/tex/mm2-s3.sam.eval +62 -0
  83. data/ext/minimap2/tex/mm2-update.tex +240 -0
  84. data/ext/minimap2/tex/mm2.approx.eval +12 -0
  85. data/ext/minimap2/tex/mm2.eval +13 -0
  86. data/ext/minimap2/tex/natbib.bst +1288 -0
  87. data/ext/minimap2/tex/natbib.sty +803 -0
  88. data/ext/minimap2/tex/ngmlr.eval +38 -0
  89. data/ext/minimap2/tex/roc.gp +60 -0
  90. data/ext/minimap2/tex/snap-s3.sam.eval +62 -0
  91. data/ext/minimap2.patch +19 -0
  92. data/{vendor → ext/vendor}/libminimap2.so +0 -0
  93. data/lib/minimap2/ffi/functions.rb +5 -0
  94. data/lib/minimap2/version.rb +1 -1
  95. data/lib/minimap2.rb +32 -0
  96. metadata +94 -4
@@ -0,0 +1,807 @@
1
+ Release 2.23-r1111 (18 November 2021)
2
+ -------------------------------------
3
+
4
+ Notable changes:
5
+
6
+ * Bugfix: fixed missing alignments around long inversions (#806 and #816).
7
+ This bug affected v2.19 through v2.22.
8
+
9
+ * Improvement: avoid extremely long mapping time for pathologic reads with
10
+ highly repeated k-mers not in the reference (#771). Use --q-occ-frac=0
11
+ to disable the new heuristic.
12
+
13
+ * Change: use --cap-kalloc=1g by default.
14
+
15
+ (2.23: 18 November 2021, r1111)
16
+
17
+
18
+
19
+ Release 2.22-r1101 (7 August 2021)
20
+ ----------------------------------
21
+
22
+ When choosing the best alignment, this release uses logarithm gap penalty and
23
+ query-specific mismatch penalty. It improves the sensitivity to long INDELs in
24
+ repetitive regions.
25
+
26
+ Other notable changes:
27
+
28
+ * Bugfix: fixed an indirect memory leak that may waste a large amount of
29
+ memory given highly repetitive reference such as a 16S RNA database (#749).
30
+ All versions of minimap2 have this issue.
31
+
32
+ * New feature: added --cap-kalloc to reduce the peak memory. This option is
33
+ not enabled by default but may become the default in future releases.
34
+
35
+ Known issue:
36
+
37
+ * Minimap2 may take a long time to map a read (#771). So far it is not clear
38
+ if this happens to v2.18 and earlier versions.
39
+
40
+ (2.22: 7 August 2021, r1101)
41
+
42
+
43
+
44
+ Release 2.21-r1071 (6 July 2021)
45
+ --------------------------------
46
+
47
+ This release fixed a regression in short-read mapping introduced in v2.19
48
+ (#776). It also fixed invalid comparisons of uninitialized variables, though
49
+ these are harmless (#752). Long-read alignment should be identical to v2.20.
50
+
51
+ (2.21: 6 July 2021, r1071)
52
+
53
+
54
+
55
+ Release 2.20-r1061 (27 May 2021)
56
+ --------------------------------
57
+
58
+ This release fixed a bug in the Python module and improves the command-line
59
+ compatibiliity with v2.18. In v2.19, if `-r` is specified with an `asm*` preset,
60
+ users would get alignments more fragmented than v2.18. This could be an issue
61
+ for existing pipelines specifying `-r`. This release resolves this issue.
62
+
63
+ (2.20: 27 May 2021, r1061)
64
+
65
+
66
+
67
+ Release 2.19-r1057 (26 May 2021)
68
+ --------------------------------
69
+
70
+ This release includes a few important improvements backported from unimap:
71
+
72
+ * Improvement: more contiguous alignment through long INDELs. This is enabled
73
+ by the minigraph chaining algorithm. All `asm*` presets now use the new
74
+ algorithm. They can find INDELs up to 100kb and may be faster for
75
+ chromosome-long contigs. The default mode and `map*` presets use this
76
+ algorithm to replace the long-join heuristic.
77
+
78
+ * Improvement: better alignment in highly repetitive regions by rescuing
79
+ high-occurrence seeds. If the distance between two adjacent seeds is too
80
+ large, attempt to choose a fraction of high-occurrence seeds in-between.
81
+ Minimap2 now produces fewer clippings and alignment break points in long
82
+ satellite regions.
83
+
84
+ * Improvement: allow to specify an interval of k-mer occurrences with `-U`.
85
+ For repeat-rich genomes, the automatic k-mer occurrence threshold determined
86
+ by `-f` may be too large and makes alignment impractically slow. The new
87
+ option protects against such cases. Enabled for `asm*` and `map-hifi`.
88
+
89
+ * New feature: added the `map-hifi` preset for maping PacBio High-Fidelity
90
+ (HiFi) reads.
91
+
92
+ * Change to the default: apply `--cap-sw-mem=100m` for genomic alignment.
93
+
94
+ * Bugfix: minimap2 could not generate an index file with `-xsr` (#734).
95
+
96
+ This release represents the most signficant algorithmic change since v2.1 in
97
+ 2017. With features backported from unimap, minimap2 now has similar power to
98
+ unimap for contig alignment. Unimap will remain an experimental project and is
99
+ no longer recommended over minimap2. Sorry for reverting the recommendation in
100
+ short time.
101
+
102
+ (2.19: 26 May 2021, r1057)
103
+
104
+
105
+
106
+ Release 2.18-r1015 (9 April 2021)
107
+ ---------------------------------
108
+
109
+ This release fixes multiple rare bugs in minimap2 and adds additional
110
+ functionality to paftools.js.
111
+
112
+ Changes to minimap2:
113
+
114
+ * Bugfix: a rare segfault caused by an off-by-one error (#489)
115
+
116
+ * Bugfix: minimap2 segfaulted due to an uninitilized variable (#622 and #625).
117
+
118
+ * Bugfix: minimap2 parsed spaces as field separators in BED (#721). This led
119
+ to issues when the BED name column contains spaces.
120
+
121
+ * Bugfix: minimap2 `--split-prefix` did not work with long reference names
122
+ (#394).
123
+
124
+ * Bugfix: option `--junc-bonus` didn't work (#513)
125
+
126
+ * Bugfix: minimap2 didn't return 1 on I/O errors (#532)
127
+
128
+ * Bugfix: the `de:f` tag (sequence divergence) could be negative if there were
129
+ ambiguous bases
130
+
131
+ * Bugfix: fixed two undefined behaviors caused by calling memcpy() on
132
+ zero-length blocks (#443)
133
+
134
+ * Bugfix: there were duplicated SAM @SQ lines if option `--split-prefix` is in
135
+ use (#400 and #527)
136
+
137
+ * Bugfix: option -K had to be smaller than 2 billion (#491). This was caused
138
+ by a 32-bit integer overflow.
139
+
140
+ * Improvement: optionally compile against SIMDe (#597). Minimap2 should work
141
+ with IBM POWER CPUs, though this has not been tested. To compile with SIMDe,
142
+ please use `make -f Makefile.simde`.
143
+
144
+ * Improvement: more informative error message for I/O errors (#454) and for
145
+ FASTQ parsing errors (#510)
146
+
147
+ * Improvement: abort given malformatted RG line (#541)
148
+
149
+ * Improvement: better formula to estimate the `dv:f` tag (approximate sequence
150
+ divergence). See DOI:10.1101/2021.01.15.426881.
151
+
152
+ * New feature: added the `--mask-len` option to fine control the removal of
153
+ redundant hits (#659). The default behavior is unchanged.
154
+
155
+ Changes to mappy:
156
+
157
+ * Bugfix: mappy caused segmentation fault if the reference index is not
158
+ present (#413).
159
+
160
+ * Bugfix: fixed a memory leak via 238b6bb3
161
+
162
+ * Change: always require Cython to compile the mappy module (#723). Older
163
+ mappy packages at PyPI bundled the C source code generated by Cython such
164
+ that end users did not need to install Cython to compile mappy. However, as
165
+ Python 3.9 is breaking backward compatibility, older mappy does not work
166
+ with Python 3.9 anymore. We have to add this Cython dependency as a
167
+ workaround.
168
+
169
+ Changes to paftools.js:
170
+
171
+ * Bugfix: the "part10-" line from asmgene was wrong (#581)
172
+
173
+ * Improvement: compatibility with GTF files from GenBank (#422)
174
+
175
+ * New feature: asmgene also checks missing multi-copy genes
176
+
177
+ * New feature: added the misjoin command to evaluate large-scale misjoins and
178
+ megabase-long inversions.
179
+
180
+ Although given the many bug fixes and minor improvements, the core algorithm
181
+ stays the same. This version of minimap2 produces nearly identical alignments
182
+ to v2.17 except very rare corner cases.
183
+
184
+ Now unimap is recommended over minimap2 for aligning long contigs against a
185
+ reference genome. It often takes less wall-clock time and is much more
186
+ sensitive to long insertions and deletions.
187
+
188
+ (2.18: 9 April 2021, r1015)
189
+
190
+
191
+
192
+ Release 2.17-r941 (4 May 2019)
193
+ ------------------------------
194
+
195
+ Changes since the last release:
196
+
197
+ * Fixed flawed CIGARs like `5I6D7I` (#392).
198
+
199
+ * Bugfix: TLEN should be 0 when either end is unmapped (#373 and #365).
200
+
201
+ * Bugfix: mappy is unable to write index (#372).
202
+
203
+ * Added option `--junc-bed` to load known gene annotations in the BED12
204
+ format. Minimap2 prefers annotated junctions over novel junctions (#197 and
205
+ #348). GTF can be converted to BED12 with `paftools.js gff2bed`.
206
+
207
+ * Added option `--sam-hit-only` to suppress unmapped hits in SAM (#377).
208
+
209
+ * Added preset `splice:hq` for high-quality CCS or mRNA sequences. It applies
210
+ better scoring and improves the sensitivity to small exons. This preset may
211
+ introduce false small introns, but the overall accuracy should be higher.
212
+
213
+ This version produces nearly identical alignments to v2.16, except for CIGARs
214
+ affected by the bug mentioned above.
215
+
216
+ (2.17: 5 May 2019, r941)
217
+
218
+
219
+
220
+ Release 2.16-r922 (28 February 2019)
221
+ ------------------------------------
222
+
223
+ This release is 50% faster for mapping ultra-long nanopore reads at comparable
224
+ accuracy. For short-read mapping, long-read overlapping and ordinary long-read
225
+ mapping, the performance and accuracy remain similar. This speedup is achieved
226
+ with a new heuristic to limit the number of chaining iterations (#324). Users
227
+ can disable the heuristic by increasing a new option `--max-chain-iter` to a
228
+ huge number.
229
+
230
+ Other changes to minimap2:
231
+
232
+ * Implemented option `--paf-no-hit` to output unmapped query sequences in PAF.
233
+ The strand and reference name columns are both `*` at an unmapped line. The
234
+ hidden option is available in earlier minimap2 but had a different 2-column
235
+ output format instead of PAF.
236
+
237
+ * Fixed a bug that leads to wrongly calculated `de` tags when ambiguous bases
238
+ are involved (#309). This bug only affects v2.15.
239
+
240
+ * Fixed a bug when parsing command-line option `--splice` (#344). This bug was
241
+ introduced in v2.13.
242
+
243
+ * Fixed two division-by-zero cases (#326). They don't affect final alignments
244
+ because the results of the divisions are not used in both case.
245
+
246
+ * Added an option `-o` to output alignments to a specified file. It is still
247
+ recommended to use UNIX pipes for on-the-fly conversion or compression.
248
+
249
+ * Output a new `rl` tag to give the length of query regions harboring
250
+ repetitive seeds.
251
+
252
+ Changes to paftool.js:
253
+
254
+ * Added a new option to convert the MD tag to the long form of the cs tag.
255
+
256
+ Changes to mappy:
257
+
258
+ * Added the `mappy.Aligner.seq_names` method to return sequence names (#312).
259
+
260
+ For NA12878 ultra-long reads, this release changes the alignments of <0.1% of
261
+ reads in comparison to v2.15. All these reads have highly fragmented alignments
262
+ and are likely to be problematic anyway. For shorter or well aligned reads,
263
+ this release should produce mostly identical alignments to v2.15.
264
+
265
+ (2.16: 28 February 2019, r922)
266
+
267
+
268
+
269
+ Release 2.15-r905 (10 January 2019)
270
+ -----------------------------------
271
+
272
+ Changes to minimap2:
273
+
274
+ * Fixed a rare segmentation fault when option -H is in use (#307). This may
275
+ happen when there are very long homopolymers towards the 5'-end of a read.
276
+
277
+ * Fixed wrong CIGARs when option --eqx is used (#266).
278
+
279
+ * Fixed a typo in the base encoding table (#264). This should have no
280
+ practical effect.
281
+
282
+ * Fixed a typo in the example code (#265).
283
+
284
+ * Improved the C++ compatibility by removing "register" (#261). However,
285
+ minimap2 still can't be compiled in the pedantic C++ mode (#306).
286
+
287
+ * Output a new "de" tag for gap-compressed sequence divergence.
288
+
289
+ Changes to paftools.js:
290
+
291
+ * Added "asmgene" to evaluate the completeness of an assembly by measuring the
292
+ uniquely mapped single-copy genes. This command learns the idea of BUSCO.
293
+
294
+ * Added "vcfpair" to call a phased VCF from phased whole-genome assemblies. An
295
+ earlier version of this script is used to produce the ground truth for the
296
+ syndip benchmark [PMID:30013044].
297
+
298
+ This release produces identical alignment coordinates and CIGARs in comparison
299
+ to v2.14. Users are advised to upgrade due to the several bug fixes.
300
+
301
+ (2.15: 10 Janurary 2019, r905)
302
+
303
+
304
+
305
+ Release 2.14-r883 (5 November 2018)
306
+ -----------------------------------
307
+
308
+ Notable changes:
309
+
310
+ * Fixed two minor bugs caused by typos (#254 and #266).
311
+
312
+ * Fixed a bug that made minimap2 abort when --eqx was used together with --MD
313
+ or --cs (#257).
314
+
315
+ * Added --cap-sw-mem to cap the size of DP matrices (#259). Base alignment may
316
+ take a lot of memory in the splicing mode. This may lead to issues when we
317
+ run minimap2 on a cluster with a hard memory limit. The new option avoids
318
+ unlimited memory usage at the cost of missing a few long introns.
319
+
320
+ * Conforming to C99 and C11 when possible (#261).
321
+
322
+ * Warn about malformatted FASTA or FASTQ (#252 and #255).
323
+
324
+ This release occasionally produces base alignments different from v2.13. The
325
+ overall alignment accuracy remain similar.
326
+
327
+ (2.14: 5 November 2018, r883)
328
+
329
+
330
+
331
+ Release 2.13-r850 (11 October 2018)
332
+ -----------------------------------
333
+
334
+ Changes to minimap2:
335
+
336
+ * Fixed wrongly formatted SAM when -L is in use (#231 and #233).
337
+
338
+ * Fixed an integer overflow in rare cases.
339
+
340
+ * Added --hard-mask-level to fine control split alignments (#244).
341
+
342
+ * Made --MD work with spliced alignment (#139).
343
+
344
+ * Replaced musl's getopt with ketopt for portability.
345
+
346
+ * Log peak memory usage on exit.
347
+
348
+ This release should produce alignments identical to v2.12 and v2.11.
349
+
350
+ (2.13: 11 October 2018, r850)
351
+
352
+
353
+
354
+ Release 2.12-r827 (6 August 2018)
355
+ ---------------------------------
356
+
357
+ Changes to minimap2:
358
+
359
+ * Added option --split-prefix to write proper alignments (correct mapping
360
+ quality and clustered query sequences) given a multi-part index (#141 and
361
+ #189; mostly by @hasindu2008).
362
+
363
+ * Fixed a memory leak when option -y is in use.
364
+
365
+ Changes to mappy:
366
+
367
+ * Support the MD/cs tag (#183 and #203).
368
+
369
+ * Allow mappy to index a single sequence, to add extra flags and to change the
370
+ scoring system.
371
+
372
+ Minimap2 should produce alignments identical to v2.11.
373
+
374
+ (2.12: 6 August 2018, r827)
375
+
376
+
377
+
378
+ Release 2.11-r797 (20 June 2018)
379
+ --------------------------------
380
+
381
+ Changes to minimap2:
382
+
383
+ * Improved alignment accuracy in low-complexity regions for SV calling. Thank
384
+ @armintoepfer for multiple offline examples.
385
+
386
+ * Added option --eqx to encode sequence match/mismatch with the =/X CIGAR
387
+ operators (#156, #157 and #175).
388
+
389
+ * When compiled with VC++, minimap2 generated wrong alignments due to a
390
+ comparison between a signed integer and an unsigned integer (#184). Also
391
+ fixed warnings reported by "clang -Wextra".
392
+
393
+ * Fixed incorrect anchor filtering due to a missing 64- to 32-bit cast.
394
+
395
+ * Fixed incorrect mapping quality for inversions (#148).
396
+
397
+ * Fixed incorrect alignment involving ambiguous bases (#155).
398
+
399
+ * Fixed incorrect presets: option `-r 2000` is intended to be used with
400
+ ava-ont, not ava-pb. The bug was introduced in 2.10.
401
+
402
+ * Fixed a bug when --for-only/--rev-only is used together with --sr or
403
+ --heap-sort=yes (#166).
404
+
405
+ * Fixed option -Y that was not working in the previous releases.
406
+
407
+ * Added option --lj-min-ratio to fine control the alignment of long gaps
408
+ found by the "long-join" heuristic (#128).
409
+
410
+ * Exposed `mm_idx_is_idx`, `mm_idx_load` and `mm_idx_dump` C APIs (#177).
411
+ Also fixed a bug when indexing without reference names (this feature is not
412
+ exposed to the command line).
413
+
414
+ Changes to mappy:
415
+
416
+ * Added `__version__` (#165).
417
+
418
+ * Exposed the maximum fragment length parameter to mappy (#174).
419
+
420
+ Changes to paftools:
421
+
422
+ * Don't crash when there is no "cg" tag (#153).
423
+
424
+ * Fixed wrong coverage report by "paftools.js call" (#145).
425
+
426
+ This version may produce slightly different base-level alignment. The overall
427
+ alignment statistics should remain similar.
428
+
429
+ (2.11: 20 June 2018, r797)
430
+
431
+
432
+
433
+ Release 2.10-r761 (27 March 2018)
434
+ ---------------------------------
435
+
436
+ Changes to minimap2:
437
+
438
+ * Optionally output the MD tag for compatibility with existing tools (#63,
439
+ #118 and #137).
440
+
441
+ * Use SSE compiler flags more precisely to prevent compiling errors on certain
442
+ machines (#127).
443
+
444
+ * Added option --min-occ-floor to set a minimum occurrence threshold. Presets
445
+ intended for assembly-to-reference alignment set this option to 100. This
446
+ option alleviates issues with regions having high copy numbers (#107).
447
+
448
+ * Exit with non-zero code on file writing errors (e.g. disk full; #103 and
449
+ #132).
450
+
451
+ * Added option -y to copy FASTA/FASTQ comments in query sequences to the
452
+ output (#136).
453
+
454
+ * Added the asm20 preset for alignments between genomes at 5-10% sequence
455
+ divergence.
456
+
457
+ * Changed the band-width in the ava-ont preset from 500 to 2000. Oxford
458
+ Nanopore reads may contain long deletion sequencing errors that break
459
+ chaining.
460
+
461
+ Changes to mappy, the Python binding:
462
+
463
+ * Fixed a typo in Align.seq() (#126).
464
+
465
+ Changes to paftools.js, the companion script:
466
+
467
+ * Command sam2paf now converts the MD tag to cs.
468
+
469
+ * Support VCF output for assembly-to-reference variant calling (#109).
470
+
471
+ This version should produce identical alignment for read overlapping, RNA-seq
472
+ read mapping, and genomic read mapping. We have also added a cook book to show
473
+ the variety uses of minimap2 on real datasets. Please see cookbook.md in the
474
+ minimap2 source code directory.
475
+
476
+ (2.10: 27 March 2017, r761)
477
+
478
+
479
+
480
+ Release 2.9-r720 (23 February 2018)
481
+ -----------------------------------
482
+
483
+ This release fixed multiple minor bugs.
484
+
485
+ * Fixed two bugs that lead to incorrect inversion alignment. Also improved the
486
+ sensitivity to small inversions by using double Z-drop cutoff (#112).
487
+
488
+ * Fixed an issue that may cause the end of a query sequence unmapped (#104).
489
+
490
+ * Added a mappy API to retrieve sequences from the index (#126) and to reverse
491
+ complement DNA sequences. Fixed a bug where the `best_n` parameter did not
492
+ work (#117).
493
+
494
+ * Avoided segmentation fault given incorrect FASTQ input (#111).
495
+
496
+ * Combined all auxiliary javascripts to paftools.js. Fixed several bugs in
497
+ these scripts at the same time.
498
+
499
+ (2.9: 24 February 2018, r720)
500
+
501
+
502
+
503
+ Release 2.8-r672 (1 February 2018)
504
+ ----------------------------------
505
+
506
+ Notable changes in this release include:
507
+
508
+ * Speed up short-read alignment by ~10%. The overall mapping accuracy stays
509
+ the same, but the output alignments are not always identical to v2.7 due to
510
+ unstable sorting employed during chaining. Long-read alignment is not
511
+ affected by this change as the speedup is short-read specific.
512
+
513
+ * Mappy now supports paired-end short-read alignment (#87). Please see
514
+ python/README.rst for details.
515
+
516
+ * Added option --for-only and --rev-only to perform alignment against the
517
+ forward or the reverse strand of the reference genome only (#91).
518
+
519
+ * Alleviated the issue with undesired diagonal alignment in the self mapping
520
+ mode (#10). Even if the output is not ideal, it should not interfere with
521
+ other alignments. Fully resolving the issue is intricate and may require
522
+ additional heuristic thresholds.
523
+
524
+ * Enhanced error checking against incorrect input (#92 and #96).
525
+
526
+ For long query sequences, minimap2 should output identical alignments to v2.7.
527
+
528
+ (2.8: 1 February 2018, r672)
529
+
530
+
531
+
532
+ Release 2.7-r654 (9 January 2018)
533
+ ---------------------------------
534
+
535
+ This release fixed a bug in the splice mode and added a few minor features:
536
+
537
+ * Fixed a bug that occasionally takes an intron as a long deletion in the
538
+ splice mode. This was caused by wrong backtracking at the last CIGAR
539
+ operator. The current fix eliminates the error, but it is not optimal in
540
+ that it often produces a wrong junction when the last operator is an intron.
541
+ A future version of minimap2 may improve upon this.
542
+
543
+ * Support high-end ARM CPUs that implement the NEON instruction set (#81).
544
+ This enables minimap2 to work on Raspberry Pi 3 and Odroid XU4.
545
+
546
+ * Added a C API to construct a minimizer index from a set of C strings (#80).
547
+
548
+ * Check scoring specified on the command line (#79). Due to the 8-bit limit,
549
+ excessively large score penalties fail minimap2.
550
+
551
+ For genomic sequences, minimap2 should give identical alignments to v2.6.
552
+
553
+ (2.7: 9 January 2018, r654)
554
+
555
+
556
+
557
+ Release 2.6-r623 (12 December 2017)
558
+ -----------------------------------
559
+
560
+ This release adds several features and fixes two minor bugs:
561
+
562
+ * Optionally build an index without sequences. This helps to reduce the
563
+ peak memory for read overlapping and is automatically applied when
564
+ base-level alignment is not requested.
565
+
566
+ * Approximately estimate per-base sequence divergence (i.e. 1-identity)
567
+ without performing base-level alignment, using a MashMap-like method. The
568
+ estimate is written to a new dv:f tag.
569
+
570
+ * Reduced the number of tiny terminal exons in RNA-seq alignment. The current
571
+ setting is conservative. Increase --end-seed-pen to drop more such exons.
572
+
573
+ * Reduced the peak memory when aligning long query sequences.
574
+
575
+ * Fixed a bug that is caused by HPC minimizers longer than 256bp. This should
576
+ have no effect in practice, but it is recommended to rebuild HPC indices if
577
+ possible.
578
+
579
+ * Fixed a bug when identifying identical hits (#71). This should only affect
580
+ artifactual reference consisting of near identical sequences.
581
+
582
+ For genomic sequences, minimap2 should give nearly identical alignments to
583
+ v2.5, except the new dv:f tag.
584
+
585
+ (2.6: 12 December 2017, r623)
586
+
587
+
588
+
589
+ Release 2.5-r572 (11 November 2017)
590
+ -----------------------------------
591
+
592
+ This release fixes several bugs and brings a couple of minor improvements:
593
+
594
+ * Fixed a severe bug that leads to incorrect mapping coordinates in rare
595
+ corner cases.
596
+
597
+ * Fixed underestimated mapping quality for chimeric alignments when the whole
598
+ query sequence contain many repetitive minimizers, and for chimeric
599
+ alignments caused by Z-drop.
600
+
601
+ * Fixed two bugs in Python binding: incorrect strand field (#57) and incorrect
602
+ sequence names for Python3 (#55).
603
+
604
+ * Improved mapping accuracy for highly overlapping paired ends.
605
+
606
+ * Added option -Y to use soft clipping for supplementary alignments (#56).
607
+
608
+ (2.5: 11 November 2017, r572)
609
+
610
+
611
+
612
+ Release 2.4-r555 (6 November 2017)
613
+ ----------------------------------
614
+
615
+ As is planned, this release focuses on fine tuning the base algorithm. Notable
616
+ changes include
617
+
618
+ * Changed the mapping quality scale to match the scale of BWA-MEM. This makes
619
+ minimap2 and BWA-MEM achieve similar sensitivity-specificity balance on real
620
+ short-read data.
621
+
622
+ * Improved the accuracy of splice alignment by modeling one additional base
623
+ close to the GT-AG signal. This model is used by default with `-x splice`.
624
+ For SIRV control data, however, it is recommended to add `--splice-flank=no`
625
+ to disable this feature as the SIRV splice signals are slightly different.
626
+
627
+ * Tuned the parameters for Nanopore Direct RNA reads. The recommended command
628
+ line is `-axsplice -k14 -uf` (#46).
629
+
630
+ * Fixed a segmentation fault when aligning PacBio reads (#47 and #48). This
631
+ bug is very rare but it affects all versions of minimap2. It is also
632
+ recommended to re-index reference genomes created with `map-pb`. For human,
633
+ two minimizers in an old index are wrong.
634
+
635
+ * Changed option `-L` in sync with the final decision of hts-specs: a fake
636
+ CIGAR takes the form of `<readLen>S<refLen>N`. Note that `-L` only enables
637
+ future tools to recognize long CIGARs. It is not possible for older tools to
638
+ work with such alignments in BAM (#43 and #51).
639
+
640
+ * Fixed a tiny issue whereby minimap2 may waste 8 bytes per candidate
641
+ alignment.
642
+
643
+ The minimap2 technical note hosted at arXiv has also been updated to reflect
644
+ recent changes.
645
+
646
+ (2.4: 6 November 2017, r555)
647
+
648
+
649
+
650
+ Release 2.3-r531 (22 October 2017)
651
+ ----------------------------------
652
+
653
+ This release come with many improvements and bug fixes:
654
+
655
+ * The **sr** preset now supports paired-end short-read alignment. Minimap2 is
656
+ 3-4 times as fast as BWA-MEM, but is slightly less accurate on simulated
657
+ reads.
658
+
659
+ * Meticulous improvements to assembly-to-assembly alignment (special thanks to
660
+ Alexey Gurevich from the QUAST team): a) apply a small penalty to matches
661
+ between ambiguous bases; b) reduce missing alignments due to spurious
662
+ overlaps; c) introduce the short form of the `cs` tag, an improvement to the
663
+ SAM MD tag.
664
+
665
+ * Make sure gaps are always left-aligned.
666
+
667
+ * Recognize `U` bases from Oxford Nanopore Direct RNA-seq (#33).
668
+
669
+ * Fixed slightly wrong chaining score. Fixed slightly inaccurate coordinates
670
+ for split alignment.
671
+
672
+ * Fixed multiple reported bugs: 1) wrong reference name for inversion
673
+ alignment (#30); 2) redundant SQ lines when multiple query files are
674
+ specified (#39); 3) non-functioning option `-K` (#36).
675
+
676
+ This release has implemented all the major features I planned five months ago,
677
+ with the addition of spliced long-read alignment. The next couple of releases
678
+ will focus on fine tuning of the base algorithms.
679
+
680
+ (2.3: 22 October 2017, r531)
681
+
682
+
683
+
684
+ Release 2.2-r409 (17 September 2017)
685
+ ------------------------------------
686
+
687
+ This is a feature release. It improves single-end short-read alignment and
688
+ comes with Python bindings. Detailed changes include:
689
+
690
+ * Added the **sr** preset for single-end short-read alignment. In this mode,
691
+ minimap2 runs faster than BWA-MEM, but is slightly less accurate on
692
+ simulated data sets. Paired-end alignment is not supported as of now.
693
+
694
+ * Improved mapping quality estimate with more accurate identification of
695
+ repetitive hits. This mainly helps short-read alignment.
696
+
697
+ * Implemented **mappy**, a Python binding for minimap2, which is available
698
+ from PyPI and can be installed with `pip install --user mappy`. Python users
699
+ can perform read alignment without the minimap2 executable.
700
+
701
+ * Restructured the indexing APIs and documented key minimap2 APIs in the
702
+ header file minimap.h. Updated example.c with the new APIs. Old APIs still
703
+ work but may become deprecated in future.
704
+
705
+ This release may output alignments different from the previous version, though
706
+ the overall alignment statistics, such as the number of aligned bases and long
707
+ gaps, remain close.
708
+
709
+ (2.2: 17 September 2017, r409)
710
+
711
+
712
+
713
+ Release 2.1.1-r341 (6 September 2017)
714
+ -------------------------------------
715
+
716
+ This is a maintenance release that is expected to output identical alignment to
717
+ v2.1. Detailed changes include:
718
+
719
+ * Support CPU dispatch. By default, minimap2 is compiled with both SSE2 and
720
+ SSE4 based implementation of alignment and automatically chooses the right
721
+ one at runtime. This avoids unexpected errors on older CPUs (#21).
722
+
723
+ * Improved Windows support as is requested by Oxford Nanopore (#19). Minimap2
724
+ now avoids variable-length stacked arrays, eliminates alloca(), ships with
725
+ getopt_long() and provides timing functions implemented with Windows APIs.
726
+
727
+ * Fixed a potential segmentation fault when specifying -k/-w/-H with
728
+ multi-part index (#23).
729
+
730
+ * Fixed two memory leaks in example.c
731
+
732
+ (2.1.1: 6 September 2017, r341)
733
+
734
+
735
+
736
+ Release 2.1-r311 (25 August 2017)
737
+ ---------------------------------
738
+
739
+ This release adds spliced alignment for long noisy RNA-seq reads. On a SMRT
740
+ Iso-Seq and a Oxford Nanopore data sets, minimap2 appears to outperform
741
+ traditional mRNA aligners. For DNA alignment, this release gives almost
742
+ identical output to v2.0. Other changes include:
743
+
744
+ * Added option `-R` to set the read group header line in SAM.
745
+
746
+ * Optionally output the `cs:Z` tag in PAF to encode both the query and the
747
+ reference sequences in the alignment.
748
+
749
+ * Fixed an issue where DP alignment uses excessive memory.
750
+
751
+ The minimap2 technical report has been updated with more details and the
752
+ evaluation of spliced alignment:
753
+
754
+ * Li, H. (2017). Minimap2: fast pairwise alignment for long nucleotide
755
+ sequences. [arXiv:1708.01492v2](https://arxiv.org/abs/1708.01492v2).
756
+
757
+ (2.1: 25 August 2017, r311)
758
+
759
+
760
+
761
+ Release 2.0-r275 (8 August 2017)
762
+ --------------------------------
763
+
764
+ This release is identical to version 2.0rc1, except the version number. It is
765
+ described and evaluated in the following technical report:
766
+
767
+ * Li, H. (2017). Minimap2: fast pairwise alignment for long DNA sequences.
768
+ [arXiv:1708.01492v1](https://arxiv.org/abs/1708.01492v1).
769
+
770
+ (2.0: 8 August 2017, r275)
771
+
772
+
773
+
774
+ Release 2.0rc1-r232 (30 July 2017)
775
+ ----------------------------------
776
+
777
+ This release improves the accuracy of long-read alignment and added several
778
+ minor features.
779
+
780
+ * Improved mapping quality estimate for short alignments containing few seed
781
+ hits.
782
+
783
+ * Fixed a minor bug that affects the chaining accuracy towards the ends of a
784
+ chain. Changed the gap cost for chaining to reduce false seeding.
785
+
786
+ * Skip potentially wrong seeding and apply dynamic programming more frequently.
787
+ This slightly increases run time, but greatly reduces false long gaps.
788
+
789
+ * Perform local alignment at Z-drop break point to recover potential inversion
790
+ alignment. Output the SA tag in the SAM format. Added scripts to evaluate
791
+ mapping accuracy for reads simulated with pbsim.
792
+
793
+ This release completes features intended for v2.0. No major features will be
794
+ added to the master branch before the final v2.0.
795
+
796
+ (2.0rc1: 30 July 2017, r232)
797
+
798
+
799
+
800
+ Release r191 (19 July 2017)
801
+ ---------------------------
802
+
803
+ This is the first public release of minimap2, an aligner for long reads and
804
+ assemblies. This release has a few issues and is generally not recommended for
805
+ production uses.
806
+
807
+ (19 July 2017, r191)