minimap2 0.2.22.0 → 0.2.24.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +60 -76
- data/ext/Rakefile +55 -0
- data/ext/cmappy/cmappy.c +129 -0
- data/ext/cmappy/cmappy.h +44 -0
- data/ext/minimap2/FAQ.md +46 -0
- data/ext/minimap2/LICENSE.txt +24 -0
- data/ext/minimap2/MANIFEST.in +10 -0
- data/ext/minimap2/Makefile +132 -0
- data/ext/minimap2/Makefile.simde +97 -0
- data/ext/minimap2/NEWS.md +821 -0
- data/ext/minimap2/README.md +403 -0
- data/ext/minimap2/align.c +1020 -0
- data/ext/minimap2/bseq.c +169 -0
- data/ext/minimap2/bseq.h +64 -0
- data/ext/minimap2/code_of_conduct.md +30 -0
- data/ext/minimap2/cookbook.md +243 -0
- data/ext/minimap2/esterr.c +64 -0
- data/ext/minimap2/example.c +63 -0
- data/ext/minimap2/format.c +559 -0
- data/ext/minimap2/hit.c +466 -0
- data/ext/minimap2/index.c +775 -0
- data/ext/minimap2/kalloc.c +205 -0
- data/ext/minimap2/kalloc.h +76 -0
- data/ext/minimap2/kdq.h +132 -0
- data/ext/minimap2/ketopt.h +120 -0
- data/ext/minimap2/khash.h +615 -0
- data/ext/minimap2/krmq.h +474 -0
- data/ext/minimap2/kseq.h +256 -0
- data/ext/minimap2/ksort.h +153 -0
- data/ext/minimap2/ksw2.h +184 -0
- data/ext/minimap2/ksw2_dispatch.c +96 -0
- data/ext/minimap2/ksw2_extd2_sse.c +402 -0
- data/ext/minimap2/ksw2_exts2_sse.c +416 -0
- data/ext/minimap2/ksw2_extz2_sse.c +313 -0
- data/ext/minimap2/ksw2_ll_sse.c +152 -0
- data/ext/minimap2/kthread.c +159 -0
- data/ext/minimap2/kthread.h +15 -0
- data/ext/minimap2/kvec.h +105 -0
- data/ext/minimap2/lchain.c +369 -0
- data/ext/minimap2/main.c +459 -0
- data/ext/minimap2/map.c +714 -0
- data/ext/minimap2/minimap.h +410 -0
- data/ext/minimap2/minimap2.1 +725 -0
- data/ext/minimap2/misc/README.md +179 -0
- data/ext/minimap2/misc/mmphase.js +335 -0
- data/ext/minimap2/misc/paftools.js +3149 -0
- data/ext/minimap2/misc.c +162 -0
- data/ext/minimap2/mmpriv.h +132 -0
- data/ext/minimap2/options.c +234 -0
- data/ext/minimap2/pe.c +177 -0
- data/ext/minimap2/python/README.rst +196 -0
- data/ext/minimap2/python/cmappy.h +152 -0
- data/ext/minimap2/python/cmappy.pxd +153 -0
- data/ext/minimap2/python/mappy.pyx +273 -0
- data/ext/minimap2/python/minimap2.py +39 -0
- data/ext/minimap2/sdust.c +213 -0
- data/ext/minimap2/sdust.h +25 -0
- data/ext/minimap2/seed.c +131 -0
- data/ext/minimap2/setup.py +55 -0
- data/ext/minimap2/sketch.c +143 -0
- data/ext/minimap2/splitidx.c +84 -0
- data/ext/minimap2/sse2neon/emmintrin.h +1689 -0
- data/ext/minimap2/test/MT-human.fa +278 -0
- data/ext/minimap2/test/MT-orang.fa +276 -0
- data/ext/minimap2/test/q-inv.fa +4 -0
- data/ext/minimap2/test/q2.fa +2 -0
- data/ext/minimap2/test/t-inv.fa +127 -0
- data/ext/minimap2/test/t2.fa +2 -0
- data/ext/minimap2/tex/Makefile +21 -0
- data/ext/minimap2/tex/bioinfo.cls +930 -0
- data/ext/minimap2/tex/blasr-mc.eval +17 -0
- data/ext/minimap2/tex/bowtie2-s3.sam.eval +28 -0
- data/ext/minimap2/tex/bwa-s3.sam.eval +52 -0
- data/ext/minimap2/tex/bwa.eval +55 -0
- data/ext/minimap2/tex/eval2roc.pl +33 -0
- data/ext/minimap2/tex/graphmap.eval +4 -0
- data/ext/minimap2/tex/hs38-simu.sh +10 -0
- data/ext/minimap2/tex/minialign.eval +49 -0
- data/ext/minimap2/tex/minimap2.bib +460 -0
- data/ext/minimap2/tex/minimap2.tex +724 -0
- data/ext/minimap2/tex/mm2-s3.sam.eval +62 -0
- data/ext/minimap2/tex/mm2-update.tex +240 -0
- data/ext/minimap2/tex/mm2.approx.eval +12 -0
- data/ext/minimap2/tex/mm2.eval +13 -0
- data/ext/minimap2/tex/natbib.bst +1288 -0
- data/ext/minimap2/tex/natbib.sty +803 -0
- data/ext/minimap2/tex/ngmlr.eval +38 -0
- data/ext/minimap2/tex/roc.gp +60 -0
- data/ext/minimap2/tex/snap-s3.sam.eval +62 -0
- data/ext/minimap2.patch +19 -0
- data/lib/minimap2/aligner.rb +4 -4
- data/lib/minimap2/alignment.rb +11 -11
- data/lib/minimap2/ffi/constants.rb +20 -16
- data/lib/minimap2/ffi/functions.rb +5 -0
- data/lib/minimap2/ffi.rb +4 -5
- data/lib/minimap2/version.rb +2 -2
- data/lib/minimap2.rb +51 -15
- metadata +97 -79
- data/lib/minimap2/ffi_helper.rb +0 -53
- data/vendor/libminimap2.so +0 -0
@@ -0,0 +1,821 @@
|
|
1
|
+
Release 2.24-r1122 (26 December 2021)
|
2
|
+
-------------------------------------
|
3
|
+
|
4
|
+
This release improves alignment around long poorly aligned regions. Older
|
5
|
+
minimap2 may chain through such regions in rare cases which may result in
|
6
|
+
missing alignments later. The issue has become worse since the the change of
|
7
|
+
the chaining algorithm in v2.19. v2.23 implements an incomplete remedy. This
|
8
|
+
release provides a better solution with a X-drop-like heuristic and by enabling
|
9
|
+
two-bandwidth chaining in the assembly mode.
|
10
|
+
|
11
|
+
(2.24: 26 December 2021, r1122)
|
12
|
+
|
13
|
+
|
14
|
+
|
15
|
+
Release 2.23-r1111 (18 November 2021)
|
16
|
+
-------------------------------------
|
17
|
+
|
18
|
+
Notable changes:
|
19
|
+
|
20
|
+
* Bugfix: fixed missing alignments around long inversions (#806 and #816).
|
21
|
+
This bug affected v2.19 through v2.22.
|
22
|
+
|
23
|
+
* Improvement: avoid extremely long mapping time for pathologic reads with
|
24
|
+
highly repeated k-mers not in the reference (#771). Use --q-occ-frac=0
|
25
|
+
to disable the new heuristic.
|
26
|
+
|
27
|
+
* Change: use --cap-kalloc=1g by default.
|
28
|
+
|
29
|
+
(2.23: 18 November 2021, r1111)
|
30
|
+
|
31
|
+
|
32
|
+
|
33
|
+
Release 2.22-r1101 (7 August 2021)
|
34
|
+
----------------------------------
|
35
|
+
|
36
|
+
When choosing the best alignment, this release uses logarithm gap penalty and
|
37
|
+
query-specific mismatch penalty. It improves the sensitivity to long INDELs in
|
38
|
+
repetitive regions.
|
39
|
+
|
40
|
+
Other notable changes:
|
41
|
+
|
42
|
+
* Bugfix: fixed an indirect memory leak that may waste a large amount of
|
43
|
+
memory given highly repetitive reference such as a 16S RNA database (#749).
|
44
|
+
All versions of minimap2 have this issue.
|
45
|
+
|
46
|
+
* New feature: added --cap-kalloc to reduce the peak memory. This option is
|
47
|
+
not enabled by default but may become the default in future releases.
|
48
|
+
|
49
|
+
Known issue:
|
50
|
+
|
51
|
+
* Minimap2 may take a long time to map a read (#771). So far it is not clear
|
52
|
+
if this happens to v2.18 and earlier versions.
|
53
|
+
|
54
|
+
(2.22: 7 August 2021, r1101)
|
55
|
+
|
56
|
+
|
57
|
+
|
58
|
+
Release 2.21-r1071 (6 July 2021)
|
59
|
+
--------------------------------
|
60
|
+
|
61
|
+
This release fixed a regression in short-read mapping introduced in v2.19
|
62
|
+
(#776). It also fixed invalid comparisons of uninitialized variables, though
|
63
|
+
these are harmless (#752). Long-read alignment should be identical to v2.20.
|
64
|
+
|
65
|
+
(2.21: 6 July 2021, r1071)
|
66
|
+
|
67
|
+
|
68
|
+
|
69
|
+
Release 2.20-r1061 (27 May 2021)
|
70
|
+
--------------------------------
|
71
|
+
|
72
|
+
This release fixed a bug in the Python module and improves the command-line
|
73
|
+
compatibiliity with v2.18. In v2.19, if `-r` is specified with an `asm*` preset,
|
74
|
+
users would get alignments more fragmented than v2.18. This could be an issue
|
75
|
+
for existing pipelines specifying `-r`. This release resolves this issue.
|
76
|
+
|
77
|
+
(2.20: 27 May 2021, r1061)
|
78
|
+
|
79
|
+
|
80
|
+
|
81
|
+
Release 2.19-r1057 (26 May 2021)
|
82
|
+
--------------------------------
|
83
|
+
|
84
|
+
This release includes a few important improvements backported from unimap:
|
85
|
+
|
86
|
+
* Improvement: more contiguous alignment through long INDELs. This is enabled
|
87
|
+
by the minigraph chaining algorithm. All `asm*` presets now use the new
|
88
|
+
algorithm. They can find INDELs up to 100kb and may be faster for
|
89
|
+
chromosome-long contigs. The default mode and `map*` presets use this
|
90
|
+
algorithm to replace the long-join heuristic.
|
91
|
+
|
92
|
+
* Improvement: better alignment in highly repetitive regions by rescuing
|
93
|
+
high-occurrence seeds. If the distance between two adjacent seeds is too
|
94
|
+
large, attempt to choose a fraction of high-occurrence seeds in-between.
|
95
|
+
Minimap2 now produces fewer clippings and alignment break points in long
|
96
|
+
satellite regions.
|
97
|
+
|
98
|
+
* Improvement: allow to specify an interval of k-mer occurrences with `-U`.
|
99
|
+
For repeat-rich genomes, the automatic k-mer occurrence threshold determined
|
100
|
+
by `-f` may be too large and makes alignment impractically slow. The new
|
101
|
+
option protects against such cases. Enabled for `asm*` and `map-hifi`.
|
102
|
+
|
103
|
+
* New feature: added the `map-hifi` preset for maping PacBio High-Fidelity
|
104
|
+
(HiFi) reads.
|
105
|
+
|
106
|
+
* Change to the default: apply `--cap-sw-mem=100m` for genomic alignment.
|
107
|
+
|
108
|
+
* Bugfix: minimap2 could not generate an index file with `-xsr` (#734).
|
109
|
+
|
110
|
+
This release represents the most signficant algorithmic change since v2.1 in
|
111
|
+
2017. With features backported from unimap, minimap2 now has similar power to
|
112
|
+
unimap for contig alignment. Unimap will remain an experimental project and is
|
113
|
+
no longer recommended over minimap2. Sorry for reverting the recommendation in
|
114
|
+
short time.
|
115
|
+
|
116
|
+
(2.19: 26 May 2021, r1057)
|
117
|
+
|
118
|
+
|
119
|
+
|
120
|
+
Release 2.18-r1015 (9 April 2021)
|
121
|
+
---------------------------------
|
122
|
+
|
123
|
+
This release fixes multiple rare bugs in minimap2 and adds additional
|
124
|
+
functionality to paftools.js.
|
125
|
+
|
126
|
+
Changes to minimap2:
|
127
|
+
|
128
|
+
* Bugfix: a rare segfault caused by an off-by-one error (#489)
|
129
|
+
|
130
|
+
* Bugfix: minimap2 segfaulted due to an uninitilized variable (#622 and #625).
|
131
|
+
|
132
|
+
* Bugfix: minimap2 parsed spaces as field separators in BED (#721). This led
|
133
|
+
to issues when the BED name column contains spaces.
|
134
|
+
|
135
|
+
* Bugfix: minimap2 `--split-prefix` did not work with long reference names
|
136
|
+
(#394).
|
137
|
+
|
138
|
+
* Bugfix: option `--junc-bonus` didn't work (#513)
|
139
|
+
|
140
|
+
* Bugfix: minimap2 didn't return 1 on I/O errors (#532)
|
141
|
+
|
142
|
+
* Bugfix: the `de:f` tag (sequence divergence) could be negative if there were
|
143
|
+
ambiguous bases
|
144
|
+
|
145
|
+
* Bugfix: fixed two undefined behaviors caused by calling memcpy() on
|
146
|
+
zero-length blocks (#443)
|
147
|
+
|
148
|
+
* Bugfix: there were duplicated SAM @SQ lines if option `--split-prefix` is in
|
149
|
+
use (#400 and #527)
|
150
|
+
|
151
|
+
* Bugfix: option -K had to be smaller than 2 billion (#491). This was caused
|
152
|
+
by a 32-bit integer overflow.
|
153
|
+
|
154
|
+
* Improvement: optionally compile against SIMDe (#597). Minimap2 should work
|
155
|
+
with IBM POWER CPUs, though this has not been tested. To compile with SIMDe,
|
156
|
+
please use `make -f Makefile.simde`.
|
157
|
+
|
158
|
+
* Improvement: more informative error message for I/O errors (#454) and for
|
159
|
+
FASTQ parsing errors (#510)
|
160
|
+
|
161
|
+
* Improvement: abort given malformatted RG line (#541)
|
162
|
+
|
163
|
+
* Improvement: better formula to estimate the `dv:f` tag (approximate sequence
|
164
|
+
divergence). See DOI:10.1101/2021.01.15.426881.
|
165
|
+
|
166
|
+
* New feature: added the `--mask-len` option to fine control the removal of
|
167
|
+
redundant hits (#659). The default behavior is unchanged.
|
168
|
+
|
169
|
+
Changes to mappy:
|
170
|
+
|
171
|
+
* Bugfix: mappy caused segmentation fault if the reference index is not
|
172
|
+
present (#413).
|
173
|
+
|
174
|
+
* Bugfix: fixed a memory leak via 238b6bb3
|
175
|
+
|
176
|
+
* Change: always require Cython to compile the mappy module (#723). Older
|
177
|
+
mappy packages at PyPI bundled the C source code generated by Cython such
|
178
|
+
that end users did not need to install Cython to compile mappy. However, as
|
179
|
+
Python 3.9 is breaking backward compatibility, older mappy does not work
|
180
|
+
with Python 3.9 anymore. We have to add this Cython dependency as a
|
181
|
+
workaround.
|
182
|
+
|
183
|
+
Changes to paftools.js:
|
184
|
+
|
185
|
+
* Bugfix: the "part10-" line from asmgene was wrong (#581)
|
186
|
+
|
187
|
+
* Improvement: compatibility with GTF files from GenBank (#422)
|
188
|
+
|
189
|
+
* New feature: asmgene also checks missing multi-copy genes
|
190
|
+
|
191
|
+
* New feature: added the misjoin command to evaluate large-scale misjoins and
|
192
|
+
megabase-long inversions.
|
193
|
+
|
194
|
+
Although given the many bug fixes and minor improvements, the core algorithm
|
195
|
+
stays the same. This version of minimap2 produces nearly identical alignments
|
196
|
+
to v2.17 except very rare corner cases.
|
197
|
+
|
198
|
+
Now unimap is recommended over minimap2 for aligning long contigs against a
|
199
|
+
reference genome. It often takes less wall-clock time and is much more
|
200
|
+
sensitive to long insertions and deletions.
|
201
|
+
|
202
|
+
(2.18: 9 April 2021, r1015)
|
203
|
+
|
204
|
+
|
205
|
+
|
206
|
+
Release 2.17-r941 (4 May 2019)
|
207
|
+
------------------------------
|
208
|
+
|
209
|
+
Changes since the last release:
|
210
|
+
|
211
|
+
* Fixed flawed CIGARs like `5I6D7I` (#392).
|
212
|
+
|
213
|
+
* Bugfix: TLEN should be 0 when either end is unmapped (#373 and #365).
|
214
|
+
|
215
|
+
* Bugfix: mappy is unable to write index (#372).
|
216
|
+
|
217
|
+
* Added option `--junc-bed` to load known gene annotations in the BED12
|
218
|
+
format. Minimap2 prefers annotated junctions over novel junctions (#197 and
|
219
|
+
#348). GTF can be converted to BED12 with `paftools.js gff2bed`.
|
220
|
+
|
221
|
+
* Added option `--sam-hit-only` to suppress unmapped hits in SAM (#377).
|
222
|
+
|
223
|
+
* Added preset `splice:hq` for high-quality CCS or mRNA sequences. It applies
|
224
|
+
better scoring and improves the sensitivity to small exons. This preset may
|
225
|
+
introduce false small introns, but the overall accuracy should be higher.
|
226
|
+
|
227
|
+
This version produces nearly identical alignments to v2.16, except for CIGARs
|
228
|
+
affected by the bug mentioned above.
|
229
|
+
|
230
|
+
(2.17: 5 May 2019, r941)
|
231
|
+
|
232
|
+
|
233
|
+
|
234
|
+
Release 2.16-r922 (28 February 2019)
|
235
|
+
------------------------------------
|
236
|
+
|
237
|
+
This release is 50% faster for mapping ultra-long nanopore reads at comparable
|
238
|
+
accuracy. For short-read mapping, long-read overlapping and ordinary long-read
|
239
|
+
mapping, the performance and accuracy remain similar. This speedup is achieved
|
240
|
+
with a new heuristic to limit the number of chaining iterations (#324). Users
|
241
|
+
can disable the heuristic by increasing a new option `--max-chain-iter` to a
|
242
|
+
huge number.
|
243
|
+
|
244
|
+
Other changes to minimap2:
|
245
|
+
|
246
|
+
* Implemented option `--paf-no-hit` to output unmapped query sequences in PAF.
|
247
|
+
The strand and reference name columns are both `*` at an unmapped line. The
|
248
|
+
hidden option is available in earlier minimap2 but had a different 2-column
|
249
|
+
output format instead of PAF.
|
250
|
+
|
251
|
+
* Fixed a bug that leads to wrongly calculated `de` tags when ambiguous bases
|
252
|
+
are involved (#309). This bug only affects v2.15.
|
253
|
+
|
254
|
+
* Fixed a bug when parsing command-line option `--splice` (#344). This bug was
|
255
|
+
introduced in v2.13.
|
256
|
+
|
257
|
+
* Fixed two division-by-zero cases (#326). They don't affect final alignments
|
258
|
+
because the results of the divisions are not used in both case.
|
259
|
+
|
260
|
+
* Added an option `-o` to output alignments to a specified file. It is still
|
261
|
+
recommended to use UNIX pipes for on-the-fly conversion or compression.
|
262
|
+
|
263
|
+
* Output a new `rl` tag to give the length of query regions harboring
|
264
|
+
repetitive seeds.
|
265
|
+
|
266
|
+
Changes to paftool.js:
|
267
|
+
|
268
|
+
* Added a new option to convert the MD tag to the long form of the cs tag.
|
269
|
+
|
270
|
+
Changes to mappy:
|
271
|
+
|
272
|
+
* Added the `mappy.Aligner.seq_names` method to return sequence names (#312).
|
273
|
+
|
274
|
+
For NA12878 ultra-long reads, this release changes the alignments of <0.1% of
|
275
|
+
reads in comparison to v2.15. All these reads have highly fragmented alignments
|
276
|
+
and are likely to be problematic anyway. For shorter or well aligned reads,
|
277
|
+
this release should produce mostly identical alignments to v2.15.
|
278
|
+
|
279
|
+
(2.16: 28 February 2019, r922)
|
280
|
+
|
281
|
+
|
282
|
+
|
283
|
+
Release 2.15-r905 (10 January 2019)
|
284
|
+
-----------------------------------
|
285
|
+
|
286
|
+
Changes to minimap2:
|
287
|
+
|
288
|
+
* Fixed a rare segmentation fault when option -H is in use (#307). This may
|
289
|
+
happen when there are very long homopolymers towards the 5'-end of a read.
|
290
|
+
|
291
|
+
* Fixed wrong CIGARs when option --eqx is used (#266).
|
292
|
+
|
293
|
+
* Fixed a typo in the base encoding table (#264). This should have no
|
294
|
+
practical effect.
|
295
|
+
|
296
|
+
* Fixed a typo in the example code (#265).
|
297
|
+
|
298
|
+
* Improved the C++ compatibility by removing "register" (#261). However,
|
299
|
+
minimap2 still can't be compiled in the pedantic C++ mode (#306).
|
300
|
+
|
301
|
+
* Output a new "de" tag for gap-compressed sequence divergence.
|
302
|
+
|
303
|
+
Changes to paftools.js:
|
304
|
+
|
305
|
+
* Added "asmgene" to evaluate the completeness of an assembly by measuring the
|
306
|
+
uniquely mapped single-copy genes. This command learns the idea of BUSCO.
|
307
|
+
|
308
|
+
* Added "vcfpair" to call a phased VCF from phased whole-genome assemblies. An
|
309
|
+
earlier version of this script is used to produce the ground truth for the
|
310
|
+
syndip benchmark [PMID:30013044].
|
311
|
+
|
312
|
+
This release produces identical alignment coordinates and CIGARs in comparison
|
313
|
+
to v2.14. Users are advised to upgrade due to the several bug fixes.
|
314
|
+
|
315
|
+
(2.15: 10 Janurary 2019, r905)
|
316
|
+
|
317
|
+
|
318
|
+
|
319
|
+
Release 2.14-r883 (5 November 2018)
|
320
|
+
-----------------------------------
|
321
|
+
|
322
|
+
Notable changes:
|
323
|
+
|
324
|
+
* Fixed two minor bugs caused by typos (#254 and #266).
|
325
|
+
|
326
|
+
* Fixed a bug that made minimap2 abort when --eqx was used together with --MD
|
327
|
+
or --cs (#257).
|
328
|
+
|
329
|
+
* Added --cap-sw-mem to cap the size of DP matrices (#259). Base alignment may
|
330
|
+
take a lot of memory in the splicing mode. This may lead to issues when we
|
331
|
+
run minimap2 on a cluster with a hard memory limit. The new option avoids
|
332
|
+
unlimited memory usage at the cost of missing a few long introns.
|
333
|
+
|
334
|
+
* Conforming to C99 and C11 when possible (#261).
|
335
|
+
|
336
|
+
* Warn about malformatted FASTA or FASTQ (#252 and #255).
|
337
|
+
|
338
|
+
This release occasionally produces base alignments different from v2.13. The
|
339
|
+
overall alignment accuracy remain similar.
|
340
|
+
|
341
|
+
(2.14: 5 November 2018, r883)
|
342
|
+
|
343
|
+
|
344
|
+
|
345
|
+
Release 2.13-r850 (11 October 2018)
|
346
|
+
-----------------------------------
|
347
|
+
|
348
|
+
Changes to minimap2:
|
349
|
+
|
350
|
+
* Fixed wrongly formatted SAM when -L is in use (#231 and #233).
|
351
|
+
|
352
|
+
* Fixed an integer overflow in rare cases.
|
353
|
+
|
354
|
+
* Added --hard-mask-level to fine control split alignments (#244).
|
355
|
+
|
356
|
+
* Made --MD work with spliced alignment (#139).
|
357
|
+
|
358
|
+
* Replaced musl's getopt with ketopt for portability.
|
359
|
+
|
360
|
+
* Log peak memory usage on exit.
|
361
|
+
|
362
|
+
This release should produce alignments identical to v2.12 and v2.11.
|
363
|
+
|
364
|
+
(2.13: 11 October 2018, r850)
|
365
|
+
|
366
|
+
|
367
|
+
|
368
|
+
Release 2.12-r827 (6 August 2018)
|
369
|
+
---------------------------------
|
370
|
+
|
371
|
+
Changes to minimap2:
|
372
|
+
|
373
|
+
* Added option --split-prefix to write proper alignments (correct mapping
|
374
|
+
quality and clustered query sequences) given a multi-part index (#141 and
|
375
|
+
#189; mostly by @hasindu2008).
|
376
|
+
|
377
|
+
* Fixed a memory leak when option -y is in use.
|
378
|
+
|
379
|
+
Changes to mappy:
|
380
|
+
|
381
|
+
* Support the MD/cs tag (#183 and #203).
|
382
|
+
|
383
|
+
* Allow mappy to index a single sequence, to add extra flags and to change the
|
384
|
+
scoring system.
|
385
|
+
|
386
|
+
Minimap2 should produce alignments identical to v2.11.
|
387
|
+
|
388
|
+
(2.12: 6 August 2018, r827)
|
389
|
+
|
390
|
+
|
391
|
+
|
392
|
+
Release 2.11-r797 (20 June 2018)
|
393
|
+
--------------------------------
|
394
|
+
|
395
|
+
Changes to minimap2:
|
396
|
+
|
397
|
+
* Improved alignment accuracy in low-complexity regions for SV calling. Thank
|
398
|
+
@armintoepfer for multiple offline examples.
|
399
|
+
|
400
|
+
* Added option --eqx to encode sequence match/mismatch with the =/X CIGAR
|
401
|
+
operators (#156, #157 and #175).
|
402
|
+
|
403
|
+
* When compiled with VC++, minimap2 generated wrong alignments due to a
|
404
|
+
comparison between a signed integer and an unsigned integer (#184). Also
|
405
|
+
fixed warnings reported by "clang -Wextra".
|
406
|
+
|
407
|
+
* Fixed incorrect anchor filtering due to a missing 64- to 32-bit cast.
|
408
|
+
|
409
|
+
* Fixed incorrect mapping quality for inversions (#148).
|
410
|
+
|
411
|
+
* Fixed incorrect alignment involving ambiguous bases (#155).
|
412
|
+
|
413
|
+
* Fixed incorrect presets: option `-r 2000` is intended to be used with
|
414
|
+
ava-ont, not ava-pb. The bug was introduced in 2.10.
|
415
|
+
|
416
|
+
* Fixed a bug when --for-only/--rev-only is used together with --sr or
|
417
|
+
--heap-sort=yes (#166).
|
418
|
+
|
419
|
+
* Fixed option -Y that was not working in the previous releases.
|
420
|
+
|
421
|
+
* Added option --lj-min-ratio to fine control the alignment of long gaps
|
422
|
+
found by the "long-join" heuristic (#128).
|
423
|
+
|
424
|
+
* Exposed `mm_idx_is_idx`, `mm_idx_load` and `mm_idx_dump` C APIs (#177).
|
425
|
+
Also fixed a bug when indexing without reference names (this feature is not
|
426
|
+
exposed to the command line).
|
427
|
+
|
428
|
+
Changes to mappy:
|
429
|
+
|
430
|
+
* Added `__version__` (#165).
|
431
|
+
|
432
|
+
* Exposed the maximum fragment length parameter to mappy (#174).
|
433
|
+
|
434
|
+
Changes to paftools:
|
435
|
+
|
436
|
+
* Don't crash when there is no "cg" tag (#153).
|
437
|
+
|
438
|
+
* Fixed wrong coverage report by "paftools.js call" (#145).
|
439
|
+
|
440
|
+
This version may produce slightly different base-level alignment. The overall
|
441
|
+
alignment statistics should remain similar.
|
442
|
+
|
443
|
+
(2.11: 20 June 2018, r797)
|
444
|
+
|
445
|
+
|
446
|
+
|
447
|
+
Release 2.10-r761 (27 March 2018)
|
448
|
+
---------------------------------
|
449
|
+
|
450
|
+
Changes to minimap2:
|
451
|
+
|
452
|
+
* Optionally output the MD tag for compatibility with existing tools (#63,
|
453
|
+
#118 and #137).
|
454
|
+
|
455
|
+
* Use SSE compiler flags more precisely to prevent compiling errors on certain
|
456
|
+
machines (#127).
|
457
|
+
|
458
|
+
* Added option --min-occ-floor to set a minimum occurrence threshold. Presets
|
459
|
+
intended for assembly-to-reference alignment set this option to 100. This
|
460
|
+
option alleviates issues with regions having high copy numbers (#107).
|
461
|
+
|
462
|
+
* Exit with non-zero code on file writing errors (e.g. disk full; #103 and
|
463
|
+
#132).
|
464
|
+
|
465
|
+
* Added option -y to copy FASTA/FASTQ comments in query sequences to the
|
466
|
+
output (#136).
|
467
|
+
|
468
|
+
* Added the asm20 preset for alignments between genomes at 5-10% sequence
|
469
|
+
divergence.
|
470
|
+
|
471
|
+
* Changed the band-width in the ava-ont preset from 500 to 2000. Oxford
|
472
|
+
Nanopore reads may contain long deletion sequencing errors that break
|
473
|
+
chaining.
|
474
|
+
|
475
|
+
Changes to mappy, the Python binding:
|
476
|
+
|
477
|
+
* Fixed a typo in Align.seq() (#126).
|
478
|
+
|
479
|
+
Changes to paftools.js, the companion script:
|
480
|
+
|
481
|
+
* Command sam2paf now converts the MD tag to cs.
|
482
|
+
|
483
|
+
* Support VCF output for assembly-to-reference variant calling (#109).
|
484
|
+
|
485
|
+
This version should produce identical alignment for read overlapping, RNA-seq
|
486
|
+
read mapping, and genomic read mapping. We have also added a cook book to show
|
487
|
+
the variety uses of minimap2 on real datasets. Please see cookbook.md in the
|
488
|
+
minimap2 source code directory.
|
489
|
+
|
490
|
+
(2.10: 27 March 2017, r761)
|
491
|
+
|
492
|
+
|
493
|
+
|
494
|
+
Release 2.9-r720 (23 February 2018)
|
495
|
+
-----------------------------------
|
496
|
+
|
497
|
+
This release fixed multiple minor bugs.
|
498
|
+
|
499
|
+
* Fixed two bugs that lead to incorrect inversion alignment. Also improved the
|
500
|
+
sensitivity to small inversions by using double Z-drop cutoff (#112).
|
501
|
+
|
502
|
+
* Fixed an issue that may cause the end of a query sequence unmapped (#104).
|
503
|
+
|
504
|
+
* Added a mappy API to retrieve sequences from the index (#126) and to reverse
|
505
|
+
complement DNA sequences. Fixed a bug where the `best_n` parameter did not
|
506
|
+
work (#117).
|
507
|
+
|
508
|
+
* Avoided segmentation fault given incorrect FASTQ input (#111).
|
509
|
+
|
510
|
+
* Combined all auxiliary javascripts to paftools.js. Fixed several bugs in
|
511
|
+
these scripts at the same time.
|
512
|
+
|
513
|
+
(2.9: 24 February 2018, r720)
|
514
|
+
|
515
|
+
|
516
|
+
|
517
|
+
Release 2.8-r672 (1 February 2018)
|
518
|
+
----------------------------------
|
519
|
+
|
520
|
+
Notable changes in this release include:
|
521
|
+
|
522
|
+
* Speed up short-read alignment by ~10%. The overall mapping accuracy stays
|
523
|
+
the same, but the output alignments are not always identical to v2.7 due to
|
524
|
+
unstable sorting employed during chaining. Long-read alignment is not
|
525
|
+
affected by this change as the speedup is short-read specific.
|
526
|
+
|
527
|
+
* Mappy now supports paired-end short-read alignment (#87). Please see
|
528
|
+
python/README.rst for details.
|
529
|
+
|
530
|
+
* Added option --for-only and --rev-only to perform alignment against the
|
531
|
+
forward or the reverse strand of the reference genome only (#91).
|
532
|
+
|
533
|
+
* Alleviated the issue with undesired diagonal alignment in the self mapping
|
534
|
+
mode (#10). Even if the output is not ideal, it should not interfere with
|
535
|
+
other alignments. Fully resolving the issue is intricate and may require
|
536
|
+
additional heuristic thresholds.
|
537
|
+
|
538
|
+
* Enhanced error checking against incorrect input (#92 and #96).
|
539
|
+
|
540
|
+
For long query sequences, minimap2 should output identical alignments to v2.7.
|
541
|
+
|
542
|
+
(2.8: 1 February 2018, r672)
|
543
|
+
|
544
|
+
|
545
|
+
|
546
|
+
Release 2.7-r654 (9 January 2018)
|
547
|
+
---------------------------------
|
548
|
+
|
549
|
+
This release fixed a bug in the splice mode and added a few minor features:
|
550
|
+
|
551
|
+
* Fixed a bug that occasionally takes an intron as a long deletion in the
|
552
|
+
splice mode. This was caused by wrong backtracking at the last CIGAR
|
553
|
+
operator. The current fix eliminates the error, but it is not optimal in
|
554
|
+
that it often produces a wrong junction when the last operator is an intron.
|
555
|
+
A future version of minimap2 may improve upon this.
|
556
|
+
|
557
|
+
* Support high-end ARM CPUs that implement the NEON instruction set (#81).
|
558
|
+
This enables minimap2 to work on Raspberry Pi 3 and Odroid XU4.
|
559
|
+
|
560
|
+
* Added a C API to construct a minimizer index from a set of C strings (#80).
|
561
|
+
|
562
|
+
* Check scoring specified on the command line (#79). Due to the 8-bit limit,
|
563
|
+
excessively large score penalties fail minimap2.
|
564
|
+
|
565
|
+
For genomic sequences, minimap2 should give identical alignments to v2.6.
|
566
|
+
|
567
|
+
(2.7: 9 January 2018, r654)
|
568
|
+
|
569
|
+
|
570
|
+
|
571
|
+
Release 2.6-r623 (12 December 2017)
|
572
|
+
-----------------------------------
|
573
|
+
|
574
|
+
This release adds several features and fixes two minor bugs:
|
575
|
+
|
576
|
+
* Optionally build an index without sequences. This helps to reduce the
|
577
|
+
peak memory for read overlapping and is automatically applied when
|
578
|
+
base-level alignment is not requested.
|
579
|
+
|
580
|
+
* Approximately estimate per-base sequence divergence (i.e. 1-identity)
|
581
|
+
without performing base-level alignment, using a MashMap-like method. The
|
582
|
+
estimate is written to a new dv:f tag.
|
583
|
+
|
584
|
+
* Reduced the number of tiny terminal exons in RNA-seq alignment. The current
|
585
|
+
setting is conservative. Increase --end-seed-pen to drop more such exons.
|
586
|
+
|
587
|
+
* Reduced the peak memory when aligning long query sequences.
|
588
|
+
|
589
|
+
* Fixed a bug that is caused by HPC minimizers longer than 256bp. This should
|
590
|
+
have no effect in practice, but it is recommended to rebuild HPC indices if
|
591
|
+
possible.
|
592
|
+
|
593
|
+
* Fixed a bug when identifying identical hits (#71). This should only affect
|
594
|
+
artifactual reference consisting of near identical sequences.
|
595
|
+
|
596
|
+
For genomic sequences, minimap2 should give nearly identical alignments to
|
597
|
+
v2.5, except the new dv:f tag.
|
598
|
+
|
599
|
+
(2.6: 12 December 2017, r623)
|
600
|
+
|
601
|
+
|
602
|
+
|
603
|
+
Release 2.5-r572 (11 November 2017)
|
604
|
+
-----------------------------------
|
605
|
+
|
606
|
+
This release fixes several bugs and brings a couple of minor improvements:
|
607
|
+
|
608
|
+
* Fixed a severe bug that leads to incorrect mapping coordinates in rare
|
609
|
+
corner cases.
|
610
|
+
|
611
|
+
* Fixed underestimated mapping quality for chimeric alignments when the whole
|
612
|
+
query sequence contain many repetitive minimizers, and for chimeric
|
613
|
+
alignments caused by Z-drop.
|
614
|
+
|
615
|
+
* Fixed two bugs in Python binding: incorrect strand field (#57) and incorrect
|
616
|
+
sequence names for Python3 (#55).
|
617
|
+
|
618
|
+
* Improved mapping accuracy for highly overlapping paired ends.
|
619
|
+
|
620
|
+
* Added option -Y to use soft clipping for supplementary alignments (#56).
|
621
|
+
|
622
|
+
(2.5: 11 November 2017, r572)
|
623
|
+
|
624
|
+
|
625
|
+
|
626
|
+
Release 2.4-r555 (6 November 2017)
|
627
|
+
----------------------------------
|
628
|
+
|
629
|
+
As is planned, this release focuses on fine tuning the base algorithm. Notable
|
630
|
+
changes include
|
631
|
+
|
632
|
+
* Changed the mapping quality scale to match the scale of BWA-MEM. This makes
|
633
|
+
minimap2 and BWA-MEM achieve similar sensitivity-specificity balance on real
|
634
|
+
short-read data.
|
635
|
+
|
636
|
+
* Improved the accuracy of splice alignment by modeling one additional base
|
637
|
+
close to the GT-AG signal. This model is used by default with `-x splice`.
|
638
|
+
For SIRV control data, however, it is recommended to add `--splice-flank=no`
|
639
|
+
to disable this feature as the SIRV splice signals are slightly different.
|
640
|
+
|
641
|
+
* Tuned the parameters for Nanopore Direct RNA reads. The recommended command
|
642
|
+
line is `-axsplice -k14 -uf` (#46).
|
643
|
+
|
644
|
+
* Fixed a segmentation fault when aligning PacBio reads (#47 and #48). This
|
645
|
+
bug is very rare but it affects all versions of minimap2. It is also
|
646
|
+
recommended to re-index reference genomes created with `map-pb`. For human,
|
647
|
+
two minimizers in an old index are wrong.
|
648
|
+
|
649
|
+
* Changed option `-L` in sync with the final decision of hts-specs: a fake
|
650
|
+
CIGAR takes the form of `<readLen>S<refLen>N`. Note that `-L` only enables
|
651
|
+
future tools to recognize long CIGARs. It is not possible for older tools to
|
652
|
+
work with such alignments in BAM (#43 and #51).
|
653
|
+
|
654
|
+
* Fixed a tiny issue whereby minimap2 may waste 8 bytes per candidate
|
655
|
+
alignment.
|
656
|
+
|
657
|
+
The minimap2 technical note hosted at arXiv has also been updated to reflect
|
658
|
+
recent changes.
|
659
|
+
|
660
|
+
(2.4: 6 November 2017, r555)
|
661
|
+
|
662
|
+
|
663
|
+
|
664
|
+
Release 2.3-r531 (22 October 2017)
|
665
|
+
----------------------------------
|
666
|
+
|
667
|
+
This release come with many improvements and bug fixes:
|
668
|
+
|
669
|
+
* The **sr** preset now supports paired-end short-read alignment. Minimap2 is
|
670
|
+
3-4 times as fast as BWA-MEM, but is slightly less accurate on simulated
|
671
|
+
reads.
|
672
|
+
|
673
|
+
* Meticulous improvements to assembly-to-assembly alignment (special thanks to
|
674
|
+
Alexey Gurevich from the QUAST team): a) apply a small penalty to matches
|
675
|
+
between ambiguous bases; b) reduce missing alignments due to spurious
|
676
|
+
overlaps; c) introduce the short form of the `cs` tag, an improvement to the
|
677
|
+
SAM MD tag.
|
678
|
+
|
679
|
+
* Make sure gaps are always left-aligned.
|
680
|
+
|
681
|
+
* Recognize `U` bases from Oxford Nanopore Direct RNA-seq (#33).
|
682
|
+
|
683
|
+
* Fixed slightly wrong chaining score. Fixed slightly inaccurate coordinates
|
684
|
+
for split alignment.
|
685
|
+
|
686
|
+
* Fixed multiple reported bugs: 1) wrong reference name for inversion
|
687
|
+
alignment (#30); 2) redundant SQ lines when multiple query files are
|
688
|
+
specified (#39); 3) non-functioning option `-K` (#36).
|
689
|
+
|
690
|
+
This release has implemented all the major features I planned five months ago,
|
691
|
+
with the addition of spliced long-read alignment. The next couple of releases
|
692
|
+
will focus on fine tuning of the base algorithms.
|
693
|
+
|
694
|
+
(2.3: 22 October 2017, r531)
|
695
|
+
|
696
|
+
|
697
|
+
|
698
|
+
Release 2.2-r409 (17 September 2017)
|
699
|
+
------------------------------------
|
700
|
+
|
701
|
+
This is a feature release. It improves single-end short-read alignment and
|
702
|
+
comes with Python bindings. Detailed changes include:
|
703
|
+
|
704
|
+
* Added the **sr** preset for single-end short-read alignment. In this mode,
|
705
|
+
minimap2 runs faster than BWA-MEM, but is slightly less accurate on
|
706
|
+
simulated data sets. Paired-end alignment is not supported as of now.
|
707
|
+
|
708
|
+
* Improved mapping quality estimate with more accurate identification of
|
709
|
+
repetitive hits. This mainly helps short-read alignment.
|
710
|
+
|
711
|
+
* Implemented **mappy**, a Python binding for minimap2, which is available
|
712
|
+
from PyPI and can be installed with `pip install --user mappy`. Python users
|
713
|
+
can perform read alignment without the minimap2 executable.
|
714
|
+
|
715
|
+
* Restructured the indexing APIs and documented key minimap2 APIs in the
|
716
|
+
header file minimap.h. Updated example.c with the new APIs. Old APIs still
|
717
|
+
work but may become deprecated in future.
|
718
|
+
|
719
|
+
This release may output alignments different from the previous version, though
|
720
|
+
the overall alignment statistics, such as the number of aligned bases and long
|
721
|
+
gaps, remain close.
|
722
|
+
|
723
|
+
(2.2: 17 September 2017, r409)
|
724
|
+
|
725
|
+
|
726
|
+
|
727
|
+
Release 2.1.1-r341 (6 September 2017)
|
728
|
+
-------------------------------------
|
729
|
+
|
730
|
+
This is a maintenance release that is expected to output identical alignment to
|
731
|
+
v2.1. Detailed changes include:
|
732
|
+
|
733
|
+
* Support CPU dispatch. By default, minimap2 is compiled with both SSE2 and
|
734
|
+
SSE4 based implementation of alignment and automatically chooses the right
|
735
|
+
one at runtime. This avoids unexpected errors on older CPUs (#21).
|
736
|
+
|
737
|
+
* Improved Windows support as is requested by Oxford Nanopore (#19). Minimap2
|
738
|
+
now avoids variable-length stacked arrays, eliminates alloca(), ships with
|
739
|
+
getopt_long() and provides timing functions implemented with Windows APIs.
|
740
|
+
|
741
|
+
* Fixed a potential segmentation fault when specifying -k/-w/-H with
|
742
|
+
multi-part index (#23).
|
743
|
+
|
744
|
+
* Fixed two memory leaks in example.c
|
745
|
+
|
746
|
+
(2.1.1: 6 September 2017, r341)
|
747
|
+
|
748
|
+
|
749
|
+
|
750
|
+
Release 2.1-r311 (25 August 2017)
|
751
|
+
---------------------------------
|
752
|
+
|
753
|
+
This release adds spliced alignment for long noisy RNA-seq reads. On a SMRT
|
754
|
+
Iso-Seq and a Oxford Nanopore data sets, minimap2 appears to outperform
|
755
|
+
traditional mRNA aligners. For DNA alignment, this release gives almost
|
756
|
+
identical output to v2.0. Other changes include:
|
757
|
+
|
758
|
+
* Added option `-R` to set the read group header line in SAM.
|
759
|
+
|
760
|
+
* Optionally output the `cs:Z` tag in PAF to encode both the query and the
|
761
|
+
reference sequences in the alignment.
|
762
|
+
|
763
|
+
* Fixed an issue where DP alignment uses excessive memory.
|
764
|
+
|
765
|
+
The minimap2 technical report has been updated with more details and the
|
766
|
+
evaluation of spliced alignment:
|
767
|
+
|
768
|
+
* Li, H. (2017). Minimap2: fast pairwise alignment for long nucleotide
|
769
|
+
sequences. [arXiv:1708.01492v2](https://arxiv.org/abs/1708.01492v2).
|
770
|
+
|
771
|
+
(2.1: 25 August 2017, r311)
|
772
|
+
|
773
|
+
|
774
|
+
|
775
|
+
Release 2.0-r275 (8 August 2017)
|
776
|
+
--------------------------------
|
777
|
+
|
778
|
+
This release is identical to version 2.0rc1, except the version number. It is
|
779
|
+
described and evaluated in the following technical report:
|
780
|
+
|
781
|
+
* Li, H. (2017). Minimap2: fast pairwise alignment for long DNA sequences.
|
782
|
+
[arXiv:1708.01492v1](https://arxiv.org/abs/1708.01492v1).
|
783
|
+
|
784
|
+
(2.0: 8 August 2017, r275)
|
785
|
+
|
786
|
+
|
787
|
+
|
788
|
+
Release 2.0rc1-r232 (30 July 2017)
|
789
|
+
----------------------------------
|
790
|
+
|
791
|
+
This release improves the accuracy of long-read alignment and added several
|
792
|
+
minor features.
|
793
|
+
|
794
|
+
* Improved mapping quality estimate for short alignments containing few seed
|
795
|
+
hits.
|
796
|
+
|
797
|
+
* Fixed a minor bug that affects the chaining accuracy towards the ends of a
|
798
|
+
chain. Changed the gap cost for chaining to reduce false seeding.
|
799
|
+
|
800
|
+
* Skip potentially wrong seeding and apply dynamic programming more frequently.
|
801
|
+
This slightly increases run time, but greatly reduces false long gaps.
|
802
|
+
|
803
|
+
* Perform local alignment at Z-drop break point to recover potential inversion
|
804
|
+
alignment. Output the SA tag in the SAM format. Added scripts to evaluate
|
805
|
+
mapping accuracy for reads simulated with pbsim.
|
806
|
+
|
807
|
+
This release completes features intended for v2.0. No major features will be
|
808
|
+
added to the master branch before the final v2.0.
|
809
|
+
|
810
|
+
(2.0rc1: 30 July 2017, r232)
|
811
|
+
|
812
|
+
|
813
|
+
|
814
|
+
Release r191 (19 July 2017)
|
815
|
+
---------------------------
|
816
|
+
|
817
|
+
This is the first public release of minimap2, an aligner for long reads and
|
818
|
+
assemblies. This release has a few issues and is generally not recommended for
|
819
|
+
production uses.
|
820
|
+
|
821
|
+
(19 July 2017, r191)
|