minimap2 0.2.22.0 → 0.2.24.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (101) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +60 -76
  3. data/ext/Rakefile +55 -0
  4. data/ext/cmappy/cmappy.c +129 -0
  5. data/ext/cmappy/cmappy.h +44 -0
  6. data/ext/minimap2/FAQ.md +46 -0
  7. data/ext/minimap2/LICENSE.txt +24 -0
  8. data/ext/minimap2/MANIFEST.in +10 -0
  9. data/ext/minimap2/Makefile +132 -0
  10. data/ext/minimap2/Makefile.simde +97 -0
  11. data/ext/minimap2/NEWS.md +821 -0
  12. data/ext/minimap2/README.md +403 -0
  13. data/ext/minimap2/align.c +1020 -0
  14. data/ext/minimap2/bseq.c +169 -0
  15. data/ext/minimap2/bseq.h +64 -0
  16. data/ext/minimap2/code_of_conduct.md +30 -0
  17. data/ext/minimap2/cookbook.md +243 -0
  18. data/ext/minimap2/esterr.c +64 -0
  19. data/ext/minimap2/example.c +63 -0
  20. data/ext/minimap2/format.c +559 -0
  21. data/ext/minimap2/hit.c +466 -0
  22. data/ext/minimap2/index.c +775 -0
  23. data/ext/minimap2/kalloc.c +205 -0
  24. data/ext/minimap2/kalloc.h +76 -0
  25. data/ext/minimap2/kdq.h +132 -0
  26. data/ext/minimap2/ketopt.h +120 -0
  27. data/ext/minimap2/khash.h +615 -0
  28. data/ext/minimap2/krmq.h +474 -0
  29. data/ext/minimap2/kseq.h +256 -0
  30. data/ext/minimap2/ksort.h +153 -0
  31. data/ext/minimap2/ksw2.h +184 -0
  32. data/ext/minimap2/ksw2_dispatch.c +96 -0
  33. data/ext/minimap2/ksw2_extd2_sse.c +402 -0
  34. data/ext/minimap2/ksw2_exts2_sse.c +416 -0
  35. data/ext/minimap2/ksw2_extz2_sse.c +313 -0
  36. data/ext/minimap2/ksw2_ll_sse.c +152 -0
  37. data/ext/minimap2/kthread.c +159 -0
  38. data/ext/minimap2/kthread.h +15 -0
  39. data/ext/minimap2/kvec.h +105 -0
  40. data/ext/minimap2/lchain.c +369 -0
  41. data/ext/minimap2/main.c +459 -0
  42. data/ext/minimap2/map.c +714 -0
  43. data/ext/minimap2/minimap.h +410 -0
  44. data/ext/minimap2/minimap2.1 +725 -0
  45. data/ext/minimap2/misc/README.md +179 -0
  46. data/ext/minimap2/misc/mmphase.js +335 -0
  47. data/ext/minimap2/misc/paftools.js +3149 -0
  48. data/ext/minimap2/misc.c +162 -0
  49. data/ext/minimap2/mmpriv.h +132 -0
  50. data/ext/minimap2/options.c +234 -0
  51. data/ext/minimap2/pe.c +177 -0
  52. data/ext/minimap2/python/README.rst +196 -0
  53. data/ext/minimap2/python/cmappy.h +152 -0
  54. data/ext/minimap2/python/cmappy.pxd +153 -0
  55. data/ext/minimap2/python/mappy.pyx +273 -0
  56. data/ext/minimap2/python/minimap2.py +39 -0
  57. data/ext/minimap2/sdust.c +213 -0
  58. data/ext/minimap2/sdust.h +25 -0
  59. data/ext/minimap2/seed.c +131 -0
  60. data/ext/minimap2/setup.py +55 -0
  61. data/ext/minimap2/sketch.c +143 -0
  62. data/ext/minimap2/splitidx.c +84 -0
  63. data/ext/minimap2/sse2neon/emmintrin.h +1689 -0
  64. data/ext/minimap2/test/MT-human.fa +278 -0
  65. data/ext/minimap2/test/MT-orang.fa +276 -0
  66. data/ext/minimap2/test/q-inv.fa +4 -0
  67. data/ext/minimap2/test/q2.fa +2 -0
  68. data/ext/minimap2/test/t-inv.fa +127 -0
  69. data/ext/minimap2/test/t2.fa +2 -0
  70. data/ext/minimap2/tex/Makefile +21 -0
  71. data/ext/minimap2/tex/bioinfo.cls +930 -0
  72. data/ext/minimap2/tex/blasr-mc.eval +17 -0
  73. data/ext/minimap2/tex/bowtie2-s3.sam.eval +28 -0
  74. data/ext/minimap2/tex/bwa-s3.sam.eval +52 -0
  75. data/ext/minimap2/tex/bwa.eval +55 -0
  76. data/ext/minimap2/tex/eval2roc.pl +33 -0
  77. data/ext/minimap2/tex/graphmap.eval +4 -0
  78. data/ext/minimap2/tex/hs38-simu.sh +10 -0
  79. data/ext/minimap2/tex/minialign.eval +49 -0
  80. data/ext/minimap2/tex/minimap2.bib +460 -0
  81. data/ext/minimap2/tex/minimap2.tex +724 -0
  82. data/ext/minimap2/tex/mm2-s3.sam.eval +62 -0
  83. data/ext/minimap2/tex/mm2-update.tex +240 -0
  84. data/ext/minimap2/tex/mm2.approx.eval +12 -0
  85. data/ext/minimap2/tex/mm2.eval +13 -0
  86. data/ext/minimap2/tex/natbib.bst +1288 -0
  87. data/ext/minimap2/tex/natbib.sty +803 -0
  88. data/ext/minimap2/tex/ngmlr.eval +38 -0
  89. data/ext/minimap2/tex/roc.gp +60 -0
  90. data/ext/minimap2/tex/snap-s3.sam.eval +62 -0
  91. data/ext/minimap2.patch +19 -0
  92. data/lib/minimap2/aligner.rb +4 -4
  93. data/lib/minimap2/alignment.rb +11 -11
  94. data/lib/minimap2/ffi/constants.rb +20 -16
  95. data/lib/minimap2/ffi/functions.rb +5 -0
  96. data/lib/minimap2/ffi.rb +4 -5
  97. data/lib/minimap2/version.rb +2 -2
  98. data/lib/minimap2.rb +51 -15
  99. metadata +97 -79
  100. data/lib/minimap2/ffi_helper.rb +0 -53
  101. data/vendor/libminimap2.so +0 -0
@@ -0,0 +1,62 @@
1
+ Q 60 18579866 27 0.000001453 18579866
2
+ Q 59 27087 4 0.000001666 18606953
3
+ Q 58 21435 1 0.000001718 18628388
4
+ Q 57 45663 3 0.000001874 18674051
5
+ Q 56 36031 2 0.000001978 18710082
6
+ Q 55 18499 2 0.000002082 18728581
7
+ Q 54 14754 2 0.000002187 18743335
8
+ Q 53 25541 2 0.000002291 18768876
9
+ Q 52 26397 5 0.000002554 18795273
10
+ Q 51 15090 3 0.000002711 18810363
11
+ Q 50 13425 11 0.000003294 18823788
12
+ Q 49 15175 2 0.000003397 18838963
13
+ Q 48 19407 4 0.000003606 18858370
14
+ Q 47 11538 16 0.000004452 18869908
15
+ Q 46 12558 17 0.000005349 18882466
16
+ Q 45 40362 28 0.000006817 18922828
17
+ Q 44 10465 13 0.000007500 18933293
18
+ Q 43 10098 20 0.000008552 18943391
19
+ Q 42 10682 19 0.000009549 18954073
20
+ Q 41 9823 11 0.000010125 18963896
21
+ Q 40 9685 16 0.000010963 18973581
22
+ Q 39 10273 18 0.000011905 18983854
23
+ Q 38 9515 18 0.000012847 18993369
24
+ Q 37 9474 27 0.000014261 19002843
25
+ Q 36 10430 25 0.000015568 19013273
26
+ Q 35 9241 34 0.000017348 19022514
27
+ Q 34 9162 31 0.000018968 19031676
28
+ Q 33 10164 49 0.000021532 19041840
29
+ Q 32 9152 55 0.000024408 19050992
30
+ Q 31 9252 35 0.000026233 19060244
31
+ Q 30 9872 55 0.000029103 19070116
32
+ Q 29 8938 65 0.000032496 19079054
33
+ Q 28 8951 73 0.000036306 19088005
34
+ Q 27 9949 95 0.000041261 19097954
35
+ Q 26 9784 97 0.000046316 19107738
36
+ Q 25 10126 97 0.000051366 19117864
37
+ Q 24 11260 123 0.000057765 19129124
38
+ Q 23 10047 114 0.000063691 19139171
39
+ Q 22 9661 123 0.000070083 19148832
40
+ Q 21 10339 168 0.000078813 19159171
41
+ Q 20 17928 193 0.000088804 19177099
42
+ Q 19 9842 193 0.000098817 19186941
43
+ Q 18 14737 247 0.000111605 19201678
44
+ Q 17 10218 238 0.000123934 19211896
45
+ Q 16 10271 242 0.000136457 19222167
46
+ Q 15 12241 333 0.000153683 19234408
47
+ Q 14 9189 336 0.000171070 19243597
48
+ Q 13 9493 515 0.000197734 19253090
49
+ Q 12 11502 743 0.000236185 19264592
50
+ Q 11 8211 507 0.000262390 19272803
51
+ Q 10 9133 606 0.000293695 19281936
52
+ Q 9 10014 931 0.000341801 19291950
53
+ Q 8 8436 698 0.000377816 19300386
54
+ Q 7 8443 705 0.000414163 19308829
55
+ Q 6 10203 944 0.000462808 19319032
56
+ Q 5 6936 756 0.000501760 19325968
57
+ Q 4 6732 843 0.000545190 19332700
58
+ Q 3 8215 1104 0.000602040 19340915
59
+ Q 2 21201 5440 0.000882342 19362116
60
+ Q 1 82328 22186 0.002019600 19444444
61
+ Q 0 553853 371953 0.020562901 19998297
62
+ U 1703
@@ -0,0 +1,240 @@
1
+ \documentclass{bioinfo}
2
+ \copyrightyear{2021}
3
+ \pubyear{2021}
4
+
5
+ \usepackage{graphicx}
6
+ \usepackage{hyperref}
7
+ \usepackage{url}
8
+ \usepackage{amsmath}
9
+ \usepackage[ruled,vlined]{algorithm2e}
10
+ \newcommand\mycommfont[1]{\footnotesize\rmfamily{\it #1}}
11
+ \SetCommentSty{mycommfont}
12
+ \SetKwComment{Comment}{$\triangleright$\ }{}
13
+
14
+ \usepackage{natbib}
15
+ \bibliographystyle{apalike}
16
+
17
+ \DeclareMathOperator*{\argmax}{argmax}
18
+
19
+ \begin{document}
20
+ \firstpage{1}
21
+
22
+ \title[Improvements to minimap2]{New strategies to improve minimap2 alignment accuracy}
23
+ \author[Li]{Heng Li$^{1,2}$}
24
+ \address{$^1$Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA,
25
+ $^2$Harvard Medical School, 10 Shattuck St, Boston, MA 02215, USA}
26
+
27
+ \maketitle
28
+
29
+ \begin{abstract}
30
+
31
+ \section{Summary:} We present several recent improvements to minimap2, a
32
+ versatile pairwise aligner for nucleotide sequences. Now minimap2 v2.22 can
33
+ more accurately map long reads to highly repetitive regions and align through
34
+ insertions or deletions up to 100kb by default, addressing major weakness in
35
+ minimap2 v2.18 or earlier.
36
+
37
+ \section{Availability and implementation:}
38
+ \href{https://github.com/lh3/minimap2}{https://github.com/lh3/minimap2}
39
+
40
+ \section{Contact:} hli@ds.dfci.harvard.edu
41
+ \end{abstract}
42
+
43
+ \section{Introduction}
44
+ Minimap2~\citep{Li:2018ab} is widely used for maping long sequence
45
+ reads and assembly contigs. \citet{Jain:2020aa} found minimap2 v2.18 or earlier occasionally
46
+ misaligned reads from highly repetitive regions as minimap2 ignored seeds of
47
+ high occurrence. They also noticed minimap2 may misplace reads with structural
48
+ variations (SVs) in such regions~\citep{Jain2020.11.01.363887}. These
49
+ misalignments have become a pressing issue in the advent of
50
+ temolere-to-telomore human assembly~\citep{Miga:2020aa}. Meanwhile, old minimap2
51
+ was unable to efficiently align long insertions/deletions (INDELs) and often
52
+ breaks an alignment around variable-number tandem repeats (VNTRs). This has
53
+ inspired new chaining algorithms~\citep{Li:2020aa,Ren:2021aa} which are not
54
+ integrated into minimap2. Here we will describe recent efforts implemented
55
+ in v2.19 through v2.22 to improve mapping results.
56
+
57
+ \begin{methods}
58
+ \section{Methods}
59
+
60
+ \subsection{Rescuing high-occurrence $k$-mers}\label{sec:high-occ}
61
+ Minimap2 keeps all $k$-mer minimizers~\citep{Roberts:2004fv} during indexing. Its original
62
+ implementation only selected low-occurrence minimizers during mapping. The
63
+ cutoff is a few hundred for mapping long reads against a human genome. If a
64
+ read habors only a few or even no low-occurrence minimizers, it will fail
65
+ chaining due to insufficient anchors.
66
+
67
+ To resolve this issue, we implemented a new heuristic to add additional
68
+ minimizers. Suppose we are looking at two adjacent low-occurence $k$-mers
69
+ located at position $x_1$ and $x_2$, respectively. If $|x_1-x_2|\ge L$,
70
+ minimap2 v2.22 additionally selects $\lfloor|x_1-x_2|/L\rfloor$ minimizers
71
+ of the lowest occurrence among minimizers between $x_1$ and $x_2$. Here
72
+ parameter $L$ controls the frequency of sampling. It defaults to 500.
73
+ This strategy adds necessary anchors at the cost of increasing total alignment
74
+ time by a few percent on real data.
75
+
76
+ \subsection{Aligning through longer INDELs}
77
+ The original minimap2 may fail to align long INDELs due to its chaining
78
+ heuristics. Briefly, minimap2 applies dynamic programming (DP) to chain
79
+ minimizer anchors. This is a quadratic algorithm, slow for chaining
80
+ contigs. For acceptable performance, the original minimap2 uses a 500bp band by
81
+ default, which means a gap longer than 500bp will stop chaining.
82
+ To align through longer gaps, older minimap2 implemented a long-join heurstic as follows.
83
+ If there is an INDEL longer than 500bp and the two chains around the INDEL
84
+ have no overlaps on either the query or the reference sequence, minimap2 may
85
+ join the two short chains later.
86
+ This heuristic may fail around VNTRs because short chains
87
+ often have overlaps in VNTRs. More subtly, minimap2 may escape the inner DP
88
+ loop early, again for performance, if the chaining result is not improved for
89
+ 50 iterations. When there is a copy number change in a long segmental
90
+ duplication, the early escape may break around the event even if users
91
+ specify a large band.
92
+
93
+ In minigraph~\citep{Li:2020aa}, we developed a new chaining algorithm that
94
+ finds up to 1kb INDELs with DP-based chaining and goes through longer INDELs with a
95
+ subquadratic algorithm~\citep{DBLP:conf/wabi/AbouelhodaO03}. We ported the same
96
+ algorithm to minimap2 for contig mapping. For long-read mapping, the minigraph
97
+ algorithm is slower. Minimap2 v2.22 still uses the DP-based algorithm to
98
+ find short chains and then invokes the minigraph algorithm to rechain anchors in
99
+ these short chains. The rechaining step achieves the same goal as long-join
100
+ but is more reliable because it can resolve overlaps between short chains. The old
101
+ long-join heuristic has since been removed.
102
+
103
+ \subsection{Properly mapping long reads with SVs}
104
+ The original minimap2 ranks an alignment by its Smith-Waterman score and
105
+ outputs the best scoring alignment. However, when there are SVs on the read,
106
+ the best scoring alignment is sometimes not the correct alignment.
107
+ \citet{Jain2020.11.01.363887} resolved this dilemma by altering the mapping
108
+ algorithm.
109
+
110
+ In our view, this problem is rooted in inapropriate scoring: affine-gap penalty
111
+ over-penalizes a long INDEL that was often evolutionarily created in one event.
112
+ We should not penalize a SV by a function linear in the SV length. Minimap2 v2.22 instead rescores
113
+ an alignment with the following scoring function. Suppose an alignment consists
114
+ of $M$ matching bases, $N$ substitutions and $G$ gap opens, we empirically
115
+ score the alignment with
116
+ $$
117
+ S=M-\frac{N+G}{2d}-\sum_{i=1}^G\log_2(1+g_i)
118
+ $$
119
+ where $g_i\ge1$ is the length of the $i$-th gap and
120
+ $$
121
+ d=\max\left\{\frac{N+G}{M+N+G},0.02\right\}
122
+ $$
123
+ It approximates per-base sequence divergence except with the smallest value set
124
+ to 2\%. As an analogy to affine-gap scoring, the matching score in our scheme
125
+ is 1, the mismatch and gap open penalties are both $1/2d$ and the gap extension
126
+ penalty is a logarithm function of the gap length~\citep{Gu:1995wt}. Our scoring gives a long SV
127
+ a much milder penalty. In terms of time complexity, scoring an alignment is
128
+ linear in the length of the alignment. The time spent on rescoring is negligible in
129
+ practice.
130
+
131
+ %If we assume sequences evolve under a duplication-mutation model, we may have a
132
+ %better way to choose the best alignment. If a long read can be mapped to $n$
133
+ %loci, we can take the read as the template and build a
134
+ %pseudo-multi-sequence-alignment (pMSA) of $n+1$ sequences. In this pMSA, we say
135
+ %a site on the read is informative if the $n$ reference subsequences differ at
136
+ %the position.
137
+
138
+ \end{methods}
139
+
140
+ \section{Results}
141
+
142
+ \begin{table}
143
+ \processtable{Evaluation of minimap2 v2.22}
144
+ {\footnotesize\label{tab:1}\begin{tabular}{p{4.2cm}rrrr}
145
+ \toprule
146
+ $[$Benchmark$]$ Metric & v2.22 & v2.18 & Winno & lra \\
147
+ \midrule
148
+ $[$sim-map$]$ \% mapped reads at Q10 & 97.9 & 97.6 & {\bf 99.0}& 97.3 \\
149
+ $[$sim-map$]$ err. rate at Q10 (phredQ) & {\bf 52} & {\bf 52} & 38 & 24 \\
150
+ $[$winno-cmp$]$ rate of diff. (phredQ) & {\bf 41} & 37 & truth & 18 \\
151
+ $[$winno-cmp$]$ CPU time (hour) & {\bf 5.0} & 5.3 & 71.8 & 13.1 \\
152
+ $[$winno-cmp$]$ peak RAM (Gb) & 17.1 & 14.4 & {\bf 9.6} & 12.4 \\
153
+ $[$sim-sv$]$ \% false negative rate & {\bf 0.5} & 2.0 & {\bf 0.5} & 1.4 \\
154
+ $[$sim-sv$]$ \% false discovery rate & {\bf 0.0} & 0.1 & {\bf 0.0} & 0.1 \\
155
+ $[$real-sv-1k$]$ \% false negative rate & {\bf 7.3} & 20.0 & 13.0 & N/A \\
156
+ $[$real-sv-1k$]$ \% false discovery rate & 2.7 & {\bf 2.4} & 2.7 & N/A \\
157
+ \botrule
158
+ \end{tabular}}
159
+ {In $[$sim-map$]$, 152,713 reads were simulated from the CHM13 telomere-to-telomere assembly v1.1
160
+ (AC: GCA\_009914755.3) with pbsim2~\citep{Ono:2021aa}: ``pbsim2 -{}-hmm\_model R94.model -{}-length-min
161
+ 5000 -{}-length-mean 20000 -{}-accuracy-mean 0.95''. Alignments of mapping quality
162
+ 10 or higher were evaluated by ``paftools.js mapeval''. The mapping error rate
163
+ is measured in the phred scale: if the error rate is $e$, $-10\log_{10}e$ is
164
+ reported in the table. In $[$winno-cmp$]$, 1.39 million CHM13 HiFi reads from
165
+ SRR11292121 were mapped against the same CHM13 assembly. 99.3\% of them were mapped by Winnowmap2
166
+ at mapping quality 10 or higher and were taken as ground truth to evaluate
167
+ minimap2 and lra with ``paftools.js pafcmp''. $[$sim-sv$]$ simulated 1,000
168
+ 50bp to 1000bp INDELs from chr8 in CHM13 using SURVIVOR~\citep{Jeffares:2017aa} and simulated Nanopore
169
+ reads at 30-fold coverage with the same pbsim2 command line. SVs were called with
170
+ ``sniffles -q 10''~\citep{Sedlazeck:2018ab} and compared to the simulated truth with ``SURVIVOR eval
171
+ call.vcf truth.bed 50''. In $[$real-sv-1k$]$, small and long variants were
172
+ called by dipcall-0.3~\citep{Li:2018aa} for HG002 assemblies (AC: GCA\_018852605.1 and
173
+ GCA\_018852615.1) and compared to the GIAB truth~\citep{Zook:2020aa} using ``truvari -r 2000 -s
174
+ 1000 -S 400 -{}-multimatch -{}-passonly'' which sets the minimum INDEL size to 1kb in evaluation. }
175
+ \end{table}
176
+
177
+ We evaluated minimap2 v2.22 along with v2.18, Winnowmap2 v2.03 and lra v1.3.2
178
+ (Table~\ref{tab:1}), using the default setting of each mapper according to the input data types.
179
+ Both versions of minimap2 achieved high mapping accuracy on
180
+ simulated Nanopore reads (sim-map). Winnowmap2 aligned more reads at mapping
181
+ quality 10 or higher (mapQ10). However, it may occasionally assign a high mapping
182
+ quality to a read with multiple identical best alignments. This reduced its
183
+ mapping accuracy.
184
+
185
+ In lack of groud truth for real data, we took Winnowmap2 mapping as ground
186
+ truth to evaluate other mappers (winno-cmp in Table~\ref{tab:1}). Out of 1,378,092 reads with mapQ10
187
+ alignments by Winnowmap2, minimap2 v2.22 could map all of them. 118 reads, less
188
+ than 0.01\% of all reads, were mapped differently by v2.22. 51 of them have
189
+ multiple identical best alignments. We believe these are more likely to be
190
+ Winnowmap2 errors. Most of the remaining 67 (=118-51) reads have multiple
191
+ highly similar but not identical alignments.
192
+ Minimap2 v2.18 is less consistent with 275 differences including 30 unmapped
193
+ reads mappable by both Winnowmap2 and v2.22.
194
+
195
+ For the minimizer rescuing parameter $L$ in Section~\ref{sec:high-occ},
196
+ we set its default to 500 such that v2.22 has comparable performance to v2.18 given simulated PacBio and Nanopore human reads.
197
+ To see the effect of this parameter on real data, we tried several different $L$ values.
198
+ v2.22 gave 99 mapping differences at $L=200$,
199
+ 118 at $L=500$ (default), 167 at $L=750$ and 224 differences at $L=1000$ in comparison to Winnowmap2.
200
+ $L=200$ is 28\% slower than the default while $L=1000$ is 9\% faster.
201
+ Changing the default minimizer window size (option ``-w'')
202
+ and the initial minimizer occurrence cutoff (option ``-f'')
203
+ also affects performance and accuracy to a similar magnitude.
204
+
205
+ The two benchmarks above only evaluate read mappings when there are no variations between the reads and the reference.
206
+ To measure the mapping accuracy in the presence of SVs (sim-sv), we reproduced
207
+ the results by~\citep{Jain2020.11.01.363887}. Minimap2 v2.22 is as good as
208
+ Winnowmap2 now. Note that we were setting the Sniffles mapping quality
209
+ threshold to 10 in consistent with the benchmarks above. If we used the
210
+ default threshold 20, v2.22 would miss additional five SVs (accounting for
211
+ 0.5\% of simulated SVs). For four out of these five missing SVs, minimap2 v2.22
212
+ mapped more variant reads than Winnowmap2. Sniffles did not call these SVs
213
+ because minimap2 tended to give them conservative mapping quality. It is worth
214
+ noting that the simulation here only considers a simple scenario in evolution.
215
+ Non-allelic gene conversions, which happen often in segmental
216
+ duplications~\citep{Harpak:2017aa}, would obscure the optimal mapping
217
+ strategies. How much such simple SV simulation informs real-world SV calling
218
+ remains a question.
219
+
220
+ To see if minimap2 v2.22 could improve long INDEL alignment, we ran dipcall on
221
+ contig-to-reference alignments and focused on INDELs longer than 1kb
222
+ (real-sv-1k). v2.22 is more sensitive at comparable specificity, confirming its
223
+ advantage in more contiguous alignment. We could not get dipcall to work well with lra,
224
+ so did not report the numbers.
225
+
226
+ Minimap2 spends most computing time on base alignment. As recent improvements
227
+ in v2.22 incur little additional computing and do not change the base alignment
228
+ algorithm, the new version has similar performance to older versions. It is
229
+ consistently faster than Winnowmap2 by several times. Sometimes simple
230
+ heuristics can be as effective as more sophisticated yet slower solutions.
231
+
232
+ \section*{Acknowledgements}
233
+ We thank Arang Rhie and Chirag Jain for providing motivating examples for which
234
+ older minimap2 underperforms.
235
+
236
+ \paragraph{Funding\textcolon} This work is funded by NHGRI grant R01HG010040.
237
+
238
+ \bibliography{minimap2}
239
+
240
+ \end{document}
@@ -0,0 +1,12 @@
1
+ Q 60 32084 0 0.000000000 32084
2
+ Q 24 318 2 0.000061725 32402
3
+ Q 11 98 2 0.000123077 32500
4
+ Q 8 37 2 0.000184405 32537
5
+ Q 7 37 3 0.000276294 32574
6
+ Q 6 40 3 0.000367940 32614
7
+ Q 5 34 2 0.000428816 32648
8
+ Q 4 37 5 0.000581306 32685
9
+ Q 3 28 6 0.000764222 32713
10
+ Q 2 38 6 0.000946536 32751
11
+ Q 1 50 21 0.001585318 32801
12
+ Q 0 286 150 0.006105117 33087
@@ -0,0 +1,13 @@
1
+ Q 60 32477 0 0.000000000 32477
2
+ Q 22 16 1 0.000030776 32493
3
+ Q 21 44 1 0.000061468 32537
4
+ Q 19 73 1 0.000091996 32610
5
+ Q 14 66 1 0.000122414 32676
6
+ Q 10 26 3 0.000214054 32702
7
+ Q 8 14 1 0.000244529 32716
8
+ Q 7 13 2 0.000305539 32729
9
+ Q 6 47 1 0.000335611 32776
10
+ Q 3 10 1 0.000366010 32786
11
+ Q 2 20 2 0.000426751 32806
12
+ Q 1 248 94 0.003267381 33054
13
+ Q 0 31 17 0.003778147 33085