minimap2 0.2.26.1 → 0.2.28.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,4 +1,4 @@
1
- .TH minimap2 1 "29 April 2023" "minimap2-2.26 (r1175)" "Bioinformatics tools"
1
+ .TH minimap2 1 "12 March 2024" "minimap2-2.28 (r1209)" "Bioinformatics tools"
2
2
  .SH NAME
3
3
  .PP
4
4
  minimap2 - mapping and alignment between collections of DNA sequences
@@ -268,6 +268,11 @@ or more of the shorter chain [0.5]
268
268
  Use the minigraph chaining algorithm [no]. The minigraph algorithm is better
269
269
  for aligning contigs through long INDELs.
270
270
  .TP
271
+ .BI --rmq-inner \ NUM
272
+ Apply full dynamic programming for anchors within distance
273
+ .I NUM
274
+ [1000].
275
+ .TP
271
276
  .B --hard-mask-level
272
277
  Honor option
273
278
  .B -M
@@ -343,6 +348,10 @@ Matching score [2]
343
348
  .BI -B \ INT
344
349
  Mismatching penalty [4]
345
350
  .TP
351
+ .BI -b \ INT
352
+ Mismatching penalty for transitions [same as
353
+ .BR -B ].
354
+ .TP
346
355
  .BI -O \ INT1[,INT2]
347
356
  Gap open penalty [4,24]. If
348
357
  .I INT2
@@ -356,10 +365,19 @@ costs
356
365
  .RI min{ O1 + k * E1 , O2 + k * E2 }.
357
366
  In the splice mode, the second gap penalties are not used.
358
367
  .TP
368
+ .BI -J \ INT
369
+ Splice model [1]. 0 for the original minimap2 splice model that always penalizes non-GT-AG splicing;
370
+ 1 for the miniprot model that considers non-GT-AG. Option
371
+ .B -C
372
+ has no effect with the default
373
+ .BR -J1 .
374
+ .BR -J0 .
375
+ .TP
359
376
  .BI -C \ INT
360
377
  Cost for a non-canonical GT-AG splicing (effective with
361
- .BR --splice )
362
- [0]
378
+ .B --splice
379
+ .BR -J0 )
380
+ [0].
363
381
  .TP
364
382
  .BI -z \ INT1[,INT2]
365
383
  Truncate an alignment if the running alignment score drops too quickly along
@@ -450,7 +468,7 @@ Set 0 to disable [100m].
450
468
  .BI --cap-kalloc \ NUM
451
469
  Free thread-local kalloc memory reservoir if after the alignment the size of the reservoir above
452
470
  .IR NUM .
453
- Set 0 to disable [0].
471
+ Set 0 to disable [500m].
454
472
  .SS Input/output options
455
473
  .TP 10
456
474
  .B -a
@@ -506,6 +524,9 @@ Output =/X CIGAR operators for sequence match/mismatch.
506
524
  .B -Y
507
525
  In SAM output, use soft clipping for supplementary alignments.
508
526
  .TP
527
+ .B --secondary-seq
528
+ In SAM output, show query sequences for secondary alignments.
529
+ .TP
509
530
  .BI --seed \ INT
510
531
  Integer seed for randomizing equally best hits. Minimap2 hashes
511
532
  .I INT
@@ -566,15 +587,43 @@ are:
566
587
  Align noisy long reads of ~10% error rate to a reference genome. This is the
567
588
  default mode.
568
589
  .TP
590
+ .B lr:hq
591
+ Align accurate long reads (error rate <1%) to a reference genome
592
+ .RB ( -k19
593
+ .B -w19 -U50,500
594
+ .BR -g10k ).
595
+ This was recommended by ONT developers for recent Nanopore reads
596
+ produced with chemistry v14 that can reach ~99% in accuracy.
597
+ It was shown to work better for accurate Nanopore reads
598
+ than
599
+ .BR map-hifi .
600
+ .TP
569
601
  .B map-hifi
570
602
  Align PacBio high-fidelity (HiFi) reads to a reference genome
571
- .RB ( -k19
572
- .B -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1
603
+ .RB ( -xlr:hq
604
+ .B -A1 -B4 -O6,26 -E2,1
573
605
  .BR -s200 ).
606
+ It differs from
607
+ .B lr:hq
608
+ only in scoring. It has not been tested whether
609
+ .B lr:hq
610
+ would work better for PacBio HiFi reads.
574
611
  .TP
575
612
  .B map-pb
576
613
  Align older PacBio continuous long (CLR) reads to a reference genome
577
614
  .RB ( -Hk19 ).
615
+ Note that this data type is effectively deprecated by HiFi.
616
+ Unless you work on very old data, you probably want to use
617
+ .B map-hifi
618
+ or
619
+ .BR lr:hq .
620
+ .TP
621
+ .B map-iclr
622
+ Align Illumina Complete Long Reads (ICLR) to a reference genome
623
+ .RB ( -k19
624
+ .B -B6 -b4
625
+ .BR -O10,50 ).
626
+ This was recommended by Illumina developers.
578
627
  .TP
579
628
  .B asm5
580
629
  Long assembly to reference mapping
@@ -582,21 +631,21 @@ Long assembly to reference mapping
582
631
  .B -w19 -U50,500 --rmq -r1k,100k -g10k -A1 -B19 -O39,81 -E3,1 -s200 -z200
583
632
  .BR -N50 ).
584
633
  Typically, the alignment will not extend to regions with 5% or higher sequence
585
- divergence. Only use this preset if the average divergence is far below 5%.
634
+ divergence. Use this preset if the average divergence is not much higher than 0.1%.
586
635
  .TP
587
636
  .B asm10
588
637
  Long assembly to reference mapping
589
638
  .RB ( -k19
590
639
  .B -w19 -U50,500 --rmq -r1k,100k -g10k -A1 -B9 -O16,41 -E2,1 -s200 -z200
591
640
  .BR -N50 ).
592
- Up to 10% sequence divergence.
641
+ Use this if the average divergence is around 1%.
593
642
  .TP
594
643
  .B asm20
595
644
  Long assembly to reference mapping
596
645
  .RB ( -k19
597
646
  .B -w10 -U50,500 --rmq -r1k,100k -g10k -A1 -B4 -O6,26 -E2,1 -s200 -z200
598
647
  .BR -N50 ).
599
- Up to 20% sequence divergence.
648
+ Use this if the average divergence is around several percent.
600
649
  .TP
601
650
  .B splice
602
651
  Long-read spliced alignment
@@ -612,13 +661,13 @@ costs are different during chaining; 4) the computation of the
612
661
  tag ignores introns to demote hits to pseudogenes.
613
662
  .TP
614
663
  .B splice:hq
615
- Long-read splice alignment for PacBio CCS reads
664
+ Spliced alignment for accurate long RNA-seq reads such as PacBio iso-seq
616
665
  .RB ( -xsplice
617
666
  .B -C5 -O6,24
618
667
  .BR -B4 ).
619
668
  .TP
620
669
  .B sr
621
- Short single-end reads without splicing
670
+ Short-read alignment without splicing
622
671
  .RB ( -k21
623
672
  .B -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -b0 -r100 -p.5 -N20 -f1000,5000 -n2 -m25
624
673
  .B -s40 -g100 -2K50m --heap-sort=yes
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/env k8
2
2
 
3
- var paftools_version = '2.26-r1175';
3
+ var paftools_version = '2.28-r1209';
4
4
 
5
5
  /*****************************
6
6
  ***** Library functions *****
@@ -133,26 +133,50 @@ Interval.find_ovlp = function(a, st, en)
133
133
 
134
134
  function fasta_read(fn)
135
135
  {
136
- var h = {}, gt = '>'.charCodeAt(0);
136
+ var h = {}, seqlen = [];
137
+ var buf = new Bytes();
137
138
  var file = fn == '-'? new File() : new File(fn);
138
- var buf = new Bytes(), seq = null, name = null, seqlen = [];
139
- while (file.readline(buf) >= 0) {
140
- if (buf[0] == gt) {
141
- if (seq != null && name != null) {
142
- seqlen.push([name, seq.length]);
143
- h[name] = seq;
144
- name = seq = null;
145
- }
146
- var m, line = buf.toString();
147
- if ((m = /^>(\S+)/.exec(line)) != null) {
148
- name = m[1];
149
- seq = new Bytes();
150
- }
151
- } else seq.set(buf);
152
- }
153
- if (seq != null && name != null) {
154
- seqlen.push([name, seq.length]);
155
- h[name] = seq;
139
+ if (typeof k8_version == "undefined") { // for k8-0.x
140
+ var seq = null, name = null, gt = '>'.charCodeAt(0);
141
+ while (file.readline(buf) >= 0) {
142
+ if (buf[0] == gt) {
143
+ if (seq != null && name != null) {
144
+ seqlen.push([name, seq.length]);
145
+ h[name] = seq;
146
+ name = seq = null;
147
+ }
148
+ var m, line = buf.toString();
149
+ if ((m = /^>(\S+)/.exec(line)) != null) {
150
+ name = m[1];
151
+ seq = new Bytes();
152
+ }
153
+ } else seq.set(buf);
154
+ }
155
+ if (seq != null && name != null) {
156
+ seqlen.push([name, seq.length]);
157
+ h[name] = seq;
158
+ }
159
+ } else { // for k8-1.x
160
+ var seq = null, name = null;
161
+ while (file.readline(buf) >= 0) {
162
+ var line = buf.toString();
163
+ if (line[0] == ">") {
164
+ if (seq != null && name != null) {
165
+ seqlen.push([name, seq.length]);
166
+ h[name] = new Uint8Array(seq.buffer);
167
+ name = seq = null;
168
+ }
169
+ var m;
170
+ if ((m = /^>(\S+)/.exec(line)) != null) {
171
+ name = m[1];
172
+ seq = new Bytes();
173
+ }
174
+ } else seq.set(line);
175
+ }
176
+ if (seq != null && name != null) {
177
+ seqlen.push([name, seq.length]);
178
+ h[name] = new Uint8Array(seq.buffer);
179
+ }
156
180
  }
157
181
  buf.destroy();
158
182
  file.close();
@@ -161,16 +185,27 @@ function fasta_read(fn)
161
185
 
162
186
  function fasta_free(fa)
163
187
  {
164
- for (var name in fa)
165
- fa[name].destroy();
188
+ if (typeof k8_version == "undefined")
189
+ for (var name in fa)
190
+ fa[name].destroy();
191
+ // FIXME: for k8-1.0, sequences are not freed. This is ok for now but not general.
166
192
  }
167
193
 
168
194
  Bytes.prototype.reverse = function()
169
195
  {
170
- for (var i = 0; i < this.length>>1; ++i) {
171
- var tmp = this[i];
172
- this[i] = this[this.length - i - 1];
173
- this[this.length - i - 1] = tmp;
196
+ if (typeof k8_version === "undefined") { // k8-0.x
197
+ for (var i = 0; i < this.length>>1; ++i) {
198
+ var tmp = this[i];
199
+ this[i] = this[this.length - i - 1];
200
+ this[this.length - i - 1] = tmp;
201
+ }
202
+ } else { // k8-1.x
203
+ var buf = new Uint8Array(this.buffer);
204
+ for (var i = 0; i < buf.length>>1; ++i) {
205
+ var tmp = buf[i];
206
+ buf[i] = buf[buf.length - i - 1];
207
+ buf[buf.length - i - 1] = tmp;
208
+ }
174
209
  }
175
210
  }
176
211
 
@@ -185,13 +220,24 @@ Bytes.prototype.revcomp = function()
185
220
  for (var i = 0; i < s1.length; ++i)
186
221
  Bytes.rctab[s1.charCodeAt(i)] = s2.charCodeAt(i);
187
222
  }
188
- for (var i = 0; i < this.length>>1; ++i) {
189
- var tmp = this[this.length - i - 1];
190
- this[this.length - i - 1] = Bytes.rctab[this[i]];
191
- this[i] = Bytes.rctab[tmp];
223
+ if (typeof k8_version === "undefined") { // k8-0.x
224
+ for (var i = 0; i < this.length>>1; ++i) {
225
+ var tmp = this[this.length - i - 1];
226
+ this[this.length - i - 1] = Bytes.rctab[this[i]];
227
+ this[i] = Bytes.rctab[tmp];
228
+ }
229
+ if (this.length&1)
230
+ this[this.length>>1] = Bytes.rctab[this[this.length>>1]];
231
+ } else { // k8-1.x
232
+ var buf = new Uint8Array(this.buffer);
233
+ for (var i = 0; i < buf.length>>1; ++i) {
234
+ var tmp = buf[buf.length - i - 1];
235
+ buf[buf.length - i - 1] = Bytes.rctab[buf[i]];
236
+ buf[i] = Bytes.rctab[tmp];
237
+ }
238
+ if (buf.length&1)
239
+ buf[buf.length>>1] = Bytes.rctab[buf[buf.length>>1]];
192
240
  }
193
- if (this.length&1)
194
- this[this.length>>1] = Bytes.rctab[this[this.length>>1]];
195
241
  }
196
242
 
197
243
  /********************
@@ -1694,15 +1740,17 @@ function paf_gff2bed(args)
1694
1740
 
1695
1741
  function paf_sam2paf(args)
1696
1742
  {
1697
- var c, pri_only = false, long_cs = false;
1698
- while ((c = getopt(args, "pL")) != null) {
1743
+ var c, pri_only = false, long_cs = false, pri_pri_only = false;
1744
+ while ((c = getopt(args, "pPL")) != null) {
1699
1745
  if (c == 'p') pri_only = true;
1746
+ else if (c == 'P') pri_pri_only = pri_only = true;
1700
1747
  else if (c == 'L') long_cs = true;
1701
1748
  }
1702
1749
  if (args.length == getopt.ind) {
1703
1750
  print("Usage: paftools.js sam2paf [options] <in.sam>");
1704
1751
  print("Options:");
1705
1752
  print(" -p convert primary or supplementary alignments only");
1753
+ print(" -P convert primary alignments only");
1706
1754
  print(" -L output the cs tag in the long form");
1707
1755
  exit(1);
1708
1756
  }
@@ -1729,6 +1777,7 @@ function paf_sam2paf(args)
1729
1777
  throw Error("at line " + lineno + ": inconsistent SEQ and QUAL lengths - " + t[9].length + " != " + t[10].length);
1730
1778
  if (t[2] == '*' || (flag&4) || t[5] == '*') continue;
1731
1779
  if (pri_only && (flag&0x100)) continue;
1780
+ if (pri_pri_only && (flag&0x900)) continue;
1732
1781
  var tlen = ctg_len[t[2]];
1733
1782
  if (tlen == null) throw Error("at line " + lineno + ": can't find the length of contig " + t[2]);
1734
1783
  // find tags
@@ -1841,7 +1890,10 @@ function paf_sam2paf(args)
1841
1890
  // optional tags
1842
1891
  var type = flag&0x100? 'S' : 'P';
1843
1892
  var tags = ["tp:A:" + type];
1844
- if (NM != null) tags.push("mm:i:"+mm);
1893
+ if (NM != null) {
1894
+ tags.push("NM:i:"+NM);
1895
+ tags.push("mm:i:"+mm);
1896
+ }
1845
1897
  tags.push("gn:i:"+(I[1]+D[1]), "go:i:"+(I[0]+D[0]), "cg:Z:" + t[5].replace(/\d+[SH]/g, ''));
1846
1898
  if (cs_str != null) tags.push("cs:Z:" + cs_str);
1847
1899
  else if (cs.length > 0) tags.push("cs:Z:" + cs.join(""));
@@ -2051,7 +2103,7 @@ function paf_mapeval(args)
2051
2103
  warn("Usage: paftools.js mapeval [options] <in.paf>|<in.sam>");
2052
2104
  warn("Options:");
2053
2105
  warn(" -r FLOAT mapping correct if overlap_length/union_length>FLOAT [" + ovlp_ratio + "]");
2054
- warn(" -Q INT print wrong mappings with mapQ>INT [don't print]");
2106
+ warn(" -Q INT print wrong mappings with mapQ>=INT [don't print]");
2055
2107
  warn(" -m INT 0: eval the longest aln only; 1: first aln only; 2: all primary aln [0]");
2056
2108
  exit(1);
2057
2109
  }
@@ -14,6 +14,7 @@
14
14
  #define MM_DBG_PRINT_SEED 0x4
15
15
  #define MM_DBG_PRINT_ALN_SEQ 0x8
16
16
  #define MM_DBG_PRINT_CHAIN 0x10
17
+ #define MM_DBG_SEED_FREQ 0x20
17
18
 
18
19
  #define MM_SEED_LONG_JOIN (1ULL<<40)
19
20
  #define MM_SEED_IGNORE (1ULL<<41)
@@ -79,8 +80,6 @@ int mm_idx_getseq2(const mm_idx_t *mi, int is_rev, uint32_t rid, uint32_t st, ui
79
80
  mm_reg1_t *mm_align_skeleton(void *km, const mm_mapopt_t *opt, const mm_idx_t *mi, int qlen, const char *qstr, int *n_regs_, mm_reg1_t *regs, mm128_t *a);
80
81
  mm_reg1_t *mm_gen_regs(void *km, uint32_t hash, int qlen, int n_u, uint64_t *u, mm128_t *a, int is_qstrand);
81
82
 
82
- mm128_t *mm_chain_dp(int max_dist_x, int max_dist_y, int bw, int max_skip, int max_iter, int min_cnt, int min_sc, float gap_scale,
83
- int is_cdna, int n_segs, int64_t n, mm128_t *a, int *n_u_, uint64_t **_u, void *km);
84
83
  mm128_t *mg_lchain_dp(int max_dist_x, int max_dist_y, int bw, int max_skip, int max_iter, int min_cnt, int min_sc, float chn_pen_gap, float chn_pen_skip,
85
84
  int is_cdna, int n_segs, int64_t n, mm128_t *a, int *n_u_, uint64_t **_u, void *km);
86
85
  mm128_t *mg_lchain_rmq(int max_dist, int max_dist_inner, int bw, int max_chn_skip, int cap_rmq_size, int min_cnt, int min_sc, float chn_pen_gap, float chn_pen_skip,
@@ -45,6 +45,7 @@ void mm_mapopt_init(mm_mapopt_t *opt)
45
45
  opt->alt_drop = 0.15f;
46
46
 
47
47
  opt->a = 2, opt->b = 4, opt->q = 4, opt->e = 2, opt->q2 = 24, opt->e2 = 1;
48
+ opt->transition = 0;
48
49
  opt->sc_ambi = 1;
49
50
  opt->zdrop = 400, opt->zdrop_inv = 200;
50
51
  opt->end_bonus = -1;
@@ -54,7 +55,7 @@ void mm_mapopt_init(mm_mapopt_t *opt)
54
55
  opt->max_clip_ratio = 1.0f;
55
56
  opt->mini_batch_size = 500000000;
56
57
  opt->max_sw_mat = 100000000;
57
- opt->cap_kalloc = 1000000000;
58
+ opt->cap_kalloc = 500000000;
58
59
 
59
60
  opt->rank_min_len = 500;
60
61
  opt->rank_frac = 0.9f;
@@ -90,7 +91,7 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
90
91
  if (preset == 0) {
91
92
  mm_idxopt_init(io);
92
93
  mm_mapopt_init(mo);
93
- } else if (strcmp(preset, "map-ont") == 0) { // this is the same as the default
94
+ } else if (strcmp(preset, "lr") == 0 || strcmp(preset, "map-ont") == 0) { // this is the same as the default
94
95
  } else if (strcmp(preset, "ava-ont") == 0) {
95
96
  io->flag = 0, io->k = 15, io->w = 5;
96
97
  mo->flag |= MM_F_ALL_CHAINS | MM_F_NO_DIAG | MM_F_NO_DUAL | MM_F_NO_LJOIN;
@@ -105,13 +106,30 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
105
106
  mo->min_chain_score = 100, mo->pri_ratio = 0.0f, mo->max_chain_skip = 25;
106
107
  mo->bw_long = mo->bw;
107
108
  mo->occ_dist = 0;
108
- } else if (strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
109
+ } else if (strcmp(preset, "lr:hq") == 0 || strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
109
110
  io->flag = 0, io->k = 19, io->w = 19;
110
111
  mo->max_gap = 10000;
111
- mo->a = 1, mo->b = 4, mo->q = 6, mo->q2 = 26, mo->e = 2, mo->e2 = 1;
112
- mo->occ_dist = 500;
113
112
  mo->min_mid_occ = 50, mo->max_mid_occ = 500;
114
- mo->min_dp_max = 200;
113
+ if (strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
114
+ mo->a = 1, mo->b = 4, mo->q = 6, mo->q2 = 26, mo->e = 2, mo->e2 = 1;
115
+ mo->min_dp_max = 200;
116
+ }
117
+ } else if (strcmp(preset, "lr:hqae") == 0) { // high-quality assembly evaluation
118
+ io->flag = 0, io->k = 25, io->w = 51;
119
+ mo->flag |= MM_F_RMQ;
120
+ mo->min_mid_occ = 50, mo->max_mid_occ = 500;
121
+ mo->rmq_inner_dist = 5000;
122
+ mo->occ_dist = 200;
123
+ mo->best_n = 100;
124
+ mo->chain_gap_scale = 5.0f;
125
+ } else if (strcmp(preset, "map-iclr-prerender") == 0) {
126
+ io->flag = 0, io->k = 15;
127
+ mo->b = 6, mo->transition = 1;
128
+ mo->q = 10, mo->q2 = 50;
129
+ } else if (strcmp(preset, "map-iclr") == 0) {
130
+ io->flag = 0, io->k = 19;
131
+ mo->b = 6, mo->transition = 4;
132
+ mo->q = 10, mo->q2 = 50;
115
133
  } else if (strncmp(preset, "asm", 3) == 0) {
116
134
  io->flag = 0, io->k = 19, io->w = 19;
117
135
  mo->bw = 1000, mo->bw_long = 100000;
@@ -156,7 +174,7 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
156
174
  mo->junc_bonus = 9;
157
175
  mo->zdrop = 200, mo->zdrop_inv = 100; // because mo->a is halved
158
176
  if (strcmp(preset, "splice:hq") == 0)
159
- mo->junc_bonus = 5, mo->b = 4, mo->q = 6, mo->q2 = 24;
177
+ mo->noncan = 5, mo->b = 4, mo->q = 6, mo->q2 = 24;
160
178
  } else return -1;
161
179
  return 0;
162
180
  }
@@ -77,7 +77,9 @@ This constructor accepts the following arguments:
77
77
 
78
78
  * **min_chain_score**: minimum chaing score
79
79
 
80
- * **bw**: chaining and alignment band width
80
+ * **bw**: chaining and alignment band width (initial chaining and extension)
81
+
82
+ * **bw_long**: chaining and alignment band width (RMQ-based rechaining and closing gaps)
81
83
 
82
84
  * **best_n**: max number of alignments to return
83
85
 
@@ -36,6 +36,7 @@ cdef extern from "minimap.h":
36
36
  float alt_drop
37
37
 
38
38
  int a, b, q, e, q2, e2
39
+ int transition
39
40
  int sc_ambi
40
41
  int noncan
41
42
  int junc_bonus
@@ -3,7 +3,7 @@ from libc.stdlib cimport free
3
3
  cimport cmappy
4
4
  import sys
5
5
 
6
- __version__ = '2.26'
6
+ __version__ = '2.28'
7
7
 
8
8
  cmappy.mm_reset_timer()
9
9
 
@@ -96,6 +96,7 @@ cdef class Alignment:
96
96
  a = [str(self._q_st), str(self._q_en), strand, self._ctg, str(self._ctg_len), str(self._r_st), str(self._r_en),
97
97
  str(self._mlen), str(self._blen), str(self._mapq), tp, ts, "cg:Z:" + self.cigar_str]
98
98
  if self._cs != "": a.append("cs:Z:" + self._cs)
99
+ if self._MD != "": a.append("MD:Z:" + self._MD)
99
100
  return "\t".join(a)
100
101
 
101
102
  cdef class ThreadBuffer:
@@ -112,7 +113,7 @@ cdef class Aligner:
112
113
  cdef cmappy.mm_idxopt_t idx_opt
113
114
  cdef cmappy.mm_mapopt_t map_opt
114
115
 
115
- def __cinit__(self, fn_idx_in=None, preset=None, k=None, w=None, min_cnt=None, min_chain_score=None, min_dp_score=None, bw=None, best_n=None, n_threads=3, fn_idx_out=None, max_frag_len=None, extra_flags=None, seq=None, scoring=None):
116
+ def __cinit__(self, fn_idx_in=None, preset=None, k=None, w=None, min_cnt=None, min_chain_score=None, min_dp_score=None, bw=None, bw_long=None, best_n=None, n_threads=3, fn_idx_out=None, max_frag_len=None, extra_flags=None, seq=None, scoring=None):
116
117
  self._idx = NULL
117
118
  cmappy.mm_set_opt(NULL, &self.idx_opt, &self.map_opt) # set the default options
118
119
  if preset is not None:
@@ -125,6 +126,7 @@ cdef class Aligner:
125
126
  if min_chain_score is not None: self.map_opt.min_chain_score = min_chain_score
126
127
  if min_dp_score is not None: self.map_opt.min_dp_max = min_dp_score
127
128
  if bw is not None: self.map_opt.bw = bw
129
+ if bw_long is not None: self.map_opt.bw_long = bw_long
128
130
  if best_n is not None: self.map_opt.best_n = best_n
129
131
  if max_frag_len is not None: self.map_opt.max_frag_len = max_frag_len
130
132
  if extra_flags is not None: self.map_opt.flag |= extra_flags
@@ -5,7 +5,7 @@ import getopt
5
5
  import mappy as mp
6
6
 
7
7
  def main(argv):
8
- opts, args = getopt.getopt(argv[1:], "x:n:m:k:w:r:c")
8
+ opts, args = getopt.getopt(argv[1:], "x:n:m:k:w:r:cM")
9
9
  if len(args) < 2:
10
10
  print("Usage: minimap2.py [options] <ref.fa>|<ref.mmi> <query.fq>")
11
11
  print("Options:")
@@ -16,10 +16,11 @@ def main(argv):
16
16
  print(" -w INT minimizer window length")
17
17
  print(" -r INT band width")
18
18
  print(" -c output the cs tag")
19
+ print(" -M output the MD tag")
19
20
  sys.exit(1)
20
21
 
21
22
  preset = min_cnt = min_sc = k = w = bw = None
22
- out_cs = False
23
+ out_cs = out_MD = False
23
24
  for opt, arg in opts:
24
25
  if opt == '-x': preset = arg
25
26
  elif opt == '-n': min_cnt = int(arg)
@@ -28,11 +29,12 @@ def main(argv):
28
29
  elif opt == '-k': k = int(arg)
29
30
  elif opt == '-w': w = int(arg)
30
31
  elif opt == '-c': out_cs = True
32
+ elif opt == '-M': out_MD = True
31
33
 
32
34
  a = mp.Aligner(args[0], preset=preset, min_cnt=min_cnt, min_chain_score=min_sc, k=k, w=w, bw=bw)
33
35
  if not a: raise Exception("ERROR: failed to load/build index file '{}'".format(args[0]))
34
36
  for name, seq, qual in mp.fastx_read(args[1]): # read one sequence
35
- for h in a.map(seq, cs=out_cs): # traverse hits
37
+ for h in a.map(seq, cs=out_cs, MD=out_MD): # traverse hits
36
38
  print('{}\t{}\t{}'.format(name, len(seq), h))
37
39
 
38
40
  if __name__ == "__main__":
data/ext/minimap2/seed.c CHANGED
@@ -112,7 +112,8 @@ mm_seed_t *mm_collect_matches(void *km, int *_n_m, int qlen, int max_occ, int ma
112
112
  }
113
113
  for (i = 0, n_m = 0, *rep_len = 0, *n_a = 0; i < n_m0; ++i) {
114
114
  mm_seed_t *q = &m[i];
115
- //fprintf(stderr, "X\t%d\t%d\t%d\n", q->q_pos>>1, q->n, q->flt);
115
+ if (mm_dbg_flag & MM_DBG_SEED_FREQ)
116
+ fprintf(stderr, "SF\t%d\t%d\t%d\n", q->q_pos>>1, q->n, q->flt);
116
117
  if (q->flt) {
117
118
  int en = (q->q_pos >> 1) + 1, st = en - q->q_span;
118
119
  if (st > rep_en) {
@@ -23,7 +23,7 @@ def readme():
23
23
 
24
24
  setup(
25
25
  name = 'mappy',
26
- version = '2.26',
26
+ version = '2.28',
27
27
  url = 'https://github.com/lh3/minimap2',
28
28
  description = 'Minimap2 python binding',
29
29
  long_description = readme(),
@@ -21,10 +21,11 @@ module Minimap2
21
21
  # * ava-ont : Nanopore read overlap
22
22
  # @param k [Integer] k-mer length, no larger than 28.
23
23
  # @param w [Integer] minimizer window size, no larger than 255.
24
- # @param min_cnt [Integer] mininum number of minimizers on a chain.
25
- # @param min_chain_score [Integer] minimum chaing score.
24
+ # @param min_cnt [Integer] minimum number of minimizers on a chain.
25
+ # @param min_chain_score [Integer] minimum chain score.
26
26
  # @param min_dp_score
27
- # @param bw [Integer] chaining and alignment band width.
27
+ # @param bw [Integer] chaining and alignment band width. (initial chaining and extension)
28
+ # @param bw_long [Integer] chaining and alignment band width (RMQ-based rechaining and closing gaps)
28
29
  # @param best_n [Integer] max number of alignments to return.
29
30
  # @param n_threads [Integer] number of indexing threads.
30
31
  # @param fn_idx_out [String] name of file to which the index is written.
@@ -47,6 +48,7 @@ module Minimap2
47
48
  min_chain_score: nil,
48
49
  min_dp_score: nil,
49
50
  bw: nil,
51
+ bw_long: nil,
50
52
  best_n: nil,
51
53
  n_threads: 3,
52
54
  fn_idx_out: nil,
@@ -72,6 +74,7 @@ module Minimap2
72
74
  map_opt[:min_chain_score] = min_chain_score if min_chain_score
73
75
  map_opt[:min_dp_max] = min_dp_score if min_dp_score
74
76
  map_opt[:bw] = bw if bw
77
+ map_opt[:bw_long] = bw_long if bw_long
75
78
  map_opt[:best_n] = best_n if best_n
76
79
  map_opt[:max_frag_len] = max_frag_len if max_frag_len
77
80
  map_opt[:flag] |= extra_flags if extra_flags
@@ -23,7 +23,7 @@ module Minimap2
23
23
  # @return [Integer] length of the matching bases in the alignment,
24
24
  # excluding ambiguous base matches.
25
25
  # @!attribute nm
26
- # @return [Integer] number of mismatches, gaps and ambiguous poistions in the alignment.
26
+ # @return [Integer] number of mismatches, gaps and ambiguous positions in the alignment.
27
27
  # @!attribute primary
28
28
  # @return [Integer] if the alignment is primary (typically the best and the first to generate)
29
29
  # @!attribute q_st
@@ -107,6 +107,7 @@ module Minimap2
107
107
  a = [@q_st, @q_en, strand, @ctg, @ctg_len, @r_st, @r_en,
108
108
  @mlen, @blen, @mapq, tp, ts, "cg:Z:#{@cigar_str}"]
109
109
  a << "cs:Z:#{@cs}" if @cs
110
+ a << "MD:Z:#{@md}" if @md
110
111
  a.join("\t")
111
112
  end
112
113
  end
@@ -40,6 +40,7 @@ module Minimap2
40
40
  NO_HASH_NAME = 0x400000000
41
41
  SPLICE_OLD = 0x800000000
42
42
  SECONDARY_SEQ = 0x1000000000 # output SEQ field for seqondary alignments using hard clipping
43
+ OUT_DS = 0x2000000000
43
44
 
44
45
  HPC = 0x1
45
46
  NO_SEQ = 0x2
@@ -109,8 +110,10 @@ module Minimap2
109
110
  :dp_score, :int32, # DP score
110
111
  :dp_max, :int32, # score of the max-scoring segment
111
112
  :dp_max2, :int32, # score of the best alternate mappings
113
+ :dp_max0, :int32, # DP score before mm_update_dp_max() adjustment
112
114
  :n_ambi_trans_strand, :uint32,
113
115
  :n_cigar, :uint32
116
+ # :cigar, :pointer # variable length array (see cigar method below)
114
117
 
115
118
  bit_field :n_ambi_trans_strand,
116
119
  :n_ambi, 30, # number of ambiguous bases
@@ -204,6 +207,7 @@ module Minimap2
204
207
  :e, :int, # gap-ext
205
208
  :q2, :int, # gap-open
206
209
  :e2, :int, # gap-ext
210
+ :transition, :int, # transition mismatch score (A:G, C:T)
207
211
  :sc_ambi, :int, # score when one or both bases are "N"
208
212
  :noncan, :int, # cost of non-canonical splicing sites
209
213
  :junc_bonus, :int,
@@ -223,7 +227,7 @@ module Minimap2
223
227
  :q_occ_frac, :float,
224
228
  :min_mid_occ, :int32,
225
229
  :max_mid_occ, :int32,
226
- :mid_occ, :int32, # ignore seeds with occurrences above this threshold
230
+ :mid_occ, :int32, # ignore seeds with occurrences above this threshold
227
231
  :max_occ, :int32,
228
232
  :max_max_occ, :int32,
229
233
  :occ_dist, :int32,
@@ -15,10 +15,11 @@ module Minimap2
15
15
  private_class_method :mm_set_opt_raw
16
16
 
17
17
  def self.mm_set_opt(preset, io, mo)
18
- ptr = if preset
19
- ::FFI::MemoryPointer.from_string(preset.to_s)
20
- else
18
+ ptr = case preset
19
+ when 0, nil
21
20
  ::FFI::Pointer.new(:int, 0)
21
+ else
22
+ ::FFI::MemoryPointer.from_string(preset.to_s)
22
23
  end
23
24
  mm_set_opt_raw(ptr, io, mo)
24
25
  end
@@ -77,5 +78,17 @@ module Minimap2
77
78
  :mm_gen_md, :mm_gen_MD, # Avoid uppercase letters in method names.
78
79
  [:pointer, :pointer, :pointer, Idx.by_ref, Reg1.by_ref, :string],
79
80
  :int
81
+
82
+ attach_function \
83
+ :mm_mapopt_init,
84
+ [MapOpt.by_ref],
85
+ :void
86
+
87
+ # mmpriv.h
88
+
89
+ attach_function \
90
+ :mm_idxopt_init,
91
+ [IdxOpt.by_ref],
92
+ :void
80
93
  end
81
94
  end
data/lib/minimap2/ffi.rb CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  # bit fields
4
4
  require "ffi/bit_struct"
5
+
5
6
  module Minimap2
6
7
  # Native APIs
7
8
  module FFI