minimap2 0.2.26.1 → 0.2.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,4 +1,4 @@
1
- .TH minimap2 1 "29 April 2023" "minimap2-2.26 (r1175)" "Bioinformatics tools"
1
+ .TH minimap2 1 "12 March 2024" "minimap2-2.28 (r1209)" "Bioinformatics tools"
2
2
  .SH NAME
3
3
  .PP
4
4
  minimap2 - mapping and alignment between collections of DNA sequences
@@ -268,6 +268,11 @@ or more of the shorter chain [0.5]
268
268
  Use the minigraph chaining algorithm [no]. The minigraph algorithm is better
269
269
  for aligning contigs through long INDELs.
270
270
  .TP
271
+ .BI --rmq-inner \ NUM
272
+ Apply full dynamic programming for anchors within distance
273
+ .I NUM
274
+ [1000].
275
+ .TP
271
276
  .B --hard-mask-level
272
277
  Honor option
273
278
  .B -M
@@ -343,6 +348,10 @@ Matching score [2]
343
348
  .BI -B \ INT
344
349
  Mismatching penalty [4]
345
350
  .TP
351
+ .BI -b \ INT
352
+ Mismatching penalty for transitions [same as
353
+ .BR -B ].
354
+ .TP
346
355
  .BI -O \ INT1[,INT2]
347
356
  Gap open penalty [4,24]. If
348
357
  .I INT2
@@ -356,10 +365,19 @@ costs
356
365
  .RI min{ O1 + k * E1 , O2 + k * E2 }.
357
366
  In the splice mode, the second gap penalties are not used.
358
367
  .TP
368
+ .BI -J \ INT
369
+ Splice model [1]. 0 for the original minimap2 splice model that always penalizes non-GT-AG splicing;
370
+ 1 for the miniprot model that considers non-GT-AG. Option
371
+ .B -C
372
+ has no effect with the default
373
+ .BR -J1 .
374
+ .BR -J0 .
375
+ .TP
359
376
  .BI -C \ INT
360
377
  Cost for a non-canonical GT-AG splicing (effective with
361
- .BR --splice )
362
- [0]
378
+ .B --splice
379
+ .BR -J0 )
380
+ [0].
363
381
  .TP
364
382
  .BI -z \ INT1[,INT2]
365
383
  Truncate an alignment if the running alignment score drops too quickly along
@@ -450,7 +468,7 @@ Set 0 to disable [100m].
450
468
  .BI --cap-kalloc \ NUM
451
469
  Free thread-local kalloc memory reservoir if after the alignment the size of the reservoir above
452
470
  .IR NUM .
453
- Set 0 to disable [0].
471
+ Set 0 to disable [500m].
454
472
  .SS Input/output options
455
473
  .TP 10
456
474
  .B -a
@@ -506,6 +524,9 @@ Output =/X CIGAR operators for sequence match/mismatch.
506
524
  .B -Y
507
525
  In SAM output, use soft clipping for supplementary alignments.
508
526
  .TP
527
+ .B --secondary-seq
528
+ In SAM output, show query sequences for secondary alignments.
529
+ .TP
509
530
  .BI --seed \ INT
510
531
  Integer seed for randomizing equally best hits. Minimap2 hashes
511
532
  .I INT
@@ -566,15 +587,43 @@ are:
566
587
  Align noisy long reads of ~10% error rate to a reference genome. This is the
567
588
  default mode.
568
589
  .TP
590
+ .B lr:hq
591
+ Align accurate long reads (error rate <1%) to a reference genome
592
+ .RB ( -k19
593
+ .B -w19 -U50,500
594
+ .BR -g10k ).
595
+ This was recommended by ONT developers for recent Nanopore reads
596
+ produced with chemistry v14 that can reach ~99% in accuracy.
597
+ It was shown to work better for accurate Nanopore reads
598
+ than
599
+ .BR map-hifi .
600
+ .TP
569
601
  .B map-hifi
570
602
  Align PacBio high-fidelity (HiFi) reads to a reference genome
571
- .RB ( -k19
572
- .B -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1
603
+ .RB ( -xlr:hq
604
+ .B -A1 -B4 -O6,26 -E2,1
573
605
  .BR -s200 ).
606
+ It differs from
607
+ .B lr:hq
608
+ only in scoring. It has not been tested whether
609
+ .B lr:hq
610
+ would work better for PacBio HiFi reads.
574
611
  .TP
575
612
  .B map-pb
576
613
  Align older PacBio continuous long (CLR) reads to a reference genome
577
614
  .RB ( -Hk19 ).
615
+ Note that this data type is effectively deprecated by HiFi.
616
+ Unless you work on very old data, you probably want to use
617
+ .B map-hifi
618
+ or
619
+ .BR lr:hq .
620
+ .TP
621
+ .B map-iclr
622
+ Align Illumina Complete Long Reads (ICLR) to a reference genome
623
+ .RB ( -k19
624
+ .B -B6 -b4
625
+ .BR -O10,50 ).
626
+ This was recommended by Illumina developers.
578
627
  .TP
579
628
  .B asm5
580
629
  Long assembly to reference mapping
@@ -582,21 +631,21 @@ Long assembly to reference mapping
582
631
  .B -w19 -U50,500 --rmq -r1k,100k -g10k -A1 -B19 -O39,81 -E3,1 -s200 -z200
583
632
  .BR -N50 ).
584
633
  Typically, the alignment will not extend to regions with 5% or higher sequence
585
- divergence. Only use this preset if the average divergence is far below 5%.
634
+ divergence. Use this preset if the average divergence is not much higher than 0.1%.
586
635
  .TP
587
636
  .B asm10
588
637
  Long assembly to reference mapping
589
638
  .RB ( -k19
590
639
  .B -w19 -U50,500 --rmq -r1k,100k -g10k -A1 -B9 -O16,41 -E2,1 -s200 -z200
591
640
  .BR -N50 ).
592
- Up to 10% sequence divergence.
641
+ Use this if the average divergence is around 1%.
593
642
  .TP
594
643
  .B asm20
595
644
  Long assembly to reference mapping
596
645
  .RB ( -k19
597
646
  .B -w10 -U50,500 --rmq -r1k,100k -g10k -A1 -B4 -O6,26 -E2,1 -s200 -z200
598
647
  .BR -N50 ).
599
- Up to 20% sequence divergence.
648
+ Use this if the average divergence is around several percent.
600
649
  .TP
601
650
  .B splice
602
651
  Long-read spliced alignment
@@ -612,13 +661,13 @@ costs are different during chaining; 4) the computation of the
612
661
  tag ignores introns to demote hits to pseudogenes.
613
662
  .TP
614
663
  .B splice:hq
615
- Long-read splice alignment for PacBio CCS reads
664
+ Spliced alignment for accurate long RNA-seq reads such as PacBio iso-seq
616
665
  .RB ( -xsplice
617
666
  .B -C5 -O6,24
618
667
  .BR -B4 ).
619
668
  .TP
620
669
  .B sr
621
- Short single-end reads without splicing
670
+ Short-read alignment without splicing
622
671
  .RB ( -k21
623
672
  .B -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -b0 -r100 -p.5 -N20 -f1000,5000 -n2 -m25
624
673
  .B -s40 -g100 -2K50m --heap-sort=yes
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/env k8
2
2
 
3
- var paftools_version = '2.26-r1175';
3
+ var paftools_version = '2.28-r1209';
4
4
 
5
5
  /*****************************
6
6
  ***** Library functions *****
@@ -133,26 +133,50 @@ Interval.find_ovlp = function(a, st, en)
133
133
 
134
134
  function fasta_read(fn)
135
135
  {
136
- var h = {}, gt = '>'.charCodeAt(0);
136
+ var h = {}, seqlen = [];
137
+ var buf = new Bytes();
137
138
  var file = fn == '-'? new File() : new File(fn);
138
- var buf = new Bytes(), seq = null, name = null, seqlen = [];
139
- while (file.readline(buf) >= 0) {
140
- if (buf[0] == gt) {
141
- if (seq != null && name != null) {
142
- seqlen.push([name, seq.length]);
143
- h[name] = seq;
144
- name = seq = null;
145
- }
146
- var m, line = buf.toString();
147
- if ((m = /^>(\S+)/.exec(line)) != null) {
148
- name = m[1];
149
- seq = new Bytes();
150
- }
151
- } else seq.set(buf);
152
- }
153
- if (seq != null && name != null) {
154
- seqlen.push([name, seq.length]);
155
- h[name] = seq;
139
+ if (typeof k8_version == "undefined") { // for k8-0.x
140
+ var seq = null, name = null, gt = '>'.charCodeAt(0);
141
+ while (file.readline(buf) >= 0) {
142
+ if (buf[0] == gt) {
143
+ if (seq != null && name != null) {
144
+ seqlen.push([name, seq.length]);
145
+ h[name] = seq;
146
+ name = seq = null;
147
+ }
148
+ var m, line = buf.toString();
149
+ if ((m = /^>(\S+)/.exec(line)) != null) {
150
+ name = m[1];
151
+ seq = new Bytes();
152
+ }
153
+ } else seq.set(buf);
154
+ }
155
+ if (seq != null && name != null) {
156
+ seqlen.push([name, seq.length]);
157
+ h[name] = seq;
158
+ }
159
+ } else { // for k8-1.x
160
+ var seq = null, name = null;
161
+ while (file.readline(buf) >= 0) {
162
+ var line = buf.toString();
163
+ if (line[0] == ">") {
164
+ if (seq != null && name != null) {
165
+ seqlen.push([name, seq.length]);
166
+ h[name] = new Uint8Array(seq.buffer);
167
+ name = seq = null;
168
+ }
169
+ var m;
170
+ if ((m = /^>(\S+)/.exec(line)) != null) {
171
+ name = m[1];
172
+ seq = new Bytes();
173
+ }
174
+ } else seq.set(line);
175
+ }
176
+ if (seq != null && name != null) {
177
+ seqlen.push([name, seq.length]);
178
+ h[name] = new Uint8Array(seq.buffer);
179
+ }
156
180
  }
157
181
  buf.destroy();
158
182
  file.close();
@@ -161,16 +185,27 @@ function fasta_read(fn)
161
185
 
162
186
  function fasta_free(fa)
163
187
  {
164
- for (var name in fa)
165
- fa[name].destroy();
188
+ if (typeof k8_version == "undefined")
189
+ for (var name in fa)
190
+ fa[name].destroy();
191
+ // FIXME: for k8-1.0, sequences are not freed. This is ok for now but not general.
166
192
  }
167
193
 
168
194
  Bytes.prototype.reverse = function()
169
195
  {
170
- for (var i = 0; i < this.length>>1; ++i) {
171
- var tmp = this[i];
172
- this[i] = this[this.length - i - 1];
173
- this[this.length - i - 1] = tmp;
196
+ if (typeof k8_version === "undefined") { // k8-0.x
197
+ for (var i = 0; i < this.length>>1; ++i) {
198
+ var tmp = this[i];
199
+ this[i] = this[this.length - i - 1];
200
+ this[this.length - i - 1] = tmp;
201
+ }
202
+ } else { // k8-1.x
203
+ var buf = new Uint8Array(this.buffer);
204
+ for (var i = 0; i < buf.length>>1; ++i) {
205
+ var tmp = buf[i];
206
+ buf[i] = buf[buf.length - i - 1];
207
+ buf[buf.length - i - 1] = tmp;
208
+ }
174
209
  }
175
210
  }
176
211
 
@@ -185,13 +220,24 @@ Bytes.prototype.revcomp = function()
185
220
  for (var i = 0; i < s1.length; ++i)
186
221
  Bytes.rctab[s1.charCodeAt(i)] = s2.charCodeAt(i);
187
222
  }
188
- for (var i = 0; i < this.length>>1; ++i) {
189
- var tmp = this[this.length - i - 1];
190
- this[this.length - i - 1] = Bytes.rctab[this[i]];
191
- this[i] = Bytes.rctab[tmp];
223
+ if (typeof k8_version === "undefined") { // k8-0.x
224
+ for (var i = 0; i < this.length>>1; ++i) {
225
+ var tmp = this[this.length - i - 1];
226
+ this[this.length - i - 1] = Bytes.rctab[this[i]];
227
+ this[i] = Bytes.rctab[tmp];
228
+ }
229
+ if (this.length&1)
230
+ this[this.length>>1] = Bytes.rctab[this[this.length>>1]];
231
+ } else { // k8-1.x
232
+ var buf = new Uint8Array(this.buffer);
233
+ for (var i = 0; i < buf.length>>1; ++i) {
234
+ var tmp = buf[buf.length - i - 1];
235
+ buf[buf.length - i - 1] = Bytes.rctab[buf[i]];
236
+ buf[i] = Bytes.rctab[tmp];
237
+ }
238
+ if (buf.length&1)
239
+ buf[buf.length>>1] = Bytes.rctab[buf[buf.length>>1]];
192
240
  }
193
- if (this.length&1)
194
- this[this.length>>1] = Bytes.rctab[this[this.length>>1]];
195
241
  }
196
242
 
197
243
  /********************
@@ -1694,15 +1740,17 @@ function paf_gff2bed(args)
1694
1740
 
1695
1741
  function paf_sam2paf(args)
1696
1742
  {
1697
- var c, pri_only = false, long_cs = false;
1698
- while ((c = getopt(args, "pL")) != null) {
1743
+ var c, pri_only = false, long_cs = false, pri_pri_only = false;
1744
+ while ((c = getopt(args, "pPL")) != null) {
1699
1745
  if (c == 'p') pri_only = true;
1746
+ else if (c == 'P') pri_pri_only = pri_only = true;
1700
1747
  else if (c == 'L') long_cs = true;
1701
1748
  }
1702
1749
  if (args.length == getopt.ind) {
1703
1750
  print("Usage: paftools.js sam2paf [options] <in.sam>");
1704
1751
  print("Options:");
1705
1752
  print(" -p convert primary or supplementary alignments only");
1753
+ print(" -P convert primary alignments only");
1706
1754
  print(" -L output the cs tag in the long form");
1707
1755
  exit(1);
1708
1756
  }
@@ -1729,6 +1777,7 @@ function paf_sam2paf(args)
1729
1777
  throw Error("at line " + lineno + ": inconsistent SEQ and QUAL lengths - " + t[9].length + " != " + t[10].length);
1730
1778
  if (t[2] == '*' || (flag&4) || t[5] == '*') continue;
1731
1779
  if (pri_only && (flag&0x100)) continue;
1780
+ if (pri_pri_only && (flag&0x900)) continue;
1732
1781
  var tlen = ctg_len[t[2]];
1733
1782
  if (tlen == null) throw Error("at line " + lineno + ": can't find the length of contig " + t[2]);
1734
1783
  // find tags
@@ -1841,7 +1890,10 @@ function paf_sam2paf(args)
1841
1890
  // optional tags
1842
1891
  var type = flag&0x100? 'S' : 'P';
1843
1892
  var tags = ["tp:A:" + type];
1844
- if (NM != null) tags.push("mm:i:"+mm);
1893
+ if (NM != null) {
1894
+ tags.push("NM:i:"+NM);
1895
+ tags.push("mm:i:"+mm);
1896
+ }
1845
1897
  tags.push("gn:i:"+(I[1]+D[1]), "go:i:"+(I[0]+D[0]), "cg:Z:" + t[5].replace(/\d+[SH]/g, ''));
1846
1898
  if (cs_str != null) tags.push("cs:Z:" + cs_str);
1847
1899
  else if (cs.length > 0) tags.push("cs:Z:" + cs.join(""));
@@ -2051,7 +2103,7 @@ function paf_mapeval(args)
2051
2103
  warn("Usage: paftools.js mapeval [options] <in.paf>|<in.sam>");
2052
2104
  warn("Options:");
2053
2105
  warn(" -r FLOAT mapping correct if overlap_length/union_length>FLOAT [" + ovlp_ratio + "]");
2054
- warn(" -Q INT print wrong mappings with mapQ>INT [don't print]");
2106
+ warn(" -Q INT print wrong mappings with mapQ>=INT [don't print]");
2055
2107
  warn(" -m INT 0: eval the longest aln only; 1: first aln only; 2: all primary aln [0]");
2056
2108
  exit(1);
2057
2109
  }
@@ -14,6 +14,7 @@
14
14
  #define MM_DBG_PRINT_SEED 0x4
15
15
  #define MM_DBG_PRINT_ALN_SEQ 0x8
16
16
  #define MM_DBG_PRINT_CHAIN 0x10
17
+ #define MM_DBG_SEED_FREQ 0x20
17
18
 
18
19
  #define MM_SEED_LONG_JOIN (1ULL<<40)
19
20
  #define MM_SEED_IGNORE (1ULL<<41)
@@ -79,8 +80,6 @@ int mm_idx_getseq2(const mm_idx_t *mi, int is_rev, uint32_t rid, uint32_t st, ui
79
80
  mm_reg1_t *mm_align_skeleton(void *km, const mm_mapopt_t *opt, const mm_idx_t *mi, int qlen, const char *qstr, int *n_regs_, mm_reg1_t *regs, mm128_t *a);
80
81
  mm_reg1_t *mm_gen_regs(void *km, uint32_t hash, int qlen, int n_u, uint64_t *u, mm128_t *a, int is_qstrand);
81
82
 
82
- mm128_t *mm_chain_dp(int max_dist_x, int max_dist_y, int bw, int max_skip, int max_iter, int min_cnt, int min_sc, float gap_scale,
83
- int is_cdna, int n_segs, int64_t n, mm128_t *a, int *n_u_, uint64_t **_u, void *km);
84
83
  mm128_t *mg_lchain_dp(int max_dist_x, int max_dist_y, int bw, int max_skip, int max_iter, int min_cnt, int min_sc, float chn_pen_gap, float chn_pen_skip,
85
84
  int is_cdna, int n_segs, int64_t n, mm128_t *a, int *n_u_, uint64_t **_u, void *km);
86
85
  mm128_t *mg_lchain_rmq(int max_dist, int max_dist_inner, int bw, int max_chn_skip, int cap_rmq_size, int min_cnt, int min_sc, float chn_pen_gap, float chn_pen_skip,
@@ -45,6 +45,7 @@ void mm_mapopt_init(mm_mapopt_t *opt)
45
45
  opt->alt_drop = 0.15f;
46
46
 
47
47
  opt->a = 2, opt->b = 4, opt->q = 4, opt->e = 2, opt->q2 = 24, opt->e2 = 1;
48
+ opt->transition = 0;
48
49
  opt->sc_ambi = 1;
49
50
  opt->zdrop = 400, opt->zdrop_inv = 200;
50
51
  opt->end_bonus = -1;
@@ -54,7 +55,7 @@ void mm_mapopt_init(mm_mapopt_t *opt)
54
55
  opt->max_clip_ratio = 1.0f;
55
56
  opt->mini_batch_size = 500000000;
56
57
  opt->max_sw_mat = 100000000;
57
- opt->cap_kalloc = 1000000000;
58
+ opt->cap_kalloc = 500000000;
58
59
 
59
60
  opt->rank_min_len = 500;
60
61
  opt->rank_frac = 0.9f;
@@ -90,7 +91,7 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
90
91
  if (preset == 0) {
91
92
  mm_idxopt_init(io);
92
93
  mm_mapopt_init(mo);
93
- } else if (strcmp(preset, "map-ont") == 0) { // this is the same as the default
94
+ } else if (strcmp(preset, "lr") == 0 || strcmp(preset, "map-ont") == 0) { // this is the same as the default
94
95
  } else if (strcmp(preset, "ava-ont") == 0) {
95
96
  io->flag = 0, io->k = 15, io->w = 5;
96
97
  mo->flag |= MM_F_ALL_CHAINS | MM_F_NO_DIAG | MM_F_NO_DUAL | MM_F_NO_LJOIN;
@@ -105,13 +106,30 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
105
106
  mo->min_chain_score = 100, mo->pri_ratio = 0.0f, mo->max_chain_skip = 25;
106
107
  mo->bw_long = mo->bw;
107
108
  mo->occ_dist = 0;
108
- } else if (strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
109
+ } else if (strcmp(preset, "lr:hq") == 0 || strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
109
110
  io->flag = 0, io->k = 19, io->w = 19;
110
111
  mo->max_gap = 10000;
111
- mo->a = 1, mo->b = 4, mo->q = 6, mo->q2 = 26, mo->e = 2, mo->e2 = 1;
112
- mo->occ_dist = 500;
113
112
  mo->min_mid_occ = 50, mo->max_mid_occ = 500;
114
- mo->min_dp_max = 200;
113
+ if (strcmp(preset, "map-hifi") == 0 || strcmp(preset, "map-ccs") == 0) {
114
+ mo->a = 1, mo->b = 4, mo->q = 6, mo->q2 = 26, mo->e = 2, mo->e2 = 1;
115
+ mo->min_dp_max = 200;
116
+ }
117
+ } else if (strcmp(preset, "lr:hqae") == 0) { // high-quality assembly evaluation
118
+ io->flag = 0, io->k = 25, io->w = 51;
119
+ mo->flag |= MM_F_RMQ;
120
+ mo->min_mid_occ = 50, mo->max_mid_occ = 500;
121
+ mo->rmq_inner_dist = 5000;
122
+ mo->occ_dist = 200;
123
+ mo->best_n = 100;
124
+ mo->chain_gap_scale = 5.0f;
125
+ } else if (strcmp(preset, "map-iclr-prerender") == 0) {
126
+ io->flag = 0, io->k = 15;
127
+ mo->b = 6, mo->transition = 1;
128
+ mo->q = 10, mo->q2 = 50;
129
+ } else if (strcmp(preset, "map-iclr") == 0) {
130
+ io->flag = 0, io->k = 19;
131
+ mo->b = 6, mo->transition = 4;
132
+ mo->q = 10, mo->q2 = 50;
115
133
  } else if (strncmp(preset, "asm", 3) == 0) {
116
134
  io->flag = 0, io->k = 19, io->w = 19;
117
135
  mo->bw = 1000, mo->bw_long = 100000;
@@ -156,7 +174,7 @@ int mm_set_opt(const char *preset, mm_idxopt_t *io, mm_mapopt_t *mo)
156
174
  mo->junc_bonus = 9;
157
175
  mo->zdrop = 200, mo->zdrop_inv = 100; // because mo->a is halved
158
176
  if (strcmp(preset, "splice:hq") == 0)
159
- mo->junc_bonus = 5, mo->b = 4, mo->q = 6, mo->q2 = 24;
177
+ mo->noncan = 5, mo->b = 4, mo->q = 6, mo->q2 = 24;
160
178
  } else return -1;
161
179
  return 0;
162
180
  }
@@ -77,7 +77,9 @@ This constructor accepts the following arguments:
77
77
 
78
78
  * **min_chain_score**: minimum chaing score
79
79
 
80
- * **bw**: chaining and alignment band width
80
+ * **bw**: chaining and alignment band width (initial chaining and extension)
81
+
82
+ * **bw_long**: chaining and alignment band width (RMQ-based rechaining and closing gaps)
81
83
 
82
84
  * **best_n**: max number of alignments to return
83
85
 
@@ -36,6 +36,7 @@ cdef extern from "minimap.h":
36
36
  float alt_drop
37
37
 
38
38
  int a, b, q, e, q2, e2
39
+ int transition
39
40
  int sc_ambi
40
41
  int noncan
41
42
  int junc_bonus
@@ -3,7 +3,7 @@ from libc.stdlib cimport free
3
3
  cimport cmappy
4
4
  import sys
5
5
 
6
- __version__ = '2.26'
6
+ __version__ = '2.28'
7
7
 
8
8
  cmappy.mm_reset_timer()
9
9
 
@@ -96,6 +96,7 @@ cdef class Alignment:
96
96
  a = [str(self._q_st), str(self._q_en), strand, self._ctg, str(self._ctg_len), str(self._r_st), str(self._r_en),
97
97
  str(self._mlen), str(self._blen), str(self._mapq), tp, ts, "cg:Z:" + self.cigar_str]
98
98
  if self._cs != "": a.append("cs:Z:" + self._cs)
99
+ if self._MD != "": a.append("MD:Z:" + self._MD)
99
100
  return "\t".join(a)
100
101
 
101
102
  cdef class ThreadBuffer:
@@ -112,7 +113,7 @@ cdef class Aligner:
112
113
  cdef cmappy.mm_idxopt_t idx_opt
113
114
  cdef cmappy.mm_mapopt_t map_opt
114
115
 
115
- def __cinit__(self, fn_idx_in=None, preset=None, k=None, w=None, min_cnt=None, min_chain_score=None, min_dp_score=None, bw=None, best_n=None, n_threads=3, fn_idx_out=None, max_frag_len=None, extra_flags=None, seq=None, scoring=None):
116
+ def __cinit__(self, fn_idx_in=None, preset=None, k=None, w=None, min_cnt=None, min_chain_score=None, min_dp_score=None, bw=None, bw_long=None, best_n=None, n_threads=3, fn_idx_out=None, max_frag_len=None, extra_flags=None, seq=None, scoring=None):
116
117
  self._idx = NULL
117
118
  cmappy.mm_set_opt(NULL, &self.idx_opt, &self.map_opt) # set the default options
118
119
  if preset is not None:
@@ -125,6 +126,7 @@ cdef class Aligner:
125
126
  if min_chain_score is not None: self.map_opt.min_chain_score = min_chain_score
126
127
  if min_dp_score is not None: self.map_opt.min_dp_max = min_dp_score
127
128
  if bw is not None: self.map_opt.bw = bw
129
+ if bw_long is not None: self.map_opt.bw_long = bw_long
128
130
  if best_n is not None: self.map_opt.best_n = best_n
129
131
  if max_frag_len is not None: self.map_opt.max_frag_len = max_frag_len
130
132
  if extra_flags is not None: self.map_opt.flag |= extra_flags
@@ -5,7 +5,7 @@ import getopt
5
5
  import mappy as mp
6
6
 
7
7
  def main(argv):
8
- opts, args = getopt.getopt(argv[1:], "x:n:m:k:w:r:c")
8
+ opts, args = getopt.getopt(argv[1:], "x:n:m:k:w:r:cM")
9
9
  if len(args) < 2:
10
10
  print("Usage: minimap2.py [options] <ref.fa>|<ref.mmi> <query.fq>")
11
11
  print("Options:")
@@ -16,10 +16,11 @@ def main(argv):
16
16
  print(" -w INT minimizer window length")
17
17
  print(" -r INT band width")
18
18
  print(" -c output the cs tag")
19
+ print(" -M output the MD tag")
19
20
  sys.exit(1)
20
21
 
21
22
  preset = min_cnt = min_sc = k = w = bw = None
22
- out_cs = False
23
+ out_cs = out_MD = False
23
24
  for opt, arg in opts:
24
25
  if opt == '-x': preset = arg
25
26
  elif opt == '-n': min_cnt = int(arg)
@@ -28,11 +29,12 @@ def main(argv):
28
29
  elif opt == '-k': k = int(arg)
29
30
  elif opt == '-w': w = int(arg)
30
31
  elif opt == '-c': out_cs = True
32
+ elif opt == '-M': out_MD = True
31
33
 
32
34
  a = mp.Aligner(args[0], preset=preset, min_cnt=min_cnt, min_chain_score=min_sc, k=k, w=w, bw=bw)
33
35
  if not a: raise Exception("ERROR: failed to load/build index file '{}'".format(args[0]))
34
36
  for name, seq, qual in mp.fastx_read(args[1]): # read one sequence
35
- for h in a.map(seq, cs=out_cs): # traverse hits
37
+ for h in a.map(seq, cs=out_cs, MD=out_MD): # traverse hits
36
38
  print('{}\t{}\t{}'.format(name, len(seq), h))
37
39
 
38
40
  if __name__ == "__main__":
data/ext/minimap2/seed.c CHANGED
@@ -112,7 +112,8 @@ mm_seed_t *mm_collect_matches(void *km, int *_n_m, int qlen, int max_occ, int ma
112
112
  }
113
113
  for (i = 0, n_m = 0, *rep_len = 0, *n_a = 0; i < n_m0; ++i) {
114
114
  mm_seed_t *q = &m[i];
115
- //fprintf(stderr, "X\t%d\t%d\t%d\n", q->q_pos>>1, q->n, q->flt);
115
+ if (mm_dbg_flag & MM_DBG_SEED_FREQ)
116
+ fprintf(stderr, "SF\t%d\t%d\t%d\n", q->q_pos>>1, q->n, q->flt);
116
117
  if (q->flt) {
117
118
  int en = (q->q_pos >> 1) + 1, st = en - q->q_span;
118
119
  if (st > rep_en) {
@@ -23,7 +23,7 @@ def readme():
23
23
 
24
24
  setup(
25
25
  name = 'mappy',
26
- version = '2.26',
26
+ version = '2.28',
27
27
  url = 'https://github.com/lh3/minimap2',
28
28
  description = 'Minimap2 python binding',
29
29
  long_description = readme(),
@@ -21,10 +21,11 @@ module Minimap2
21
21
  # * ava-ont : Nanopore read overlap
22
22
  # @param k [Integer] k-mer length, no larger than 28.
23
23
  # @param w [Integer] minimizer window size, no larger than 255.
24
- # @param min_cnt [Integer] mininum number of minimizers on a chain.
25
- # @param min_chain_score [Integer] minimum chaing score.
24
+ # @param min_cnt [Integer] minimum number of minimizers on a chain.
25
+ # @param min_chain_score [Integer] minimum chain score.
26
26
  # @param min_dp_score
27
- # @param bw [Integer] chaining and alignment band width.
27
+ # @param bw [Integer] chaining and alignment band width. (initial chaining and extension)
28
+ # @param bw_long [Integer] chaining and alignment band width (RMQ-based rechaining and closing gaps)
28
29
  # @param best_n [Integer] max number of alignments to return.
29
30
  # @param n_threads [Integer] number of indexing threads.
30
31
  # @param fn_idx_out [String] name of file to which the index is written.
@@ -47,6 +48,7 @@ module Minimap2
47
48
  min_chain_score: nil,
48
49
  min_dp_score: nil,
49
50
  bw: nil,
51
+ bw_long: nil,
50
52
  best_n: nil,
51
53
  n_threads: 3,
52
54
  fn_idx_out: nil,
@@ -72,6 +74,7 @@ module Minimap2
72
74
  map_opt[:min_chain_score] = min_chain_score if min_chain_score
73
75
  map_opt[:min_dp_max] = min_dp_score if min_dp_score
74
76
  map_opt[:bw] = bw if bw
77
+ map_opt[:bw_long] = bw_long if bw_long
75
78
  map_opt[:best_n] = best_n if best_n
76
79
  map_opt[:max_frag_len] = max_frag_len if max_frag_len
77
80
  map_opt[:flag] |= extra_flags if extra_flags
@@ -23,7 +23,7 @@ module Minimap2
23
23
  # @return [Integer] length of the matching bases in the alignment,
24
24
  # excluding ambiguous base matches.
25
25
  # @!attribute nm
26
- # @return [Integer] number of mismatches, gaps and ambiguous poistions in the alignment.
26
+ # @return [Integer] number of mismatches, gaps and ambiguous positions in the alignment.
27
27
  # @!attribute primary
28
28
  # @return [Integer] if the alignment is primary (typically the best and the first to generate)
29
29
  # @!attribute q_st
@@ -107,6 +107,7 @@ module Minimap2
107
107
  a = [@q_st, @q_en, strand, @ctg, @ctg_len, @r_st, @r_en,
108
108
  @mlen, @blen, @mapq, tp, ts, "cg:Z:#{@cigar_str}"]
109
109
  a << "cs:Z:#{@cs}" if @cs
110
+ a << "MD:Z:#{@md}" if @md
110
111
  a.join("\t")
111
112
  end
112
113
  end
@@ -40,6 +40,7 @@ module Minimap2
40
40
  NO_HASH_NAME = 0x400000000
41
41
  SPLICE_OLD = 0x800000000
42
42
  SECONDARY_SEQ = 0x1000000000 # output SEQ field for seqondary alignments using hard clipping
43
+ OUT_DS = 0x2000000000
43
44
 
44
45
  HPC = 0x1
45
46
  NO_SEQ = 0x2
@@ -109,8 +110,10 @@ module Minimap2
109
110
  :dp_score, :int32, # DP score
110
111
  :dp_max, :int32, # score of the max-scoring segment
111
112
  :dp_max2, :int32, # score of the best alternate mappings
113
+ :dp_max0, :int32, # DP score before mm_update_dp_max() adjustment
112
114
  :n_ambi_trans_strand, :uint32,
113
115
  :n_cigar, :uint32
116
+ # :cigar, :pointer # variable length array (see cigar method below)
114
117
 
115
118
  bit_field :n_ambi_trans_strand,
116
119
  :n_ambi, 30, # number of ambiguous bases
@@ -204,6 +207,7 @@ module Minimap2
204
207
  :e, :int, # gap-ext
205
208
  :q2, :int, # gap-open
206
209
  :e2, :int, # gap-ext
210
+ :transition, :int, # transition mismatch score (A:G, C:T)
207
211
  :sc_ambi, :int, # score when one or both bases are "N"
208
212
  :noncan, :int, # cost of non-canonical splicing sites
209
213
  :junc_bonus, :int,
@@ -223,7 +227,7 @@ module Minimap2
223
227
  :q_occ_frac, :float,
224
228
  :min_mid_occ, :int32,
225
229
  :max_mid_occ, :int32,
226
- :mid_occ, :int32, # ignore seeds with occurrences above this threshold
230
+ :mid_occ, :int32, # ignore seeds with occurrences above this threshold
227
231
  :max_occ, :int32,
228
232
  :max_max_occ, :int32,
229
233
  :occ_dist, :int32,
@@ -15,10 +15,11 @@ module Minimap2
15
15
  private_class_method :mm_set_opt_raw
16
16
 
17
17
  def self.mm_set_opt(preset, io, mo)
18
- ptr = if preset
19
- ::FFI::MemoryPointer.from_string(preset.to_s)
20
- else
18
+ ptr = case preset
19
+ when 0, nil
21
20
  ::FFI::Pointer.new(:int, 0)
21
+ else
22
+ ::FFI::MemoryPointer.from_string(preset.to_s)
22
23
  end
23
24
  mm_set_opt_raw(ptr, io, mo)
24
25
  end
@@ -77,5 +78,17 @@ module Minimap2
77
78
  :mm_gen_md, :mm_gen_MD, # Avoid uppercase letters in method names.
78
79
  [:pointer, :pointer, :pointer, Idx.by_ref, Reg1.by_ref, :string],
79
80
  :int
81
+
82
+ attach_function \
83
+ :mm_mapopt_init,
84
+ [MapOpt.by_ref],
85
+ :void
86
+
87
+ # mmpriv.h
88
+
89
+ attach_function \
90
+ :mm_idxopt_init,
91
+ [IdxOpt.by_ref],
92
+ :void
80
93
  end
81
94
  end
data/lib/minimap2/ffi.rb CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  # bit fields
4
4
  require "ffi/bit_struct"
5
+
5
6
  module Minimap2
6
7
  # Native APIs
7
8
  module FFI