minimap2 0.0.0 → 0.2.21

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2bedd8ea7d7cbcf1b91089982d695964fdd7efd0a3f5357bd2a4385e980eee26
4
- data.tar.gz: de3ba9df2cf9651ffa9a4b48d441ab8c8b51e56a9d75586f3d7385507c85c7c1
3
+ metadata.gz: 4bd850f529cb82950c16581735bdd74f232e0ef3490e5cb5b6f7045faa1fe696
4
+ data.tar.gz: 40d00cf14886a35f831b593d541cf9e72f8e5cf07d87be31116c215799449f62
5
5
  SHA512:
6
- metadata.gz: 0cb0a94415322856f59670e4f5ffb1e31c7f1e273438f5fe0b5e5074551d3f0019b349b77295d58a93d4838b1132919cb8f083442ee8b5bc7c5244a34cefbf3c
7
- data.tar.gz: 66c02083e183470297d4ee1afd0f26472956bdf77339b642ea7951ca2b9a72ba8747352c3e482a2e9810005c093cb98cb4ea8caa7ba6300c72b1f21f2b56abc7
6
+ metadata.gz: 669bd6d5a4eb0dc37f12ee4c0f9653bfe76afec70b8d592e291269cb97b90b493b398b8d68ebacb64ba2ce28187a32a32fdb3fb77ef070023ffa27983f479929
7
+ data.tar.gz: 12c2fd1ace06a7e6a1734cb27f09091851f3fe917714156b27a003a168815dbef83eabc00c56c701bdcd5f982db873346bca375b3e8f05764b7fb797d2d5c898
data/README.md CHANGED
@@ -1,30 +1,199 @@
1
- # Minimap2
1
+ # ruby-minimap2
2
+
3
+ [![Gem Version](https://img.shields.io/gem/v/minimap2?color=brightgreen)](https://rubygems.org/gems/minimap2)
4
+ [![CI](https://github.com/kojix2/ruby-minimap2/workflows/CI/badge.svg)](https://github.com/kojix2/ruby-minimap2/actions)
5
+ [![The MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.txt)
6
+ [![Docs Latest](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/minimap2)
7
+ [![DOI](https://zenodo.org/badge/325711305.svg)](https://zenodo.org/badge/latestdoi/325711305)
8
+
9
+
2
10
 
3
11
  :dna: [minimap2](https://github.com/lh3/minimap2) - the long-read mapper - for [Ruby](https://github.com/ruby/ruby)
4
12
 
5
13
  ## Installation
6
14
 
15
+ Open your terminal and type the following commands in order. You need to build minimap2 on your own because you need to create a shared library that contains cmappy functions.
16
+
17
+ Build
18
+
7
19
  ```sh
8
- gem install minimap2
20
+ git clone --recursive https://github.com/kojix2/ruby-minimap2
21
+ cd ruby-minimap2
22
+ bundle install
23
+ bundle exec rake minimap2:build
9
24
  ```
10
25
 
11
- ## Usage
26
+ Install
12
27
 
13
- ```sh
14
- # TODO
15
28
  ```
29
+ bundle exec rake install
30
+ ```
31
+
32
+ Ruby-minimap2 is [tested on Ubuntu and macOS](https://github.com/kojix2/ruby-minimap2/actions).
33
+
34
+ ## Quick Start
35
+
36
+ ```ruby
37
+ require "minimap2"
38
+ ```
39
+
40
+ Create aligner
41
+
42
+ ```ruby
43
+ aligner = Minimap2::Aligner.new("minimap2/test/MT-human.fa")
44
+ ```
45
+
46
+ Retrieve a subsequence from the index
47
+
48
+ ```ruby
49
+ seq = aligner.seq("MT_human", 100, 200)
50
+ ```
51
+
52
+ Mapping
53
+
54
+ ```ruby
55
+ hits = aligner.align(seq)
56
+ pp hits[0]
57
+ ```
58
+
59
+ ```
60
+ =>
61
+ #<Minimap2::Alignment:0x000055fe18223f50
62
+ @blen=100,
63
+ @cigar=[[100, 0]],
64
+ @cigar_str="100M",
65
+ @cs="",
66
+ @ctg="MT_human",
67
+ @ctg_len=16569,
68
+ @mapq=60,
69
+ @md="",
70
+ @mlen=100,
71
+ @nm=0,
72
+ @primary=1,
73
+ @q_en=100,
74
+ @q_st=0,
75
+ @r_en=200,
76
+ @r_st=100,
77
+ @read_num=1,
78
+ @strand=1,
79
+ @trans_strand=0>
80
+ ```
81
+
82
+ ## APIs Overview
83
+
84
+ API is based on [Mappy](https://github.com/lh3/minimap2/tree/master/python), the official Python binding for Minimap2.
85
+
86
+ Note: `Aligner#map` has been changed to `aligne`, because `map` means iterator in Ruby.
87
+
88
+ ```markdown
89
+ * Minimap2 module
90
+ - fastx_read Read fasta/fastq file.
91
+ - revcomp Reverse complement sequence.
92
+
93
+ * Aligner class
94
+ * attributes
95
+ - index Returns the value of attribute index.
96
+ - idx_opt Returns the value of attribute idx_opt.
97
+ - map_opt Returns the value of attribute map_opt.
98
+ * methods
99
+ - new(path, preset: nil) Create a new aligner. (presets: sr, map-pb, map-out, map-hifi, splice, asm5, etc.)
100
+ - align Maps and returns alignments.
101
+ - seq Retrieve a subsequence from the index.
102
+
103
+ * Alignment class
104
+ * attributes
105
+ - ctg Returns name of the reference sequence the query is mapped to.
106
+ - ctg_len Returns total length of the reference sequence.
107
+ - r_st Returns start positions on the reference.
108
+ - r_en Returns end positions on the reference.
109
+ - strand Returns +1 if on the forward strand; -1 if on the reverse strand.
110
+ - trans_strand Returns transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown.
111
+ - blen Returns length of the alignment, including both alignment matches and gaps but excluding ambiguous bases.
112
+ - mlen Returns length of the matching bases in the alignment, excluding ambiguous base matches.
113
+ - nm Returns number of mismatches, gaps and ambiguous poistions in the alignment.
114
+ - primary Returns if the alignment is primary (typically the best and the first to generate).
115
+ - q_st Returns start positions on the query.
116
+ - q_en Returns end positions on the query.
117
+ - mapq Returns mapping quality.
118
+ - cigar Returns CIGAR returned as an array of shape (n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.
119
+ - read_num Returns read number that the alignment corresponds to; 1 for the first read and 2 for the second read.
120
+ - cs Returns the cs tag.
121
+ - md Returns the MD tag as in the SAM format. It is an empty string unless the md argument is applied when calling Aligner#align.
122
+ - cigar_str Returns CIGAR string.
123
+ * methods
124
+ - to_h Convert Alignment to hash.
125
+ - to_s Convert to the PAF format without the QueryName and QueryLength columns.
126
+
127
+ ## FFI module
128
+ * IdxOpt class Indexing options.
129
+ * MapOpt class Mapping options.
130
+ ```
131
+
132
+ This is not all. See the [RubyDoc.info documentation](https://rubydoc.info/gems/minimap2/) for more details.
16
133
 
134
+ ruby-minimap2 is built on top of [Ruby-FFI](https://github.com/ffi/ffi).
135
+ Native functions can be called from the FFI module. FFI also provides the way to access some C structs.
136
+
137
+ ```ruby
138
+ aligner.idx_opt.members
139
+ # => [:k, :w, :flag, :bucket_bits, :mini_batch_size, :batch_size]
140
+ aligner.kds_opt.values
141
+ # => [15, 10, 0, 14, 50000000, 9223372036854775807]
142
+ aligner.idx_opt[:k]
143
+ # => 15
144
+ aligner.idx_opt[:k] = 14
145
+ aligner.idx_opt[:k]
146
+ # => 14
147
+ ```
17
148
 
18
149
  ## Development
19
150
 
151
+ Fork your repository.
152
+ then clone.
153
+
20
154
  ```sh
21
- # TODO
155
+ git clone --recursive https://github.com/kojix2/ruby-minimap2
156
+ # git clone https://github.com/kojix2/ruby-minimap2
157
+ # cd ruby-minimap2
158
+ # git submodule update -i
159
+ ```
160
+
161
+ Build Minimap2 and Mappy.
162
+
163
+ ```sh
164
+ cd ruby-minimap2
165
+ bundle install # Install dependent packages including Ruby-FFI
166
+ bundle exec rake minimap2:build
167
+ ```
168
+
169
+ A shared library will be created in the vendor directory.
170
+
171
+ ```
172
+ └── vendor
173
+ └── libminimap2.so
174
+ ```
175
+
176
+ Run tests.
177
+
178
+ ```
179
+ bundle exec rake test
22
180
  ```
23
181
 
24
182
  ## Contributing
25
183
 
26
- Bug reports and pull requests are welcome on GitHub at https://github.com/kojix2/ruby-minimap2.
184
+ ruby-minimap2 is a library under development and there are many points to be improved. Please feel free to send us your pull request.
185
+
186
+ * [Report bugs](https://github.com/kojix2/ruby-minimap2/issues)
187
+ * Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-minimap2/pulls)
188
+ * Write, clarify, or fix documentation
189
+ * Suggest or add new features
190
+ * Create tools based on ruby-minimap2
191
+ * Update minimap2 in github submodule
27
192
 
28
193
  ## License
29
194
 
30
195
  [MIT License](https://opensource.org/licenses/MIT).
196
+
197
+ ## Acknowledgements
198
+
199
+ I would like to thank Heng Li for making Minimap2, and all the readers who read the README to the end.
data/lib/minimap2.rb CHANGED
@@ -1,6 +1,78 @@
1
- require_relative "minimap/version"
1
+ # frozen_string_literal: true
2
2
 
3
- module Minimap
3
+ # dependencies
4
+ require 'ffi'
5
+
6
+ # bit fields
7
+ require_relative 'minimap2/ffi_helper'
8
+
9
+ # modules
10
+ require_relative 'minimap2/aligner'
11
+ require_relative 'minimap2/alignment'
12
+ require_relative 'minimap2/version'
13
+
14
+ # Minimap2 mapper for long read sequences
15
+ # https://github.com/lh3/minimap2
16
+ # Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100.
17
+ # doi:10.1093/bioinformatics/bty191
18
+ module Minimap2
4
19
  class Error < StandardError; end
5
- # Your code goes here...
20
+
21
+ class << self
22
+ attr_accessor :ffi_lib
23
+ end
24
+
25
+ lib_name = ::FFI.map_library_name('minimap2')
26
+ self.ffi_lib = if ENV['MINIMAPDIR']
27
+ File.expand_path(lib_name, ENV['MINIMAPDIR'])
28
+ else
29
+ File.expand_path("../vendor/#{lib_name}", __dir__)
30
+ end
31
+
32
+ # friendlier error message
33
+ autoload :FFI, 'minimap2/ffi'
34
+
35
+ # methods from mappy
36
+ class << self
37
+ # Read fasta/fastq file.
38
+ # @param [String] file_path
39
+ # @param [Boolean] read_comment If false or nil, the comment will not be read.
40
+ # @yield [name, seq, qual, comment]
41
+ # Note: You can also use a generic library such as BioRuby instead of this method.
42
+
43
+ def fastx_read(file_path, read_comment = false)
44
+ path = File.expand_path(file_path)
45
+ ks = FFI.mm_fastx_open(path)
46
+ while FFI.kseq_read(ks) >= 0
47
+ qual = ks[:qual][:s] if (ks[:qual][:l]).positive?
48
+ name = ks[:name][:s]
49
+ seq = ks[:seq][:s]
50
+ if read_comment
51
+ comment = ks[:comment][:s] if (ks[:comment][:l]).positive?
52
+ yield [name, seq, qual, comment]
53
+ else
54
+ yield [name, seq, qual]
55
+ end
56
+ end
57
+ FFI.mm_fastx_close(ks)
58
+ end
59
+
60
+ # Reverse complement sequence.
61
+ # @param [String] seq
62
+ # @return [string] seq
63
+
64
+ def revcomp(seq)
65
+ l = seq.size
66
+ bseq = ::FFI::MemoryPointer.new(:char, l)
67
+ bseq.put_bytes(0, seq)
68
+ FFI.mappy_revcomp(l, bseq)
69
+ end
70
+
71
+ # Set verbosity level.
72
+ # @param [Integer] level
73
+
74
+ def verbose(level = -1)
75
+ FFI.mm_verbose_level(level)
76
+ end
77
+ end
6
78
  end
@@ -0,0 +1,235 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minimap2
4
+ class Aligner
5
+ attr_reader :idx_opt, :map_opt, :index
6
+
7
+ # Create a new aligner.
8
+ #
9
+ # @param fn_idx_in [String] index or sequence file name.
10
+ # @param seq [String] a single sequence to index.
11
+ # @param preset [String] minimap2 preset.
12
+ # * map-pb : PacBio CLR genomic reads
13
+ # * map-ont : Oxford Nanopore genomic reads
14
+ # * map-hifi : PacBio HiFi/CCS genomic reads (v2.19 or later)
15
+ # * asm20 : PacBio HiFi/CCS genomic reads (v2.18 or earlier)
16
+ # * sr : short genomic paired-end reads
17
+ # * splice : spliced long reads (strand unknown)
18
+ # * splice:hq : Final PacBio Iso-seq or traditional cDNA
19
+ # * asm5 : intra-species asm-to-asm alignment
20
+ # * ava-pb : PacBio read overlap
21
+ # * ava-ont : Nanopore read overlap
22
+ # @param k [Integer] k-mer length, no larger than 28.
23
+ # @param w [Integer] minimizer window size, no larger than 255.
24
+ # @param min_cnt [Integer] mininum number of minimizers on a chain.
25
+ # @param min_chain_score [Integer] minimum chaing score.
26
+ # @param min_dp_score
27
+ # @param bw [Integer] chaining and alignment band width.
28
+ # @param best_n [Integer] max number of alignments to return.
29
+ # @param n_threads [Integer] number of indexing threads.
30
+ # @param fn_idx_out [String] name of file to which the index is written.
31
+ # This parameter has no effect if seq is set.
32
+ # @param max_frag_len [Integer]
33
+ # @param extra_flags [Integer] additional flags defined in minimap.h.
34
+ # @param scoring [Array] scoring system.
35
+ # It is a tuple/list consisting of 4, 6 or 7 positive integers.
36
+ # The first 4 elements specify match scoring, mismatch penalty, gap open and gap extension penalty.
37
+ # The 5th and 6th elements, if present, set long-gap open and long-gap extension penalty.
38
+ # The 7th sets a mismatch penalty involving ambiguous bases.
39
+
40
+ def initialize(
41
+ fn_idx_in = nil,
42
+ seq: nil,
43
+ preset: nil,
44
+ k: nil,
45
+ w: nil,
46
+ min_cnt: nil,
47
+ min_chain_score: nil,
48
+ min_dp_score: nil,
49
+ bw: nil,
50
+ best_n: nil,
51
+ n_threads: 3,
52
+ fn_idx_out: nil,
53
+ max_frag_len: nil,
54
+ extra_flags: nil,
55
+ scoring: nil
56
+ )
57
+
58
+ @idx_opt = FFI::IdxOpt.new
59
+ @map_opt = FFI::MapOpt.new
60
+
61
+ r = FFI.mm_set_opt(preset, idx_opt, map_opt)
62
+ raise ArgumentError, "Unknown preset name: #{preset}" if r == -1
63
+
64
+ # always perform alignment
65
+ map_opt[:flag] |= 4
66
+ idx_opt[:batch_size] = 0x7fffffffffffffff
67
+
68
+ # override preset options
69
+ idx_opt[:k] = k if k
70
+ idx_opt[:w] = w if w
71
+ map_opt[:min_cnt] = min_cnt if min_cnt
72
+ map_opt[:min_chain_score] = min_chain_score if min_chain_score
73
+ map_opt[:min_dp_max] = min_dp_score if min_dp_score
74
+ map_opt[:bw] = bw if bw
75
+ map_opt[:best_n] = best_n if best_n
76
+ map_opt[:max_frag_len] = max_frag_len if max_frag_len
77
+ map_opt[:flag] |= extra_flags if extra_flags
78
+ if scoring && scoring.size >= 4
79
+ map_opt[:a] = scoring[0]
80
+ map_opt[:b] = scoring[1]
81
+ map_opt[:q] = scoring[2]
82
+ map_opt[:e] = scoring[3]
83
+ map_opt[:q2] = map_opt.q
84
+ map_opt[:e2] = map_opt.e
85
+ if scoring.size >= 6
86
+ map_opt[:q2] = scoring[4]
87
+ map_opt[:e2] = scoring[5]
88
+ map_opt[:sc_ambi] = scoring[6] if scoring.size >= 7
89
+ end
90
+ end
91
+
92
+ if fn_idx_in
93
+ warn 'Since fn_idx_in is specified, the seq argument will be ignored.' if seq
94
+ reader = FFI.mm_idx_reader_open(fn_idx_in, idx_opt, fn_idx_out)
95
+
96
+ # The Ruby version raises an error here
97
+ raise "Cannot open : #{fn_idx_in}" if reader.null?
98
+
99
+ @index = FFI.mm_idx_reader_read(reader, n_threads)
100
+ FFI.mm_idx_reader_close(reader)
101
+ FFI.mm_mapopt_update(map_opt, index)
102
+ FFI.mm_idx_index_name(index)
103
+ elsif seq
104
+ @index = FFI.mappy_idx_seq(
105
+ idx_opt[:w], idx_opt[:k], idx_opt[:flag] & 1,
106
+ idx_opt[:bucket_bits], seq, seq.size
107
+ )
108
+ FFI.mm_mapopt_update(map_opt, index)
109
+ map_opt[:mid_occ] = 1000 # don't filter high-occ seeds
110
+ end
111
+ end
112
+
113
+ # Explicitly releases the memory of the index object.
114
+
115
+ def free_index
116
+ FFI.mm_idx_destroy(index) unless index.null?
117
+ end
118
+
119
+ # @param seq [String]
120
+ # @param seq2 [String]
121
+ # @param buf [FFI::TBuf]
122
+ # @param cs [true, false]
123
+ # @param md [true, false]
124
+ # @param max_frag_len [Integer]
125
+ # @param extra_flags [Integer]
126
+ # @note Name change: map -> align
127
+ # In the Ruby language, the name map means iterator.
128
+ # The original name is map, but here I use the method name align.
129
+ # @note The use of Enumerator is being considered. The method names may change again.
130
+ # @return [Array] alignments
131
+
132
+ def align(
133
+ seq, seq2 = nil,
134
+ buf: nil,
135
+ cs: false,
136
+ md: false,
137
+ max_frag_len: nil,
138
+ extra_flags: nil
139
+ )
140
+
141
+ return if index.null?
142
+
143
+ map_opt.max_frag_len = max_frag_len if max_frag_len
144
+ map_opt.flag |= extra_flags if extra_flags
145
+
146
+ buf ||= FFI::TBuf.new
147
+ km = FFI.mm_tbuf_get_km(buf)
148
+
149
+ n_regs_ptr = ::FFI::MemoryPointer.new :int
150
+ regs_ptr = FFI.mm_map_aux(index, seq, seq2, n_regs_ptr, buf, map_opt)
151
+ n_regs = n_regs_ptr.read_int
152
+
153
+ regs = Array.new(n_regs) do |i|
154
+ FFI::Reg1.new(regs_ptr + i * FFI::Reg1.size)
155
+ end
156
+
157
+ hit = FFI::Hit.new
158
+
159
+ cs_str = ::FFI::MemoryPointer.new(::FFI::MemoryPointer.new(:string))
160
+ m_cs_str = ::FFI::MemoryPointer.new :int
161
+
162
+ alignments = []
163
+
164
+ i = 0
165
+ begin
166
+ while i < n_regs
167
+ FFI.mm_reg2hitpy(index, regs[i], hit)
168
+
169
+ c = hit[:cigar32].read_array_of_uint32(hit[:n_cigar32])
170
+ cigar = c.map { |x| [x >> 4, x & 0xf] } # 32-bit CIGAR encoding -> Ruby array
171
+
172
+ _cs = ''
173
+ if cs
174
+ l_cs_str = FFI.mm_gen_cs(km, cs_str, m_cs_str, @index, regs[i], seq, 1)
175
+ _cs = cs_str.read_pointer.read_string(l_cs_str)
176
+ end
177
+
178
+ _md = ''
179
+ if md
180
+ l_cs_str = FFI.mm_gen_md(km, cs_str, m_cs_str, @index, regs[i], seq)
181
+ _md = cs_str.read_pointer.read_string(l_cs_str)
182
+ end
183
+
184
+ alignments << Alignment.new(hit, cigar, _cs, _md)
185
+
186
+ FFI.mm_free_reg1(regs[i])
187
+ i += 1
188
+ end
189
+ ensure
190
+ while i < n_regs
191
+ FFI.mm_free_reg1(regs[i])
192
+ i += 1
193
+ end
194
+ end
195
+ alignments
196
+ end
197
+
198
+ # Retrieve a subsequence from the index.
199
+ # @param name
200
+ # @param start
201
+ # @param stop
202
+
203
+ def seq(name, start = 0, stop = 0x7fffffff)
204
+ lp = ::FFI::MemoryPointer.new(:int)
205
+ s = FFI.mappy_fetch_seq(index, name, start, stop, lp)
206
+ l = lp.read_int
207
+ return nil if l.zero?
208
+
209
+ s.read_string(l)
210
+ end
211
+
212
+ # k-mer length, no larger than 28
213
+
214
+ def k
215
+ index[:k]
216
+ end
217
+
218
+ # minimizer window size, no larger than 255
219
+
220
+ def w
221
+ index[:w]
222
+ end
223
+
224
+ def n_seq
225
+ index[:n_seq]
226
+ end
227
+
228
+ def seq_names
229
+ ptr = index[:seq].to_ptr
230
+ Array.new(index[:n_seq]) do |i|
231
+ FFI::IdxSeq.new(ptr + i * FFI::IdxSeq.size)[:name]
232
+ end
233
+ end
234
+ end
235
+ end
@@ -0,0 +1,113 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minimap2
4
+ # Alignment result.
5
+ #
6
+ # @!attribute ctg
7
+ # @return [String] name of the reference sequence the query is mapped to.
8
+ # @!attribute ctg_len
9
+ # @return [Integer] total length of the reference sequence.
10
+ # @!attribute r_st
11
+ # @return [Integer] start positions on the reference.
12
+ # @!attribute r_en
13
+ # @return [Integer] end positions on the reference.
14
+ # @!attribute strand
15
+ # @return [Integer] +1 if on the forward strand; -1 if on the reverse strand.
16
+ # @!attribute trans_strand
17
+ # @return [Integer] transcript strand.
18
+ # +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown.
19
+ # @!attribute blen
20
+ # @return [Integer] length of the alignment, including both alignment matches and gaps
21
+ # but excluding ambiguous bases.
22
+ # @!attribute mlen
23
+ # @return [Integer] length of the matching bases in the alignment,
24
+ # excluding ambiguous base matches.
25
+ # @!attribute nm
26
+ # @return [Integer] number of mismatches, gaps and ambiguous poistions in the alignment.
27
+ # @!attribute primary
28
+ # @return [Integer] if the alignment is primary (typically the best and the first to generate)
29
+ # @!attribute q_st
30
+ # @return [Integer] start positions on the query.
31
+ # @!attribute q_en
32
+ # @return [Integer] end positions on the query.
33
+ # @!attribute mapq
34
+ # @return [Integer] mapping quality.
35
+ # @!attribute cigar
36
+ # @return [Array] CIGAR returned as an array of shape (n_cigar,2).
37
+ # The two numbers give the length and the operator of each CIGAR operation.
38
+ # @!attribute read_num
39
+ # @return [Integer] read number that the alignment corresponds to;
40
+ # 1 for the first read and 2 for the second read.
41
+ # @!attribute cs
42
+ # @return [String] the cs tag.
43
+ # @!attribute md
44
+ # @return [String] the MD tag as in the SAM format.
45
+ # It is an empty string unless the md argument is applied when calling Aligner#align.
46
+ # @!attribute cigar_str
47
+ # @return [String] CIGAR string.
48
+
49
+ class Alignment
50
+ def self.keys
51
+ %i[ctg ctg_len r_st r_en strand trans_strand blen mlen nm primary
52
+ q_st q_en mapq cigar read_num cs md cigar_str]
53
+ end
54
+
55
+ attr_reader(*keys)
56
+
57
+ def initialize(h, cigar, cs = nil, md = nil)
58
+ @ctg = h[:ctg]
59
+ @ctg_len = h[:ctg_len]
60
+ @r_st = h[:ctg_start]
61
+ @r_en = h[:ctg_end]
62
+ @strand = h[:strand]
63
+ @trans_strand = h[:trans_strand]
64
+ @blen = h[:blen]
65
+ @mlen = h[:mlen]
66
+ @nm = h[:NM]
67
+ @primary = h[:is_primary]
68
+ @q_st = h[:qry_start]
69
+ @q_en = h[:qry_end]
70
+ @mapq = h[:mapq]
71
+ @cigar = cigar
72
+ @read_num = h[:seg_id] + 1
73
+ @cs = cs
74
+ @md = md
75
+
76
+ @cigar_str = cigar.map { |x| x[0].to_s + FFI::CIGAR_STR[x[1]] }.join
77
+ end
78
+
79
+ def primary?
80
+ @primary == 1
81
+ end
82
+
83
+ # Convert Alignment to hash.
84
+
85
+ def to_h
86
+ self.class.keys.map { |k| [k, __send__(k)] }.to_h
87
+ end
88
+
89
+ # Convert to the PAF format without the QueryName and QueryLength columns.
90
+
91
+ def to_s
92
+ strand = if @strand.positive?
93
+ '+'
94
+ elsif @strand.negative?
95
+ '-'
96
+ else
97
+ '?'
98
+ end
99
+ tp = @primary != 0 ? 'tp:A:P' : 'tp:A:S'
100
+ ts = if @trans_strand.positive?
101
+ 'ts:A:+'
102
+ elsif @trans_strand.negative?
103
+ 'ts:A:-'
104
+ else
105
+ 'ts:A:.'
106
+ end
107
+ a = [@q_st, @q_en, strand, @ctg, @ctg_len, @r_st, @r_en,
108
+ @mlen, @blen, @mapq, tp, ts, "cg:Z:#{@cigar_str}"]
109
+ a << "cs:Z:#{@cs}" if @cs
110
+ a.join("\t")
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ # bit fields
4
+ require_relative 'ffi_helper'
5
+
6
+ module Minimap2
7
+ # Native APIs
8
+ module FFI
9
+ extend ::FFI::Library
10
+ begin
11
+ ffi_lib Minimap2.ffi_lib
12
+ rescue LoadError => e
13
+ raise LoadError, "Could not find #{Minimap2.ffi_lib} \n#{e}"
14
+ end
15
+
16
+ # Continue even if some functions are not found.
17
+ def self.attach_function(*)
18
+ super
19
+ rescue ::FFI::NotFoundError => e
20
+ warn e.message
21
+ end
22
+ end
23
+ end
24
+
25
+ require_relative 'ffi/constants'
26
+ require_relative 'ffi/functions'
27
+ require_relative 'ffi/mappy'
@@ -0,0 +1,231 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minimap2
4
+ module FFI
5
+ # flags
6
+ NO_DIAG = 0x001 # no exact diagonal hit
7
+ NO_DUAL = 0x002 # skip pairs where query name is lexicographically larger than target name
8
+ CIGAR = 0x004
9
+ OUT_SAM = 0x008
10
+ NO_QUAL = 0x010
11
+ OUT_CG = 0x020
12
+ OUT_CS = 0x040
13
+ SPLICE = 0x080 # splice mode
14
+ SPLICE_FOR = 0x100 # match GT-AG
15
+ SPLICE_REV = 0x200 # match CT-AC, the reverse complement of GT-AG
16
+ NO_LJOIN = 0x400
17
+ OUT_CS_LONG = 0x800
18
+ SR = 0x1000
19
+ FRAG_MODE = 0x2000
20
+ NO_PRINT_2ND = 0x4000
21
+ TWO_IO_THREADS = 0x8000 # Translator's Note. MM_F_2_IO_THREADS. Constants starting with numbers cannot be defined.
22
+ LONG_CIGAR = 0x10000
23
+ INDEPEND_SEG = 0x20000
24
+ SPLICE_FLANK = 0x40000
25
+ SOFTCLIP = 0x80000
26
+ FOR_ONLY = 0x100000
27
+ REV_ONLY = 0x200000
28
+ HEAP_SORT = 0x400000
29
+ ALL_CHAINS = 0x800000
30
+ OUT_MD = 0x1000000
31
+ COPY_COMMENT = 0x2000000
32
+ EQX = 0x4000000 # use =/X instead of M
33
+ PAF_NO_HIT = 0x8000000 # output unmapped reads to PAF
34
+ NO_END_FLT = 0x10000000
35
+ HARD_MLEVEL = 0x20000000
36
+ SAM_HIT_ONLY = 0x40000000
37
+ RMQ = 0x80000000 # LL
38
+
39
+ HPC = 0x1
40
+ NO_SEQ = 0x2
41
+ NO_NAME = 0x4
42
+
43
+ IDX_MAGIC = "MMI\2"
44
+
45
+ MAX_SEG = 255
46
+
47
+ CIGAR_STR = 'MIDNSHP=XB'
48
+
49
+ # emulate 128-bit integers
50
+ class MM128 < ::FFI::Struct
51
+ layout \
52
+ :x, :uint64_t,
53
+ :y, :uint64_t
54
+ end
55
+
56
+ # emulate 128-bit arrays
57
+ class MM128V < ::FFI::Struct
58
+ layout \
59
+ :n, :size_t,
60
+ :m, :size_t,
61
+ :a, MM128.ptr
62
+ end
63
+
64
+ # indexing option
65
+ class IdxOpt < ::FFI::Struct
66
+ layout \
67
+ :k, :short,
68
+ :w, :short,
69
+ :flag, :short,
70
+ :bucket_bits, :short,
71
+ :mini_batch_size, :int64_t,
72
+ :batch_size, :uint64_t
73
+ end
74
+
75
+ # mapping option
76
+ class MapOpt < ::FFI::Struct
77
+ layout \
78
+ :flag, :int64_t, # see MM_F_* macros
79
+ :seed, :int,
80
+ :sdust_thres, :int, # score threshold for SDUST; 0 to disable
81
+ :max_qlen, :int, # max query length
82
+ :bw, :int, # bandwidth
83
+ :bw_long, :int,
84
+ :max_gap, :int, # break a chain if there are no minimizers in a max_gap window
85
+ :max_gap_ref, :int,
86
+ :max_frag_len, :int,
87
+ :max_chain_skip, :int,
88
+ :max_chain_iter, :int,
89
+ :min_cnt, :int, # min number of minimizers on each chain
90
+ :min_chain_score, :int, # min chaining score
91
+ :chain_gap_scale, :float,
92
+ :rmq_size_cap, :int,
93
+ :rmq_inner_dist, :int,
94
+ :rmq_rescue_size, :int,
95
+ :rmq_rescue_ratio, :float,
96
+ :mask_level, :float,
97
+ :mask_len, :int,
98
+ :pri_ratio, :float,
99
+ :best_n, :int, # top best_n chains are subjected to DP alignment
100
+ :alt_drop, :float,
101
+ :a, :int, # matching score
102
+ :b, :int, # mismatch
103
+ :q, :int, # gap-open
104
+ :e, :int, # gap-ext
105
+ :q2, :int, # gap-open
106
+ :e2, :int, # gap-ext
107
+ :sc_ambi, :int, # score when one or both bases are "N"
108
+ :noncan, :int, # cost of non-canonical splicing sites
109
+ :junc_bonus, :int,
110
+ :zdrop, :int, # break alignment if alignment score drops too fast along the diagonal
111
+ :zdrop_inv, :int,
112
+ :end_bonus, :int,
113
+ :min_dp_max, :int, # drop an alignment if the score of the max scoring segment is below this threshold
114
+ :min_ksw_len, :int,
115
+ :anchor_ext_len, :int,
116
+ :anchor_ext_shift, :int,
117
+ :max_clip_ratio, :float, # drop an alignment if BOTH ends are clipped above this ratio
118
+ :pe_ori, :int,
119
+ :pe_bonus, :int,
120
+ :mid_occ_frac, :float, # only used by mm_mapopt_update(); see below
121
+ :min_mid_occ, :int32_t,
122
+ :mid_occ, :int32_t, # ignore seeds with occurrences above this threshold
123
+ :max_occ, :int32_t,
124
+ :mini_batch_size, :int64_t, # size of a batch of query bases to process in parallel
125
+ :max_sw_mat, :int64_t,
126
+ :split_prefix, :string
127
+ end
128
+
129
+ # minimap2 index
130
+ class IdxSeq < ::FFI::Struct
131
+ layout \
132
+ :name, :string, # name of the db sequence
133
+ :offset, :uint64_t, # offset in mm_idx_t::S
134
+ :len, :uint32_t, # length
135
+ :is_alt, :uint32_t
136
+ end
137
+
138
+ class Idx < ::FFI::Struct
139
+ layout \
140
+ :b, :int32_t,
141
+ :w, :int32_t,
142
+ :k, :int32_t,
143
+ :flag, :int32_t,
144
+ :n_seq, :uint32_t, # number of reference sequences
145
+ :index, :int32_t,
146
+ :n_alt, :int32_t,
147
+ :seq, IdxSeq.ptr, # sequence name, length and offset
148
+ :S, :pointer, # 4-bit packed sequence
149
+ :B, :pointer, # index (hidden)
150
+ :I, :pointer, # intervals (hidden)
151
+ :km, :pointer,
152
+ :h, :pointer
153
+ end
154
+
155
+ # index reader
156
+ class IdxReader < ::FFI::Struct
157
+ layout \
158
+ :is_idx, :int,
159
+ :n_parts, :int,
160
+ :idx_size, :int64_t,
161
+ :opt, IdxOpt,
162
+ :fp_out, :pointer, # FILE
163
+ :seq_or_idx, :pointer # FIXME: Union mm_bseq_files or FILE
164
+ end
165
+
166
+ # minimap2 alignment
167
+ class Extra < ::FFI::BitStruct
168
+ layout \
169
+ :capacity, :uint32, # the capacity of cigar[]
170
+ :dp_score, :int32, # DP score
171
+ :dp_max, :int32, # score of the max-scoring segment
172
+ :dp_max2, :int32, # score of the best alternate mappings
173
+ :n_ambi_trans_strand, :uint32,
174
+ :n_cigar, :uint32
175
+
176
+ bitfields :n_ambi_trans_strand,
177
+ :n_ambi, 30, # number of ambiguous bases
178
+ :trans_strand, 2 # transcript strand: 0 for unknown, 1 for +, 2 for -
179
+
180
+ # variable length array
181
+ def cigar
182
+ pointer.get_array_of_uint32(size, self[:n_cigar])
183
+ end
184
+ end
185
+
186
+ class Reg1 < ::FFI::BitStruct
187
+ layout \
188
+ :id, :int32_t, # ID for internal uses (see also parent below)
189
+ :cnt, :int32_t, # number of minimizers; if on the reverse strand
190
+ :rid, :int32_t, # reference index; if this is an alignment from inversion rescue
191
+ :score, :int32_t, # DP alignment score
192
+ :qs, :int32_t, # query start
193
+ :qe, :int32_t, # query end
194
+ :rs, :int32_t, # reference start
195
+ :re, :int32_t, # reference end
196
+ :parent, :int32_t, # parent==id if primary
197
+ :subsc, :int32_t, # best alternate mapping score
198
+ :as, :int32_t, # offset in the a[] array (for internal uses only)
199
+ :mlen, :int32_t, # seeded exact match length
200
+ :blen, :int32_t, # seeded alignment block length
201
+ :n_sub, :int32_t, # number of suboptimal mappings
202
+ :score0, :int32_t, # initial chaining score (before chain merging/spliting)
203
+ :fields, :uint32_t,
204
+ :hash, :uint32_t,
205
+ :div, :float,
206
+ :p, Extra.ptr
207
+
208
+ bitfields :fields,
209
+ :mapq, 8,
210
+ :split, 2,
211
+ :rev, 1,
212
+ :inv, 1,
213
+ :sam_pri, 1,
214
+ :proper_frag, 1,
215
+ :pe_thru, 1,
216
+ :seg_split, 1,
217
+ :seg_id, 8,
218
+ :split_inv, 1,
219
+ :is_alt, 1,
220
+ :dummy, 6
221
+ end
222
+
223
+ # memory buffer for thread-local storage during mapping
224
+ class TBuf < ::FFI::Struct
225
+ layout \
226
+ :km, :pointer,
227
+ :rep_len, :int,
228
+ :frag_gap, :int
229
+ end
230
+ end
231
+ end
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minimap2
4
+ module FFI
5
+ attach_function \
6
+ :mm_set_opt_raw, :mm_set_opt,
7
+ [:pointer, IdxOpt.by_ref, MapOpt.by_ref],
8
+ :int
9
+
10
+ private_class_method :mm_set_opt_raw
11
+
12
+ def self.mm_set_opt(preset, io, mo)
13
+ ptr = if preset
14
+ ::FFI::MemoryPointer.from_string(preset.to_s)
15
+ else
16
+ ::FFI::Pointer.new(:int, 0)
17
+ end
18
+ mm_set_opt_raw(ptr, io, mo)
19
+ end
20
+
21
+ attach_function \
22
+ :mm_idx_reader_open,
23
+ [:string, IdxOpt.by_ref, :string],
24
+ IdxReader.by_ref
25
+
26
+ attach_function \
27
+ :mm_idx_reader_read,
28
+ [IdxReader.by_ref, :int],
29
+ Idx.by_ref
30
+
31
+ attach_function \
32
+ :mm_idx_reader_close,
33
+ [IdxReader.by_ref],
34
+ :void
35
+
36
+ attach_function \
37
+ :mm_idx_destroy,
38
+ [Idx.by_ref],
39
+ :void
40
+
41
+ attach_function \
42
+ :mm_mapopt_update,
43
+ [MapOpt.by_ref, Idx.by_ref],
44
+ :void
45
+
46
+ attach_function \
47
+ :mm_idx_index_name,
48
+ [Idx.by_ref],
49
+ :int
50
+
51
+ attach_function \
52
+ :mm_tbuf_init,
53
+ [],
54
+ TBuf.by_ref
55
+
56
+ attach_function \
57
+ :mm_tbuf_destroy,
58
+ [TBuf.by_ref],
59
+ :void
60
+
61
+ attach_function \
62
+ :mm_tbuf_get_km,
63
+ [TBuf.by_ref],
64
+ :pointer
65
+
66
+ attach_function \
67
+ :mm_gen_cs,
68
+ [:pointer, :pointer, :pointer, Idx.by_ref, Reg1.by_ref, :string, :int],
69
+ :int
70
+
71
+ attach_function \
72
+ :mm_gen_md, :mm_gen_MD, # Avoid uppercase letters in method names.
73
+ [:pointer, :pointer, :pointer, Idx.by_ref, Reg1.by_ref, :string],
74
+ :int
75
+ end
76
+ end
@@ -0,0 +1,99 @@
1
+ # frozen_string_literal: true
2
+
3
+ # https://github.com/lh3/minimap2/blob/master/python/cmappy.h
4
+
5
+ module Minimap2
6
+ module FFI
7
+ class Hit < ::FFI::Struct
8
+ layout \
9
+ :ctg, :string,
10
+ :ctg_start, :int32_t,
11
+ :ctg_end, :int32_t,
12
+ :qry_start, :int32_t,
13
+ :qry_end, :int32_t,
14
+ :blen, :int32_t,
15
+ :mlen, :int32_t,
16
+ :NM, :int32_t,
17
+ :ctg_len, :int32_t,
18
+ :mapq, :uint8_t,
19
+ :is_primary, :uint8_t,
20
+ :strand, :int8_t,
21
+ :trans_strand, :int8_t,
22
+ :seg_id, :int32_t,
23
+ :n_cigar32, :int32_t,
24
+ :cigar32, :pointer
25
+ end
26
+
27
+ class KString < ::FFI::Struct
28
+ layout \
29
+ :l, :size_t,
30
+ :m, :size_t,
31
+ :s, :string
32
+ end
33
+
34
+ class KSeq < ::FFI::Struct
35
+ layout \
36
+ :name, KString,
37
+ :comment, KString,
38
+ :seq, KString,
39
+ :qual, KString,
40
+ :last_char, :int,
41
+ :f, :pointer # KStream
42
+ end
43
+
44
+ attach_function \
45
+ :mm_reg2hitpy,
46
+ [Idx.by_ref, Reg1.by_ref, Hit.by_ref],
47
+ :void
48
+
49
+ attach_function \
50
+ :mm_free_reg1,
51
+ [Reg1.by_ref],
52
+ :void
53
+
54
+ attach_function \
55
+ :mm_fastx_open,
56
+ [:string],
57
+ KSeq.by_ref
58
+
59
+ attach_function \
60
+ :mm_fastx_close,
61
+ [KSeq.by_ref],
62
+ :void
63
+
64
+ attach_function \
65
+ :mm_verbose_level,
66
+ [:int],
67
+ :int
68
+
69
+ attach_function \
70
+ :mm_reset_timer,
71
+ [:void],
72
+ :void
73
+
74
+ attach_function \
75
+ :mm_map_aux,
76
+ [Idx.by_ref, :string, :string, :pointer, TBuf.by_ref, MapOpt.by_ref],
77
+ :pointer # Reg1
78
+
79
+ attach_function \
80
+ :mappy_revcomp,
81
+ %i[int pointer],
82
+ :string
83
+
84
+ attach_function \
85
+ :mappy_fetch_seq,
86
+ [Idx.by_ref, :string, :int, :int, :pointer],
87
+ :pointer # Use pointer instead of string to read with a specified length
88
+
89
+ attach_function \
90
+ :mappy_idx_seq,
91
+ %i[int int int int string int],
92
+ Idx.by_ref
93
+
94
+ attach_function \
95
+ :kseq_read,
96
+ [KSeq.by_ref],
97
+ :int
98
+ end
99
+ end
@@ -0,0 +1,53 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'ffi'
4
+
5
+ module FFI
6
+ class BitStruct < Struct
7
+ class << self
8
+ # def union_layout(*args)
9
+ # Class.new(FFI::Union) { layout(*args) }
10
+ # end
11
+
12
+ # def struct_layout(*args)
13
+ # Class.new(FFI::Struct) { layout(*args) }
14
+ # end
15
+
16
+ module BitFieldsModule
17
+ def [](name)
18
+ bit_fields = self.class.bit_fields_map
19
+ parent, start, width = bit_fields[name]
20
+ if parent
21
+ (super(parent) >> start) & ((1 << width) - 1)
22
+ else
23
+ super(name)
24
+ end
25
+ end
26
+ end
27
+ private_constant :BitFieldsModule
28
+
29
+ attr_reader :bit_fields_map
30
+
31
+ def bitfields(*args)
32
+ unless instance_variable_defined?(:@bit_fields)
33
+ @bit_fields_map = {}
34
+ prepend BitFieldsModule
35
+ end
36
+
37
+ parent = args.shift
38
+ labels = []
39
+ widths = []
40
+ args.each_slice(2) do |l, w|
41
+ labels << l
42
+ widths << w
43
+ end
44
+ starts = widths.inject([0]) do |result, w|
45
+ result << (result.last + w)
46
+ end
47
+ labels.zip(starts, widths).each do |l, s, w|
48
+ @bit_fields_map[l] = [parent, s, w]
49
+ end
50
+ end
51
+ end
52
+ end
53
+ end
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minimap2
4
+ # Minimap2-2.21 (r1071).
5
+ VERSION = '0.2.21'
6
+ end
Binary file
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: minimap2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0
4
+ version: 0.2.21
5
5
  platform: ruby
6
6
  authors:
7
7
  - kojix2
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-12-31 00:00:00.000000000 Z
11
+ date: 2021-07-06 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: ffi
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: bundler
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -66,6 +80,20 @@ dependencies:
66
80
  - - ">="
67
81
  - !ruby/object:Gem::Version
68
82
  version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: tty-command
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
69
97
  description: minimap2
70
98
  email:
71
99
  - 2xijok@gmail.com
@@ -75,8 +103,16 @@ extra_rdoc_files: []
75
103
  files:
76
104
  - LICENSE.txt
77
105
  - README.md
78
- - lib/minimap/version.rb
79
106
  - lib/minimap2.rb
107
+ - lib/minimap2/aligner.rb
108
+ - lib/minimap2/alignment.rb
109
+ - lib/minimap2/ffi.rb
110
+ - lib/minimap2/ffi/constants.rb
111
+ - lib/minimap2/ffi/functions.rb
112
+ - lib/minimap2/ffi/mappy.rb
113
+ - lib/minimap2/ffi_helper.rb
114
+ - lib/minimap2/version.rb
115
+ - vendor/libminimap2.so
80
116
  homepage: https://github.com/kojix2/ruby-minimap2
81
117
  licenses:
82
118
  - MIT
@@ -96,7 +132,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
96
132
  - !ruby/object:Gem::Version
97
133
  version: '0'
98
134
  requirements: []
99
- rubygems_version: 3.1.4
135
+ rubygems_version: 3.2.15
100
136
  signing_key:
101
137
  specification_version: 4
102
138
  summary: minimap2
@@ -1,5 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- module Minimap
4
- VERSION = "0.0.0"
5
- end