htslib 0.2.0 → 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 87b06f64c496150db76d689f2cd06ba61e8ff424511be781bf17abbdf5d92d6c
4
- data.tar.gz: 10126e01757aafb8fbb18c4f5d2f58541511226ddf922ffb253d29ccaf6c2027
3
+ metadata.gz: ea47a473d63b90fd2b1db4ecd65aa1ef38d47bbe5624e24ac583b5329301e5cb
4
+ data.tar.gz: aab96376972d7b0a587978484b22e41ee049ea742cba937cd57c66a9c0c42f2a
5
5
  SHA512:
6
- metadata.gz: 2bb5806ae23d58192a74c567767de193ed21f2b8f8f87cf2bef4ef9beccc62da684ce5a0b8a6a2b96afbdbd84fb13c14b27ea2738a4d70829b7d3c1a57396430
7
- data.tar.gz: 9e49efbb3bdce645013e0aaa33dec335204a0c62f48cd0ab4b5128314b97f6237b5faec6a0acf0059d8a276e26323d93ad53c4b5e7e2c36b273592b970b7dfe4
6
+ metadata.gz: 76d9aa699d0176557d8e44171018b132fc4df2c0e1f47a9784388693103799cf920c2ae437fb97f8429422a1edb5ebd43eae7b28a1f18452a5eeeac38244a981
7
+ data.tar.gz: f2f1d50099a66e71574c4019e66452dd100371cbdc7d08b144856bab8f59f04aac4e207585534f26e29a06342a7fafe5c3ea0fb0d0e2e8dd9ed1822ddbcbdfe8
data/README.md CHANGED
@@ -6,11 +6,11 @@
6
6
  [![DOI](https://zenodo.org/badge/247078205.svg)](https://zenodo.org/badge/latestdoi/247078205)
7
7
  [![Docs Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/htslib)
8
8
 
9
- Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https://github.com/samtools/htslib), a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/) in the Ruby language.
9
+ Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https://github.com/samtools/htslib), a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/), in the Ruby language.
10
10
 
11
11
  :apple: Feel free to fork it out if you can develop it!
12
12
 
13
- :bowtie: alpha stage.
13
+ :bowtie: Alpha stage.
14
14
 
15
15
  ## Requirements
16
16
 
@@ -83,6 +83,19 @@ end
83
83
  bcf.close
84
84
  ```
85
85
 
86
+ <details>
87
+ <summary><b>Faidx</b></summary>
88
+
89
+ ```ruby
90
+ fa = HTS::Faidx.open("c.fa")
91
+
92
+ fa.fetch("chr1:1-10")
93
+
94
+ fa.close
95
+ ```
96
+
97
+ </details>
98
+
86
99
  ### Low level API
87
100
 
88
101
  `HTS::LibHTS` provides native C functions.
@@ -96,11 +109,11 @@ p b[:category]
96
109
  p b[:format]
97
110
  ```
98
111
 
99
- Note: htslib makes extensive use of macro functions for speed. you cannot use C macro functions in Ruby if they are not reimplemented in ruby-htslib. Only small number of C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
112
+ Note: htslib makes extensive use of macro functions for speed. You cannot use C macro functions in Ruby if they are not reimplemented in ruby-htslib. Only small number of C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
100
113
 
101
114
  ### Need more speed?
102
115
 
103
- Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crsytal language is not as flexible as Ruby language. You can not use `eval` methods, and you must always be careful with the data types. Writing one-time scripts in Crystal or playing with REPL may not be as much fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is fast, as fast as the Rust and C languages. It will give you great power to create tools.
116
+ Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crystal language is not as flexible as Ruby language. You can not use `eval` methods, and you must always be careful with the data types. Writing one-time scripts in Crystal or playing with REPL may not be as much fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is fast, as fast as the Rust and C languages. It will give you great power to create tools.
104
117
 
105
118
  ## Documentation
106
119
 
@@ -125,8 +138,8 @@ Many macro functions are used in HTSlib. Since these macro functions cannot be c
125
138
 
126
139
  * Use the new version of Ruby to take full advantage of Ruby's potential.
127
140
  * This is possible because we have a small number of users. What a deal!
128
- * Remain compatibile with [htslib.cr](https://github.com/bio-crystal/htslib.cr).
129
- * The most difficult part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal can do this. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
141
+ * Remain compatible with [htslib.cr](https://github.com/bio-crystal/htslib.cr).
142
+ * The most challenging part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal can do this. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
130
143
 
131
144
  In the script directory, there are several tools to help implement ruby-htslib. These tools may be forked into independent repository in the future.
132
145
 
@@ -1,5 +1,11 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ # Q. Why is the file name auxi.rb and not aux.rb?
4
+ #
5
+ # A. This is for compatibility with Windows.
6
+ # In Windows, aux is a reserved word
7
+ # You cannot create a file named aux.
8
+
3
9
  module HTS
4
10
  class Bam < Hts
5
11
  # Auxiliary record data
@@ -2,7 +2,7 @@
2
2
 
3
3
  require_relative "flag"
4
4
  require_relative "cigar"
5
- require_relative "aux"
5
+ require_relative "auxi"
6
6
 
7
7
  module HTS
8
8
  class Bam < Hts
data/lib/hts/bam.rb CHANGED
@@ -13,7 +13,7 @@ module HTS
13
13
  class Bam
14
14
  include Enumerable
15
15
 
16
- attr_reader :file_name, :index_name, :mode, :header
16
+ attr_reader :file_name, :index_name, :mode, :header, :nthreads
17
17
 
18
18
  def self.open(*args, **kw)
19
19
  file = new(*args, **kw) # do not yield
@@ -28,7 +28,7 @@ module HTS
28
28
  end
29
29
 
30
30
  def initialize(file_name, mode = "r", index: nil, fai: nil, threads: nil,
31
- create_index: false)
31
+ build_index: false)
32
32
  if block_given?
33
33
  message = "HTS::Bam.new() dose not take block; Please use HTS::Bam.open() instead"
34
34
  raise message
@@ -39,6 +39,7 @@ module HTS
39
39
  @file_name = file_name
40
40
  @index_name = index
41
41
  @mode = mode
42
+ @nthreads = threads
42
43
  @hts_file = LibHTS.hts_open(@file_name, mode)
43
44
 
44
45
  raise Errno::ENOENT, "Failed to open #{@file_name}" if @hts_file.null?
@@ -53,21 +54,24 @@ module HTS
53
54
  return if @mode[0] == "w"
54
55
 
55
56
  @header = Bam::Header.new(@hts_file)
56
- create_index(index) if create_index
57
+ build_index(index) if build_index
57
58
  @idx = load_index(index)
58
59
  @start_position = tell
59
60
  super # do nothing
60
61
  end
61
62
 
62
- def create_index(index_name = nil)
63
+ def build_index(index_name = nil, min_shift: 0)
63
64
  check_closed
64
65
 
65
- warn "Create index for #{@file_name} to #{index_name}"
66
- if index
67
- LibHTS.sam_index_build2(@file_name, index_name, -1)
66
+ if index_name
67
+ warn "Create index for #{@file_name} to #{index_name}"
68
68
  else
69
- LibHTS.sam_index_build(@file_name, -1)
69
+ warn "Create index for #{@file_name}"
70
70
  end
71
+ r = LibHTS.sam_index_build3(@file_name, index_name, min_shift, @nthreads)
72
+ raise "Failed to build index for #{@file_name}" if r < 0
73
+
74
+ self
71
75
  end
72
76
 
73
77
  def load_index(index_name = nil)
@@ -139,23 +143,51 @@ module HTS
139
143
  self
140
144
  end
141
145
 
142
- # query [WIP]
143
- def query(region)
146
+ def query(region, copy: false, &block)
147
+ if copy
148
+ query_copy(region, &block)
149
+ else
150
+ query_reuse(region, &block)
151
+ end
152
+ end
153
+
154
+ private def query_copy(region)
144
155
  check_closed
145
156
  raise "Index file is required to call the query method." unless index_loaded?
157
+ return to_enum(__method__, region) unless block_given?
146
158
 
147
159
  qiter = LibHTS.sam_itr_querys(@idx, header, region)
160
+
148
161
  begin
149
- bam1 = LibHTS.bam_init1
150
- slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
151
- while slen > 0
152
- yield Record.new(bam1, header)
162
+ loop do
153
163
  bam1 = LibHTS.bam_init1
154
164
  slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
165
+ break if slen == -1
166
+ raise if slen < -1
167
+
168
+ yield Record.new(bam1, header)
155
169
  end
156
170
  ensure
157
171
  LibHTS.hts_itr_destroy(qiter)
158
172
  end
173
+ self
174
+ end
175
+
176
+ private def query_reuse(region)
177
+ check_closed
178
+ raise "Index file is required to call the query method." unless index_loaded?
179
+ return to_enum(__method__, region) unless block_given?
180
+
181
+ qiter = LibHTS.sam_itr_querys(@idx, header, region)
182
+
183
+ bam1 = LibHTS.bam_init1
184
+ record = Record.new(bam1, header)
185
+ begin
186
+ yield record while LibHTS.sam_itr_next(@hts_file, qiter, bam1) > 0
187
+ ensure
188
+ LibHTS.hts_itr_destroy(qiter)
189
+ end
190
+ self
159
191
  end
160
192
 
161
193
  # @!macro [attach] define_getter
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module HTS
4
+ class Bcf < Hts
5
+ class HeaderRecord
6
+ def initialize
7
+ @bcf_hrec
8
+ end
9
+ end
10
+ end
11
+ end
@@ -137,7 +137,7 @@ module HTS
137
137
 
138
138
  private
139
139
 
140
- def initialize_copy(orig)\
140
+ def initialize_copy(orig)
141
141
  @header = orig.header
142
142
  @bcf1 = LibHTS.bcf_dup(orig.struct)
143
143
  end
data/lib/hts/bcf.rb CHANGED
@@ -13,7 +13,7 @@ module HTS
13
13
  class Bcf < Hts
14
14
  include Enumerable
15
15
 
16
- attr_reader :file_name, :index_name, :mode, :header
16
+ attr_reader :file_name, :index_name, :mode, :header, :nthreads
17
17
 
18
18
  def self.open(*args, **kw)
19
19
  file = new(*args, **kw) # do not yield
@@ -28,7 +28,7 @@ module HTS
28
28
  end
29
29
 
30
30
  def initialize(file_name, mode = "r", index: nil, threads: nil,
31
- create_index: false)
31
+ build_index: false)
32
32
  if block_given?
33
33
  message = "HTS::Bcf.new() dose not take block; Please use HTS::Bcf.open() instead"
34
34
  raise message
@@ -39,6 +39,7 @@ module HTS
39
39
  @file_name = file_name
40
40
  @index_name = index
41
41
  @mode = mode
42
+ @nthreads = threads
42
43
  @hts_file = LibHTS.hts_open(@file_name, mode)
43
44
 
44
45
  raise Errno::ENOENT, "Failed to open #{@file_name}" if @hts_file.null?
@@ -48,21 +49,24 @@ module HTS
48
49
  return if @mode[0] == "w"
49
50
 
50
51
  @header = Bcf::Header.new(@hts_file)
51
- create_index(index) if create_index
52
+ build_index(index) if build_index
52
53
  @idx = load_index(index)
53
54
  @start_position = tell
54
55
  super # do nothing
55
56
  end
56
57
 
57
- def create_index(index_name = nil)
58
+ def build_index(index_name = nil, min_shift: 14)
58
59
  check_closed
59
60
 
60
- warn "Create index for #{@file_name} to #{index_name}"
61
61
  if index_name
62
- LibHTS.bcf_index_build2(@file_name, index_name, -1)
62
+ warn "Create index for #{@file_name} to #{index_name}"
63
63
  else
64
- LibHTS.bcf_index_build(@file_name, -1)
64
+ warn "Create index for #{@file_name}"
65
65
  end
66
+ r = LibHTS.bcf_index_build3(@file_name, index_name, min_shift, @nthreads)
67
+ raise "Failed to build index for #{@file_name}" if r < 0
68
+
69
+ self
66
70
  end
67
71
 
68
72
  def load_index(index_name = nil)
@@ -132,8 +136,7 @@ module HTS
132
136
 
133
137
  private def each_record_reuse
134
138
  check_closed
135
- # Each does not always start at the beginning of the file.
136
- # This is the common behavior of IO objects in Ruby.
139
+
137
140
  return to_enum(__method__) unless block_given?
138
141
 
139
142
  bcf1 = LibHTS.bcf_init
@@ -142,6 +145,76 @@ module HTS
142
145
  self
143
146
  end
144
147
 
148
+ def query(...)
149
+ querys(...) # Fixme
150
+ end
151
+
152
+ # def queryi
153
+ # end
154
+
155
+ def querys(region, copy: false, &block)
156
+ if copy
157
+ querys_copy(region, &block)
158
+ else
159
+ querys_reuse(region, &block)
160
+ end
161
+ end
162
+
163
+ # private def queryi_copy
164
+ # end
165
+
166
+ # private def queryi_reuse
167
+ # end
168
+
169
+ private def querys_copy(region)
170
+ check_closed
171
+
172
+ raise "query is only available for BCF files" unless file_format == "bcf"
173
+ raise "Index file is required to call the query method." unless index_loaded?
174
+ return to_enum(__method__, region) unless block_given?
175
+
176
+ qitr = LibHTS.bcf_itr_querys(@idx, header, region)
177
+
178
+ begin
179
+ loop do
180
+ bcf1 = LibHTS.bcf_init
181
+ slen = LibHTS.hts_itr_next(@hts_file[:fp][:bgzf], qitr, bcf1, ::FFI::Pointer::NULL)
182
+ break if slen == -1
183
+ raise if slen < -1
184
+
185
+ yield Record.new(bcf1, header)
186
+ end
187
+ ensure
188
+ LibHTS.bcf_itr_destroy(qitr)
189
+ end
190
+ self
191
+ end
192
+
193
+ private def querys_reuse(region)
194
+ check_closed
195
+
196
+ raise "query is only available for BCF files" unless file_format == "bcf"
197
+ raise "Index file is required to call the query method." unless index_loaded?
198
+ return to_enum(__method__, region) unless block_given?
199
+
200
+ qitr = LibHTS.bcf_itr_querys(@idx, header, region)
201
+
202
+ bcf1 = LibHTS.bcf_init
203
+ record = Record.new(bcf1, header)
204
+ begin
205
+ loop do
206
+ slen = LibHTS.hts_itr_next(@hts_file[:fp][:bgzf], qitr, bcf1, ::FFI::Pointer::NULL)
207
+ break if slen == -1
208
+ raise if slen < -1
209
+
210
+ yield record
211
+ end
212
+ ensure
213
+ LibHTS.bcf_itr_destroy(qitr)
214
+ end
215
+ self
216
+ end
217
+
145
218
  # @!macro [attach] define_getter
146
219
  # @method $1
147
220
  # Get $1 array
@@ -197,7 +270,7 @@ module HTS
197
270
 
198
271
  def each_info(key)
199
272
  check_closed
200
- return to_enum(__method__) unless block
273
+ return to_enum(__method__, key) unless block
201
274
 
202
275
  each do |r|
203
276
  yield r.info(key)
@@ -206,7 +279,7 @@ module HTS
206
279
 
207
280
  def each_format(key)
208
281
  check_closed
209
- return to_enum(__method__) unless block
282
+ return to_enum(__method__, key) unless block
210
283
 
211
284
  each do |r|
212
285
  yield r.format(key)
data/lib/hts/faidx.rb CHANGED
@@ -38,6 +38,11 @@ module HTS
38
38
  LibHTS.fai_destroy(@fai)
39
39
  end
40
40
 
41
+ # FIXME: This doesn't seem to work as expected
42
+ # def closed?
43
+ # @fai.null?
44
+ # end
45
+
41
46
  # the number of sequences in the index.
42
47
  def length
43
48
  LibHTS.faidx_nseq(@fai)
@@ -46,10 +51,7 @@ module HTS
46
51
 
47
52
  # return the length of the requested chromosome.
48
53
  def chrom_size(chrom)
49
- unless chrom.is_a?(String) || chrom.is_a?(Symbol)
50
- # FIXME
51
- raise ArgumentError, "Expect chrom to be String or Symbol"
52
- end
54
+ raise ArgumentError, "Expect chrom to be String or Symbol" unless chrom.is_a?(String) || chrom.is_a?(Symbol)
53
55
 
54
56
  chrom = chrom.to_s
55
57
  result = LibHTS.faidx_seq_len(@fai, chrom)
@@ -57,12 +59,41 @@ module HTS
57
59
  end
58
60
  alias chrom_length chrom_size
59
61
 
60
- # FIXME: naming and syntax
61
- # def cget; end
62
+ # return the length of the requested chromosome.
63
+ def chrom_names
64
+ Array.new(length) { |i| LibHTS.faidx_iseq(@fai, i) }
65
+ end
66
+
67
+ # @overload fetch(name)
68
+ # Fetch the sequence as a String.
69
+ # @param name [String] chr1:0-10
70
+ # @overload fetch(name, start, stop)
71
+ # Fetch the sequence as a String.
72
+ # @param name [String] the name of the chromosome
73
+ # @param start [Integer] the start position of the sequence (0-based)
74
+ # @param stop [Integer] the end position of the sequence (0-based)
75
+ # @return [String] the sequence
76
+
77
+ def seq(name, start = nil, stop = nil)
78
+ name = name.to_s
79
+ rlen = FFI::MemoryPointer.new(:int)
62
80
 
63
- # FIXME: naming and syntax
64
- # def get; end
81
+ if start.nil? && stop.nil?
82
+ result = LibHTS.fai_fetch(@fai, name, rlen)
83
+ else
84
+ start < 0 && raise(ArgumentError, "Expect start to be >= 0")
85
+ stop < 0 && raise(ArgumentError, "Expect stop to be >= 0")
86
+ start > stop && raise(ArgumentError, "Expect start to be <= stop")
65
87
 
66
- # __iter__
88
+ result = LibHTS.faidx_fetch_seq(@fai, name, start, stop, rlen)
89
+ end
90
+
91
+ case rlen.read_int
92
+ when -2 then raise "Invalid chromosome name: #{name}"
93
+ when -1 then raise "Error fetching sequence: #{name}:#{start}-#{stop}"
94
+ end
95
+
96
+ result
97
+ end
67
98
  end
68
99
  end
data/lib/hts/hts.rb CHANGED
@@ -69,13 +69,18 @@ module HTS
69
69
  @hts_file.nil? || @hts_file.null?
70
70
  end
71
71
 
72
- def set_threads(n)
73
- raise TypeError unless n.is_a(Integer)
74
-
75
- if n > 0
76
- r = LibHTS.hts_set_threads(@hts_file, n)
77
- raise "Failed to set number of threads: #{threads}" if r < 0
72
+ def set_threads(n = nil)
73
+ if n.nil?
74
+ require "etc"
75
+ n = [Etc.nprocessors - 1, 1].max
78
76
  end
77
+ raise TypeError unless n.is_a?(Integer)
78
+ raise ArgumentError, "Number of threads must be positive" if n < 1
79
+
80
+ r = LibHTS.hts_set_threads(@hts_file, n)
81
+ raise "Failed to set number of threads: #{threads}" if r < 0
82
+
83
+ @nthreads = n
79
84
  self
80
85
  end
81
86
 
@@ -195,7 +195,7 @@ module HTS
195
195
  :reg, :string,
196
196
  :intervals, :pointer, # hts_pair_pos_t
197
197
  :tid, :int,
198
- :count, :uint32_t,
198
+ :count, :uint32,
199
199
  :min_beg, :hts_pos_t,
200
200
  :max_end, :hts_pos_t
201
201
  end
@@ -352,7 +352,7 @@ module HTS
352
352
  :qpos, :int32_t,
353
353
  :indel, :int,
354
354
  :level, :int,
355
- :_flags, :uint32_t, # bit_fields
355
+ :_flags, :uint32, # bit_fields
356
356
  :cd, BamPileupCd,
357
357
  :cigar_ind, :int
358
358
 
@@ -403,6 +403,7 @@ module HTS
403
403
  :n, :int
404
404
  end
405
405
 
406
+ # Complete textual representation of a header line
406
407
  class BcfHrec < FFI::Struct
407
408
  layout \
408
409
  :type, :int,
@@ -413,21 +414,6 @@ module HTS
413
414
  :vals, :pointer
414
415
  end
415
416
 
416
- class BcfFmt < FFI::BitStruct
417
- layout \
418
- :id, :int,
419
- :n, :int,
420
- :size, :int,
421
- :type, :int,
422
- :p, :pointer, # uint8_t
423
- :p_len, :uint32,
424
- :_p_off_free, :uint32 # bit_fields
425
-
426
- bit_fields :_p_off_free,
427
- :p_off, 31,
428
- :p_free, 1
429
- end
430
-
431
417
  class BcfInfo < FFI::BitStruct
432
418
  layout \
433
419
  :key, :int,
@@ -477,6 +463,21 @@ module HTS
477
463
  :m, [:int, 3]
478
464
  end
479
465
 
466
+ class BcfFmt < FFI::BitStruct
467
+ layout \
468
+ :id, :int,
469
+ :n, :int,
470
+ :size, :int,
471
+ :type, :int,
472
+ :p, :pointer, # uint8_t
473
+ :p_len, :uint32,
474
+ :_p_off_free, :uint32 # bit_fields
475
+
476
+ bit_fields :_p_off_free,
477
+ :p_off, 31,
478
+ :p_free, 1
479
+ end
480
+
480
481
  class BcfDec < FFI::Struct
481
482
  layout \
482
483
  :m_fmt, :int,
@@ -505,8 +506,8 @@ module HTS
505
506
  :rlen, :hts_pos_t,
506
507
  :rid, :int32_t,
507
508
  :qual, :float,
508
- :_n_info_allele, :uint32_t,
509
- :_n_fmt_sample, :uint32_t,
509
+ :_n_info_allele, :uint32,
510
+ :_n_fmt_sample, :uint32,
510
511
  :shared, KString,
511
512
  :indiv, KString,
512
513
  :d, BcfDec,