htslib 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 87b06f64c496150db76d689f2cd06ba61e8ff424511be781bf17abbdf5d92d6c
4
- data.tar.gz: 10126e01757aafb8fbb18c4f5d2f58541511226ddf922ffb253d29ccaf6c2027
3
+ metadata.gz: d26f2d2af0e353b349235ec153e4127288ecd602362254dafd91e015f61dc0a2
4
+ data.tar.gz: 567d03329907f3f53424511f74a4c9fd6f745ef7c2b0c6c14e8dff6da957dbdb
5
5
  SHA512:
6
- metadata.gz: 2bb5806ae23d58192a74c567767de193ed21f2b8f8f87cf2bef4ef9beccc62da684ce5a0b8a6a2b96afbdbd84fb13c14b27ea2738a4d70829b7d3c1a57396430
7
- data.tar.gz: 9e49efbb3bdce645013e0aaa33dec335204a0c62f48cd0ab4b5128314b97f6237b5faec6a0acf0059d8a276e26323d93ad53c4b5e7e2c36b273592b970b7dfe4
6
+ metadata.gz: 4c51ee0ed3c259da9e5e7353ae92bcbc7faad622453c8875ff749e8bb0a861c6c28fa09df7dac79ca37d15a0e6f83afb200d4065028e7da63910eab8f9c6e7a4
7
+ data.tar.gz: c62b5c7e95ba91a9f465b6822170c0faa05cf9626af8c63c486ecb55edb1af7f3d841cfa8b2d5adcd9cd1ac6df75088b85ea3145ae78a645eb2b7e3390ca11bd
data/README.md CHANGED
@@ -6,11 +6,11 @@
6
6
  [![DOI](https://zenodo.org/badge/247078205.svg)](https://zenodo.org/badge/latestdoi/247078205)
7
7
  [![Docs Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/htslib)
8
8
 
9
- Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https://github.com/samtools/htslib), a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/) in the Ruby language.
9
+ Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https://github.com/samtools/htslib), a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/), in the Ruby language.
10
10
 
11
11
  :apple: Feel free to fork it out if you can develop it!
12
12
 
13
- :bowtie: alpha stage.
13
+ :bowtie: Alpha stage.
14
14
 
15
15
  ## Requirements
16
16
 
@@ -83,6 +83,19 @@ end
83
83
  bcf.close
84
84
  ```
85
85
 
86
+ <details>
87
+ <summary><b>Faidx</b></summary>
88
+
89
+ ```ruby
90
+ fa = HTS::Faidx.open("c.fa")
91
+
92
+ fa.fetch("chr1:1-10")
93
+
94
+ fa.close
95
+ ```
96
+
97
+ </details>
98
+
86
99
  ### Low level API
87
100
 
88
101
  `HTS::LibHTS` provides native C functions.
@@ -96,11 +109,11 @@ p b[:category]
96
109
  p b[:format]
97
110
  ```
98
111
 
99
- Note: htslib makes extensive use of macro functions for speed. you cannot use C macro functions in Ruby if they are not reimplemented in ruby-htslib. Only small number of C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
112
+ Note: htslib makes extensive use of macro functions for speed. You cannot use C macro functions in Ruby if they are not reimplemented in ruby-htslib. Only small number of C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
100
113
 
101
114
  ### Need more speed?
102
115
 
103
- Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crsytal language is not as flexible as Ruby language. You can not use `eval` methods, and you must always be careful with the data types. Writing one-time scripts in Crystal or playing with REPL may not be as much fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is fast, as fast as the Rust and C languages. It will give you great power to create tools.
116
+ Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crystal language is not as flexible as Ruby language. You can not use `eval` methods, and you must always be careful with the data types. Writing one-time scripts in Crystal or playing with REPL may not be as much fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is fast, as fast as the Rust and C languages. It will give you great power to create tools.
104
117
 
105
118
  ## Documentation
106
119
 
@@ -125,8 +138,8 @@ Many macro functions are used in HTSlib. Since these macro functions cannot be c
125
138
 
126
139
  * Use the new version of Ruby to take full advantage of Ruby's potential.
127
140
  * This is possible because we have a small number of users. What a deal!
128
- * Remain compatibile with [htslib.cr](https://github.com/bio-crystal/htslib.cr).
129
- * The most difficult part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal can do this. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
141
+ * Remain compatible with [htslib.cr](https://github.com/bio-crystal/htslib.cr).
142
+ * The most challenging part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal can do this. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
130
143
 
131
144
  In the script directory, there are several tools to help implement ruby-htslib. These tools may be forked into independent repository in the future.
132
145
 
data/lib/hts/bam.rb CHANGED
@@ -139,23 +139,51 @@ module HTS
139
139
  self
140
140
  end
141
141
 
142
- # query [WIP]
143
- def query(region)
142
+ def query(region, copy: false, &block)
143
+ if copy
144
+ query_copy(region, &block)
145
+ else
146
+ query_reuse(region, &block)
147
+ end
148
+ end
149
+
150
+ private def query_copy(region)
144
151
  check_closed
145
152
  raise "Index file is required to call the query method." unless index_loaded?
153
+ return to_enum(__method__, region) unless block_given?
146
154
 
147
155
  qiter = LibHTS.sam_itr_querys(@idx, header, region)
156
+
148
157
  begin
149
- bam1 = LibHTS.bam_init1
150
- slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
151
- while slen > 0
152
- yield Record.new(bam1, header)
158
+ loop do
153
159
  bam1 = LibHTS.bam_init1
154
160
  slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
161
+ break if slen == -1
162
+ raise if slen < -1
163
+
164
+ yield Record.new(bam1, header)
155
165
  end
156
166
  ensure
157
167
  LibHTS.hts_itr_destroy(qiter)
158
168
  end
169
+ self
170
+ end
171
+
172
+ private def query_reuse(region)
173
+ check_closed
174
+ raise "Index file is required to call the query method." unless index_loaded?
175
+ return to_enum(__method__, region) unless block_given?
176
+
177
+ qiter = LibHTS.sam_itr_querys(@idx, header, region)
178
+
179
+ bam1 = LibHTS.bam_init1
180
+ record = Record.new(bam1, header)
181
+ begin
182
+ yield record while LibHTS.sam_itr_next(@hts_file, qiter, bam1) > 0
183
+ ensure
184
+ LibHTS.hts_itr_destroy(qiter)
185
+ end
186
+ self
159
187
  end
160
188
 
161
189
  # @!macro [attach] define_getter
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module HTS
4
+ class Bcf < Hts
5
+ class HeaderRecord
6
+ def initialize
7
+ @bcf_hrec
8
+ end
9
+ end
10
+ end
11
+ end
data/lib/hts/bcf.rb CHANGED
@@ -132,8 +132,7 @@ module HTS
132
132
 
133
133
  private def each_record_reuse
134
134
  check_closed
135
- # Each does not always start at the beginning of the file.
136
- # This is the common behavior of IO objects in Ruby.
135
+
137
136
  return to_enum(__method__) unless block_given?
138
137
 
139
138
  bcf1 = LibHTS.bcf_init
@@ -142,6 +141,76 @@ module HTS
142
141
  self
143
142
  end
144
143
 
144
+ def query(...)
145
+ querys(...) # Fixme
146
+ end
147
+
148
+ # def queryi
149
+ # end
150
+
151
+ def querys(region, copy: false, &block)
152
+ if copy
153
+ querys_copy(region, &block)
154
+ else
155
+ querys_reuse(region, &block)
156
+ end
157
+ end
158
+
159
+ # private def queryi_copy
160
+ # end
161
+
162
+ # private def queryi_reuse
163
+ # end
164
+
165
+ private def querys_copy(region)
166
+ check_closed
167
+
168
+ raise "query is only available for BCF files" unless file_format == "bcf"
169
+ raise "Index file is required to call the query method." unless index_loaded?
170
+ return to_enum(__method__, region) unless block_given?
171
+
172
+ qitr = LibHTS.bcf_itr_querys(@idx, header, region)
173
+
174
+ begin
175
+ loop do
176
+ bcf1 = LibHTS.bcf_init
177
+ slen = LibHTS.hts_itr_next(@hts_file[:fp][:bgzf], qitr, bcf1, ::FFI::Pointer::NULL)
178
+ break if slen == -1
179
+ raise if slen < -1
180
+
181
+ yield Record.new(bcf1, header)
182
+ end
183
+ ensure
184
+ LibHTS.bcf_itr_destroy(qitr)
185
+ end
186
+ self
187
+ end
188
+
189
+ private def querys_reuse(region)
190
+ check_closed
191
+
192
+ raise "query is only available for BCF files" unless file_format == "bcf"
193
+ raise "Index file is required to call the query method." unless index_loaded?
194
+ return to_enum(__method__, region) unless block_given?
195
+
196
+ qitr = LibHTS.bcf_itr_querys(@idx, header, region)
197
+
198
+ bcf1 = LibHTS.bcf_init
199
+ record = Record.new(bcf1, header)
200
+ begin
201
+ loop do
202
+ slen = LibHTS.hts_itr_next(@hts_file[:fp][:bgzf], qitr, bcf1, ::FFI::Pointer::NULL)
203
+ break if slen == -1
204
+ raise if slen < -1
205
+
206
+ yield record
207
+ end
208
+ ensure
209
+ LibHTS.bcf_itr_destroy(qitr)
210
+ end
211
+ self
212
+ end
213
+
145
214
  # @!macro [attach] define_getter
146
215
  # @method $1
147
216
  # Get $1 array
@@ -197,7 +266,7 @@ module HTS
197
266
 
198
267
  def each_info(key)
199
268
  check_closed
200
- return to_enum(__method__) unless block
269
+ return to_enum(__method__, key) unless block
201
270
 
202
271
  each do |r|
203
272
  yield r.info(key)
@@ -206,7 +275,7 @@ module HTS
206
275
 
207
276
  def each_format(key)
208
277
  check_closed
209
- return to_enum(__method__) unless block
278
+ return to_enum(__method__, key) unless block
210
279
 
211
280
  each do |r|
212
281
  yield r.format(key)
data/lib/hts/faidx.rb CHANGED
@@ -38,6 +38,11 @@ module HTS
38
38
  LibHTS.fai_destroy(@fai)
39
39
  end
40
40
 
41
+ # FIXME: This doesn't seem to work as expected
42
+ # def closed?
43
+ # @fai.null?
44
+ # end
45
+
41
46
  # the number of sequences in the index.
42
47
  def length
43
48
  LibHTS.faidx_nseq(@fai)
@@ -46,10 +51,7 @@ module HTS
46
51
 
47
52
  # return the length of the requested chromosome.
48
53
  def chrom_size(chrom)
49
- unless chrom.is_a?(String) || chrom.is_a?(Symbol)
50
- # FIXME
51
- raise ArgumentError, "Expect chrom to be String or Symbol"
52
- end
54
+ raise ArgumentError, "Expect chrom to be String or Symbol" unless chrom.is_a?(String) || chrom.is_a?(Symbol)
53
55
 
54
56
  chrom = chrom.to_s
55
57
  result = LibHTS.faidx_seq_len(@fai, chrom)
@@ -57,12 +59,41 @@ module HTS
57
59
  end
58
60
  alias chrom_length chrom_size
59
61
 
60
- # FIXME: naming and syntax
61
- # def cget; end
62
+ # return the length of the requested chromosome.
63
+ def chrom_names
64
+ Array.new(length) { |i| LibHTS.faidx_iseq(@fai, i) }
65
+ end
66
+
67
+ # @overload fetch(name)
68
+ # Fetch the sequence as a String.
69
+ # @param name [String] chr1:0-10
70
+ # @overload fetch(name, start, stop)
71
+ # Fetch the sequence as a String.
72
+ # @param name [String] the name of the chromosome
73
+ # @param start [Integer] the start position of the sequence (0-based)
74
+ # @param stop [Integer] the end position of the sequence (0-based)
75
+ # @return [String] the sequence
76
+
77
+ def seq(name, start = nil, stop = nil)
78
+ name = name.to_s
79
+ rlen = FFI::MemoryPointer.new(:int)
62
80
 
63
- # FIXME: naming and syntax
64
- # def get; end
81
+ if start.nil? && stop.nil?
82
+ result = LibHTS.fai_fetch(@fai, name, rlen)
83
+ else
84
+ start < 0 && raise(ArgumentError, "Expect start to be >= 0")
85
+ stop < 0 && raise(ArgumentError, "Expect stop to be >= 0")
86
+ start > stop && raise(ArgumentError, "Expect start to be <= stop")
65
87
 
66
- # __iter__
88
+ result = LibHTS.faidx_fetch_seq(@fai, name, start, stop, rlen)
89
+ end
90
+
91
+ case rlen.read_int
92
+ when -2 then raise "Invalid chromosome name: #{name}"
93
+ when -1 then raise "Error fetching sequence: #{name}:#{start}-#{stop}"
94
+ end
95
+
96
+ result
97
+ end
67
98
  end
68
99
  end
@@ -403,6 +403,7 @@ module HTS
403
403
  :n, :int
404
404
  end
405
405
 
406
+ # Complete textual representation of a header line
406
407
  class BcfHrec < FFI::Struct
407
408
  layout \
408
409
  :type, :int,
@@ -413,21 +414,6 @@ module HTS
413
414
  :vals, :pointer
414
415
  end
415
416
 
416
- class BcfFmt < FFI::BitStruct
417
- layout \
418
- :id, :int,
419
- :n, :int,
420
- :size, :int,
421
- :type, :int,
422
- :p, :pointer, # uint8_t
423
- :p_len, :uint32,
424
- :_p_off_free, :uint32 # bit_fields
425
-
426
- bit_fields :_p_off_free,
427
- :p_off, 31,
428
- :p_free, 1
429
- end
430
-
431
417
  class BcfInfo < FFI::BitStruct
432
418
  layout \
433
419
  :key, :int,
@@ -477,6 +463,21 @@ module HTS
477
463
  :m, [:int, 3]
478
464
  end
479
465
 
466
+ class BcfFmt < FFI::BitStruct
467
+ layout \
468
+ :id, :int,
469
+ :n, :int,
470
+ :size, :int,
471
+ :type, :int,
472
+ :p, :pointer, # uint8_t
473
+ :p_len, :uint32,
474
+ :_p_off_free, :uint32 # bit_fields
475
+
476
+ bit_fields :_p_off_free,
477
+ :p_off, 31,
478
+ :p_free, 1
479
+ end
480
+
480
481
  class BcfDec < FFI::Struct
481
482
  layout \
482
483
  :m_fmt, :int,
@@ -30,7 +30,7 @@ module HTS
30
30
  end
31
31
 
32
32
  def bam_cigar_opchr(c)
33
- ("#{BAM_CIGAR_STR}??????")[bam_cigar_op(c)]
33
+ "#{BAM_CIGAR_STR}??????"[bam_cigar_op(c)]
34
34
  end
35
35
 
36
36
  def bam_cigar_gen(l, o)
@@ -91,10 +91,49 @@ module HTS
91
91
  end
92
92
 
93
93
  def bam_seqi(s, i)
94
- s[(i) >> 1].read_uint8 >> ((~i & 1) << 2) & 0xf
94
+ s[i >> 1].read_uint8 >> ((~i & 1) << 2) & 0xf
95
95
  end
96
96
 
97
- # def bam_set_seqi(s, i, b)
97
+ def bam_set_seqi(s, i, b)
98
+ s[i >> 1] = (s[i >> 1] & (0xf0 >> ((~i & 1) << 2))) | ((b) << ((~i & 1) << 2))
99
+ end
100
+
101
+ def sam_hdr_find_hd(h, ks)
102
+ sam_hdr_find_line_id(h, "HD", nil, nil, ks)
103
+ end
104
+
105
+ def sam_hdr_find_tag_hd(h, key, ks)
106
+ sam_hdr_find_tag_id(h, "HD", nil, nil, key, ks)
107
+ end
108
+
109
+ def sam_hdr_update_hd(h, *args)
110
+ sam_hdr_update_line(h, "HD", nil, nil, *args, nil)
111
+ end
112
+
113
+ def sam_hdr_remove_tag_hd(h, key)
114
+ sam_hdr_remove_tag_id(h, "HD", nil, nil, key)
115
+ end
116
+
117
+ BAM_USER_OWNS_STRUCT = 1
118
+ BAM_USER_OWNS_DATA = 2
119
+
120
+ alias bam_itr_destroy hts_itr_destroy
121
+ alias bam_itr_queryi sam_itr_queryi
122
+ alias bam_itr_querys sam_itr_querys
123
+ alias bam_itr_next sam_itr_next
124
+
125
+ def bam_index_load(fn)
126
+ hts_idx_load(fn, HTS_FMT_BAI)
127
+ end
128
+
129
+ alias bam_index_build sam_index_build
130
+
131
+ alias sam_itr_destroy hts_itr_destroy
132
+
133
+ alias sam_open hts_open
134
+ alias sam_open_format hts_open_format
135
+ alias sam_flush hts_flush
136
+ alias sam_close hts_close
98
137
  end
99
138
  end
100
139
  end
@@ -139,15 +139,15 @@ module HTS
139
139
  end
140
140
 
141
141
  def bcf_gt_is_missing(val)
142
- ((val) >> 1 ? 0 : 1)
142
+ (val >> 1 ? 0 : 1)
143
143
  end
144
144
 
145
145
  def bcf_gt_is_phased(idx)
146
- ((idx) & 1)
146
+ (idx & 1)
147
147
  end
148
148
 
149
149
  def bcf_gt_allele(val)
150
- (((val) >> 1) - 1)
150
+ ((val >> 1) - 1)
151
151
  end
152
152
 
153
153
  def bcf_alleles2gt(a, b)
@@ -233,6 +233,32 @@ module HTS
233
233
  LibHTS::BcfIdpair.size * int_id # offset
234
234
  )[:val][:info][type] & 0xf
235
235
  end
236
+
237
+ # def bcf_hdr_idinfo_exists
238
+
239
+ # def bcf_hdr_id2hrec
240
+
241
+ alias bcf_itr_destroy hts_itr_destroy
242
+
243
+ def bcf_itr_queryi(idx, tid, beg, _end)
244
+ hts_itr_query(idx, tid, beg, _end, @@bcf_readrec)
245
+ end
246
+
247
+ @@bcf_hdr_name2id = proc do |hdr, id|
248
+ LibHTS.bcf_hdr_name2id(hdr, id)
249
+ end
250
+
251
+ def bcf_itr_querys(idx, hdr, s)
252
+ hts_itr_querys(idx, s, @@bcf_hdr_name2id, hdr, @@hts_itr_query, @@bcf_readrec)
253
+ end
254
+
255
+ def bcf_index_load(fn)
256
+ hts_idx_load(fn, HTS_FMT_CSI)
257
+ end
258
+
259
+ def bcf_index_seqnames(idx, hdr, nptr)
260
+ hts_idx_seqnames(idx, nptr, @@bcf_hdr_id2name, hdr)
261
+ end
236
262
  end
237
263
  end
238
264
  end
data/lib/hts/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module HTS
4
- VERSION = "0.2.0"
4
+ VERSION = "0.2.1"
5
5
  end
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: htslib
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - kojix2
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-05-29 00:00:00.000000000 Z
11
+ date: 2022-08-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -122,7 +122,7 @@ dependencies:
122
122
  - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
- description:
125
+ description:
126
126
  email:
127
127
  - 2xijok@gmail.com
128
128
  executables: []
@@ -140,6 +140,7 @@ files:
140
140
  - lib/hts/bcf.rb
141
141
  - lib/hts/bcf/format.rb
142
142
  - lib/hts/bcf/header.rb
143
+ - lib/hts/bcf/header_record.rb
143
144
  - lib/hts/bcf/info.rb
144
145
  - lib/hts/bcf/record.rb
145
146
  - lib/hts/faidx.rb
@@ -164,11 +165,12 @@ files:
164
165
  - lib/hts/tbx.rb
165
166
  - lib/hts/version.rb
166
167
  - lib/htslib.rb
168
+ - vendor/libhts.dylib
167
169
  homepage: https://github.com/kojix2/ruby-htslib
168
170
  licenses:
169
171
  - MIT
170
172
  metadata: {}
171
- post_install_message:
173
+ post_install_message:
172
174
  rdoc_options: []
173
175
  require_paths:
174
176
  - lib
@@ -184,7 +186,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
184
186
  version: '0'
185
187
  requirements: []
186
188
  rubygems_version: 3.3.7
187
- signing_key:
189
+ signing_key:
188
190
  specification_version: 4
189
191
  summary: HTSlib bindings for Ruby
190
192
  test_files: []