htslib 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c90062954caa8dc2155f77ff3d03534d933ba81148f9bdb7c8dc173f73efca64
4
- data.tar.gz: fb24ced637a8b9b897ecd9c5afed1c6f75902995e48c25daa849baa06611517b
3
+ metadata.gz: 87b06f64c496150db76d689f2cd06ba61e8ff424511be781bf17abbdf5d92d6c
4
+ data.tar.gz: 10126e01757aafb8fbb18c4f5d2f58541511226ddf922ffb253d29ccaf6c2027
5
5
  SHA512:
6
- metadata.gz: fed4e6689d48493416ebd409401df284fedaddb567d3558196b3ba3c253376dd1c4bea8db536fdb6ff8bbebe85512850d7d5c17605544ce26a09e814e7172eb5
7
- data.tar.gz: b9d596e6445671254568b4bd3cc5b694a1011914ae67b1afc11aaa66a195f49b7371d001f9e10cc204a4c20e501cd8eb63092d63b9b392c701571a12843b5c91
6
+ metadata.gz: 2bb5806ae23d58192a74c567767de193ed21f2b8f8f87cf2bef4ef9beccc62da684ce5a0b8a6a2b96afbdbd84fb13c14b27ea2738a4d70829b7d3c1a57396430
7
+ data.tar.gz: 9e49efbb3bdce645013e0aaa33dec335204a0c62f48cd0ab4b5128314b97f6237b5faec6a0acf0059d8a276e26323d93ad53c4b5e7e2c36b273592b970b7dfe4
data/README.md CHANGED
@@ -14,7 +14,7 @@ Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https:
14
14
 
15
15
  ## Requirements
16
16
 
17
- * [Ruby](https://github.com/ruby/ruby) 2.7 or above.
17
+ * [Ruby](https://github.com/ruby/ruby) 3.1 or above.
18
18
  * [HTSlib](https://github.com/samtools/htslib)
19
19
  * Ubuntu : `apt install libhts-dev`
20
20
  * macOS : `brew install htslib`
@@ -27,7 +27,7 @@ gem install htslib
27
27
  ```
28
28
 
29
29
  If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
30
- will automatically detect the location of the shared library.
30
+ will automatically detect the location of the shared library. If pkg-config does not work well, set `PKG_CONFIG_PATH`.
31
31
  Alternatively, you can specify the directory of the shared library by setting the environment variable `HTSLIBDIR`.
32
32
 
33
33
  ```sh
@@ -85,7 +85,7 @@ bcf.close
85
85
 
86
86
  ### Low level API
87
87
 
88
- `HTS::LibHTS` provides native C functions.
88
+ `HTS::LibHTS` provides native C functions.
89
89
 
90
90
  ```ruby
91
91
  require 'htslib'
@@ -100,7 +100,7 @@ Note: htslib makes extensive use of macro functions for speed. you cannot use C
100
100
 
101
101
  ### Need more speed?
102
102
 
103
- Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crsytal language is not as flexible as Ruby language. You can not use eval methods, and you must always be aware of the types. It is not very suitable for writing one-time scripts or experimenting with different code. However, If you have already written code in ruby-htslib, have a clear idea of the manipulations you want to do, and need to execute them many times, then by all means try to implement the command line tool using htslib.cr. The Crystal language is very fast and can perform almost as well as the Rust and C languages.
103
+ Try Crystal. [htslib.cr](https://github.com/bio-crystal/htslib.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crsytal language is not as flexible as Ruby language. You can not use `eval` methods, and you must always be careful with the data types. Writing one-time scripts in Crystal or playing with REPL may not be as much fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is fast, as fast as the Rust and C languages. It will give you great power to create tools.
104
104
 
105
105
  ## Documentation
106
106
 
@@ -123,9 +123,12 @@ bundle exec rake test
123
123
 
124
124
  Many macro functions are used in HTSlib. Since these macro functions cannot be called using FFI, they must be reimplemented in Ruby.
125
125
 
126
- * Actively use the advanced features of Ruby.
126
+ * Use the new version of Ruby to take full advantage of Ruby's potential.
127
+ * This is possible because we have a small number of users. What a deal!
127
128
  * Remain compatibile with [htslib.cr](https://github.com/bio-crystal/htslib.cr).
128
- * The most difficult part is the return value. In the Crystal language, it is convenient for a method to return only one type. In the Ruby language, on the other hand, it is more convenient to return multiple classes. For example, in the Crystal language, it is confusing that a return value can take four types: Int32, Float32, Nil, and String. In Ruby, on the other hand, it is very common and does not cause any problems.
129
+ * The most difficult part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal can do this. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
130
+
131
+ In the script directory, there are several tools to help implement ruby-htslib. These tools may be forked into independent repository in the future.
129
132
 
130
133
  #### FFI Extensions
131
134
 
@@ -145,7 +148,10 @@ Ruby-htslib is a library under development, so even small improvements like typo
145
148
  * Suggest or add new features
146
149
  * [financial contributions](https://github.com/sponsors/kojix2)
147
150
 
151
+
148
152
  ```
153
+ # Ownership and Commitment Rights
154
+
149
155
  Do you need commit rights to ruby-htslib repository?
150
156
  Do you want to get admin rights and take over the project?
151
157
  If so, please feel free to contact us @kojix2.
@@ -153,7 +159,7 @@ If so, please feel free to contact us @kojix2.
153
159
 
154
160
  #### Why do you implement htslib in a language like Ruby, which is not widely used in the bioinformatics?
155
161
 
156
- One of the greatest joys of using a minor language like Ruby in bioinformatics is that there is nothing stopping you from reinventing the wheel. Reinventing the wheel can be fun. But with languages like Python and R, where many bioinformatics masters work, there is no chance left for beginners to create htslib bindings. Bioinformatics file formats, libraries and tools are very complex and I don't know how to understand them. So I wanted to implement the HTSLib binding myself to better understand how the pioneers of bioinformatics felt when establishing the file format and how they created their tools. And that effort is still going on today...
162
+ One of the greatest joys of using a minor language like Ruby in bioinformatics is that there is nothing stopping you from reinventing the wheel. Reinventing the wheel can be fun. But with languages like Python and R, where many bioinformatics masters work, there is no chance left for beginners to create htslib bindings. Bioinformatics file formats, libraries and tools are very complex and I don't know how to understand them. So I wanted to implement the HTSLib binding myself to better understand how the pioneers of bioinformatics felt when establishing the file format and how they created their tools. I hope one day we can work on bioinformatics using Ruby and Crystal languages, not to replace other languages such as Python and R, but to add new power and value to this advancing field.
157
163
 
158
164
  ## Links
159
165
 
data/lib/hts/bam/aux.rb CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  module HTS
4
4
  class Bam < Hts
5
+ # Auxiliary record data
5
6
  class Aux
6
7
  def initialize(record)
7
8
  @record = record
data/lib/hts/bam/cigar.rb CHANGED
@@ -2,16 +2,15 @@
2
2
 
3
3
  module HTS
4
4
  class Bam < Hts
5
+ # CIGAR string
5
6
  class Cigar
6
7
  include Enumerable
7
8
 
8
9
  def initialize(pointer, n_cigar)
9
- @pointer = pointer
10
10
  @n_cigar = n_cigar
11
- end
12
-
13
- def to_ptr
14
- @pointer
11
+ # Read the pointer before the memory is changed.
12
+ # Especially when called from a block of `each` iterator.
13
+ @c = pointer.read_array_of_uint32(n_cigar)
15
14
  end
16
15
 
17
16
  def to_s
@@ -21,8 +20,7 @@ module HTS
21
20
  def each
22
21
  return to_enum(__method__) unless block_given?
23
22
 
24
- @n_cigar.times do |i|
25
- c = @pointer[i].read_uint32
23
+ @c.each do |c|
26
24
  op = LibHTS.bam_cigar_opchr(c)
27
25
  len = LibHTS.bam_cigar_oplen(c)
28
26
  yield [op, len]
data/lib/hts/bam/flag.rb CHANGED
@@ -5,6 +5,7 @@
5
5
 
6
6
  module HTS
7
7
  class Bam < Hts
8
+ # SAM flags
8
9
  class Flag
9
10
  def initialize(flag_value)
10
11
  raise TypeError unless flag_value.is_a? Integer
@@ -52,6 +53,10 @@ module HTS
52
53
  (@value & f) != 0
53
54
  end
54
55
 
56
+ def to_i
57
+ @value
58
+ end
59
+
55
60
  def to_s
56
61
  LibHTS.bam_flag2str(@value)
57
62
  # "0x#{format('%x', @value)}\t#{@value}\t#{LibHTS.bam_flag2str(@value)}"
@@ -2,6 +2,7 @@
2
2
 
3
3
  module HTS
4
4
  class Bam < Hts
5
+ # A class for working with alignment header.
5
6
  class Header
6
7
  def initialize(hts_file)
7
8
  @sam_hdr = LibHTS.sam_hdr_read(hts_file)
@@ -6,6 +6,7 @@ require_relative "aux"
6
6
 
7
7
  module HTS
8
8
  class Bam < Hts
9
+ # A class for working with alignment records.
9
10
  class Record
10
11
  SEQ_NT16_STR = "=ACMGRSVTWYHKDBN"
11
12
 
@@ -62,14 +63,17 @@ module HTS
62
63
  end
63
64
 
64
65
  # returns 0-based mate position
65
- def mpos
66
+ def mate_pos
66
67
  @bam1[:core][:mpos]
67
68
  end
68
69
 
69
- def mpos=(mpos)
70
+ def mate_pos=(mpos)
70
71
  @bam1[:core][:mpos] = mpos
71
72
  end
72
73
 
74
+ alias mpos mate_pos
75
+ alias mpos= mate_pos=
76
+
73
77
  def bin
74
78
  @bam1[:core][:bin]
75
79
  end
@@ -111,12 +115,11 @@ module HTS
111
115
  @bam1[:core][:isize]
112
116
  end
113
117
 
114
- alias isize insert_size
115
-
116
118
  def insert_size=(isize)
117
119
  @bam1[:core][:isize] = isize
118
120
  end
119
121
 
122
+ alias isize insert_size
120
123
  alias isize= insert_size=
121
124
 
122
125
  # mapping quality
data/lib/hts/bam.rb CHANGED
@@ -9,6 +9,7 @@ require_relative "bam/flag"
9
9
  require_relative "bam/record"
10
10
 
11
11
  module HTS
12
+ # A class for working with SAM, BAM, CRAM files.
12
13
  class Bam
13
14
  include Enumerable
14
15
 
@@ -47,23 +48,20 @@ module HTS
47
48
  raise "Failed to load fasta index: #{fai}" if r < 0
48
49
  end
49
50
 
50
- if threads&.> 0
51
- r = LibHTS.hts_set_threads(@hts_file, threads)
52
- raise "Failed to set number of threads: #{threads}" if r < 0
53
- end
51
+ set_threads(threads) if threads
54
52
 
55
53
  return if @mode[0] == "w"
56
54
 
57
55
  @header = Bam::Header.new(@hts_file)
58
-
59
56
  create_index(index) if create_index
60
-
61
57
  @idx = load_index(index)
62
-
63
58
  @start_position = tell
59
+ super # do nothing
64
60
  end
65
61
 
66
62
  def create_index(index_name = nil)
63
+ check_closed
64
+
67
65
  warn "Create index for #{@file_name} to #{index_name}"
68
66
  if index
69
67
  LibHTS.sam_index_build2(@file_name, index_name, -1)
@@ -73,6 +71,8 @@ module HTS
73
71
  end
74
72
 
75
73
  def load_index(index_name = nil)
74
+ check_closed
75
+
76
76
  if index_name
77
77
  LibHTS.sam_index_load2(@hts_file, @file_name, index_name)
78
78
  else
@@ -81,6 +81,8 @@ module HTS
81
81
  end
82
82
 
83
83
  def index_loaded?
84
+ check_closed
85
+
84
86
  !@idx.null?
85
87
  end
86
88
 
@@ -92,7 +94,7 @@ module HTS
92
94
  end
93
95
 
94
96
  def write_header(header)
95
- raise IOError, "closed stream" if closed?
97
+ check_closed
96
98
 
97
99
  @header = header.dup
98
100
  LibHTS.hts_set_fai_filename(@hts_file, @file_name)
@@ -100,17 +102,22 @@ module HTS
100
102
  end
101
103
 
102
104
  def write(aln)
103
- raise IOError, "closed stream" if closed?
105
+ check_closed
104
106
 
105
107
  aln_dup = aln.dup
106
108
  LibHTS.sam_write1(@hts_file, header, aln_dup) > 0 || raise
107
109
  end
108
110
 
109
- # Iterate over each record.
110
- # Generate a new Record object each time.
111
- # Slower than each.
112
- def each_copy
113
- raise IOError, "closed stream" if closed?
111
+ def each(copy: false, &block)
112
+ if copy
113
+ each_record_copy(&block)
114
+ else
115
+ each_record_reuse(&block)
116
+ end
117
+ end
118
+
119
+ private def each_record_copy
120
+ check_closed
114
121
  return to_enum(__method__) unless block_given?
115
122
 
116
123
  while LibHTS.sam_read1(@hts_file, header, bam1 = LibHTS.bam_init1) != -1
@@ -120,14 +127,10 @@ module HTS
120
127
  self
121
128
  end
122
129
 
123
- # Iterate over each record.
124
- # Record object is reused.
125
- # Faster than each_copy.
126
- def each
127
- raise IOError, "closed stream" if closed?
130
+ private def each_record_reuse
131
+ check_closed
128
132
  # Each does not always start at the beginning of the file.
129
133
  # This is the common behavior of IO objects in Ruby.
130
- # This may change in the future.
131
134
  return to_enum(__method__) unless block_given?
132
135
 
133
136
  bam1 = LibHTS.bam_init1
@@ -138,7 +141,7 @@ module HTS
138
141
 
139
142
  # query [WIP]
140
143
  def query(region)
141
- raise IOError, "closed stream" if closed?
144
+ check_closed
142
145
  raise "Index file is required to call the query method." unless index_loaded?
143
146
 
144
147
  qiter = LibHTS.sam_itr_querys(@idx, header, region)
@@ -154,5 +157,63 @@ module HTS
154
157
  LibHTS.hts_itr_destroy(qiter)
155
158
  end
156
159
  end
160
+
161
+ # @!macro [attach] define_getter
162
+ # @method $1
163
+ # Get $1 array
164
+ # @return [Array] the $1 array
165
+ define_getter :qname
166
+ define_getter :flag
167
+ define_getter :chrom
168
+ define_getter :pos
169
+ define_getter :mapq
170
+ define_getter :cigar
171
+ define_getter :mate_chrom
172
+ define_getter :mate_pos
173
+ define_getter :insert_size
174
+ define_getter :seq
175
+ define_getter :qual
176
+
177
+ alias isize insert_size
178
+ alias mpos mate_pos
179
+
180
+ def aux(tag)
181
+ warn "experimental"
182
+ check_closed
183
+ position = tell
184
+ ary = map { |r| r.aux(tag) }
185
+ seek(position)
186
+ ary
187
+ end
188
+
189
+ # @!macro [attach] define_iterator
190
+ # @method each_$1
191
+ # Get $1 iterator
192
+ define_iterator :qname
193
+ define_iterator :flag
194
+ define_iterator :chrom
195
+ define_iterator :pos
196
+ define_iterator :mapq
197
+ define_iterator :cigar
198
+ define_iterator :mate_chrom
199
+ define_iterator :mate_pos
200
+ define_iterator :insert_size
201
+ define_iterator :seq
202
+ define_iterator :qual
203
+
204
+ alias each_isize each_insert_size
205
+ alias each_mpos each_mate_pos
206
+
207
+ def each_aux(tag)
208
+ warn "experimental"
209
+ check_closed
210
+ return to_enum(__method__, tag) unless block_given?
211
+
212
+ each do |record|
213
+ yield record.aux(tag)
214
+ end
215
+
216
+ self
217
+ end
157
218
  end
158
219
  end
@@ -38,8 +38,8 @@ module HTS
38
38
  h = @record.header.struct
39
39
  r = @record.struct
40
40
 
41
- format_values = proc do |type|
42
- ret = LibHTS.bcf_get_format_values(h, r, key, p1, n, type)
41
+ format_values = proc do |typ|
42
+ ret = LibHTS.bcf_get_format_values(h, r, key, p1, n, typ)
43
43
  return nil if ret < 0 # return from method.
44
44
 
45
45
  p1.read_pointer
@@ -79,10 +79,10 @@ module HTS
79
79
  num = LibHTS.bcf_hdr_id2number(@record.header.struct, LibHTS::BCF_HL_FMT, id)
80
80
  type = LibHTS.bcf_hdr_id2type(@record.header.struct, LibHTS::BCF_HL_FMT, id)
81
81
  {
82
- name: name,
82
+ name:,
83
83
  n: num,
84
84
  type: ht_type_to_sym(type),
85
- id: id
85
+ id:
86
86
  }
87
87
  end
88
88
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  module HTS
4
4
  class Bcf < Hts
5
+ # A class for working with VCF records.
5
6
  class Header
6
7
  def initialize(hts_file)
7
8
  @bcf_hdr = LibHTS.bcf_hdr_read(hts_file)
data/lib/hts/bcf/info.rb CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  module HTS
4
4
  class Bcf < Hts
5
+ # Info field
5
6
  class Info
6
7
  def initialize(record)
7
8
  @record = record
@@ -76,10 +77,10 @@ module HTS
76
77
  num = LibHTS.bcf_hdr_id2number(@record.header.struct, LibHTS::BCF_HL_INFO, key)
77
78
  type = LibHTS.bcf_hdr_id2type(@record.header.struct, LibHTS::BCF_HL_INFO, key)
78
79
  {
79
- name: name,
80
+ name:,
80
81
  n: num,
81
82
  type: ht_type_to_sym(type),
82
- key: key
83
+ key:
83
84
  }
84
85
  end
85
86
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  module HTS
4
4
  class Bcf < Hts
5
+ # A class for working with VCF records.
5
6
  class Record
6
7
  def initialize(bcf_t, header)
7
8
  @bcf1 = bcf_t
@@ -60,35 +61,6 @@ module HTS
60
61
  LibHTS.bcf_update_id(@header, @bcf1, ".")
61
62
  end
62
63
 
63
- def filter
64
- LibHTS.bcf_unpack(@bcf1, LibHTS::BCF_UN_FLT)
65
- d = @bcf1[:d]
66
- n_flt = d[:n_flt]
67
-
68
- case n_flt
69
- when 0
70
- "PASS"
71
- when 1
72
- i = d[:flt].read_int
73
- LibHTS.bcf_hdr_int2id(@header.struct, LibHTS::BCF_DT_ID, i)
74
- when 2..nil
75
- d[:flt].get_array_of_int(0, n_flt).map do |i|
76
- LibHTS.bcf_hdr_int2id(@header.struct, LibHTS::BCF_DT_ID, i)
77
- end
78
- else
79
- raise "Unexpected number of filters. n_flt: #{n_flt}"
80
- end
81
- end
82
-
83
- # Get variant quality.
84
- def qual
85
- @bcf1[:qual]
86
- end
87
-
88
- def qual=(qual)
89
- @bcf1[:qual] = qual
90
- end
91
-
92
64
  def ref
93
65
  LibHTS.bcf_unpack(@bcf1, LibHTS::BCF_UN_STR)
94
66
  @bcf1[:d][:allele].get_pointer(0).read_string
@@ -108,6 +80,35 @@ module HTS
108
80
  ).map(&:read_string)
109
81
  end
110
82
 
83
+ # Get variant quality.
84
+ def qual
85
+ @bcf1[:qual]
86
+ end
87
+
88
+ def qual=(qual)
89
+ @bcf1[:qual] = qual
90
+ end
91
+
92
+ def filter
93
+ LibHTS.bcf_unpack(@bcf1, LibHTS::BCF_UN_FLT)
94
+ d = @bcf1[:d]
95
+ n_flt = d[:n_flt]
96
+
97
+ case n_flt
98
+ when 0
99
+ "PASS"
100
+ when 1
101
+ id = d[:flt].read_int
102
+ LibHTS.bcf_hdr_int2id(@header.struct, LibHTS::BCF_DT_ID, id)
103
+ when 2..nil
104
+ d[:flt].get_array_of_int(0, n_flt).map do |i|
105
+ LibHTS.bcf_hdr_int2id(@header.struct, LibHTS::BCF_DT_ID, i)
106
+ end
107
+ else
108
+ raise "Unexpected number of filters. n_flt: #{n_flt}"
109
+ end
110
+ end
111
+
111
112
  def info(key = nil)
112
113
  LibHTS.bcf_unpack(@bcf1, LibHTS::BCF_UN_SHR)
113
114
  info = Info.new(self)