htslib 0.0.4 → 0.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c8048806df4c335ea698c8d3ad9e51e12644f0d32667d3db5fac59a004db75bf
4
- data.tar.gz: 3d18183921f70b7b42dc5657195607c519c7d83c1e0e94b9b7a8f647d96f5504
3
+ metadata.gz: dd81ebe87741d1019acd341189ee0f8fb6ef09d3a10690686b3d40bdfea73076
4
+ data.tar.gz: 39a7fa296cc1797c7cd6317923b3336c67f0cf16e8c9f7eb8ae8dbef9e6c7f56
5
5
  SHA512:
6
- metadata.gz: f50e96a23cb75130c5aa463329cce3c2d5784ed75c608d4685d2662ee22fa93e55208c356bce0fee319028c6ae8e33b5b20d270a17fb63ca6e39a736f668a2e4
7
- data.tar.gz: c223f9b61979ace926fa53b408e85fa6f2e127cdde6ffa5bcfa1ac0ad7ce6c9c1ac5570522021c61e82d27498473299b8875f7d633b24367ae99dd2494758556
6
+ metadata.gz: c349f647e798efc920492877807db88d03dfe88edf1434e733d38221ab15f1bea179ac524b3e26feb83c7063c878138918be3dcb30bcbbc583e0cc5bbbe1b8df
7
+ data.tar.gz: 120052721db361d285968331d2091f4bfe17b45a4204443a591141bc12aef3b153866077dc85ddff63451873800f78db377a151ecbec7084b1f5f409fe5507b0
data/README.md CHANGED
@@ -9,11 +9,12 @@
9
9
  :dna: [HTSlib](https://github.com/samtools/htslib) - for Ruby
10
10
 
11
11
  Ruby-htslib is the Ruby bindings to HTSlib, a C library for processing high throughput sequencing (HTS) data.
12
- It will provide APIs to read and write file formats such as SAM, BAM, VCF, and BCF.
12
+ It will provide APIs to read and write file formats such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/).
13
13
 
14
14
  :apple: Feel free to fork it out if you can develop it!
15
15
 
16
16
  :bowtie: alpha stage.
17
+
17
18
  ## Requirements
18
19
 
19
20
  * [Ruby](https://github.com/ruby/ruby) 2.7 or above.
@@ -38,26 +39,12 @@ export HTSLIBDIR="/your/path/to/htslib" # libhts.so
38
39
 
39
40
  ## Overview
40
41
 
41
- ### Low level API
42
-
43
- `HTS::LibHTS` provides native functions.
44
-
45
- ```ruby
46
- require 'htslib'
42
+ ### High level API
47
43
 
48
- a = HTS::LibHTS.hts_open("a.bam", "r")
49
- b = HTS::LibHTS.hts_get_format(a)
50
- p b[:category]
51
- p b[:format]
52
- ```
44
+ A high-level API is under development.
45
+ Classes such as `Cram` `Bam` `Bcf` `Faidx` `Tabix` are partially implemented.
53
46
 
54
- Note: Managed struct is not used in ruby-htslib. You may need to free the memory by yourself.
55
-
56
- ### High level API (Plan)
57
-
58
- `Cram` `Bam` `Bcf` `Faidx` `Tabix`
59
-
60
- A high-level API is under development. We will change and improve the API to make it better.
47
+ Read SAM / BAM - Sequence Alignment Map file
61
48
 
62
49
  ```ruby
63
50
  require 'htslib'
@@ -67,15 +54,56 @@ bam = HTS::Bam.new("a.bam")
67
54
  bam.each do |r|
68
55
  p name: r.qname,
69
56
  flag: r.flag,
70
- start: r.start + 1,
71
- mpos: r.mate_pos + 1,
57
+ pos: r.start + 1,
58
+ mpos: r.mate_start + 1,
72
59
  mqual: r.mapping_quality,
73
60
  seq: r.sequence,
74
61
  cigar: r.cigar.to_s,
75
62
  qual: r.base_qualities.map { |i| (i + 33).chr }.join
76
63
  end
64
+
65
+ bam.close
66
+ ```
67
+
68
+ Read VCF / BCF - Variant Call Format
69
+
70
+ ```ruby
71
+ bcf = HTS::Bcf.new("b.bcf")
72
+
73
+ bcf.each do |r|
74
+ p chrom: r.chrom,
75
+ pos: r.pos,
76
+ id: r.id,
77
+ qual: r.qual.round(2),
78
+ ref: r.ref,
79
+ alt: r.alt,
80
+ filter: r.filter
81
+ end
82
+
83
+ bcf.close
84
+ ```
85
+
86
+ The methods for reading are implemented first. Methods for writing will be implemented in the coming days.
87
+
88
+ ### Low level API
89
+
90
+ `HTS::LibHTS` provides native functions.
91
+
92
+ ```ruby
93
+ require 'htslib'
94
+
95
+ a = HTS::LibHTS.hts_open("a.bam", "r")
96
+ b = HTS::LibHTS.hts_get_format(a)
97
+ p b[:category]
98
+ p b[:format]
77
99
  ```
78
100
 
101
+ Note: Only some C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
102
+
103
+ ### Need more speed?
104
+
105
+ Try [htslib.cr](https://github.com/bio-crystal/htslib.cr). htslib.cr is implemented in Crystal language and provides an API compatible with ruby-htslib. crsytal language is not as flexible as Ruby language. If you have a clear idea of the manipulation you want to do and need to perform it many times, then by all means try to implement a command line tool using htslib.cr. The Crystal language is very fast and can perform almost as well as the Rust and C languages.
106
+
79
107
  ## Documentation
80
108
 
81
109
  * [RubyDoc.info - HTSlib](https://rdoc.info/gems/htslib)
@@ -92,8 +120,10 @@ bundle exec rake htslib:build
92
120
  bundle exec rake test
93
121
  ```
94
122
 
95
- We plan to actively use the new features of Ruby. Since the number of users is small, backward compatibility is not important.
96
- On the other hand, we will consider compatibility with [Crystal](https://github.com/bio-crystal/htslib.cr) to some extent.
123
+ Many macro functions are used in HTSlib. Since these macro functions cannot be called using FFI, they must be reimplemented in Ruby.
124
+
125
+ * Actively use the advanced features of Ruby.
126
+ * Consider compatibility with [htslib.cr](https://github.com/bio-crystal/htslib.cr) to some extent.
97
127
 
98
128
  #### FFI Extensions
99
129
 
@@ -101,6 +131,7 @@ On the other hand, we will consider compatibility with [Crystal](https://github.
101
131
 
102
132
  #### Automatic generation or automatic validation (Future plan)
103
133
 
134
+
104
135
  + [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files. It is planned to use c2ffi to automatically generate bindings or tests.
105
136
 
106
137
  ## Contributing
@@ -113,6 +144,16 @@ Ruby-htslib is a library under development, so even small improvements like typo
113
144
  * Suggest or add new features
114
145
  * [financial contributions](https://github.com/sponsors/kojix2)
115
146
 
147
+ ```
148
+ Do you need commit rights to my repository?
149
+ Do you want to get admin rights and take over the project?
150
+ If so, please feel free to contact us @kojix2.
151
+ ```
152
+
153
+ #### Why do you implement htslib in a language like Ruby, which is not widely used in the bioinformatics?
154
+
155
+ One of the greatest joys of using a minor language like Ruby in bioinformatics is that there is nothing stopping you from reinventing the wheel. Reinventing the wheel can be fun. But with languages like Python and R, where many bioinformatics masters work, there is no chance left for beginners to create htslib bindings. Bioinformatics file formats, libraries and tools are very complex and I don't know how to understand them. So I wanted to implement the HTSLib binding to better understand how to use the file formats and tools. And that effort is still going on today...
156
+
116
157
  ## Links
117
158
 
118
159
  * [samtools/hts-spec](https://github.com/samtools/hts-specs)
data/lib/hts/bam/cigar.rb CHANGED
@@ -22,6 +22,8 @@ module HTS
22
22
  end
23
23
 
24
24
  def each
25
+ return to_enum(__method__) unless block_given?
26
+
25
27
  @n_cigar.times do |i|
26
28
  c = @pointer[i].read_uint32
27
29
  yield [LibHTS.bam_cigar_oplen(c),
data/lib/hts/bam/flag.rb CHANGED
@@ -85,8 +85,12 @@ module HTS
85
85
  has_flag? LibHTS::BAM_FSUPPLEMENTARY
86
86
  end
87
87
 
88
- def has_flag?(o)
89
- @value[o] != 0
88
+ def has_flag?(m)
89
+ (@value & m) != 0
90
+ end
91
+
92
+ def to_s
93
+ "0x#{format('%x', @value)}\t#{@value}\t#{LibHTS.bam_flag2str(@value)}"
90
94
  end
91
95
  end
92
96
  end
@@ -6,8 +6,8 @@
6
6
  module HTS
7
7
  class Bam
8
8
  class Header
9
- def initialize(pointer)
10
- @sam_hdr = pointer
9
+ def initialize(hts_file)
10
+ @sam_hdr = LibHTS.sam_hdr_read(hts_file)
11
11
  end
12
12
 
13
13
  def struct
@@ -18,16 +18,31 @@ module HTS
18
18
  @sam_hdr.to_ptr
19
19
  end
20
20
 
21
- # FIXME: better name?
22
- def seqs
23
- Array.new(@sam_hdr[:n_targets]) do |i|
21
+ def target_count
22
+ @sam_hdr[:n_targets]
23
+ end
24
+
25
+ def target_names
26
+ Array.new(target_count) do |i|
24
27
  LibHTS.sam_hdr_tid2name(@sam_hdr, i)
25
28
  end
26
29
  end
27
30
 
28
- def text
31
+ def target_lengths
32
+ Array.new(target_count) do |i|
33
+ LibHTS.sam_hdr_tid2len(@sam_hdr, i)
34
+ end
35
+ end
36
+
37
+ def to_s
29
38
  LibHTS.sam_hdr_str(@sam_hdr)
30
39
  end
40
+
41
+ private
42
+
43
+ def initialize_copy(orig)
44
+ @sam_hdr = LibHTS.sam_hdr_dup(orig.struct)
45
+ end
31
46
  end
32
47
  end
33
48
  end
@@ -21,6 +21,8 @@ module HTS
21
21
  @bam1.to_ptr
22
22
  end
23
23
 
24
+ attr_reader :header
25
+
24
26
  # def initialize_copy
25
27
  # super
26
28
  # end
@@ -67,18 +69,22 @@ module HTS
67
69
 
68
70
  # returns the chromosome or '' if not mapped.
69
71
  def chrom
70
- tid = @bam1[:core][:tid]
71
72
  return "" if tid == -1
72
73
 
73
74
  LibHTS.sam_hdr_tid2name(@header, tid)
74
75
  end
75
76
 
77
+ # returns the chromosome or '' if not mapped.
78
+ def contig
79
+ chrom
80
+ end
81
+
76
82
  # returns the chromosome of the mate or '' if not mapped.
77
83
  def mate_chrom
78
- tid = @bam1[:core][:mtid]
79
- return "" if tid == -1
84
+ mtid = mate_tid
85
+ return "" if mtid == -1
80
86
 
81
- LibHTS.sam_hdr_tid2name(@header, tid)
87
+ LibHTS.sam_hdr_tid2name(@header, mtid)
82
88
  end
83
89
 
84
90
  def strand
@@ -90,7 +96,7 @@ module HTS
90
96
  # end
91
97
 
92
98
  # insert size
93
- def isize
99
+ def insert_size
94
100
  @bam1[:core][:isize]
95
101
  end
96
102
 
@@ -161,6 +167,28 @@ module HTS
161
167
  Flag.new(@bam1[:core][:flag])
162
168
  end
163
169
 
170
+ def tag(str)
171
+ aux = LibHTS.bam_aux_get(@bam1, str)
172
+ return nil if aux.null?
173
+
174
+ t = aux.read_string(1)
175
+
176
+ # A (character), B (general array),
177
+ # f (real number), H (hexadecimal array),
178
+ # i (integer), or Z (string).
179
+
180
+ case t
181
+ when "i", "I", "c", "C", "s", "S"
182
+ LibHTS.bam_aux2i(aux)
183
+ when "f", "d"
184
+ LibHTS.bam_aux2f(aux)
185
+ when "Z", "H"
186
+ LibHTS.bam_aux2Z(aux)
187
+ when "A" # char
188
+ LibHTS.bam_aux2A(aux).chr
189
+ end
190
+ end
191
+
164
192
  def to_s
165
193
  kstr = LibHTS::KString.new
166
194
  raise "Failed to format bam record" if LibHTS.sam_format1(@header.struct, @bam1, kstr) == -1
@@ -168,9 +196,12 @@ module HTS
168
196
  kstr[:s]
169
197
  end
170
198
 
171
- # TODO:
172
- # def eql?
173
- # def hash
199
+ private
200
+
201
+ def initialize_copy(orig)
202
+ @header = orig.header
203
+ @bam = LibHTS.bam_dup1(orig.struct)
204
+ end
174
205
  end
175
206
  end
176
207
  end
data/lib/hts/bam.rb CHANGED
@@ -7,104 +7,139 @@ require_relative "bam/header"
7
7
  require_relative "bam/cigar"
8
8
  require_relative "bam/flag"
9
9
  require_relative "bam/record"
10
- require_relative "utils/open_method"
11
10
 
12
11
  module HTS
13
12
  class Bam
14
13
  include Enumerable
15
- extend Utils::OpenMethod
16
14
 
17
15
  attr_reader :file_path, :mode, :header
18
- # HtfFile is FFI::BitStruct
19
- attr_reader :htf_file
20
16
 
21
- class << self
22
- alias open new
17
+ def self.open(...)
18
+ file = new(...)
19
+ return file unless block_given?
20
+
21
+ begin
22
+ yield file
23
+ ensure
24
+ file.close
25
+ end
26
+ file
23
27
  end
24
28
 
25
- def initialize(file_path, mode = "r", create_index: nil)
26
- file_path = File.expand_path(file_path)
29
+ def initialize(filename, mode = "r", fai: nil, threads: nil, index: nil)
30
+ raise "HTS::Bam.new() dose not take block; Please use HTS::Bam.open() instead" if block_given?
27
31
 
28
- unless File.exist?(file_path)
32
+ @file_path = filename == "-" ? "-" : File.expand_path(filename)
33
+
34
+ if mode[0] == "r" && !File.exist?(file_path)
29
35
  message = "No such SAM/BAM file - #{file_path}"
30
36
  raise message
31
37
  end
32
38
 
33
- @file_path = file_path
34
39
  @mode = mode
35
- @htf_file = LibHTS.hts_open(file_path, mode)
36
- @header = Bam::Header.new(LibHTS.sam_hdr_read(htf_file))
37
-
38
- # FIXME: should be defined here?
39
- @bam1 = LibHTS.bam_init1
40
-
41
- # read
42
- if mode[0] == "r"
43
- # load index
44
- @idx = LibHTS.sam_index_load(htf_file, file_path)
45
- # create index
46
- if create_index || (@idx.null? && create_index.nil?)
47
- warn "Create index for #{file_path}"
48
- LibHTS.sam_index_build(file_path, -1)
49
- @idx = LibHTS.sam_index_load(htf_file, file_path)
50
- end
51
- else
52
- # FIXME: implement
53
- raise "not implemented yet."
40
+ @hts_file = LibHTS.hts_open(file_path, mode)
41
+
42
+ if fai
43
+ fai_path = File.expand_path(fai)
44
+ r = LibHTS.hts_set_fai_filename(@hts_file, fai_path)
45
+ raise "Failed to load fasta index: #{fai}" if r < 0
54
46
  end
55
47
 
56
- # IO like API
57
- if block_given?
58
- begin
59
- yield self
60
- ensure
61
- close
62
- end
48
+ if threads&.> 0
49
+ r = LibHTS.hts_set_threads(@hts_file, threads)
50
+ raise "Failed to set number of threads: #{threads}" if r < 0
63
51
  end
52
+
53
+ return if mode[0] == "w"
54
+
55
+ @header = Bam::Header.new(@hts_file)
56
+
57
+ create_index if index
58
+
59
+ # load index
60
+ @idx = LibHTS.sam_index_load(@hts_file, file_path)
64
61
  end
65
62
 
66
- def struct
67
- htf_file
63
+ def create_index
64
+ warn "Create index for #{file_path}"
65
+ LibHTS.sam_index_build(file_path, -1)
66
+ idx = LibHTS.sam_index_load(@hts_file, file_path)
67
+ raise "Failed to load index: #{file_path}" if idx.null?
68
68
  end
69
69
 
70
- def to_ptr
71
- htf_file.to_ptr
70
+ def struct
71
+ @hts_file
72
72
  end
73
73
 
74
- def write(alns)
75
- alns.each do
76
- LibHTS.sam_write1(htf_file, header, alns.b) > 0 || raise
77
- end
74
+ def to_ptr
75
+ @hts_file.to_ptr
78
76
  end
79
77
 
80
78
  # Close the current file.
81
79
  def close
82
- LibHTS.hts_close(htf_file)
80
+ LibHTS.hts_idx_destroy(@idx) if @idx
81
+ @idx = nil
82
+ LibHTS.hts_close(@hts_file)
83
+ @hts_file = nil
84
+ end
85
+
86
+ def closed?
87
+ @hts_file.nil?
88
+ end
89
+
90
+ def write_header(header)
91
+ @header = header.dup
92
+ LibHTS.hts_set_fai_filename(@hts_file, @file_path)
93
+ LibHTS.sam_hdr_write(@hts_file, header)
94
+ end
95
+
96
+ def write(aln)
97
+ aln_dup = aln.dup
98
+ LibHTS.sam_write1(@hts_file, header, aln_dup) > 0 || raise
83
99
  end
84
100
 
85
101
  # Flush the current file.
86
102
  def flush
87
- # LibHTS.bgzf_flush(@htf_file.fp.bgzf)
103
+ # LibHTS.bgzf_flush(@@hts_file.fp.bgzf)
88
104
  end
89
105
 
90
- def each(&block)
106
+ # Iterate over each record.
107
+ # Record object is reused.
108
+ # Faster than each_copy.
109
+ def each
91
110
  # Each does not always start at the beginning of the file.
92
111
  # This is the common behavior of IO objects in Ruby.
93
112
  # This may change in the future.
94
- while LibHTS.sam_read1(htf_file, header, @bam1) > 0
95
- record = Record.new(@bam1, header)
96
- block.call(record)
113
+ return to_enum(__method__) unless block_given?
114
+
115
+ bam1 = LibHTS.bam_init1
116
+ record = Record.new(bam1, header)
117
+ yield record while LibHTS.sam_read1(@hts_file, header, bam1) > 0
118
+ end
119
+
120
+ # Iterate over each record.
121
+ # Generate a new Record object each time.
122
+ # Slower than each.
123
+ def each_copy
124
+ return to_enum(__method__) unless block_given?
125
+
126
+ while LibHTS.sam_read1(@hts_file, header, bam1 = LibHTS.bam_init1) > 0
127
+ record = Record.new(bam1, header)
128
+ yield record
97
129
  end
98
130
  end
99
131
 
100
132
  # query [WIP]
101
133
  def query(region)
134
+ # FIXME: when @idx is nil
102
135
  qiter = LibHTS.sam_itr_querys(@idx, header, region)
103
136
  begin
104
- slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
137
+ bam1 = LibHTS.bam_init1
138
+ slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
105
139
  while slen > 0
106
- yield Record.new(@bam1, header)
107
- slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
140
+ yield Record.new(bam1, header)
141
+ bam1 = LibHTS.bam_init1
142
+ slen = LibHTS.sam_itr_next(@hts_file, qiter, bam1)
108
143
  end
109
144
  ensure
110
145
  LibHTS.hts_itr_destroy(qiter)
@@ -12,10 +12,30 @@ module HTS
12
12
  @p1 = FFI::MemoryPointer.new(:pointer) # FIXME: naming
13
13
  end
14
14
 
15
+ # For compatibility with htslib.cr.
16
+ def get_int(key)
17
+ get(key, :int)
18
+ end
19
+
20
+ # For compatibility with htslib.cr.
21
+ def get_float(key)
22
+ get(key, :float)
23
+ end
24
+
25
+ # For compatibility with htslib.cr.
26
+ def get_flag(key)
27
+ get(key, :flag)
28
+ end
29
+
30
+ # For compatibility with htslib.cr.
31
+ def get_string(key)
32
+ get(key, :string)
33
+ end
34
+
15
35
  def get(key, type = nil)
16
36
  n = FFI::MemoryPointer.new(:int)
17
37
  p1 = @p1
18
- h = @record.bcf.header.struct
38
+ h = @record.header.struct
19
39
  r = @record.struct
20
40
 
21
41
  format_values = proc do |type|
@@ -33,11 +53,13 @@ module HTS
33
53
  format_values.call(LibHTS::BCF_HT_REAL)
34
54
  .read_array_of_float(n.read_int)
35
55
  when :flag
36
- format_values.call(LibHTS::BCF_HT_FLAG)
37
- .read_int == 1
56
+ raise NotImplementedError, "Flag type not implemented yet."
57
+ # format_values.call(LibHTS::BCF_HT_FLAG)
58
+ # .read_int == 1
38
59
  when :string, :str
39
- format_values.call(LibHTS::BCF_HT_STR)
40
- .read_pointer.read_string
60
+ raise NotImplementedError, "String type not implemented yet."
61
+ # format_values.call(LibHTS::BCF_HT_STR)
62
+ # .read_string
41
63
  end
42
64
  end
43
65
 
@@ -3,16 +3,44 @@
3
3
  module HTS
4
4
  class Bcf
5
5
  class Header
6
- def initialize(h)
7
- @h = h
6
+ def initialize(hts_file)
7
+ @bcf_hdr = LibHTS.bcf_hdr_read(hts_file)
8
8
  end
9
9
 
10
10
  def struct
11
- @h
11
+ @bcf_hdr
12
12
  end
13
13
 
14
14
  def to_ptr
15
- @h.to_ptr
15
+ @bcf_hdr.to_ptr
16
+ end
17
+
18
+ def get_version
19
+ LibHTS.bcf_hdr_get_version(@bcf_hdr)
20
+ end
21
+
22
+ def sample_count
23
+ LibHTS.bcf_hdr_nsamples(@bcf_hdr)
24
+ end
25
+
26
+ def sample_names
27
+ # bcf_hdr_id2name is macro function
28
+ @bcf_hdr[:samples]
29
+ .read_array_of_pointer(sample_count)
30
+ .map(&:read_string)
31
+ end
32
+
33
+ def to_s
34
+ kstr = LibHTS::KString.new
35
+ raise "Failed to get header string" unless LibHTS.bcf_hdr_format(@bcf_hdr, 0, kstr)
36
+
37
+ kstr[:s]
38
+ end
39
+
40
+ private
41
+
42
+ def initialize_copy(orig)
43
+ @bcf_hdr = LibHTS.bcf_hdr_dup(orig.struct)
16
44
  end
17
45
  end
18
46
  end