htslib 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 30f42b474bc317136d665b00781fbfcb11caaf588b091e76bc86bf9cdf8d5e3f
4
- data.tar.gz: d48e5f74fb0efed4de5af2955b0093d1eb7ef5ee7767bc7ee50bf3e296e7ce28
3
+ metadata.gz: c8048806df4c335ea698c8d3ad9e51e12644f0d32667d3db5fac59a004db75bf
4
+ data.tar.gz: 3d18183921f70b7b42dc5657195607c519c7d83c1e0e94b9b7a8f647d96f5504
5
5
  SHA512:
6
- metadata.gz: 6c1bf27a8fdc04a4a9ba678923df5bb579439c286802a5d1f2a4e6f11d7102217eafa0e4e42c2fa853e9ee82c706756315a0a1d6f97c5b5fab58ee909add4eb0
7
- data.tar.gz: 59219371057e45cf31951eda2dae250acaedd12c593c09fb08f19720818be514797c57e9b45be8e1f6ea60f2fbc79fe6038a1aced6ba66d8f39a544eb6516a0a
6
+ metadata.gz: f50e96a23cb75130c5aa463329cce3c2d5784ed75c608d4685d2662ee22fa93e55208c356bce0fee319028c6ae8e33b5b20d270a17fb63ca6e39a736f668a2e4
7
+ data.tar.gz: c223f9b61979ace926fa53b408e85fa6f2e127cdde6ffa5bcfa1ac0ad7ce6c9c1ac5570522021c61e82d27498473299b8875f7d633b24367ae99dd2494758556
data/README.md CHANGED
@@ -6,17 +6,21 @@
6
6
  [![DOI](https://zenodo.org/badge/247078205.svg)](https://zenodo.org/badge/latestdoi/247078205)
7
7
  [![Docs Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/htslib)
8
8
 
9
- :dna: [HTSlib](https://github.com/samtools/htslib) - high-throughput sequencing data manipulation - for Ruby
9
+ :dna: [HTSlib](https://github.com/samtools/htslib) - for Ruby
10
+
11
+ Ruby-htslib is the Ruby bindings to HTSlib, a C library for processing high throughput sequencing (HTS) data.
12
+ It will provide APIs to read and write file formats such as SAM, BAM, VCF, and BCF.
10
13
 
11
14
  :apple: Feel free to fork it out if you can develop it!
12
15
 
13
16
  :bowtie: alpha stage.
14
-
15
17
  ## Requirements
16
18
 
17
- * [htslib](https://github.com/samtools/htslib)
19
+ * [Ruby](https://github.com/ruby/ruby) 2.7 or above.
20
+ * [HTSlib](https://github.com/samtools/htslib)
18
21
  * Ubuntu : `apt install libhts-dev`
19
22
  * macOS : `brew install htslib`
23
+ * Build from source code (see Development section)
20
24
 
21
25
  ## Installation
22
26
 
@@ -24,19 +28,19 @@
24
28
  gem install htslib
25
29
  ```
26
30
 
27
- If you installed htslib with Ubuntu/apt or Mac/homebrew, [pkg-config](https://github.com/ruby-gnome/pkg-config) will automatically detect the location of the shared library.
28
-
29
- Or you can set the environment variable `HTSLIBDIR`.
31
+ If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
32
+ will automatically detect the location of the shared library.
33
+ Alternatively, you can specify the directory of the shared library by setting the environment variable `HTSLIBDIR`.
30
34
 
31
35
  ```sh
32
36
  export HTSLIBDIR="/your/path/to/htslib" # libhts.so
33
37
  ```
34
38
 
35
- ## Usage
39
+ ## Overview
36
40
 
37
41
  ### Low level API
38
42
 
39
- HTS::LibHTS
43
+ `HTS::LibHTS` provides native functions.
40
44
 
41
45
  ```ruby
42
46
  require 'htslib'
@@ -49,9 +53,11 @@ p b[:format]
49
53
 
50
54
  Note: Managed struct is not used in ruby-htslib. You may need to free the memory by yourself.
51
55
 
52
- ### High level API
56
+ ### High level API (Plan)
53
57
 
54
- A high-level API based on [hts-python](https://github.com/quinlan-lab/hts-python) or [hts-nim](https://github.com/brentp/hts-nim) is under development. We will change and improve the API to make it better.
58
+ `Cram` `Bam` `Bcf` `Faidx` `Tabix`
59
+
60
+ A high-level API is under development. We will change and improve the API to make it better.
55
61
 
56
62
  ```ruby
57
63
  require 'htslib'
@@ -86,8 +92,16 @@ bundle exec rake htslib:build
86
92
  bundle exec rake test
87
93
  ```
88
94
 
89
- [c2ffi](https://github.com/rpav/c2ffi) :
90
- I am trying to find a way to automatically generate a low-level API using c2ffi.
95
+ We plan to actively use the new features of Ruby. Since the number of users is small, backward compatibility is not important.
96
+ On the other hand, we will consider compatibility with [Crystal](https://github.com/bio-crystal/htslib.cr) to some extent.
97
+
98
+ #### FFI Extensions
99
+
100
+ * [ffi-bitfield](https://github.com/kojix2/ffi-bitfield) : Extension of Ruby-FFI to support bitfields.
101
+
102
+ #### Automatic generation or automatic validation (Future plan)
103
+
104
+ + [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files. It is planned to use c2ffi to automatically generate bindings or tests.
91
105
 
92
106
  ## Contributing
93
107
 
@@ -97,12 +111,16 @@ Ruby-htslib is a library under development, so even small improvements like typo
97
111
  * Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-htslib/pulls)
98
112
  * Write, clarify, or fix documentation
99
113
  * Suggest or add new features
114
+ * [financial contributions](https://github.com/sponsors/kojix2)
100
115
 
101
116
  ## Links
102
117
 
103
118
  * [samtools/hts-spec](https://github.com/samtools/hts-specs)
104
- * [c2ffi](https://github.com/rpav/c2ffi)
119
+ * [bioruby](https://github.com/bioruby/bioruby)
120
+
121
+ ## Funding support
105
122
 
123
+ This work was supported partially by [Ruby Association Grant 2020](https://www.ruby.or.jp/en/news/20201022).
106
124
  ## License
107
125
 
108
126
  [MIT License](https://opensource.org/licenses/MIT).
data/lib/hts/bam/cigar.rb CHANGED
@@ -7,20 +7,23 @@ module HTS
7
7
  class Bam
8
8
  class Cigar
9
9
  include Enumerable
10
- OPS = "MIDNSHP=XB"
11
10
 
12
- def initialize(cigar, n_cigar)
13
- @c = cigar
11
+ def initialize(pointer, n_cigar)
12
+ @pointer = pointer
14
13
  @n_cigar = n_cigar
15
14
  end
16
15
 
16
+ def to_ptr
17
+ @pointer
18
+ end
19
+
17
20
  def to_s
18
21
  to_a.flatten.join
19
22
  end
20
23
 
21
24
  def each
22
25
  @n_cigar.times do |i|
23
- c = @c[i].read_uint32
26
+ c = @pointer[i].read_uint32
24
27
  yield [LibHTS.bam_cigar_oplen(c),
25
28
  LibHTS.bam_cigar_opchr(c)]
26
29
  end
data/lib/hts/bam/flag.rb CHANGED
@@ -7,7 +7,9 @@ module HTS
7
7
  class Bam
8
8
  class Flag
9
9
  def initialize(flag_value)
10
- @value = flag_value # tytpe check?
10
+ raise TypeError unless flag_value.is_a? Integer
11
+
12
+ @value = flag_value
11
13
  end
12
14
 
13
15
  attr_accessor :value
@@ -84,7 +86,7 @@ module HTS
84
86
  end
85
87
 
86
88
  def has_flag?(o)
87
- (@value & o) != 0
89
+ @value[o] != 0
88
90
  end
89
91
  end
90
92
  end
@@ -6,21 +6,27 @@
6
6
  module HTS
7
7
  class Bam
8
8
  class Header
9
- attr_reader :h
9
+ def initialize(pointer)
10
+ @sam_hdr = pointer
11
+ end
12
+
13
+ def struct
14
+ @sam_hdr
15
+ end
10
16
 
11
- def initialize(h)
12
- @h = h
17
+ def to_ptr
18
+ @sam_hdr.to_ptr
13
19
  end
14
20
 
15
21
  # FIXME: better name?
16
22
  def seqs
17
- Array.new(@h[:n_targets]) do |i|
18
- LibHTS.sam_hdr_tid2name(@h, i)
23
+ Array.new(@sam_hdr[:n_targets]) do |i|
24
+ LibHTS.sam_hdr_tid2name(@sam_hdr, i)
19
25
  end
20
26
  end
21
27
 
22
28
  def text
23
- LibHTS.sam_hdr_str(@h)
29
+ LibHTS.sam_hdr_str(@sam_hdr)
24
30
  end
25
31
  end
26
32
  end
@@ -8,9 +8,17 @@ module HTS
8
8
  class Record
9
9
  SEQ_NT16_STR = "=ACMGRSVTWYHKDBN"
10
10
 
11
- def initialize(bam1_t, bam_hdr_t)
12
- @b = bam1_t
13
- @h = bam_hdr_t
11
+ def initialize(bam1_t, header)
12
+ @bam1 = bam1_t
13
+ @header = header
14
+ end
15
+
16
+ def struct
17
+ @bam1
18
+ end
19
+
20
+ def to_ptr
21
+ @bam1.to_ptr
14
22
  end
15
23
 
16
24
  # def initialize_copy
@@ -23,7 +31,7 @@ module HTS
23
31
 
24
32
  # returns the query name.
25
33
  def qname
26
- LibHTS.bam_get_qname(@b).read_string
34
+ LibHTS.bam_get_qname(@bam1).read_string
27
35
  end
28
36
 
29
37
  # Set (query) name.
@@ -33,48 +41,48 @@ module HTS
33
41
 
34
42
  # returns the tid of the record or -1 if not mapped.
35
43
  def tid
36
- @b[:core][:tid]
44
+ @bam1[:core][:tid]
37
45
  end
38
46
 
39
47
  # returns the tid of the mate or -1 if not mapped.
40
48
  def mate_tid
41
- @b[:core][:mtid]
49
+ @bam1[:core][:mtid]
42
50
  end
43
51
 
44
52
  # returns 0-based start position.
45
53
  def start
46
- @b[:core][:pos]
54
+ @bam1[:core][:pos]
47
55
  end
48
56
 
49
57
  # returns end position of the read.
50
58
  def stop
51
- LibHTS.bam_endpos @b
59
+ LibHTS.bam_endpos @bam1
52
60
  end
53
61
 
54
62
  # returns 0-based mate position
55
63
  def mate_start
56
- @b[:core][:mpos]
64
+ @bam1[:core][:mpos]
57
65
  end
58
66
  alias mate_pos mate_start
59
67
 
60
68
  # returns the chromosome or '' if not mapped.
61
69
  def chrom
62
- tid = @b[:core][:tid]
70
+ tid = @bam1[:core][:tid]
63
71
  return "" if tid == -1
64
72
 
65
- LibHTS.sam_hdr_tid2name(@h, tid)
73
+ LibHTS.sam_hdr_tid2name(@header, tid)
66
74
  end
67
75
 
68
76
  # returns the chromosome of the mate or '' if not mapped.
69
77
  def mate_chrom
70
- tid = @b[:core][:mtid]
78
+ tid = @bam1[:core][:mtid]
71
79
  return "" if tid == -1
72
80
 
73
- LibHTS.sam_hdr_tid2name(@h, tid)
81
+ LibHTS.sam_hdr_tid2name(@header, tid)
74
82
  end
75
83
 
76
84
  def strand
77
- LibHTS.bam_is_rev(@b) ? "-" : "+"
85
+ LibHTS.bam_is_rev(@bam1) ? "-" : "+"
78
86
  end
79
87
 
80
88
  # def start=(v)
@@ -83,38 +91,38 @@ module HTS
83
91
 
84
92
  # insert size
85
93
  def isize
86
- @b[:core][:isize]
94
+ @bam1[:core][:isize]
87
95
  end
88
96
 
89
97
  # mapping quality
90
98
  def mapping_quality
91
- @b[:core][:qual]
99
+ @bam1[:core][:qual]
92
100
  end
93
101
 
94
102
  # returns a `Cigar` object.
95
103
  def cigar
96
- Cigar.new(LibHTS.bam_get_cigar(@b), @b[:core][:n_cigar])
104
+ Cigar.new(LibHTS.bam_get_cigar(@bam1), @bam1[:core][:n_cigar])
97
105
  end
98
106
 
99
107
  def qlen
100
108
  LibHTS.bam_cigar2qlen(
101
- @b[:core][:n_cigar],
102
- LibHTS.bam_get_cigar(@b)
109
+ @bam1[:core][:n_cigar],
110
+ LibHTS.bam_get_cigar(@bam1)
103
111
  )
104
112
  end
105
113
 
106
114
  def rlen
107
115
  LibHTS.bam_cigar2rlen(
108
- @b[:core][:n_cigar],
109
- LibHTS.bam_get_cigar(@b)
116
+ @bam1[:core][:n_cigar],
117
+ LibHTS.bam_get_cigar(@bam1)
110
118
  )
111
119
  end
112
120
 
113
121
  # return the read sequence
114
122
  def sequence
115
- r = LibHTS.bam_get_seq(@b)
123
+ r = LibHTS.bam_get_seq(@bam1)
116
124
  seq = String.new
117
- (@b[:core][:l_qseq]).times do |i|
125
+ (@bam1[:core][:l_qseq]).times do |i|
118
126
  seq << SEQ_NT16_STR[LibHTS.bam_seqi(r, i)]
119
127
  end
120
128
  seq
@@ -122,35 +130,42 @@ module HTS
122
130
 
123
131
  # return only the base of the requested index "i" of the query sequence.
124
132
  def base_at(n)
125
- n += @b[:core][:l_qseq] if n < 0
126
- return "." if (n >= @b[:core][:l_qseq]) || (n < 0) # eg. base_at(-1000)
133
+ n += @bam1[:core][:l_qseq] if n < 0
134
+ return "." if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_at(-1000)
127
135
 
128
- r = LibHTS.bam_get_seq(@b)
136
+ r = LibHTS.bam_get_seq(@bam1)
129
137
  SEQ_NT16_STR[LibHTS.bam_seqi(r, n)]
130
138
  end
131
139
 
132
140
  # return the base qualities
133
141
  def base_qualities
134
- q_ptr = LibHTS.bam_get_qual(@b)
135
- q_ptr.read_array_of_uint8(@b[:core][:l_qseq])
142
+ q_ptr = LibHTS.bam_get_qual(@bam1)
143
+ q_ptr.read_array_of_uint8(@bam1[:core][:l_qseq])
136
144
  end
137
145
 
138
146
  # return only the base quality of the requested index "i" of the query sequence.
139
147
  def base_quality_at(n)
140
- n += @b[:core][:l_qseq] if n < 0
141
- return 0 if (n >= @b[:core][:l_qseq]) || (n < 0) # eg. base_quality_at(-1000)
148
+ n += @bam1[:core][:l_qseq] if n < 0
149
+ return 0 if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_quality_at(-1000)
142
150
 
143
- q_ptr = LibHTS.bam_get_qual(@b)
151
+ q_ptr = LibHTS.bam_get_qual(@bam1)
144
152
  q_ptr.get_uint8(n)
145
153
  end
146
154
 
147
155
  def flag_str
148
- LibHTS.bam_flag2str(@b[:core][:flag])
156
+ LibHTS.bam_flag2str(@bam1[:core][:flag])
149
157
  end
150
158
 
151
159
  # returns a `Flag` object.
152
160
  def flag
153
- Flag.new(@b[:core][:flag])
161
+ Flag.new(@bam1[:core][:flag])
162
+ end
163
+
164
+ def to_s
165
+ kstr = LibHTS::KString.new
166
+ raise "Failed to format bam record" if LibHTS.sam_format1(@header.struct, @bam1, kstr) == -1
167
+
168
+ kstr[:s]
154
169
  end
155
170
 
156
171
  # TODO:
data/lib/hts/bam.rb CHANGED
@@ -7,75 +7,104 @@ require_relative "bam/header"
7
7
  require_relative "bam/cigar"
8
8
  require_relative "bam/flag"
9
9
  require_relative "bam/record"
10
+ require_relative "utils/open_method"
10
11
 
11
12
  module HTS
12
13
  class Bam
13
14
  include Enumerable
14
- attr_reader :file_path, :mode, :htf, :header
15
+ extend Utils::OpenMethod
16
+
17
+ attr_reader :file_path, :mode, :header
18
+ # HtfFile is FFI::BitStruct
19
+ attr_reader :htf_file
20
+
21
+ class << self
22
+ alias open new
23
+ end
15
24
 
16
25
  def initialize(file_path, mode = "r", create_index: nil)
17
26
  file_path = File.expand_path(file_path)
18
27
 
19
- raise("No such SAM/BAM file - #{file_path}") unless File.exist?(file_path)
28
+ unless File.exist?(file_path)
29
+ message = "No such SAM/BAM file - #{file_path}"
30
+ raise message
31
+ end
20
32
 
21
33
  @file_path = file_path
22
34
  @mode = mode
23
- @htf = LibHTS.hts_open(@file_path, mode)
24
- @header = Bam::Header.new(LibHTS.sam_hdr_read(htf))
35
+ @htf_file = LibHTS.hts_open(file_path, mode)
36
+ @header = Bam::Header.new(LibHTS.sam_hdr_read(htf_file))
37
+
25
38
  # FIXME: should be defined here?
26
- @b = LibHTS.bam_init1
39
+ @bam1 = LibHTS.bam_init1
27
40
 
28
41
  # read
29
42
  if mode[0] == "r"
30
43
  # load index
31
- @idx = LibHTS.sam_index_load(htf, file_path)
44
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
32
45
  # create index
33
46
  if create_index || (@idx.null? && create_index.nil?)
34
47
  warn "Create index for #{file_path}"
35
48
  LibHTS.sam_index_build(file_path, -1)
36
- @idx = LibHTS.sam_index_load(@htf, @file_path)
49
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
37
50
  end
38
51
  else
39
52
  # FIXME: implement
40
53
  raise "not implemented yet."
41
54
  end
55
+
56
+ # IO like API
57
+ if block_given?
58
+ begin
59
+ yield self
60
+ ensure
61
+ close
62
+ end
63
+ end
64
+ end
65
+
66
+ def struct
67
+ htf_file
68
+ end
69
+
70
+ def to_ptr
71
+ htf_file.to_ptr
42
72
  end
43
73
 
44
74
  def write(alns)
45
75
  alns.each do
46
- LibHTS.sam_write1(htf, header, alns.b) > 0 || raise
76
+ LibHTS.sam_write1(htf_file, header, alns.b) > 0 || raise
47
77
  end
48
78
  end
49
79
 
50
80
  # Close the current file.
51
81
  def close
52
- LibHTS.hts_close(htf)
82
+ LibHTS.hts_close(htf_file)
53
83
  end
54
84
 
55
85
  # Flush the current file.
56
86
  def flush
57
- raise
58
- # LibHTS.bgzf_flush(@htf.fp.bgzf)
87
+ # LibHTS.bgzf_flush(@htf_file.fp.bgzf)
59
88
  end
60
89
 
61
90
  def each(&block)
62
91
  # Each does not always start at the beginning of the file.
63
92
  # This is the common behavior of IO objects in Ruby.
64
93
  # This may change in the future.
65
- while LibHTS.sam_read1(htf, header.h, @b) > 0
66
- record = Record.new(@b, header.h)
94
+ while LibHTS.sam_read1(htf_file, header, @bam1) > 0
95
+ record = Record.new(@bam1, header)
67
96
  block.call(record)
68
97
  end
69
98
  end
70
99
 
71
100
  # query [WIP]
72
101
  def query(region)
73
- qiter = LibHTS.sam_itr_querys(@idx, header.h, region)
102
+ qiter = LibHTS.sam_itr_querys(@idx, header, region)
74
103
  begin
75
- slen = LibHTS.sam_itr_next(htf, qiter, @b)
104
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
76
105
  while slen > 0
77
- yield Record.new(@b, header.h)
78
- slen = LibHTS.sam_itr_next(htf, qiter, @b)
106
+ yield Record.new(@bam1, header)
107
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
79
108
  end
80
109
  ensure
81
110
  LibHTS.hts_itr_destroy(qiter)
@@ -0,0 +1,52 @@
1
+ # frozen_string_literal: true
2
+
3
+ # https://github.com/brentp/hts-nim/blob/master/src/hts/vcf.nim
4
+ # This is a port from Nim.
5
+ # TODO: Make it more like Ruby.
6
+
7
+ module HTS
8
+ class Bcf
9
+ class Format
10
+ def initialize(record)
11
+ @record = record
12
+ @p1 = FFI::MemoryPointer.new(:pointer) # FIXME: naming
13
+ end
14
+
15
+ def get(key, type = nil)
16
+ n = FFI::MemoryPointer.new(:int)
17
+ p1 = @p1
18
+ h = @record.bcf.header.struct
19
+ r = @record.struct
20
+
21
+ format_values = proc do |type|
22
+ ret = LibHTS.bcf_get_format_values(h, r, key, p1, n, type)
23
+ return nil if ret < 0 # return from method.
24
+
25
+ p1.read_pointer
26
+ end
27
+
28
+ case type.to_sym
29
+ when :int, :int32
30
+ format_values.call(LibHTS::BCF_HT_INT)
31
+ .read_array_of_int32(n.read_int)
32
+ when :float, :real
33
+ format_values.call(LibHTS::BCF_HT_REAL)
34
+ .read_array_of_float(n.read_int)
35
+ when :flag
36
+ format_values.call(LibHTS::BCF_HT_FLAG)
37
+ .read_int == 1
38
+ when :string, :str
39
+ format_values.call(LibHTS::BCF_HT_STR)
40
+ .read_pointer.read_string
41
+ end
42
+ end
43
+
44
+ def set; end
45
+
46
+ # def fields # iterator
47
+ # end
48
+
49
+ def genotypes; end
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module HTS
4
+ class Bcf
5
+ class Header
6
+ def initialize(h)
7
+ @h = h
8
+ end
9
+
10
+ def struct
11
+ @h
12
+ end
13
+
14
+ def to_ptr
15
+ @h.to_ptr
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ module HTS
4
+ class Bcf
5
+ class Info
6
+ def initialize(record)
7
+ @record = record
8
+ end
9
+
10
+ def get(key, type = nil)
11
+ n = FFI::MemoryPointer.new(:int)
12
+ p1 = @record.p1
13
+ h = @record.bcf.header.struct
14
+ r = @record.struct
15
+
16
+ info_values = proc do |type|
17
+ ret = LibHTS.bcf_get_info_values(h, r, key, p1, n, type)
18
+ return nil if ret < 0 # return from method.
19
+
20
+ p1.read_pointer
21
+ end
22
+
23
+ case type.to_sym
24
+ when :int, :int32
25
+ info_values.call(LibHTS::BCF_HT_INT)
26
+ .read_array_of_int32(n.read_int)
27
+ when :float, :real
28
+ info_values.call(LibHTS::BCF_HT_REAL)
29
+ .read_array_of_float(n.read_int)
30
+ when :flag
31
+ info_values.call(LibHTS::BCF_HT_FLAG)
32
+ .read_int == 1
33
+ when :string, :str
34
+ info_values.call(LibHTS::BCF_HT_STR)
35
+ .read_pointer.read_string
36
+ end
37
+ end
38
+ end
39
+ end
40
+ end