htslib 0.0.0 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8b19cf8cd36bbb9ffeb34a9c92b98adba85716f57d24ec6990cc95c53c8f658d
4
- data.tar.gz: e678bdbe86be9c73c8fb3e378f86236a0a6eaf36ce047368af9263753bdca439
3
+ metadata.gz: c8048806df4c335ea698c8d3ad9e51e12644f0d32667d3db5fac59a004db75bf
4
+ data.tar.gz: 3d18183921f70b7b42dc5657195607c519c7d83c1e0e94b9b7a8f647d96f5504
5
5
  SHA512:
6
- metadata.gz: cf6a74ee14a7f0d3bbff603c1af232d7a106a0c7d70e4c8e4c7ac4270ab98866a971611a793a1b6d994fde2b5d0a400b0e73fe04a99c9b2abfa16d5e0d82a2c5
7
- data.tar.gz: d24f1fa238f3ad8d541f649a03cede9580bcdedad52e81a97f7d33a44a9c3d8fbfcf162b5912fcc9f7e5c832f09979d59cf7a4bc13fc6da7aa4facd0102fdfbb
6
+ metadata.gz: f50e96a23cb75130c5aa463329cce3c2d5784ed75c608d4685d2662ee22fa93e55208c356bce0fee319028c6ae8e33b5b20d270a17fb63ca6e39a736f668a2e4
7
+ data.tar.gz: c223f9b61979ace926fa53b408e85fa6f2e127cdde6ffa5bcfa1ac0ad7ce6c9c1ac5570522021c61e82d27498473299b8875f7d633b24367ae99dd2494758556
data/README.md CHANGED
@@ -1,12 +1,26 @@
1
- # HTSlib
1
+ # ruby-htslib
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/htslib.svg)](https://badge.fury.io/rb/htslib)
4
- ![CI](https://github.com/kojix2/ruby-htslib/workflows/CI/badge.svg?branch=master)
4
+ ![CI](https://github.com/kojix2/ruby-htslib/workflows/CI/badge.svg)
5
5
  [![The MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.txt)
6
+ [![DOI](https://zenodo.org/badge/247078205.svg)](https://zenodo.org/badge/latestdoi/247078205)
7
+ [![Docs Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/htslib)
8
+
9
+ :dna: [HTSlib](https://github.com/samtools/htslib) - for Ruby
10
+
11
+ Ruby-htslib is the Ruby bindings to HTSlib, a C library for processing high throughput sequencing (HTS) data.
12
+ It will provide APIs to read and write file formats such as SAM, BAM, VCF, and BCF.
6
13
 
7
14
  :apple: Feel free to fork it out if you can develop it!
8
15
 
9
- :bowtie: Just a prototype.
16
+ :bowtie: alpha stage.
17
+ ## Requirements
18
+
19
+ * [Ruby](https://github.com/ruby/ruby) 2.7 or above.
20
+ * [HTSlib](https://github.com/samtools/htslib)
21
+ * Ubuntu : `apt install libhts-dev`
22
+ * macOS : `brew install htslib`
23
+ * Build from source code (see Development section)
10
24
 
11
25
  ## Installation
12
26
 
@@ -14,34 +28,99 @@
14
28
  gem install htslib
15
29
  ```
16
30
 
17
- Set environment variable HTSLIBDIR.
31
+ If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
32
+ will automatically detect the location of the shared library.
33
+ Alternatively, you can specify the directory of the shared library by setting the environment variable `HTSLIBDIR`.
18
34
 
19
35
  ```sh
20
- export HTSLIBDIR="/your/path/to/htslib"
36
+ export HTSLIBDIR="/your/path/to/htslib" # libhts.so
21
37
  ```
22
38
 
23
- ## Requirements
39
+ ## Overview
24
40
 
25
- * [htslib](https://github.com/samtools/htslib)
41
+ ### Low level API
26
42
 
27
- ## Usage
43
+ `HTS::LibHTS` provides native functions.
28
44
 
29
45
  ```ruby
30
46
  require 'htslib'
31
47
 
32
- a = HTS::FFI.hts_open("a.bam", "r")
33
- b = HTS::FFI.hts_get_format(a)
48
+ a = HTS::LibHTS.hts_open("a.bam", "r")
49
+ b = HTS::LibHTS.hts_get_format(a)
34
50
  p b[:category]
35
51
  p b[:format]
36
52
  ```
37
53
 
54
+ Note: Managed struct is not used in ruby-htslib. You may need to free the memory by yourself.
55
+
56
+ ### High level API (Plan)
57
+
58
+ `Cram` `Bam` `Bcf` `Faidx` `Tabix`
59
+
60
+ A high-level API is under development. We will change and improve the API to make it better.
61
+
62
+ ```ruby
63
+ require 'htslib'
64
+
65
+ bam = HTS::Bam.new("a.bam")
66
+
67
+ bam.each do |r|
68
+ p name: r.qname,
69
+ flag: r.flag,
70
+ start: r.start + 1,
71
+ mpos: r.mate_pos + 1,
72
+ mqual: r.mapping_quality,
73
+ seq: r.sequence,
74
+ cigar: r.cigar.to_s,
75
+ qual: r.base_qualities.map { |i| (i + 33).chr }.join
76
+ end
77
+ ```
78
+
79
+ ## Documentation
80
+
81
+ * [RubyDoc.info - HTSlib](https://rdoc.info/gems/htslib)
82
+
38
83
  ## Development
39
84
 
85
+ To get started with development
86
+
87
+ ```sh
88
+ git clone --recursive https://github.com/kojix2/ruby-htslib
89
+ cd ruby-htslib
90
+ bundle install
91
+ bundle exec rake htslib:build
92
+ bundle exec rake test
93
+ ```
94
+
95
+ We plan to actively use the new features of Ruby. Since the number of users is small, backward compatibility is not important.
96
+ On the other hand, we will consider compatibility with [Crystal](https://github.com/bio-crystal/htslib.cr) to some extent.
97
+
98
+ #### FFI Extensions
99
+
100
+ * [ffi-bitfield](https://github.com/kojix2/ffi-bitfield) : Extension of Ruby-FFI to support bitfields.
101
+
102
+ #### Automatic generation or automatic validation (Future plan)
103
+
104
+ + [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files. It is planned to use c2ffi to automatically generate bindings or tests.
105
+
40
106
  ## Contributing
41
107
 
42
- Bug reports and pull requests are welcome on GitHub at https://github.com/kojix2/ruby-htslib.
108
+ Ruby-htslib is a library under development, so even small improvements like typofix are welcome! Please feel free to send us your pull requests.
109
+
110
+ * [Report bugs](https://github.com/kojix2/ruby-htslib/issues)
111
+ * Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-htslib/pulls)
112
+ * Write, clarify, or fix documentation
113
+ * Suggest or add new features
114
+ * [financial contributions](https://github.com/sponsors/kojix2)
115
+
116
+ ## Links
117
+
118
+ * [samtools/hts-spec](https://github.com/samtools/hts-specs)
119
+ * [bioruby](https://github.com/bioruby/bioruby)
43
120
 
121
+ ## Funding support
44
122
 
123
+ This work was supported partially by [Ruby Association Grant 2020](https://www.ruby.or.jp/en/news/20201022).
45
124
  ## License
46
125
 
47
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
126
+ [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Cigar
9
+ include Enumerable
10
+
11
+ def initialize(pointer, n_cigar)
12
+ @pointer = pointer
13
+ @n_cigar = n_cigar
14
+ end
15
+
16
+ def to_ptr
17
+ @pointer
18
+ end
19
+
20
+ def to_s
21
+ to_a.flatten.join
22
+ end
23
+
24
+ def each
25
+ @n_cigar.times do |i|
26
+ c = @pointer[i].read_uint32
27
+ yield [LibHTS.bam_cigar_oplen(c),
28
+ LibHTS.bam_cigar_opchr(c)]
29
+ end
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,93 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-nim
4
+ # https://github.com/brentp/hts-nim/blob/master/src/hts/bam/flag.nim
5
+
6
+ module HTS
7
+ class Bam
8
+ class Flag
9
+ def initialize(flag_value)
10
+ raise TypeError unless flag_value.is_a? Integer
11
+
12
+ @value = flag_value
13
+ end
14
+
15
+ attr_accessor :value
16
+
17
+ # BAM_FPAIRED = 1
18
+ # BAM_FPROPER_PAIR = 2
19
+ # BAM_FUNMAP = 4
20
+ # BAM_FMUNMAP = 8
21
+ # BAM_FREVERSE = 16
22
+ # BAM_FMREVERSE = 32
23
+ # BAM_FREAD1 = 64
24
+ # BAM_FREAD2 = 128
25
+ # BAM_FSECONDARY = 256
26
+ # BAM_FQCFAIL = 512
27
+ # BAM_FDUP = 1024
28
+ # BAM_FSUPPLEMENTARY = 2048
29
+
30
+ # TODO: Enabling bitwise operations
31
+ # hts-nim
32
+ # proc `and`*(f: Flag, o: uint16): uint16 {. borrow, inline .}
33
+ # proc `and`*(f: Flag, o: Flag): uint16 {. borrow, inline .}
34
+ # proc `or`*(f: Flag, o: uint16): uint16 {. borrow .}
35
+ # proc `or`*(o: uint16, f: Flag): uint16 {. borrow .}
36
+ # proc `==`*(f: Flag, o: Flag): bool {. borrow, inline .}
37
+ # proc `==`*(f: Flag, o: uint16): bool {. borrow, inline .}
38
+ # proc `==`*(o: uint16, f: Flag): bool {. borrow, inline .}
39
+
40
+ def paired?
41
+ has_flag? LibHTS::BAM_FPAIRED
42
+ end
43
+
44
+ def proper_pair?
45
+ has_flag? LibHTS::BAM_FPROPER_PAIR
46
+ end
47
+
48
+ def unmapped?
49
+ has_flag? LibHTS::BAM_FUNMAP
50
+ end
51
+
52
+ def mate_unmapped?
53
+ has_flag? LibHTS::BAM_FMUNMAP
54
+ end
55
+
56
+ def reverse?
57
+ has_flag? LibHTS::BAM_FREVERSE
58
+ end
59
+
60
+ def mate_reverse?
61
+ has_flag? LibHTS::BAM_FMREVERSE
62
+ end
63
+
64
+ def read1?
65
+ has_flag? LibHTS::BAM_FREAD1
66
+ end
67
+
68
+ def read2?
69
+ has_flag? LibHTS::BAM_FREAD2
70
+ end
71
+
72
+ def secondary?
73
+ has_flag? LibHTS::BAM_FSECONDARY
74
+ end
75
+
76
+ def qcfail?
77
+ has_flag? LibHTS::BAM_FQCFAIL
78
+ end
79
+
80
+ def dup?
81
+ has_flag? LibHTS::BAM_FDUP
82
+ end
83
+
84
+ def supplementary?
85
+ has_flag? LibHTS::BAM_FSUPPLEMENTARY
86
+ end
87
+
88
+ def has_flag?(o)
89
+ @value[o] != 0
90
+ end
91
+ end
92
+ end
93
+ end
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Header
9
+ def initialize(pointer)
10
+ @sam_hdr = pointer
11
+ end
12
+
13
+ def struct
14
+ @sam_hdr
15
+ end
16
+
17
+ def to_ptr
18
+ @sam_hdr.to_ptr
19
+ end
20
+
21
+ # FIXME: better name?
22
+ def seqs
23
+ Array.new(@sam_hdr[:n_targets]) do |i|
24
+ LibHTS.sam_hdr_tid2name(@sam_hdr, i)
25
+ end
26
+ end
27
+
28
+ def text
29
+ LibHTS.sam_hdr_str(@sam_hdr)
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,176 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Record
9
+ SEQ_NT16_STR = "=ACMGRSVTWYHKDBN"
10
+
11
+ def initialize(bam1_t, header)
12
+ @bam1 = bam1_t
13
+ @header = header
14
+ end
15
+
16
+ def struct
17
+ @bam1
18
+ end
19
+
20
+ def to_ptr
21
+ @bam1.to_ptr
22
+ end
23
+
24
+ # def initialize_copy
25
+ # super
26
+ # end
27
+
28
+ def self.rom_sam_str; end
29
+
30
+ def tags; end
31
+
32
+ # returns the query name.
33
+ def qname
34
+ LibHTS.bam_get_qname(@bam1).read_string
35
+ end
36
+
37
+ # Set (query) name.
38
+ # def qname=(name)
39
+ # raise 'Not Implemented'
40
+ # end
41
+
42
+ # returns the tid of the record or -1 if not mapped.
43
+ def tid
44
+ @bam1[:core][:tid]
45
+ end
46
+
47
+ # returns the tid of the mate or -1 if not mapped.
48
+ def mate_tid
49
+ @bam1[:core][:mtid]
50
+ end
51
+
52
+ # returns 0-based start position.
53
+ def start
54
+ @bam1[:core][:pos]
55
+ end
56
+
57
+ # returns end position of the read.
58
+ def stop
59
+ LibHTS.bam_endpos @bam1
60
+ end
61
+
62
+ # returns 0-based mate position
63
+ def mate_start
64
+ @bam1[:core][:mpos]
65
+ end
66
+ alias mate_pos mate_start
67
+
68
+ # returns the chromosome or '' if not mapped.
69
+ def chrom
70
+ tid = @bam1[:core][:tid]
71
+ return "" if tid == -1
72
+
73
+ LibHTS.sam_hdr_tid2name(@header, tid)
74
+ end
75
+
76
+ # returns the chromosome of the mate or '' if not mapped.
77
+ def mate_chrom
78
+ tid = @bam1[:core][:mtid]
79
+ return "" if tid == -1
80
+
81
+ LibHTS.sam_hdr_tid2name(@header, tid)
82
+ end
83
+
84
+ def strand
85
+ LibHTS.bam_is_rev(@bam1) ? "-" : "+"
86
+ end
87
+
88
+ # def start=(v)
89
+ # raise 'Not Implemented'
90
+ # end
91
+
92
+ # insert size
93
+ def isize
94
+ @bam1[:core][:isize]
95
+ end
96
+
97
+ # mapping quality
98
+ def mapping_quality
99
+ @bam1[:core][:qual]
100
+ end
101
+
102
+ # returns a `Cigar` object.
103
+ def cigar
104
+ Cigar.new(LibHTS.bam_get_cigar(@bam1), @bam1[:core][:n_cigar])
105
+ end
106
+
107
+ def qlen
108
+ LibHTS.bam_cigar2qlen(
109
+ @bam1[:core][:n_cigar],
110
+ LibHTS.bam_get_cigar(@bam1)
111
+ )
112
+ end
113
+
114
+ def rlen
115
+ LibHTS.bam_cigar2rlen(
116
+ @bam1[:core][:n_cigar],
117
+ LibHTS.bam_get_cigar(@bam1)
118
+ )
119
+ end
120
+
121
+ # return the read sequence
122
+ def sequence
123
+ r = LibHTS.bam_get_seq(@bam1)
124
+ seq = String.new
125
+ (@bam1[:core][:l_qseq]).times do |i|
126
+ seq << SEQ_NT16_STR[LibHTS.bam_seqi(r, i)]
127
+ end
128
+ seq
129
+ end
130
+
131
+ # return only the base of the requested index "i" of the query sequence.
132
+ def base_at(n)
133
+ n += @bam1[:core][:l_qseq] if n < 0
134
+ return "." if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_at(-1000)
135
+
136
+ r = LibHTS.bam_get_seq(@bam1)
137
+ SEQ_NT16_STR[LibHTS.bam_seqi(r, n)]
138
+ end
139
+
140
+ # return the base qualities
141
+ def base_qualities
142
+ q_ptr = LibHTS.bam_get_qual(@bam1)
143
+ q_ptr.read_array_of_uint8(@bam1[:core][:l_qseq])
144
+ end
145
+
146
+ # return only the base quality of the requested index "i" of the query sequence.
147
+ def base_quality_at(n)
148
+ n += @bam1[:core][:l_qseq] if n < 0
149
+ return 0 if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_quality_at(-1000)
150
+
151
+ q_ptr = LibHTS.bam_get_qual(@bam1)
152
+ q_ptr.get_uint8(n)
153
+ end
154
+
155
+ def flag_str
156
+ LibHTS.bam_flag2str(@bam1[:core][:flag])
157
+ end
158
+
159
+ # returns a `Flag` object.
160
+ def flag
161
+ Flag.new(@bam1[:core][:flag])
162
+ end
163
+
164
+ def to_s
165
+ kstr = LibHTS::KString.new
166
+ raise "Failed to format bam record" if LibHTS.sam_format1(@header.struct, @bam1, kstr) == -1
167
+
168
+ kstr[:s]
169
+ end
170
+
171
+ # TODO:
172
+ # def eql?
173
+ # def hash
174
+ end
175
+ end
176
+ end
data/lib/hts/bam.rb CHANGED
@@ -1,87 +1,114 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Create a skeleton using hts-python as a reference.
3
+ # Based on hts-python
4
4
  # https://github.com/quinlan-lab/hts-python
5
5
 
6
- class BamHeader
7
- def initialize; end
8
-
9
- def seqs; end
10
- end
11
-
12
- class Cigar
13
- def initialize; end
14
-
15
- def to_s; end
16
-
17
- def inspect; end
18
- end
19
-
20
- class Alignment
21
- def initialize; end
22
-
23
- def self.rom_sam_str; end
24
-
25
- def tags; end
26
-
27
- def qname; end
28
-
29
- def qname=; end
30
-
31
- def rnext; end
32
-
33
- def pnext; end
34
-
35
- def rname; end
36
-
37
- def strand; end
38
-
39
- def base_qualities; end
40
-
41
- def pos; end
42
-
43
- def pos=; end
44
-
45
- def isize; end
46
-
47
- def mapping_quality; end
48
-
49
- def cigar; end
50
-
51
- def qlen; end
52
-
53
- def rlen; end
54
-
55
- def seqs; end
56
-
57
- def flag_str; end
58
-
59
- def flag; end
60
-
61
- # def eql?
62
- # def hash
63
-
64
- def inspect; end
65
-
66
- def to_s; end
67
- end
68
-
69
- class Bam
70
- def initialize; end
71
-
72
- def self.header_from_fasta; end
73
-
74
- def inspect; end
75
-
76
- def write; end
77
-
78
- def close; end
79
-
80
- def flush; end
81
-
82
- def to_s; end
83
-
84
- def each; end
85
-
86
- # def call
6
+ require_relative "bam/header"
7
+ require_relative "bam/cigar"
8
+ require_relative "bam/flag"
9
+ require_relative "bam/record"
10
+ require_relative "utils/open_method"
11
+
12
+ module HTS
13
+ class Bam
14
+ include Enumerable
15
+ extend Utils::OpenMethod
16
+
17
+ attr_reader :file_path, :mode, :header
18
+ # HtfFile is FFI::BitStruct
19
+ attr_reader :htf_file
20
+
21
+ class << self
22
+ alias open new
23
+ end
24
+
25
+ def initialize(file_path, mode = "r", create_index: nil)
26
+ file_path = File.expand_path(file_path)
27
+
28
+ unless File.exist?(file_path)
29
+ message = "No such SAM/BAM file - #{file_path}"
30
+ raise message
31
+ end
32
+
33
+ @file_path = file_path
34
+ @mode = mode
35
+ @htf_file = LibHTS.hts_open(file_path, mode)
36
+ @header = Bam::Header.new(LibHTS.sam_hdr_read(htf_file))
37
+
38
+ # FIXME: should be defined here?
39
+ @bam1 = LibHTS.bam_init1
40
+
41
+ # read
42
+ if mode[0] == "r"
43
+ # load index
44
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
45
+ # create index
46
+ if create_index || (@idx.null? && create_index.nil?)
47
+ warn "Create index for #{file_path}"
48
+ LibHTS.sam_index_build(file_path, -1)
49
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
50
+ end
51
+ else
52
+ # FIXME: implement
53
+ raise "not implemented yet."
54
+ end
55
+
56
+ # IO like API
57
+ if block_given?
58
+ begin
59
+ yield self
60
+ ensure
61
+ close
62
+ end
63
+ end
64
+ end
65
+
66
+ def struct
67
+ htf_file
68
+ end
69
+
70
+ def to_ptr
71
+ htf_file.to_ptr
72
+ end
73
+
74
+ def write(alns)
75
+ alns.each do
76
+ LibHTS.sam_write1(htf_file, header, alns.b) > 0 || raise
77
+ end
78
+ end
79
+
80
+ # Close the current file.
81
+ def close
82
+ LibHTS.hts_close(htf_file)
83
+ end
84
+
85
+ # Flush the current file.
86
+ def flush
87
+ # LibHTS.bgzf_flush(@htf_file.fp.bgzf)
88
+ end
89
+
90
+ def each(&block)
91
+ # Each does not always start at the beginning of the file.
92
+ # This is the common behavior of IO objects in Ruby.
93
+ # This may change in the future.
94
+ while LibHTS.sam_read1(htf_file, header, @bam1) > 0
95
+ record = Record.new(@bam1, header)
96
+ block.call(record)
97
+ end
98
+ end
99
+
100
+ # query [WIP]
101
+ def query(region)
102
+ qiter = LibHTS.sam_itr_querys(@idx, header, region)
103
+ begin
104
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
105
+ while slen > 0
106
+ yield Record.new(@bam1, header)
107
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
108
+ end
109
+ ensure
110
+ LibHTS.hts_itr_destroy(qiter)
111
+ end
112
+ end
113
+ end
87
114
  end