htslib 0.0.0 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8b19cf8cd36bbb9ffeb34a9c92b98adba85716f57d24ec6990cc95c53c8f658d
4
- data.tar.gz: e678bdbe86be9c73c8fb3e378f86236a0a6eaf36ce047368af9263753bdca439
3
+ metadata.gz: c8048806df4c335ea698c8d3ad9e51e12644f0d32667d3db5fac59a004db75bf
4
+ data.tar.gz: 3d18183921f70b7b42dc5657195607c519c7d83c1e0e94b9b7a8f647d96f5504
5
5
  SHA512:
6
- metadata.gz: cf6a74ee14a7f0d3bbff603c1af232d7a106a0c7d70e4c8e4c7ac4270ab98866a971611a793a1b6d994fde2b5d0a400b0e73fe04a99c9b2abfa16d5e0d82a2c5
7
- data.tar.gz: d24f1fa238f3ad8d541f649a03cede9580bcdedad52e81a97f7d33a44a9c3d8fbfcf162b5912fcc9f7e5c832f09979d59cf7a4bc13fc6da7aa4facd0102fdfbb
6
+ metadata.gz: f50e96a23cb75130c5aa463329cce3c2d5784ed75c608d4685d2662ee22fa93e55208c356bce0fee319028c6ae8e33b5b20d270a17fb63ca6e39a736f668a2e4
7
+ data.tar.gz: c223f9b61979ace926fa53b408e85fa6f2e127cdde6ffa5bcfa1ac0ad7ce6c9c1ac5570522021c61e82d27498473299b8875f7d633b24367ae99dd2494758556
data/README.md CHANGED
@@ -1,12 +1,26 @@
1
- # HTSlib
1
+ # ruby-htslib
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/htslib.svg)](https://badge.fury.io/rb/htslib)
4
- ![CI](https://github.com/kojix2/ruby-htslib/workflows/CI/badge.svg?branch=master)
4
+ ![CI](https://github.com/kojix2/ruby-htslib/workflows/CI/badge.svg)
5
5
  [![The MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.txt)
6
+ [![DOI](https://zenodo.org/badge/247078205.svg)](https://zenodo.org/badge/latestdoi/247078205)
7
+ [![Docs Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://rubydoc.info/gems/htslib)
8
+
9
+ :dna: [HTSlib](https://github.com/samtools/htslib) - for Ruby
10
+
11
+ Ruby-htslib is the Ruby bindings to HTSlib, a C library for processing high throughput sequencing (HTS) data.
12
+ It will provide APIs to read and write file formats such as SAM, BAM, VCF, and BCF.
6
13
 
7
14
  :apple: Feel free to fork it out if you can develop it!
8
15
 
9
- :bowtie: Just a prototype.
16
+ :bowtie: alpha stage.
17
+ ## Requirements
18
+
19
+ * [Ruby](https://github.com/ruby/ruby) 2.7 or above.
20
+ * [HTSlib](https://github.com/samtools/htslib)
21
+ * Ubuntu : `apt install libhts-dev`
22
+ * macOS : `brew install htslib`
23
+ * Build from source code (see Development section)
10
24
 
11
25
  ## Installation
12
26
 
@@ -14,34 +28,99 @@
14
28
  gem install htslib
15
29
  ```
16
30
 
17
- Set environment variable HTSLIBDIR.
31
+ If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
32
+ will automatically detect the location of the shared library.
33
+ Alternatively, you can specify the directory of the shared library by setting the environment variable `HTSLIBDIR`.
18
34
 
19
35
  ```sh
20
- export HTSLIBDIR="/your/path/to/htslib"
36
+ export HTSLIBDIR="/your/path/to/htslib" # libhts.so
21
37
  ```
22
38
 
23
- ## Requirements
39
+ ## Overview
24
40
 
25
- * [htslib](https://github.com/samtools/htslib)
41
+ ### Low level API
26
42
 
27
- ## Usage
43
+ `HTS::LibHTS` provides native functions.
28
44
 
29
45
  ```ruby
30
46
  require 'htslib'
31
47
 
32
- a = HTS::FFI.hts_open("a.bam", "r")
33
- b = HTS::FFI.hts_get_format(a)
48
+ a = HTS::LibHTS.hts_open("a.bam", "r")
49
+ b = HTS::LibHTS.hts_get_format(a)
34
50
  p b[:category]
35
51
  p b[:format]
36
52
  ```
37
53
 
54
+ Note: Managed struct is not used in ruby-htslib. You may need to free the memory by yourself.
55
+
56
+ ### High level API (Plan)
57
+
58
+ `Cram` `Bam` `Bcf` `Faidx` `Tabix`
59
+
60
+ A high-level API is under development. We will change and improve the API to make it better.
61
+
62
+ ```ruby
63
+ require 'htslib'
64
+
65
+ bam = HTS::Bam.new("a.bam")
66
+
67
+ bam.each do |r|
68
+ p name: r.qname,
69
+ flag: r.flag,
70
+ start: r.start + 1,
71
+ mpos: r.mate_pos + 1,
72
+ mqual: r.mapping_quality,
73
+ seq: r.sequence,
74
+ cigar: r.cigar.to_s,
75
+ qual: r.base_qualities.map { |i| (i + 33).chr }.join
76
+ end
77
+ ```
78
+
79
+ ## Documentation
80
+
81
+ * [RubyDoc.info - HTSlib](https://rdoc.info/gems/htslib)
82
+
38
83
  ## Development
39
84
 
85
+ To get started with development
86
+
87
+ ```sh
88
+ git clone --recursive https://github.com/kojix2/ruby-htslib
89
+ cd ruby-htslib
90
+ bundle install
91
+ bundle exec rake htslib:build
92
+ bundle exec rake test
93
+ ```
94
+
95
+ We plan to actively use the new features of Ruby. Since the number of users is small, backward compatibility is not important.
96
+ On the other hand, we will consider compatibility with [Crystal](https://github.com/bio-crystal/htslib.cr) to some extent.
97
+
98
+ #### FFI Extensions
99
+
100
+ * [ffi-bitfield](https://github.com/kojix2/ffi-bitfield) : Extension of Ruby-FFI to support bitfields.
101
+
102
+ #### Automatic generation or automatic validation (Future plan)
103
+
104
+ + [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files. It is planned to use c2ffi to automatically generate bindings or tests.
105
+
40
106
  ## Contributing
41
107
 
42
- Bug reports and pull requests are welcome on GitHub at https://github.com/kojix2/ruby-htslib.
108
+ Ruby-htslib is a library under development, so even small improvements like typofix are welcome! Please feel free to send us your pull requests.
109
+
110
+ * [Report bugs](https://github.com/kojix2/ruby-htslib/issues)
111
+ * Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-htslib/pulls)
112
+ * Write, clarify, or fix documentation
113
+ * Suggest or add new features
114
+ * [financial contributions](https://github.com/sponsors/kojix2)
115
+
116
+ ## Links
117
+
118
+ * [samtools/hts-spec](https://github.com/samtools/hts-specs)
119
+ * [bioruby](https://github.com/bioruby/bioruby)
43
120
 
121
+ ## Funding support
44
122
 
123
+ This work was supported partially by [Ruby Association Grant 2020](https://www.ruby.or.jp/en/news/20201022).
45
124
  ## License
46
125
 
47
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
126
+ [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Cigar
9
+ include Enumerable
10
+
11
+ def initialize(pointer, n_cigar)
12
+ @pointer = pointer
13
+ @n_cigar = n_cigar
14
+ end
15
+
16
+ def to_ptr
17
+ @pointer
18
+ end
19
+
20
+ def to_s
21
+ to_a.flatten.join
22
+ end
23
+
24
+ def each
25
+ @n_cigar.times do |i|
26
+ c = @pointer[i].read_uint32
27
+ yield [LibHTS.bam_cigar_oplen(c),
28
+ LibHTS.bam_cigar_opchr(c)]
29
+ end
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,93 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-nim
4
+ # https://github.com/brentp/hts-nim/blob/master/src/hts/bam/flag.nim
5
+
6
+ module HTS
7
+ class Bam
8
+ class Flag
9
+ def initialize(flag_value)
10
+ raise TypeError unless flag_value.is_a? Integer
11
+
12
+ @value = flag_value
13
+ end
14
+
15
+ attr_accessor :value
16
+
17
+ # BAM_FPAIRED = 1
18
+ # BAM_FPROPER_PAIR = 2
19
+ # BAM_FUNMAP = 4
20
+ # BAM_FMUNMAP = 8
21
+ # BAM_FREVERSE = 16
22
+ # BAM_FMREVERSE = 32
23
+ # BAM_FREAD1 = 64
24
+ # BAM_FREAD2 = 128
25
+ # BAM_FSECONDARY = 256
26
+ # BAM_FQCFAIL = 512
27
+ # BAM_FDUP = 1024
28
+ # BAM_FSUPPLEMENTARY = 2048
29
+
30
+ # TODO: Enabling bitwise operations
31
+ # hts-nim
32
+ # proc `and`*(f: Flag, o: uint16): uint16 {. borrow, inline .}
33
+ # proc `and`*(f: Flag, o: Flag): uint16 {. borrow, inline .}
34
+ # proc `or`*(f: Flag, o: uint16): uint16 {. borrow .}
35
+ # proc `or`*(o: uint16, f: Flag): uint16 {. borrow .}
36
+ # proc `==`*(f: Flag, o: Flag): bool {. borrow, inline .}
37
+ # proc `==`*(f: Flag, o: uint16): bool {. borrow, inline .}
38
+ # proc `==`*(o: uint16, f: Flag): bool {. borrow, inline .}
39
+
40
+ def paired?
41
+ has_flag? LibHTS::BAM_FPAIRED
42
+ end
43
+
44
+ def proper_pair?
45
+ has_flag? LibHTS::BAM_FPROPER_PAIR
46
+ end
47
+
48
+ def unmapped?
49
+ has_flag? LibHTS::BAM_FUNMAP
50
+ end
51
+
52
+ def mate_unmapped?
53
+ has_flag? LibHTS::BAM_FMUNMAP
54
+ end
55
+
56
+ def reverse?
57
+ has_flag? LibHTS::BAM_FREVERSE
58
+ end
59
+
60
+ def mate_reverse?
61
+ has_flag? LibHTS::BAM_FMREVERSE
62
+ end
63
+
64
+ def read1?
65
+ has_flag? LibHTS::BAM_FREAD1
66
+ end
67
+
68
+ def read2?
69
+ has_flag? LibHTS::BAM_FREAD2
70
+ end
71
+
72
+ def secondary?
73
+ has_flag? LibHTS::BAM_FSECONDARY
74
+ end
75
+
76
+ def qcfail?
77
+ has_flag? LibHTS::BAM_FQCFAIL
78
+ end
79
+
80
+ def dup?
81
+ has_flag? LibHTS::BAM_FDUP
82
+ end
83
+
84
+ def supplementary?
85
+ has_flag? LibHTS::BAM_FSUPPLEMENTARY
86
+ end
87
+
88
+ def has_flag?(o)
89
+ @value[o] != 0
90
+ end
91
+ end
92
+ end
93
+ end
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Header
9
+ def initialize(pointer)
10
+ @sam_hdr = pointer
11
+ end
12
+
13
+ def struct
14
+ @sam_hdr
15
+ end
16
+
17
+ def to_ptr
18
+ @sam_hdr.to_ptr
19
+ end
20
+
21
+ # FIXME: better name?
22
+ def seqs
23
+ Array.new(@sam_hdr[:n_targets]) do |i|
24
+ LibHTS.sam_hdr_tid2name(@sam_hdr, i)
25
+ end
26
+ end
27
+
28
+ def text
29
+ LibHTS.sam_hdr_str(@sam_hdr)
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,176 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Based on hts-python
4
+ # https://github.com/quinlan-lab/hts-python
5
+
6
+ module HTS
7
+ class Bam
8
+ class Record
9
+ SEQ_NT16_STR = "=ACMGRSVTWYHKDBN"
10
+
11
+ def initialize(bam1_t, header)
12
+ @bam1 = bam1_t
13
+ @header = header
14
+ end
15
+
16
+ def struct
17
+ @bam1
18
+ end
19
+
20
+ def to_ptr
21
+ @bam1.to_ptr
22
+ end
23
+
24
+ # def initialize_copy
25
+ # super
26
+ # end
27
+
28
+ def self.rom_sam_str; end
29
+
30
+ def tags; end
31
+
32
+ # returns the query name.
33
+ def qname
34
+ LibHTS.bam_get_qname(@bam1).read_string
35
+ end
36
+
37
+ # Set (query) name.
38
+ # def qname=(name)
39
+ # raise 'Not Implemented'
40
+ # end
41
+
42
+ # returns the tid of the record or -1 if not mapped.
43
+ def tid
44
+ @bam1[:core][:tid]
45
+ end
46
+
47
+ # returns the tid of the mate or -1 if not mapped.
48
+ def mate_tid
49
+ @bam1[:core][:mtid]
50
+ end
51
+
52
+ # returns 0-based start position.
53
+ def start
54
+ @bam1[:core][:pos]
55
+ end
56
+
57
+ # returns end position of the read.
58
+ def stop
59
+ LibHTS.bam_endpos @bam1
60
+ end
61
+
62
+ # returns 0-based mate position
63
+ def mate_start
64
+ @bam1[:core][:mpos]
65
+ end
66
+ alias mate_pos mate_start
67
+
68
+ # returns the chromosome or '' if not mapped.
69
+ def chrom
70
+ tid = @bam1[:core][:tid]
71
+ return "" if tid == -1
72
+
73
+ LibHTS.sam_hdr_tid2name(@header, tid)
74
+ end
75
+
76
+ # returns the chromosome of the mate or '' if not mapped.
77
+ def mate_chrom
78
+ tid = @bam1[:core][:mtid]
79
+ return "" if tid == -1
80
+
81
+ LibHTS.sam_hdr_tid2name(@header, tid)
82
+ end
83
+
84
+ def strand
85
+ LibHTS.bam_is_rev(@bam1) ? "-" : "+"
86
+ end
87
+
88
+ # def start=(v)
89
+ # raise 'Not Implemented'
90
+ # end
91
+
92
+ # insert size
93
+ def isize
94
+ @bam1[:core][:isize]
95
+ end
96
+
97
+ # mapping quality
98
+ def mapping_quality
99
+ @bam1[:core][:qual]
100
+ end
101
+
102
+ # returns a `Cigar` object.
103
+ def cigar
104
+ Cigar.new(LibHTS.bam_get_cigar(@bam1), @bam1[:core][:n_cigar])
105
+ end
106
+
107
+ def qlen
108
+ LibHTS.bam_cigar2qlen(
109
+ @bam1[:core][:n_cigar],
110
+ LibHTS.bam_get_cigar(@bam1)
111
+ )
112
+ end
113
+
114
+ def rlen
115
+ LibHTS.bam_cigar2rlen(
116
+ @bam1[:core][:n_cigar],
117
+ LibHTS.bam_get_cigar(@bam1)
118
+ )
119
+ end
120
+
121
+ # return the read sequence
122
+ def sequence
123
+ r = LibHTS.bam_get_seq(@bam1)
124
+ seq = String.new
125
+ (@bam1[:core][:l_qseq]).times do |i|
126
+ seq << SEQ_NT16_STR[LibHTS.bam_seqi(r, i)]
127
+ end
128
+ seq
129
+ end
130
+
131
+ # return only the base of the requested index "i" of the query sequence.
132
+ def base_at(n)
133
+ n += @bam1[:core][:l_qseq] if n < 0
134
+ return "." if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_at(-1000)
135
+
136
+ r = LibHTS.bam_get_seq(@bam1)
137
+ SEQ_NT16_STR[LibHTS.bam_seqi(r, n)]
138
+ end
139
+
140
+ # return the base qualities
141
+ def base_qualities
142
+ q_ptr = LibHTS.bam_get_qual(@bam1)
143
+ q_ptr.read_array_of_uint8(@bam1[:core][:l_qseq])
144
+ end
145
+
146
+ # return only the base quality of the requested index "i" of the query sequence.
147
+ def base_quality_at(n)
148
+ n += @bam1[:core][:l_qseq] if n < 0
149
+ return 0 if (n >= @bam1[:core][:l_qseq]) || (n < 0) # eg. base_quality_at(-1000)
150
+
151
+ q_ptr = LibHTS.bam_get_qual(@bam1)
152
+ q_ptr.get_uint8(n)
153
+ end
154
+
155
+ def flag_str
156
+ LibHTS.bam_flag2str(@bam1[:core][:flag])
157
+ end
158
+
159
+ # returns a `Flag` object.
160
+ def flag
161
+ Flag.new(@bam1[:core][:flag])
162
+ end
163
+
164
+ def to_s
165
+ kstr = LibHTS::KString.new
166
+ raise "Failed to format bam record" if LibHTS.sam_format1(@header.struct, @bam1, kstr) == -1
167
+
168
+ kstr[:s]
169
+ end
170
+
171
+ # TODO:
172
+ # def eql?
173
+ # def hash
174
+ end
175
+ end
176
+ end
data/lib/hts/bam.rb CHANGED
@@ -1,87 +1,114 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Create a skeleton using hts-python as a reference.
3
+ # Based on hts-python
4
4
  # https://github.com/quinlan-lab/hts-python
5
5
 
6
- class BamHeader
7
- def initialize; end
8
-
9
- def seqs; end
10
- end
11
-
12
- class Cigar
13
- def initialize; end
14
-
15
- def to_s; end
16
-
17
- def inspect; end
18
- end
19
-
20
- class Alignment
21
- def initialize; end
22
-
23
- def self.rom_sam_str; end
24
-
25
- def tags; end
26
-
27
- def qname; end
28
-
29
- def qname=; end
30
-
31
- def rnext; end
32
-
33
- def pnext; end
34
-
35
- def rname; end
36
-
37
- def strand; end
38
-
39
- def base_qualities; end
40
-
41
- def pos; end
42
-
43
- def pos=; end
44
-
45
- def isize; end
46
-
47
- def mapping_quality; end
48
-
49
- def cigar; end
50
-
51
- def qlen; end
52
-
53
- def rlen; end
54
-
55
- def seqs; end
56
-
57
- def flag_str; end
58
-
59
- def flag; end
60
-
61
- # def eql?
62
- # def hash
63
-
64
- def inspect; end
65
-
66
- def to_s; end
67
- end
68
-
69
- class Bam
70
- def initialize; end
71
-
72
- def self.header_from_fasta; end
73
-
74
- def inspect; end
75
-
76
- def write; end
77
-
78
- def close; end
79
-
80
- def flush; end
81
-
82
- def to_s; end
83
-
84
- def each; end
85
-
86
- # def call
6
+ require_relative "bam/header"
7
+ require_relative "bam/cigar"
8
+ require_relative "bam/flag"
9
+ require_relative "bam/record"
10
+ require_relative "utils/open_method"
11
+
12
+ module HTS
13
+ class Bam
14
+ include Enumerable
15
+ extend Utils::OpenMethod
16
+
17
+ attr_reader :file_path, :mode, :header
18
+ # HtfFile is FFI::BitStruct
19
+ attr_reader :htf_file
20
+
21
+ class << self
22
+ alias open new
23
+ end
24
+
25
+ def initialize(file_path, mode = "r", create_index: nil)
26
+ file_path = File.expand_path(file_path)
27
+
28
+ unless File.exist?(file_path)
29
+ message = "No such SAM/BAM file - #{file_path}"
30
+ raise message
31
+ end
32
+
33
+ @file_path = file_path
34
+ @mode = mode
35
+ @htf_file = LibHTS.hts_open(file_path, mode)
36
+ @header = Bam::Header.new(LibHTS.sam_hdr_read(htf_file))
37
+
38
+ # FIXME: should be defined here?
39
+ @bam1 = LibHTS.bam_init1
40
+
41
+ # read
42
+ if mode[0] == "r"
43
+ # load index
44
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
45
+ # create index
46
+ if create_index || (@idx.null? && create_index.nil?)
47
+ warn "Create index for #{file_path}"
48
+ LibHTS.sam_index_build(file_path, -1)
49
+ @idx = LibHTS.sam_index_load(htf_file, file_path)
50
+ end
51
+ else
52
+ # FIXME: implement
53
+ raise "not implemented yet."
54
+ end
55
+
56
+ # IO like API
57
+ if block_given?
58
+ begin
59
+ yield self
60
+ ensure
61
+ close
62
+ end
63
+ end
64
+ end
65
+
66
+ def struct
67
+ htf_file
68
+ end
69
+
70
+ def to_ptr
71
+ htf_file.to_ptr
72
+ end
73
+
74
+ def write(alns)
75
+ alns.each do
76
+ LibHTS.sam_write1(htf_file, header, alns.b) > 0 || raise
77
+ end
78
+ end
79
+
80
+ # Close the current file.
81
+ def close
82
+ LibHTS.hts_close(htf_file)
83
+ end
84
+
85
+ # Flush the current file.
86
+ def flush
87
+ # LibHTS.bgzf_flush(@htf_file.fp.bgzf)
88
+ end
89
+
90
+ def each(&block)
91
+ # Each does not always start at the beginning of the file.
92
+ # This is the common behavior of IO objects in Ruby.
93
+ # This may change in the future.
94
+ while LibHTS.sam_read1(htf_file, header, @bam1) > 0
95
+ record = Record.new(@bam1, header)
96
+ block.call(record)
97
+ end
98
+ end
99
+
100
+ # query [WIP]
101
+ def query(region)
102
+ qiter = LibHTS.sam_itr_querys(@idx, header, region)
103
+ begin
104
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
105
+ while slen > 0
106
+ yield Record.new(@bam1, header)
107
+ slen = LibHTS.sam_itr_next(htf_file, qiter, @bam1)
108
+ end
109
+ ensure
110
+ LibHTS.hts_itr_destroy(qiter)
111
+ end
112
+ end
113
+ end
87
114
  end