htslib 0.2.5 → 0.2.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2a49c4758453c5c39915d9d17ccc33ce104699fca39c4083d4b2801767b655f1
4
- data.tar.gz: 46ee49f6b452471e26afedfce70489cd24eebf86b1a18240bab125d465921fb8
3
+ metadata.gz: caaf2dd527c9570e2e4c72b22e5252a1ef69adbf983fbfe3e815c3a9bc1b91c0
4
+ data.tar.gz: b7e0b6ecf736142ea9b43e4ed7f80513b1c0d002ea40b2565e218be3a40d2fac
5
5
  SHA512:
6
- metadata.gz: 918c0919750eb3d436cf19b1011ffed0a8d0760b29e0a3c2e675a9efa563a3f5c3b9a9ae38232e67192df4d066c446bbc0443f00d07cab9ac0c3401b38bd3fb7
7
- data.tar.gz: 343fbaec89a2e60a54feadd29856b26b772baa65ef52d181cd5845ed53a8e549504ddcdcc739dff716f9069a5b0b3e6f22f5136c2a612cc6014eb79ee225ab2b
6
+ metadata.gz: 6fe725489c93915fc5afa5d58dd2a5a0030b82546f473b6cae4245898b01f60919494d64b31f3442c3ea316a25fb31c0b46cfdd447230c253c21818ca3da9fb8
7
+ data.tar.gz: f5bed585f7bded73022ea0ad9a1dc2eb6a10c13e63772cac267a3c23497b521d82d5ab82763e512b452d47f59be8c47dcf5ff2046c96dcc492604931799cf2b2
data/README.md CHANGED
@@ -8,16 +8,16 @@
8
8
 
9
9
  Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https://github.com/samtools/htslib), a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as [SAM, BAM, VCF, and BCF](http://samtools.github.io/hts-specs/), in the Ruby language.
10
10
 
11
- :apple: Feel free to fork it!
11
+ :apple: Feel free to fork it!
12
12
 
13
13
  ## Requirements
14
14
 
15
- * [Ruby](https://github.com/ruby/ruby) 3.1 or above.
16
- * [HTSlib](https://github.com/samtools/htslib)
17
- * Ubuntu : `apt install libhts-dev`
18
- * macOS : `brew install htslib`
19
- * Windows : [mingw-w64-htslib](https://packages.msys2.org/base/mingw-w64-htslib) is automatically fetched when installing the gem ([RubyInstaller](https://rubyinstaller.org) only).
20
- * Build from source code (see Development section)
15
+ - [Ruby](https://github.com/ruby/ruby) 3.1 or above.
16
+ - [HTSlib](https://github.com/samtools/htslib)
17
+ - Ubuntu : `apt install libhts-dev`
18
+ - macOS : `brew install htslib`
19
+ - Windows : [mingw-w64-htslib](https://packages.msys2.org/base/mingw-w64-htslib) is automatically fetched when installing the gem ([RubyInstaller](https://rubyinstaller.org) only).
20
+ - Build from source code (see the Development section)
21
21
 
22
22
  ## Installation
23
23
 
@@ -25,21 +25,21 @@ Ruby-htslib is the [Ruby](https://www.ruby-lang.org) bindings to [HTSlib](https:
25
25
  gem install htslib
26
26
  ```
27
27
 
28
- If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
28
+ If you have installed htslib with apt on Ubuntu or homebrew on Mac, [pkg-config](https://github.com/ruby-gnome/pkg-config)
29
29
  will automatically detect the location of the shared library. If pkg-config does not work well, set `PKG_CONFIG_PATH`.
30
30
  Alternatively, you can specify the directory of the shared library by setting the environment variable `HTSLIBDIR`.
31
31
 
32
32
  ```sh
33
- export HTSLIBDIR="/your/path/to/htslib" # libhts.so
33
+ export HTSLIBDIR="/your/path/to/htslib" # Directory where libhts.so is located
34
34
  ```
35
35
 
36
- ruby-htslib also works on Windows; if you use RubyInstaller, htslib will be prepared automatically.
36
+ ruby-htslib also works on Windows. If you use RubyInstaller, htslib will be prepared automatically.
37
37
 
38
- ## Overview
38
+ ## Usage
39
39
 
40
- ### High-level API
40
+ ### HTS::Bam - SAM / BAM / CRAM - Sequence Alignment Map file
41
41
 
42
- HTS::Bam - SAM / BAM / CRAM - Sequence Alignment Map file
42
+ Reading fields
43
43
 
44
44
  ```ruby
45
45
  require 'htslib'
@@ -64,10 +64,24 @@ end
64
64
  bam.close
65
65
  ```
66
66
 
67
- HTS::Bcf - VCF / BCF - Variant Call Format file
67
+ With a block
68
68
 
69
69
  ```ruby
70
- bcf = HTS::Bcf.open("b.bcf")
70
+ HTS::Bam.open("test/fixtures/moo.bam") do |bam|
71
+ bam.each do |r|
72
+ puts r.to_s
73
+ end
74
+ end
75
+ ```
76
+
77
+ ### HTS::Bcf - VCF / BCF - Variant Call Format file
78
+
79
+ Reading fields
80
+
81
+ ```ruby
82
+ require 'htslib'
83
+
84
+ bcf = HTS::Bcf.open("test/fixtures/test.bcf")
71
85
 
72
86
  bcf.each do |r|
73
87
  p chrom: r.chrom,
@@ -84,31 +98,38 @@ end
84
98
  bcf.close
85
99
  ```
86
100
 
87
- <details>
88
- <summary><b>Faidx</b></summary>
101
+ With a block
89
102
 
90
103
  ```ruby
91
- fa = HTS::Faidx.open("c.fa")
104
+ HTS::Bcf.open("test/fixtures/test.bcf") do |bcf|
105
+ bcf.each do |r|
106
+ puts r.to_s
107
+ end
108
+ end
109
+ ```
92
110
 
93
- fa.fetch("chr1:1-10")
111
+ ### HTS::Faidx - FASTA / FASTQ - Nucleic acid sequence
94
112
 
113
+ ```ruby
114
+ fa = HTS::Faidx.open("test/fixtures/moo.fa")
115
+ fa.seq("chr1:1-10") # => CGCAACCCGA # 1-based
95
116
  fa.close
96
117
  ```
97
118
 
98
- </details>
99
-
100
- <details>
101
- <summary><b>Tbx</b></summary>
119
+ ### HTS::Tabix - GFF / BED - TAB-delimited genome position file
102
120
 
103
121
  ```ruby
104
-
122
+ tb = HTS::Tabix.open("test/fixtures/test.vcf.gz")
123
+ tb.query("poo", 2000, 3000) do |line|
124
+ puts line.join("\t")
125
+ end
126
+ tb.close
105
127
  ```
106
128
 
107
- </details>
108
-
109
129
  ### Low-level API
110
130
 
111
- `HTS::LibHTS` provides native C functions.
131
+ Middle architectural layer between high-level Ruby code and low-level C code.
132
+ `HTS::LibHTS` provides native C functions using [Ruby-FFI](https://github.com/ffi/ffi).
112
133
 
113
134
  ```ruby
114
135
  require 'htslib'
@@ -119,25 +140,31 @@ p b[:category]
119
140
  p b[:format]
120
141
  ```
121
142
 
122
- Macro functions
143
+ The low-level API makes it possible to perform detailed operations, such as calling CRAM-specific functions.
123
144
 
124
- htslib has a lot of macro functions for speed. Ruby-FFI cannot call C macro functions. However, essential functions are reimplemented in Ruby, and you can call them.
145
+ #### Macro functions
125
146
 
126
- Structs
147
+ HTSlib is designed to improve performance with many macro functions. However, it is not possible to call C macro functions directly from Ruby-FFI. To overcome this, important macro functions have been re-implemented in Ruby, allowing them to be called in the same way as native functions.
127
148
 
128
- Only a small number of C structs are implemented with FFI's ManagedStruct, which frees memory when Ruby's garbage collection fires. Other structs will need to be freed manually.
149
+ #### Garbage Collection and Memory Freeing
150
+
151
+ A small number of commonly used structs, such as `Bam1` and `Bcf1`, are implemented using FFI's `ManagedStruct`. This allows for automatic memory release when Ruby's garbage collection is triggered. On the other hand, other structs are implemented using `FFI::Struct`, and they will require manual memory release.
129
152
 
130
153
  ### Need more speed?
131
154
 
132
- Try Crystal. [HTS.cr](https://github.com/bio-cr/hts.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib. Crystal language is not as flexible as Ruby language. You can not use `eval` methods and must always be careful with the types. Writing one-time scripts in Crystal may be less fun. However, if you have a clear idea of what you want to do in your mind, have already written code in Ruby, and need to run them over and over, try creating a command line tool in Crystal. The Crystal language is as fast as the Rust and C languages. It will give you incredible power to make tools.
155
+ Try Crystal. [HTS.cr](https://github.com/bio-cr/hts.cr) is implemented in Crystal language and provides an API compatible with ruby-htslib.
133
156
 
134
157
  ## Documentation
135
158
 
136
- * [API Documentation (develop branch)](https://kojix2.github.io/ruby-htslib/)
137
- * [RubyDoc.info - HTSlib](https://rdoc.info/gems/htslib)
159
+ - [TUTORIAL.md](TUTORIAL.md)
160
+ - [API Documentation (develop branch)](https://kojix2.github.io/ruby-htslib/)
161
+ - [API Documentation (released gem)](https://rubydoc.info/gems/htslib)
138
162
 
139
163
  ## Development
140
164
 
165
+ #### Compile from source code
166
+
167
+ [GNU Autotools](https://en.wikipedia.org/wiki/GNU_Autotools) is required to compile htslib.
141
168
  To get started with development:
142
169
 
143
170
  ```sh
@@ -148,41 +175,60 @@ bundle exec rake htslib:build
148
175
  bundle exec rake test
149
176
  ```
150
177
 
151
- [GNU Autotools](https://en.wikipedia.org/wiki/GNU_Autotools) is required to compile htslib.
178
+ #### Macro functions are reimplemented
152
179
 
153
180
  HTSlib has many macro functions. These macro functions cannot be called from FFI and must be reimplemented in Ruby.
154
181
 
155
- * Use the new version of Ruby to take full advantage of Ruby's potential.
156
- * This is possible because we have a small number of users.
157
- * Remain compatible with [HTS.cr](https://github.com/bio-cr/hts.cr).
158
- * The most challenging part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal allows you to do that. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
159
- * Ruby and Crystal are languages that use garbage collection. However, the memory release policy for allocated C structures is slightly different: in Ruby-FFI, you can define a `self.release` method in `FFI::Struct`. This method is called when GC. So you don't have to worry about memory in high-level APIs like Bam::Record or Bcf::Record, etc. Crystal requires you to define a finalize method on each class. So you need to define it in Bam::Record or Bcf::Record.
182
+ #### Use the latest Ruby
183
+
184
+ Use Ruby 3 or newer to take advantage of new features. This is possible because we have a small number of users.
185
+
186
+ #### Keep compatibility with Crystal language
187
+
188
+ Compatibility with Crystal language is important for Ruby-htslib development.
160
189
 
161
- Method naming generally follows the Rust-htslib API.
190
+ - [HTS.cr](https://github.com/bio-cr/hts.cr) - HTSlib bindings for Crystal
162
191
 
163
- #### FFI Extensions
192
+ Return value
164
193
 
165
- * [ffi-bitfield](https://github.com/kojix2/ffi-bitfield) : Extension of Ruby-FFI to support bitfields.
194
+ The most challenging part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal allows you to do that. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.
195
+
196
+ Memory management
197
+
198
+ Ruby and Crystal are languages that use garbage collection. However, the memory release policy for allocated C structures is slightly different: in Ruby-FFI, you can define a `self.release` method in `FFI::Struct`. This method is called when GC. So you don't have to worry about memory in high-level APIs like Bam::Record or Bcf::Record, etc. Crystal requires you to define a finalize method on each class. So you need to define it in Bam::Record or Bcf::Record.
199
+
200
+ Macro functions
201
+
202
+ In ruby-htslib, C macro functions are added to `LibHTS`, but in Crystal, `LibHTS` is a Lib, so methods cannot be added. methods are added to `LibHTS2`.
203
+
204
+ #### Naming convention
205
+
206
+ If you are not sure about the naming of a method, follow the Rust-htslib API. This is a very weak rule. if a more appropriate name is found later in Ruby, it will replace it.
207
+
208
+ #### Support for bitfields of structures
209
+
210
+ Since Ruby-FFI does not support structure bit fields, the following extensions are used.
211
+
212
+ - [ffi-bitfield](https://github.com/kojix2/ffi-bitfield) - Extension of Ruby-FFI to support bitfields.
166
213
 
167
214
  #### Automatic validation
168
215
 
169
216
  In the `script` directory, there are several tools to help implement ruby-htslib. Scripts using c2ffi can check the coverage of htslib functions in Ruby-htslib. They are useful when new versions of htslib are released.
170
217
 
171
- * [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files.
218
+ - [c2ffi](https://github.com/rpav/c2ffi) is a tool to create JSON format metadata from C header files.
172
219
 
173
220
  ## Contributing
174
221
 
175
222
  Ruby-htslib is a library under development, so even minor improvements like typo fixes are welcome! Please feel free to send us your pull requests.
176
223
 
177
- * [Report bugs](https://github.com/kojix2/ruby-htslib/issues)
178
- * Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-htslib/pulls)
179
- * Write, clarify, or fix documentation
180
- * Suggest or add new features
181
- * [financial contributions](https://github.com/sponsors/kojix2)
224
+ - [Report bugs](https://github.com/kojix2/ruby-htslib/issues)
225
+ - Fix bugs and [submit pull requests](https://github.com/kojix2/ruby-htslib/pulls)
226
+ - Write, clarify, or fix documentation
227
+ - Suggest or add new features
228
+ - [financial contributions](https://github.com/sponsors/kojix2)
182
229
 
183
-
184
- ```
185
- # Ownership and Commitment Rights
230
+ ```markdown
231
+ # Ownership and Commit Rights
186
232
 
187
233
  Do you need commit rights to the ruby-htslib repository?
188
234
  Do you want to get admin rights and take over the project?
@@ -195,8 +241,8 @@ One of the greatest joys of using a minor language like Ruby in bioinformatics i
195
241
 
196
242
  ## Links
197
243
 
198
- * [samtools/hts-spec](https://github.com/samtools/hts-specs)
199
- * [bioruby](https://github.com/bioruby/bioruby)
244
+ - [samtools/hts-spec](https://github.com/samtools/hts-specs)
245
+ - [bioruby](https://github.com/bioruby/bioruby)
200
246
 
201
247
  ## Funding support
202
248
 
data/TUTORIAL.md ADDED
@@ -0,0 +1,316 @@
1
+ # Tutorial
2
+
3
+ ```mermaid
4
+ %%{init:{'theme':'base'}}%%
5
+ classDiagram
6
+ Bam~Hts~ o-- `Bam::Header`
7
+ Bam o-- `Bam::Record`
8
+ `Bam::Record` o-- `Bam::Header`
9
+ Bcf~Hts~ o-- `Bcf::Header`
10
+ Bcf o-- `Bcf::Record`
11
+ `Bcf::Record` o--`Bcf::Header`
12
+ `Bam::Header` o-- `Bam::HeaderRecord`
13
+ `Bcf::Header` o-- `Bcf::HeaderRecord`
14
+ `Bam::Record` o-- Flag
15
+ `Bam::Record` o-- Cigar
16
+ `Bam::Record` o-- Aux
17
+ `Bcf::Record` o-- Info
18
+ `Bcf::Record` o-- Format
19
+ class Bam{
20
+ +@hts_file : FFI::Struct
21
+ +@header : Bam::Header
22
+ +@file_name
23
+ +@index_name
24
+ +@mode
25
+ +struct()
26
+ +build_index()
27
+ +each() Enumerable
28
+ +query()
29
+ }
30
+ class Bcf{
31
+ +@hts_file : FFI::Struct
32
+ +@header : Bcf::Header
33
+ +@file_name
34
+ +@index_name
35
+ +@mode
36
+ +struct()
37
+ +build_index()
38
+ +each() Enumerable
39
+ +query()
40
+ }
41
+ class Tabix~Hts~{
42
+ +@hts_file : FFI::Struct
43
+ }
44
+ class `Bam::Header`{
45
+ +@sam_hdr : FFI::Struct
46
+ +struct()
47
+ +target_count()
48
+ +target_names()
49
+ +name2tid()
50
+ +tid2name()
51
+ +to_s()
52
+ }
53
+ class `Bam::Record` {
54
+ +@bam1 : FFI::Struct
55
+ +@header : Bam::Header
56
+ +struct()
57
+ +qname()
58
+ +qname=()
59
+ +tid()
60
+ +tid=()
61
+ +mtid()
62
+ +mtid=()
63
+ +pos()
64
+ +pos=()
65
+ +mpos() +mate_pos()
66
+ +mpos=() +mate_pos=()
67
+ +bin()
68
+ +bin=()
69
+ +endpos()
70
+ +chorm() +contig()
71
+ +mate_chrom() +mate_contig()
72
+ +strand()
73
+ +mate_strand()
74
+ +isize() +insert_size()
75
+ +isize=() +insert_size=()
76
+ +mapq()
77
+ +mapq=()
78
+ +cigar()
79
+ +qlen()
80
+ +rlen()
81
+ +seq() +sequence()
82
+ +len()
83
+ +base(n)
84
+ +qual()
85
+ +qual_string()
86
+ +base_qual(n)
87
+ +flag()
88
+ +flag=()
89
+ +aux()
90
+ +to_s()
91
+ }
92
+ class `Aux` {
93
+ -@record : Bam::Record
94
+ +[]()
95
+ +get_int()
96
+ +get_float()
97
+ +get_string()
98
+ }
99
+ class `Bcf::Header`{
100
+ +@bcf_hdr : FFI::Struct
101
+ +struct()
102
+ +to_s()
103
+ }
104
+ class `Bcf::Record`{
105
+ +@bcf1 : FFI::Struct
106
+ +@header : Bcf::Header
107
+ +struct()
108
+ +rid()
109
+ +rid=()
110
+ +chrom()
111
+ +pos()
112
+ +pos=()
113
+ +id()
114
+ +id=()
115
+ +clear_id()
116
+ +ref()
117
+ +alt()
118
+ +alleles()
119
+ +qual()
120
+ +qual=()
121
+ +filter()
122
+ +info()
123
+ +format()
124
+ +to_s()
125
+ }
126
+ class Flag {
127
+ +@value : Integer
128
+ +paired?()
129
+ +proper_pair?()
130
+ +unmapped?()
131
+ +mate_unmapped?()
132
+ +reverse?()
133
+ +mate_reverse?()
134
+ +read1?()
135
+ +read2?()
136
+ +secondary?()
137
+ +qcfail?()
138
+ +duplicate?()
139
+ +supplementary?()
140
+ +&()
141
+ +|()
142
+ +^()
143
+ +~()
144
+ +<<()
145
+ +>>()
146
+ +to_i()
147
+ +to_s()
148
+ }
149
+ class Info {
150
+ -@record : Bcf::Record
151
+ +[]()
152
+ +get_int()
153
+ +get_float()
154
+ +get_string()
155
+ +get_flag()
156
+ +fields()
157
+ +length() +size()
158
+ +to_h()
159
+ -info_ptr()
160
+ }
161
+ class Format {
162
+ -@record : Bcf::Record
163
+ +[]()\
164
+ +get_int()
165
+ +get_float()
166
+ +get_string()
167
+ +get_flag()
168
+ +fields()
169
+ +length() +size()
170
+ +to_h()
171
+ -format_ptr()
172
+ }
173
+ class Cigar {
174
+ -@array : Array
175
+ +each() Enumerable
176
+ +qlen()
177
+ +rlen()
178
+ +to_s()
179
+ +==()
180
+ +eql?()
181
+ }
182
+ class Faidx{
183
+ +@fai
184
+ }
185
+
186
+ ```
187
+
188
+ ## Installation
189
+
190
+ ```
191
+ gem install htslib
192
+ ```
193
+
194
+ You can check which shared libraries are used by ruby-htslib as follows
195
+
196
+ ```ruby
197
+ require "htslib"
198
+ puts HTS.lib_path
199
+ # => "/home/kojix2/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/htslib-0.2.6/vendor/libhts.so"
200
+ ```
201
+
202
+ ## HTS::Bam - SAM / BAM / CRAM - Sequence Alignment Map file
203
+
204
+ Reading fields
205
+
206
+ ```ruby
207
+ require 'htslib'
208
+
209
+ bam = HTS::Bam.open("test/fixtures/moo.bam")
210
+
211
+ bam.each do |r|
212
+ pp name: r.qname,
213
+ flag: r.flag,
214
+ chrm: r.chrom,
215
+ strt: r.pos + 1,
216
+ mapq: r.mapq,
217
+ cigr: r.cigar.to_s,
218
+ mchr: r.mate_chrom,
219
+ mpos: r.mpos + 1,
220
+ isiz: r.isize,
221
+ seqs: r.seq,
222
+ qual: r.qual_string,
223
+ MC: r.aux("MC")
224
+ end
225
+
226
+ bam.close
227
+ ```
228
+
229
+ Open with block
230
+
231
+ ```ruby
232
+ HTS::Bam.open("test/fixtures/moo.bam") do |bam|
233
+ bam.each do |record|
234
+ # ...
235
+ end
236
+ end
237
+ ```
238
+
239
+ Writing
240
+
241
+ ```ruby
242
+ in = HTS::Bam.open("foo.bam")
243
+ out = HTS::Bam.open("bar.bam", "wb")
244
+
245
+ out.header = in.header
246
+ # out.write_header(in.header)
247
+
248
+ in.each do |r|
249
+ out << r
250
+ # out.write(r)
251
+ end
252
+
253
+ in.close
254
+ out.close
255
+ ```
256
+
257
+ Create index
258
+
259
+ ```ruby
260
+ HTS::Bam.open("foo.bam", build_index: true)
261
+ ```
262
+
263
+ ```
264
+ b = HTS::Bam.open("foo.bam")
265
+ .build_index
266
+ .load_index
267
+ ```
268
+
269
+ ## HTS::Bcf - VCF / BCF - Variant Call Format file
270
+
271
+ Reading fields
272
+
273
+ ```ruby
274
+ bcf = HTS::Bcf.open("b.bcf")
275
+
276
+ bcf.each do |r|
277
+ p chrom: r.chrom,
278
+ pos: r.pos,
279
+ id: r.id,
280
+ qual: r.qual.round(2),
281
+ ref: r.ref,
282
+ alt: r.alt,
283
+ filter: r.filter,
284
+ info: r.info.to_h,
285
+ format: r.format.to_h
286
+ end
287
+
288
+ bcf.close
289
+ ```
290
+
291
+ Open with block
292
+
293
+ ```ruby
294
+ HTS::Bcf.open("b.bcf") do |bcf|
295
+ bcf.each do |record|
296
+ # ...
297
+ end
298
+ end
299
+ ```
300
+
301
+ Writing
302
+
303
+ ```ruby
304
+ in = HTS::Bcf.open("foo.bcf")
305
+ out = HTS::Bcf.open("bar.bcf", "wb")
306
+
307
+ out.header = in.header
308
+ # out.write_header(in.header)
309
+ in.each do |r|
310
+ out << r
311
+ # out.write(r)
312
+ end
313
+
314
+ in.close
315
+ out.close
316
+ ```