parse_fasta 1.9.2 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (48) hide show
  1. checksums.yaml +8 -8
  2. data/.gitignore +1 -0
  3. data/.rspec +2 -0
  4. data/CHANGELOG.md +178 -0
  5. data/README.md +42 -215
  6. data/Rakefile +2 -4
  7. data/bin/console +14 -0
  8. data/bin/setup +8 -0
  9. data/lib/parse_fasta/error.rb +39 -0
  10. data/lib/parse_fasta/record.rb +88 -0
  11. data/lib/parse_fasta/seq_file.rb +221 -114
  12. data/lib/parse_fasta/version.rb +2 -2
  13. data/lib/parse_fasta.rb +5 -20
  14. data/spec/parse_fasta/record_spec.rb +115 -0
  15. data/spec/parse_fasta/seq_file_spec.rb +238 -0
  16. data/spec/parse_fasta_spec.rb +25 -0
  17. data/spec/spec_helper.rb +2 -44
  18. data/spec/test_files/cr.fa +1 -0
  19. data/spec/test_files/cr.fa.gz +0 -0
  20. data/spec/test_files/cr.fq +3 -0
  21. data/spec/test_files/cr.fq.gz +0 -0
  22. data/spec/test_files/cr_nl.fa +4 -0
  23. data/spec/test_files/cr_nl.fa.gz +0 -0
  24. data/spec/test_files/cr_nl.fq +8 -0
  25. data/spec/test_files/cr_nl.fq.gz +0 -0
  26. data/spec/test_files/multi_blob.fa.gz +0 -0
  27. data/spec/test_files/multi_blob.fq.gz +0 -0
  28. data/spec/test_files/not_a_seq_file.txt +1 -0
  29. data/{test_files/bad.fa → spec/test_files/poorly_catted.fa} +0 -0
  30. data/{test_files/test.fa → spec/test_files/seqs.fa} +0 -0
  31. data/spec/test_files/seqs.fa.gz +0 -0
  32. data/spec/test_files/seqs.fq +8 -0
  33. data/spec/test_files/seqs.fq.gz +0 -0
  34. metadata +49 -24
  35. data/lib/parse_fasta/fasta_file.rb +0 -232
  36. data/lib/parse_fasta/fastq_file.rb +0 -160
  37. data/lib/parse_fasta/quality.rb +0 -54
  38. data/lib/parse_fasta/sequence.rb +0 -174
  39. data/spec/lib/fasta_file_spec.rb +0 -212
  40. data/spec/lib/fastq_file_spec.rb +0 -143
  41. data/spec/lib/quality_spec.rb +0 -51
  42. data/spec/lib/seq_file_spec.rb +0 -357
  43. data/spec/lib/sequence_spec.rb +0 -188
  44. data/test_files/benchmark.rb +0 -99
  45. data/test_files/bogus.txt +0 -2
  46. data/test_files/test.fa.gz +0 -0
  47. data/test_files/test.fq +0 -8
  48. data/test_files/test.fq.gz +0 -0
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NmM5ZWYwOGM5YWIxMzU2YjBmZTk4Y2I5YzI0NjY0MzUwM2YwMjgyOA==
4
+ YzliYjhmZmMzNGRlYmFmNDQwOGE2NGFmNzgyZTliZDdhMDdkMTc0Zg==
5
5
  data.tar.gz: !binary |-
6
- NzI2NDY1MWZmYmUwNDUxMTk2MmI4YjgwYWVlYjcyZDI4MDUzMzk4NA==
6
+ OTgxOWFjYTEyMWI0MjNlNjBhZjJkNGZkMjFkZGFkZDNjNGJkNTk2NA==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- ODY1ZTQ1MzU4MTc2MDhhMjA0OThiYzM4Yzk4YjJiZjU4ZGY4MGM5NTRjYTE5
10
- OWZkODk0M2ZmODE5ODY1MjE3NTQ5MzgyNTFjMTk2NzU2NGVjN2NkNGUzYzA3
11
- ODliNjRlOGJjOGJhNjhlMWZmMmU1NjkyMjgwNzAyODQ1MDExOTI=
9
+ OGQxNTg4YzYyYzQyZGM2YjM0NzYyMjFiYzUwMTllYjM3NzZiZjViNTQwMWFi
10
+ NTI0NDk5NDY0NTc4YThhZTg4ODczYjAxZTA3MGNmZDdmMWYzNmMwMGFlMzhl
11
+ ODFhM2Q1NzIxZDVlYjE0MjEwYTg0OTlkMzlmZDQyYjIzYjhjNGQ=
12
12
  data.tar.gz: !binary |-
13
- YWY4NWU3NDFiYTVmMmE1Y2MxMDI3ZjE3NTIyY2Q1N2Q2ZDQxM2ZlZjI4NjUy
14
- MWM5OTZhNzEzZWNmMGVlYTQ1MDc1MzViMDBkOTQ0YzQyY2IxYjlmOGQwNzRh
15
- YmIyOTg2Yjk0OTFlNWVhOGU3MTMzM2I1ZGY0ZjlkMzExZGNkZDk=
13
+ MTBlN2NmNmJkOGUwM2Q1MDZhZTkzM2NmMzNmOTY1YWUzMzVjNjdkN2NiMDM2
14
+ NTJlYmU5Yzk1ODExNzczMGNkNTFkNzEwOWZkZGIwMjRiMjNiNGY5ZGM0MDJk
15
+ ZmY3YWI0OGQwOTNiMzY2ODAzMzkxZjFkZmNiNTExMGE3NWFlZjk=
data/.gitignore CHANGED
@@ -21,3 +21,4 @@ tmp
21
21
  *.a
22
22
  mkmf.log
23
23
  .ruby-*
24
+ .idea
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/CHANGELOG.md ADDED
@@ -0,0 +1,178 @@
1
+ ## Versions ##
2
+
3
+ ### 2.0.0 ###
4
+
5
+ A weird feature of `Zlib::GzipReader` made it so that if a gzipped file was created like this.
6
+
7
+ ```bash
8
+ gzip -c a.fa > z.fa.gz
9
+ gzip -c b.fa >> z.fa.gz
10
+ ```
11
+
12
+ Then the gzip reader would only read the lines from `a.fa` without some fiddling around. Since this was a pretty low level thing, I just decided to make a bunch of under the hood changes that I've been meaning to get to.
13
+
14
+ #### Other things
15
+
16
+ - Everything is namespaced under `ParseFasta` module
17
+ - Removed `FastaFile` and `FastqFile` classes, `SeqFile` only remains
18
+ - Removed `Sequence` and `Quality` classes. These might get put back in at some point, but I almost never used them anyway
19
+ - `SeqFile#each_record` yields a `Record` object so you can use the same code to parse fastA and fastQ files
20
+ - Other stuff that I'm forgetting!
21
+
22
+
23
+ ### 1.9.2 ###
24
+
25
+ Speed up fastA `each_record` and `each_record_fast`.
26
+
27
+ ### 1.9.1 ###
28
+
29
+ Speed up fastQ `each_record` and `each_record_fast`. Courtesy of
30
+ [Matthew Ralston](https://github.com/MatthewRalston).
31
+
32
+ ### 1.9.0 ###
33
+
34
+ Added "fast" versions of `each_record` methods
35
+ (`each_record_fast`). Basically, they return sequences and quality
36
+ strings as Ruby `Sring` objects instead of aa `Sequence` or `Quality`
37
+ objects. Also, if the sequence or quality string has spaces, they will
38
+ be retained. If this is a problem, use the original `each_record`
39
+ methods.
40
+
41
+ ### 1.8.2 ###
42
+
43
+ Speed up `FastqFile#each_record`.
44
+
45
+ ### 1.8.1 ###
46
+
47
+ An error will be raised if a fasta file has a `>` in the
48
+ sequence. Sometimes files are not terminated with a newline
49
+ character. If this is the case, then catting two fasta files will
50
+ smush the first header of the second file right in with the last
51
+ sequence of the first file. This is bad, raise an error! ;)
52
+
53
+ Example
54
+
55
+ >seq1
56
+ ACTG>seq2
57
+ ACTG
58
+ >seq3
59
+ ACTG
60
+
61
+ This will raise `ParseFasta::SequenceFormatError`.
62
+
63
+ Also, headers with lots of `>` within are fine now.
64
+
65
+ ### 1.8 ###
66
+
67
+ Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
68
+ `parse_fasta` doesn't check whether the seq is AA or NA, if called on
69
+ an amino acid string, things will get weird as it will complement the
70
+ IUPAC characters in the AA string and leave others.
71
+
72
+ ### 1.7.2 ###
73
+
74
+ Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
75
+
76
+ Some alignment fastas have spaces for easier reading. Strip these
77
+ out. For consistency, also strips spaces from `Quality` strings. If
78
+ there are spaces that don't match in the quality and sequence in a
79
+ fastQ file, then things will get messed up in the FastQ file. FastQ
80
+ shouldn't have spaces though.
81
+
82
+ ### 1.7 ###
83
+
84
+ Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
85
+
86
+ ### 1.6.2 ###
87
+
88
+ `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
89
+ that don't begin with a `>`.
90
+
91
+ ### 1.6.1 ###
92
+
93
+ Better internal handling of empty sequences -- instead of raising
94
+ errors, pass empty sequences.
95
+
96
+ ### 1.6 ###
97
+
98
+ Added `SeqFile` class, which accepts either fastA or fastQ files. It
99
+ uses FastaFile and FastqFile internally. You can use this class if you
100
+ want your scripts to accept either fastA or fastQ files.
101
+
102
+ If you need the description and quality string, you should use
103
+ FastqFile instead.
104
+
105
+ ### 1.5 ###
106
+
107
+ Now accepts gzipped files. Huzzah!
108
+
109
+ ### 1.4 ###
110
+
111
+ Added methods:
112
+
113
+ Sequence.base_counts
114
+ Sequence.base_frequencies
115
+
116
+ ### 1.3 ###
117
+
118
+ Add additional functionality to `each_record` method.
119
+
120
+ #### Info ####
121
+
122
+ I often like to use the fasta format for other things like so
123
+
124
+ >fruits
125
+ pineapple
126
+ pear
127
+ peach
128
+ >veggies
129
+ peppers
130
+ parsnip
131
+ peas
132
+
133
+ rather than having this in a two column file like this
134
+
135
+ fruit,pineapple
136
+ fruit,pear
137
+ fruit,peach
138
+ veggie,peppers
139
+ veggie,parsnip
140
+ veggie,peas
141
+
142
+ So I added functionality to `each_record` to keep each line a record
143
+ separate in an array. Here's an example using the above file.
144
+
145
+ info = []
146
+ FastaFile.open(f, 'r').each_record(1) do |header, lines|
147
+ info << [header, lines]
148
+ end
149
+
150
+ Then info will contain the following arrays
151
+
152
+ ['fruits', ['pineapple', 'pear', 'peach']],
153
+ ['veggies', ['peppers', 'parsnip', 'peas']]
154
+
155
+ ### 1.2 ###
156
+
157
+ Added `mean_qual` method to the `Quality` class.
158
+
159
+ ### 1.1.2 ###
160
+
161
+ Dropped Ruby requirement to 1.9.3
162
+
163
+ (Note, if you want to build the docs with yard and you're using
164
+ Ruby 1.9.3, you may have to install the redcarpet gem.)
165
+
166
+ ### 1.1 ###
167
+
168
+ Added: Fastq and Quality classes
169
+
170
+ ### 1.0 ###
171
+
172
+ Added: Fasta and Sequence classes
173
+
174
+ Removed: File monkey patch
175
+
176
+ ### 0.0.5 ###
177
+
178
+ Last version with File monkey patch.
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # parse_fasta #
1
+ # ParseFasta #
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/parse_fasta.svg)](http://badge.fury.io/rb/parse_fasta) [![Build Status](https://travis-ci.org/mooreryan/parse_fasta.svg?branch=master)](https://travis-ci.org/mooreryan/parse_fasta) [![Coverage Status](https://coveralls.io/repos/mooreryan/parse_fasta/badge.svg)](https://coveralls.io/r/mooreryan/parse_fasta)
4
4
 
@@ -8,7 +8,9 @@ So you want to parse a fasta file...
8
8
 
9
9
  Add this line to your application's Gemfile:
10
10
 
11
- gem 'parse_fasta'
11
+ ```ruby
12
+ gem 'parse_fasta'
13
+ ```
12
14
 
13
15
  And then execute:
14
16
 
@@ -20,9 +22,7 @@ Or install it yourself as:
20
22
 
21
23
  ## Overview ##
22
24
 
23
- Provides nice, programmatic access to fasta and fastq files, as well
24
- as providing Sequence and Quality helper classes. It's more
25
- lightweight than BioRuby. And more fun! ;)
25
+ Provides nice, programmatic access to fasta and fastq files. It's faster and more lightweight than BioRuby. And more fun!
26
26
 
27
27
  ## Documentation ##
28
28
 
@@ -32,213 +32,40 @@ for the full api documentation.
32
32
 
33
33
  ## Usage ##
34
34
 
35
- Some examples...
36
-
37
- A little script to print header and length of each record.
38
-
39
- require 'parse_fasta'
40
-
41
- FastaFile.open(ARGV[0]).each_record do |header, sequence|
42
- puts [header, sequence.length].join("\t")
43
- end
44
-
45
- And here, a script to calculate GC content:
46
-
47
- FastaFile.open(ARGV[0]).each_record do |header, sequence|
48
- puts [header, sequence.gc].join("\t")
49
- end
50
-
51
- Now we can parse fastq files as well!
52
-
53
- FastqFile.open(ARGV[0]).each_record do |head, seq, desc, qual|
54
- puts [header, qual.qual_scores.join(',')].join("\t")
55
- end
56
-
57
- What if you don't care if the input is a fastA or a fastQ? No problem!
58
-
59
- SeqFile.open(ARGV[0]).each_record do |head, seq|
60
- puts [header, seq].join "\t"
61
- end
62
-
63
- Read fasta file into a hash.
64
-
65
- seqs = FastaFile.open(ARGV[0]).to_hash
66
-
67
- ## Versions ##
68
-
69
- ### 1.9.2 ###
70
-
71
- Speed up fastA `each_record` and `each_record_fast`.
72
-
73
- ### 1.9.1 ###
74
-
75
- Speed up fastQ `each_record` and `each_record_fast`. Courtesy of
76
- [Matthew Ralston](https://github.com/MatthewRalston).
77
-
78
- ### 1.9.0 ###
79
-
80
- Added "fast" versions of `each_record` methods
81
- (`each_record_fast`). Basically, they return sequences and quality
82
- strings as Ruby `Sring` objects instead of aa `Sequence` or `Quality`
83
- objects. Also, if the sequence or quality string has spaces, they will
84
- be retained. If this is a problem, use the original `each_record`
85
- methods.
86
-
87
- ### 1.8.2 ###
88
-
89
- Speed up `FastqFile#each_record`.
90
-
91
- ### 1.8.1 ###
92
-
93
- An error will be raised if a fasta file has a `>` in the
94
- sequence. Sometimes files are not terminated with a newline
95
- character. If this is the case, then catting two fasta files will
96
- smush the first header of the second file right in with the last
97
- sequence of the first file. This is bad, raise an error! ;)
98
-
99
- Example
100
-
101
- >seq1
102
- ACTG>seq2
103
- ACTG
104
- >seq3
105
- ACTG
106
-
107
- This will raise `ParseFasta::SequenceFormatError`.
108
-
109
- Also, headers with lots of `>` within are fine now.
110
-
111
- ### 1.8 ###
112
-
113
- Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
114
- `parse_fasta` doesn't check whether the seq is AA or NA, if called on
115
- an amino acid string, things will get weird as it will complement the
116
- IUPAC characters in the AA string and leave others.
117
-
118
- ### 1.7.2 ###
119
-
120
- Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
121
-
122
- Some alignment fastas have spaces for easier reading. Strip these
123
- out. For consistency, also strips spaces from `Quality` strings. If
124
- there are spaces that don't match in the quality and sequence in a
125
- fastQ file, then things will get messed up in the FastQ file. FastQ
126
- shouldn't have spaces though.
127
-
128
- ### 1.7 ###
129
-
130
- Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
131
-
132
- ### 1.6.2 ###
133
-
134
- `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
135
- that don't begin with a `>`.
136
-
137
- ### 1.6.1 ###
138
-
139
- Better internal handling of empty sequences -- instead of raising
140
- errors, pass empty sequences.
141
-
142
- ### 1.6 ###
143
-
144
- Added `SeqFile` class, which accepts either fastA or fastQ files. It
145
- uses FastaFile and FastqFile internally. You can use this class if you
146
- want your scripts to accept either fastA or fastQ files.
147
-
148
- If you need the description and quality string, you should use
149
- FastqFile instead.
150
-
151
- ### 1.5 ###
152
-
153
- Now accepts gzipped files. Huzzah!
154
-
155
- ### 1.4 ###
156
-
157
- Added methods:
158
-
159
- Sequence.base_counts
160
- Sequence.base_frequencies
161
-
162
- ### 1.3 ###
163
-
164
- Add additional functionality to `each_record` method.
165
-
166
- #### Info ####
167
-
168
- I often like to use the fasta format for other things like so
169
-
170
- >fruits
171
- pineapple
172
- pear
173
- peach
174
- >veggies
175
- peppers
176
- parsnip
177
- peas
178
-
179
- rather than having this in a two column file like this
180
-
181
- fruit,pineapple
182
- fruit,pear
183
- fruit,peach
184
- veggie,peppers
185
- veggie,parsnip
186
- veggie,peas
187
-
188
- So I added functionality to `each_record` to keep each line a record
189
- separate in an array. Here's an example using the above file.
190
-
191
- info = []
192
- FastaFile.open(f, 'r').each_record(1) do |header, lines|
193
- info << [header, lines]
194
- end
195
-
196
- Then info will contain the following arrays
197
-
198
- ['fruits', ['pineapple', 'pear', 'peach']],
199
- ['veggies', ['peppers', 'parsnip', 'peas']]
200
-
201
- ### 1.2 ###
202
-
203
- Added `mean_qual` method to the `Quality` class.
204
-
205
- ### 1.1.2 ###
206
-
207
- Dropped Ruby requirement to 1.9.3
208
-
209
- (Note, if you want to build the docs with yard and you're using
210
- Ruby 1.9.3, you may have to install the redcarpet gem.)
211
-
212
- ### 1.1 ###
213
-
214
- Added: Fastq and Quality classes
215
-
216
- ### 1.0 ###
217
-
218
- Added: Fasta and Sequence classes
219
-
220
- Removed: File monkey patch
221
-
222
- ### 0.0.5 ###
223
-
224
- Last version with File monkey patch.
225
-
226
- ## Benchmark ##
227
-
228
- Some quick and dirty benchmarks against `BioRuby`.
229
-
230
- ### FastaFile#each_record ###
231
-
232
- You can see the test script in `benchmark.rb`.
233
-
234
- user system total real
235
- parse_fasta 1.920000 0.160000 2.080000 ( 2.145932)
236
- parse_fasta fast 1.210000 0.160000 1.370000 ( 1.377770)
237
- bioruby 4.330000 0.290000 4.620000 ( 4.655567)
238
-
239
- Hot dog! It's faster :)
240
-
241
- ## Notes ##
242
-
243
- Only the `SeqFile` class actually checks to make sure that you passed
244
- in a "proper" fastA or fastQ file, so watch out.
35
+ Here are some examples of using ParseFasta. Don't forget to `require "parse_fasta"` at the top of your program!
36
+
37
+ Print header and length of each record.
38
+
39
+ ```ruby
40
+ ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
41
+ puts [rec.header, rec.seq.length].join "\t"
42
+ end
43
+ ```
44
+
45
+ You can parse fastQ files in exatcly the same way.
46
+
47
+ ```ruby
48
+ ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
49
+ printf "Header: %s, Sequence: %s, Description: %s, Quality: %s\n",
50
+ rec.header,
51
+ rec.seq,
52
+ rec.desc,
53
+ rec.qual
54
+ end
55
+ ```
56
+
57
+ The `Record#desc` and `Record#qual` will be `nil` if the file you are parsing is a fastA file.
58
+
59
+ ```ruby
60
+ ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
61
+ if rec.qual
62
+ puts "@#{rec.header}"
63
+ puts rec.seq
64
+ puts "+#{rec.desc}"
65
+ puts rec.qual
66
+ else
67
+ puts ">#{rec.header}"
68
+ puts rec.sequence
69
+ end
70
+ end
71
+ ```
data/Rakefile CHANGED
@@ -1,8 +1,6 @@
1
1
  require "bundler/gem_tasks"
2
2
  require "rspec/core/rake_task"
3
3
 
4
- RSpec::Core::RakeTask.new
5
-
6
- task default: :spec
7
- task test: :spec
4
+ RSpec::Core::RakeTask.new(:spec)
8
5
 
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "parse_fasta"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,39 @@
1
+ # Copyright 2014 - 2016 Ryan Moore
2
+ # Contact: moorer@udel.edu
3
+ #
4
+ # This file is part of parse_fasta.
5
+ #
6
+ # parse_fasta is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU General Public License as published by
8
+ # the Free Software Foundation, either version 3 of the License, or
9
+ # (at your option) any later version.
10
+ #
11
+ # parse_fasta is distributed in the hope that it will be useful,
12
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
13
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
+ # GNU General Public License for more details.
15
+ #
16
+ # You should have received a copy of the GNU General Public License
17
+ # along with parse_fasta. If not, see <http://www.gnu.org/licenses/>.
18
+
19
+ module ParseFasta
20
+ # Contains the Error classes that ParseFasta API will raise
21
+ module Error
22
+
23
+ # All ParseFasta errors inherit from ParseFastaError
24
+ class ParseFastaError < StandardError
25
+ end
26
+
27
+ # Raised when the input file doesn't look like fastA or fastQ
28
+ class DataFormatError < ParseFastaError
29
+ end
30
+
31
+ # Raised when the file is not found
32
+ class FileNotFoundError < ParseFastaError
33
+ end
34
+
35
+ # Raised when fastA sequences have a '>' in them
36
+ class SequenceFormatError < ParseFastaError
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,88 @@
1
+ # Copyright 2014 - 2016 Ryan Moore
2
+ # Contact: moorer@udel.edu
3
+ #
4
+ # This file is part of parse_fasta.
5
+ #
6
+ # parse_fasta is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU General Public License as published by
8
+ # the Free Software Foundation, either version 3 of the License, or
9
+ # (at your option) any later version.
10
+ #
11
+ # parse_fasta is distributed in the hope that it will be useful,
12
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
13
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
+ # GNU General Public License for more details.
15
+ #
16
+ # You should have received a copy of the GNU General Public License
17
+ # along with parse_fasta. If not, see <http://www.gnu.org/licenses/>.
18
+
19
+ module ParseFasta
20
+ class Record
21
+
22
+ # @!attribute header
23
+ # @return [String] the full header of the record without the '>'
24
+ # or '@'
25
+ # @!attribute seq
26
+ # @return [String] the sequence of the record
27
+ # @!attribute desc
28
+ # @return [String or Nil] if the record is from a fastA file, it
29
+ # is nil; else, the description line of the fastQ record
30
+ # @!attribute qual
31
+ # @return [String or Nil] if the record is from a fastA file, it
32
+ # is nil; else, the quality string of the fastQ record
33
+ attr_accessor :header, :seq, :desc, :qual
34
+
35
+ # The constructor takes keyword args.
36
+ #
37
+ # @example Init a new Record object for a fastA record
38
+ # Record.new header: "apple", seq: "actg"
39
+ # @example Init a new Record object for a fastQ record
40
+ # Record.new header: "apple", seq: "actd", desc: "", qual: "IIII"
41
+ #
42
+ # @param header [String] the header of the record
43
+ # @param seq [String] the sequence of the record
44
+ # @param desc [String] the description line of a fastQ record
45
+ # @param qual [String] the quality string of a fastQ record
46
+ #
47
+ # @raise [SequenceFormatError] if a fastA sequence has a '>'
48
+ # character in it
49
+ def initialize args = {}
50
+ @header = args.fetch :header
51
+
52
+ @desc = args.fetch :desc, nil
53
+ @qual = args.fetch :qual, nil
54
+
55
+ @qual.gsub!(/\s+/, "") if @qual
56
+
57
+ seq = args.fetch(:seq).gsub(/\s+/, "")
58
+
59
+ if @qual # is fastQ
60
+ @seq = seq
61
+ else # is fastA
62
+ @seq = check_fasta_seq(seq)
63
+ end
64
+ end
65
+
66
+ # Compare attrs of this rec with another
67
+ #
68
+ # @param rec [Record] a Record object to compare with
69
+ #
70
+ # @return [Bool] true or false
71
+ def == rec
72
+ self.header == rec.header && self.seq == rec.seq &&
73
+ self.desc == rec.desc && self.qual == rec.qual
74
+ end
75
+
76
+ private
77
+
78
+ def check_fasta_seq seq
79
+ if seq.match ">"
80
+ raise ParseFasta::Error::SequenceFormatError,
81
+ "A sequence contained a '>' character " +
82
+ "(the fastA file record separator)"
83
+ else
84
+ seq
85
+ end
86
+ end
87
+ end
88
+ end