parse_fasta 1.8.1 → 1.8.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- YzlkNTQ5NGQ5YTFlNzVkOTJjOTJkMTM2YmUwN2FlMjhmOTg2ZDZlMQ==
4
+ MDE1YThkNzUyNzI0MTMwZDMyYzBiNzFiODMzZGQzNzQ5ODU3ZTk1MA==
5
5
  data.tar.gz: !binary |-
6
- OThhYTU5NTAzYzlkMTg2N2IxOWNjYTExMWEyODRiY2Q2OGFhMzQ4MQ==
6
+ NjBmZGUxZTdkM2UyZTQ4YWY1MDliMTI0OTJlYjA5ZDFmMzg4OWRlZQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- NzMyYTNmZmQ0YThlMThkZmE3ZjZhZjAzNDM2MGQ4ZTcwODhkODY3NzI2NzU1
10
- NDQzYmU1ZDBiZjljYzVhZmNlMDIzZDMxMDc4Zjk3N2E1YTAxOTUzZTIyOGNj
11
- NzdjOWJiODA2ZDA0NGNmMjFkOGI1ZjgxZWY3NTRmMTQ1MDc5MTU=
9
+ MjIxNGNlODdkNTk3ZWE1ZDk1Zjg4ZDY0ZWE3NzE0ZWI0ODQ4MDZjZTk1MDY1
10
+ OGNiOGViOWYwZDU5ODY0YjNmZWY1ODYwOGVlN2E5MTVmYzZlZmIwNzE4MjNi
11
+ YWFjMjgzNGQ4YmMzODdjYjZjNTBmYTM4MWFiYTcyYjlmZWFhYWM=
12
12
  data.tar.gz: !binary |-
13
- ODU2NzMwZTk3ZmE0ZTIxYzMwOWVkMWUyY2U4MTE3YzAzMzI5MzU1ZDAzNWE3
14
- OGQ3ODk2ZjQwYTNjNTJlZTVjYzg3MGU5YzliZjAyYjQ4ZDNmNjRlNzE2YmJk
15
- MmY2OTRhYjI3NTM3ODFmYWYwNDk2ZjQ0YzI3YjIxMzI3MGU3MmE=
13
+ YTNiMTYzNmJhODkzMjEyMjBlOTgxOGIyMjFmMTFlOTE0NTEyOWZjNTgxMTRj
14
+ N2Y4YjE3NWUxMjYyNTRjNTYzZGE3MjBhNjJjZTNmNjRkYzY5ZGI2MGY0MjQz
15
+ N2RhMDUxN2E1MjY0NDZkOWQyMjEzYTU2ZDE4M2FlZDg3YzA0N2M=
data/README.md CHANGED
@@ -66,14 +66,11 @@ Read fasta file into a hash.
66
66
 
67
67
  ## Versions ##
68
68
 
69
- ### 1.8 ###
69
+ ### 1.8.2 ###
70
70
 
71
- Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
72
- `parse_fasta` doesn't check whether the seq is AA or NA, if called on
73
- an amino acid string, things will get weird as it will complement the
74
- IUPAC characters in the AA string and leave others.
71
+ Speed up `FastqFile#each_record`.
75
72
 
76
- #### 1.8.1 ####
73
+ ### 1.8.1 ###
77
74
 
78
75
  An error will be raised if a fasta file has a `>` in the
79
76
  sequence. Sometimes files are not terminated with a newline
@@ -93,12 +90,14 @@ This will raise `ParseFasta::SequenceFormatError`.
93
90
 
94
91
  Also, headers with lots of `>` within are fine now.
95
92
 
93
+ ### 1.8 ###
96
94
 
97
- ### 1.7 ###
98
-
99
- Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
95
+ Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
96
+ `parse_fasta` doesn't check whether the seq is AA or NA, if called on
97
+ an amino acid string, things will get weird as it will complement the
98
+ IUPAC characters in the AA string and leave others.
100
99
 
101
- #### 1.7.2 ####
100
+ ### 1.7.2 ###
102
101
 
103
102
  Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
104
103
 
@@ -108,24 +107,28 @@ there are spaces that don't match in the quality and sequence in a
108
107
  fastQ file, then things will get messed up in the FastQ file. FastQ
109
108
  shouldn't have spaces though.
110
109
 
111
- ### 1.6 ###
110
+ ### 1.7 ###
112
111
 
113
- Added `SeqFile` class, which accepts either fastA or fastQ files. It
114
- uses FastaFile and FastqFile internally. You can use this class if you
115
- want your scripts to accept either fastA or fastQ files.
112
+ Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
116
113
 
117
- If you need the description and quality string, you should use
118
- FastqFile instead.
114
+ ### 1.6.2 ###
115
+
116
+ `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
117
+ that don't begin with a `>`.
119
118
 
120
- #### 1.6.1 ####
119
+ ### 1.6.1 ###
121
120
 
122
121
  Better internal handling of empty sequences -- instead of raising
123
122
  errors, pass empty sequences.
124
123
 
125
- #### 1.6.2 ####
124
+ ### 1.6 ###
126
125
 
127
- `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
128
- that don't begin with a `>`.
126
+ Added `SeqFile` class, which accepts either fastA or fastQ files. It
127
+ uses FastaFile and FastqFile internally. You can use this class if you
128
+ want your scripts to accept either fastA or fastQ files.
129
+
130
+ If you need the description and quality string, you should use
131
+ FastqFile instead.
129
132
 
130
133
  ### 1.5 ###
131
134
 
@@ -204,17 +207,16 @@ Last version with File monkey patch.
204
207
 
205
208
  ## Benchmark ##
206
209
 
207
- Perhaps this isn't exactly fair since `BioRuby` is a big module with
208
- lots of features and error checking, whereas `parse_fasta` is meant to
209
- be lightweight and easy to use for my own research. Oh well ;)
210
+ **NOTE**: These benchmarks are against an older version of
211
+ `parse_fasta`.
212
+
213
+ Some quick and dirty benchmarks against `BioRuby`.
210
214
 
211
215
  ### FastaFile#each_record ###
212
216
 
213
- You're probably wondering...How does it compare to BioRuby in some
214
- super accurate benchmarking tests? Lucky for you, I calculated
215
- sequence length for each fasta record with both the `each_record`
216
- method from this gem and using the `FastaFormat` class from
217
- BioRuby. You can see the test script in `benchmark.rb`.
217
+ Calculating sequence length length for each fasta record with both the
218
+ `each_record` method from this gem and using the `FastaFormat` class
219
+ from BioRuby. You can see the test script in `benchmark.rb`.
218
220
 
219
221
  The test file contained 2,009,897 illumina reads and the file size
220
222
  was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class:
@@ -255,20 +257,10 @@ test 2 was 4,000,000 and test 3 was 8,000,000 bases.
255
257
 
256
258
  Nice!
257
259
 
258
- Troll: "But Ryan, when will you find the GC of an 8,000,000 base
259
- sequence?"
260
+ Troll: "When will you find the GC of an 8,000,000 base sequence?"
260
261
 
261
262
  Me: "Step off, troll!"
262
263
 
263
- ## Test suite & docs ##
264
-
265
- For a good time, you could clone this repo and run the test suite with
266
- rspec! Or if you just don't trust that it works like it should. The
267
- specs probably need a little clean up...so fork it and clean it up ;)
268
-
269
- Same with the docs. Clone the repo and build them yourself with `yard`
270
- if you are in need of some excitement.
271
-
272
264
  ## Notes ##
273
265
 
274
266
  Only the `SeqFile` class actually checks to make sure that you passed
@@ -80,11 +80,11 @@ class FastqFile < File
80
80
 
81
81
  case count % 4
82
82
  when 0
83
- header = line.sub(/^@/, '')
83
+ header = line[1..-1]
84
84
  when 1
85
85
  sequence = Sequence.new(line)
86
86
  when 2
87
- description = line.sub(/^\+/, '')
87
+ description = line[1..-1]
88
88
  when 3
89
89
  quality = Quality.new(line)
90
90
  yield(header, sequence, description, quality)
@@ -17,5 +17,5 @@
17
17
  # along with parse_fasta. If not, see <http://www.gnu.org/licenses/>.
18
18
 
19
19
  module ParseFasta
20
- VERSION = "1.8.1"
20
+ VERSION = "1.8.2"
21
21
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: parse_fasta
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.1
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ryan Moore
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-03-11 00:00:00.000000000 Z
11
+ date: 2016-04-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler