parse_fasta 1.8.1 → 1.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- YzlkNTQ5NGQ5YTFlNzVkOTJjOTJkMTM2YmUwN2FlMjhmOTg2ZDZlMQ==
4
+ MDE1YThkNzUyNzI0MTMwZDMyYzBiNzFiODMzZGQzNzQ5ODU3ZTk1MA==
5
5
  data.tar.gz: !binary |-
6
- OThhYTU5NTAzYzlkMTg2N2IxOWNjYTExMWEyODRiY2Q2OGFhMzQ4MQ==
6
+ NjBmZGUxZTdkM2UyZTQ4YWY1MDliMTI0OTJlYjA5ZDFmMzg4OWRlZQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- NzMyYTNmZmQ0YThlMThkZmE3ZjZhZjAzNDM2MGQ4ZTcwODhkODY3NzI2NzU1
10
- NDQzYmU1ZDBiZjljYzVhZmNlMDIzZDMxMDc4Zjk3N2E1YTAxOTUzZTIyOGNj
11
- NzdjOWJiODA2ZDA0NGNmMjFkOGI1ZjgxZWY3NTRmMTQ1MDc5MTU=
9
+ MjIxNGNlODdkNTk3ZWE1ZDk1Zjg4ZDY0ZWE3NzE0ZWI0ODQ4MDZjZTk1MDY1
10
+ OGNiOGViOWYwZDU5ODY0YjNmZWY1ODYwOGVlN2E5MTVmYzZlZmIwNzE4MjNi
11
+ YWFjMjgzNGQ4YmMzODdjYjZjNTBmYTM4MWFiYTcyYjlmZWFhYWM=
12
12
  data.tar.gz: !binary |-
13
- ODU2NzMwZTk3ZmE0ZTIxYzMwOWVkMWUyY2U4MTE3YzAzMzI5MzU1ZDAzNWE3
14
- OGQ3ODk2ZjQwYTNjNTJlZTVjYzg3MGU5YzliZjAyYjQ4ZDNmNjRlNzE2YmJk
15
- MmY2OTRhYjI3NTM3ODFmYWYwNDk2ZjQ0YzI3YjIxMzI3MGU3MmE=
13
+ YTNiMTYzNmJhODkzMjEyMjBlOTgxOGIyMjFmMTFlOTE0NTEyOWZjNTgxMTRj
14
+ N2Y4YjE3NWUxMjYyNTRjNTYzZGE3MjBhNjJjZTNmNjRkYzY5ZGI2MGY0MjQz
15
+ N2RhMDUxN2E1MjY0NDZkOWQyMjEzYTU2ZDE4M2FlZDg3YzA0N2M=
data/README.md CHANGED
@@ -66,14 +66,11 @@ Read fasta file into a hash.
66
66
 
67
67
  ## Versions ##
68
68
 
69
- ### 1.8 ###
69
+ ### 1.8.2 ###
70
70
 
71
- Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
72
- `parse_fasta` doesn't check whether the seq is AA or NA, if called on
73
- an amino acid string, things will get weird as it will complement the
74
- IUPAC characters in the AA string and leave others.
71
+ Speed up `FastqFile#each_record`.
75
72
 
76
- #### 1.8.1 ####
73
+ ### 1.8.1 ###
77
74
 
78
75
  An error will be raised if a fasta file has a `>` in the
79
76
  sequence. Sometimes files are not terminated with a newline
@@ -93,12 +90,14 @@ This will raise `ParseFasta::SequenceFormatError`.
93
90
 
94
91
  Also, headers with lots of `>` within are fine now.
95
92
 
93
+ ### 1.8 ###
96
94
 
97
- ### 1.7 ###
98
-
99
- Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
95
+ Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
96
+ `parse_fasta` doesn't check whether the seq is AA or NA, if called on
97
+ an amino acid string, things will get weird as it will complement the
98
+ IUPAC characters in the AA string and leave others.
100
99
 
101
- #### 1.7.2 ####
100
+ ### 1.7.2 ###
102
101
 
103
102
  Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
104
103
 
@@ -108,24 +107,28 @@ there are spaces that don't match in the quality and sequence in a
108
107
  fastQ file, then things will get messed up in the FastQ file. FastQ
109
108
  shouldn't have spaces though.
110
109
 
111
- ### 1.6 ###
110
+ ### 1.7 ###
112
111
 
113
- Added `SeqFile` class, which accepts either fastA or fastQ files. It
114
- uses FastaFile and FastqFile internally. You can use this class if you
115
- want your scripts to accept either fastA or fastQ files.
112
+ Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
116
113
 
117
- If you need the description and quality string, you should use
118
- FastqFile instead.
114
+ ### 1.6.2 ###
115
+
116
+ `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
117
+ that don't begin with a `>`.
119
118
 
120
- #### 1.6.1 ####
119
+ ### 1.6.1 ###
121
120
 
122
121
  Better internal handling of empty sequences -- instead of raising
123
122
  errors, pass empty sequences.
124
123
 
125
- #### 1.6.2 ####
124
+ ### 1.6 ###
126
125
 
127
- `FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
128
- that don't begin with a `>`.
126
+ Added `SeqFile` class, which accepts either fastA or fastQ files. It
127
+ uses FastaFile and FastqFile internally. You can use this class if you
128
+ want your scripts to accept either fastA or fastQ files.
129
+
130
+ If you need the description and quality string, you should use
131
+ FastqFile instead.
129
132
 
130
133
  ### 1.5 ###
131
134
 
@@ -204,17 +207,16 @@ Last version with File monkey patch.
204
207
 
205
208
  ## Benchmark ##
206
209
 
207
- Perhaps this isn't exactly fair since `BioRuby` is a big module with
208
- lots of features and error checking, whereas `parse_fasta` is meant to
209
- be lightweight and easy to use for my own research. Oh well ;)
210
+ **NOTE**: These benchmarks are against an older version of
211
+ `parse_fasta`.
212
+
213
+ Some quick and dirty benchmarks against `BioRuby`.
210
214
 
211
215
  ### FastaFile#each_record ###
212
216
 
213
- You're probably wondering...How does it compare to BioRuby in some
214
- super accurate benchmarking tests? Lucky for you, I calculated
215
- sequence length for each fasta record with both the `each_record`
216
- method from this gem and using the `FastaFormat` class from
217
- BioRuby. You can see the test script in `benchmark.rb`.
217
+ Calculating sequence length length for each fasta record with both the
218
+ `each_record` method from this gem and using the `FastaFormat` class
219
+ from BioRuby. You can see the test script in `benchmark.rb`.
218
220
 
219
221
  The test file contained 2,009,897 illumina reads and the file size
220
222
  was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class:
@@ -255,20 +257,10 @@ test 2 was 4,000,000 and test 3 was 8,000,000 bases.
255
257
 
256
258
  Nice!
257
259
 
258
- Troll: "But Ryan, when will you find the GC of an 8,000,000 base
259
- sequence?"
260
+ Troll: "When will you find the GC of an 8,000,000 base sequence?"
260
261
 
261
262
  Me: "Step off, troll!"
262
263
 
263
- ## Test suite & docs ##
264
-
265
- For a good time, you could clone this repo and run the test suite with
266
- rspec! Or if you just don't trust that it works like it should. The
267
- specs probably need a little clean up...so fork it and clean it up ;)
268
-
269
- Same with the docs. Clone the repo and build them yourself with `yard`
270
- if you are in need of some excitement.
271
-
272
264
  ## Notes ##
273
265
 
274
266
  Only the `SeqFile` class actually checks to make sure that you passed
@@ -80,11 +80,11 @@ class FastqFile < File
80
80
 
81
81
  case count % 4
82
82
  when 0
83
- header = line.sub(/^@/, '')
83
+ header = line[1..-1]
84
84
  when 1
85
85
  sequence = Sequence.new(line)
86
86
  when 2
87
- description = line.sub(/^\+/, '')
87
+ description = line[1..-1]
88
88
  when 3
89
89
  quality = Quality.new(line)
90
90
  yield(header, sequence, description, quality)
@@ -17,5 +17,5 @@
17
17
  # along with parse_fasta. If not, see <http://www.gnu.org/licenses/>.
18
18
 
19
19
  module ParseFasta
20
- VERSION = "1.8.1"
20
+ VERSION = "1.8.2"
21
21
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: parse_fasta
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.1
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ryan Moore
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-03-11 00:00:00.000000000 Z
11
+ date: 2016-04-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler