parse_fasta 1.8.1 → 1.8.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/README.md +31 -39
- data/lib/parse_fasta/fastq_file.rb +2 -2
- data/lib/parse_fasta/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
MDE1YThkNzUyNzI0MTMwZDMyYzBiNzFiODMzZGQzNzQ5ODU3ZTk1MA==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
NjBmZGUxZTdkM2UyZTQ4YWY1MDliMTI0OTJlYjA5ZDFmMzg4OWRlZQ==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
MjIxNGNlODdkNTk3ZWE1ZDk1Zjg4ZDY0ZWE3NzE0ZWI0ODQ4MDZjZTk1MDY1
|
10
|
+
OGNiOGViOWYwZDU5ODY0YjNmZWY1ODYwOGVlN2E5MTVmYzZlZmIwNzE4MjNi
|
11
|
+
YWFjMjgzNGQ4YmMzODdjYjZjNTBmYTM4MWFiYTcyYjlmZWFhYWM=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
YTNiMTYzNmJhODkzMjEyMjBlOTgxOGIyMjFmMTFlOTE0NTEyOWZjNTgxMTRj
|
14
|
+
N2Y4YjE3NWUxMjYyNTRjNTYzZGE3MjBhNjJjZTNmNjRkYzY5ZGI2MGY0MjQz
|
15
|
+
N2RhMDUxN2E1MjY0NDZkOWQyMjEzYTU2ZDE4M2FlZDg3YzA0N2M=
|
data/README.md
CHANGED
@@ -66,14 +66,11 @@ Read fasta file into a hash.
|
|
66
66
|
|
67
67
|
## Versions ##
|
68
68
|
|
69
|
-
### 1.8 ###
|
69
|
+
### 1.8.2 ###
|
70
70
|
|
71
|
-
|
72
|
-
`parse_fasta` doesn't check whether the seq is AA or NA, if called on
|
73
|
-
an amino acid string, things will get weird as it will complement the
|
74
|
-
IUPAC characters in the AA string and leave others.
|
71
|
+
Speed up `FastqFile#each_record`.
|
75
72
|
|
76
|
-
|
73
|
+
### 1.8.1 ###
|
77
74
|
|
78
75
|
An error will be raised if a fasta file has a `>` in the
|
79
76
|
sequence. Sometimes files are not terminated with a newline
|
@@ -93,12 +90,14 @@ This will raise `ParseFasta::SequenceFormatError`.
|
|
93
90
|
|
94
91
|
Also, headers with lots of `>` within are fine now.
|
95
92
|
|
93
|
+
### 1.8 ###
|
96
94
|
|
97
|
-
|
98
|
-
|
99
|
-
|
95
|
+
Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
|
96
|
+
`parse_fasta` doesn't check whether the seq is AA or NA, if called on
|
97
|
+
an amino acid string, things will get weird as it will complement the
|
98
|
+
IUPAC characters in the AA string and leave others.
|
100
99
|
|
101
|
-
|
100
|
+
### 1.7.2 ###
|
102
101
|
|
103
102
|
Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
|
104
103
|
|
@@ -108,24 +107,28 @@ there are spaces that don't match in the quality and sequence in a
|
|
108
107
|
fastQ file, then things will get messed up in the FastQ file. FastQ
|
109
108
|
shouldn't have spaces though.
|
110
109
|
|
111
|
-
### 1.
|
110
|
+
### 1.7 ###
|
112
111
|
|
113
|
-
|
114
|
-
uses FastaFile and FastqFile internally. You can use this class if you
|
115
|
-
want your scripts to accept either fastA or fastQ files.
|
112
|
+
Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
|
116
113
|
|
117
|
-
|
118
|
-
|
114
|
+
### 1.6.2 ###
|
115
|
+
|
116
|
+
`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
|
117
|
+
that don't begin with a `>`.
|
119
118
|
|
120
|
-
|
119
|
+
### 1.6.1 ###
|
121
120
|
|
122
121
|
Better internal handling of empty sequences -- instead of raising
|
123
122
|
errors, pass empty sequences.
|
124
123
|
|
125
|
-
|
124
|
+
### 1.6 ###
|
126
125
|
|
127
|
-
`
|
128
|
-
|
126
|
+
Added `SeqFile` class, which accepts either fastA or fastQ files. It
|
127
|
+
uses FastaFile and FastqFile internally. You can use this class if you
|
128
|
+
want your scripts to accept either fastA or fastQ files.
|
129
|
+
|
130
|
+
If you need the description and quality string, you should use
|
131
|
+
FastqFile instead.
|
129
132
|
|
130
133
|
### 1.5 ###
|
131
134
|
|
@@ -204,17 +207,16 @@ Last version with File monkey patch.
|
|
204
207
|
|
205
208
|
## Benchmark ##
|
206
209
|
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
+
**NOTE**: These benchmarks are against an older version of
|
211
|
+
`parse_fasta`.
|
212
|
+
|
213
|
+
Some quick and dirty benchmarks against `BioRuby`.
|
210
214
|
|
211
215
|
### FastaFile#each_record ###
|
212
216
|
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
method from this gem and using the `FastaFormat` class from
|
217
|
-
BioRuby. You can see the test script in `benchmark.rb`.
|
217
|
+
Calculating sequence length length for each fasta record with both the
|
218
|
+
`each_record` method from this gem and using the `FastaFormat` class
|
219
|
+
from BioRuby. You can see the test script in `benchmark.rb`.
|
218
220
|
|
219
221
|
The test file contained 2,009,897 illumina reads and the file size
|
220
222
|
was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class:
|
@@ -255,20 +257,10 @@ test 2 was 4,000,000 and test 3 was 8,000,000 bases.
|
|
255
257
|
|
256
258
|
Nice!
|
257
259
|
|
258
|
-
Troll: "
|
259
|
-
sequence?"
|
260
|
+
Troll: "When will you find the GC of an 8,000,000 base sequence?"
|
260
261
|
|
261
262
|
Me: "Step off, troll!"
|
262
263
|
|
263
|
-
## Test suite & docs ##
|
264
|
-
|
265
|
-
For a good time, you could clone this repo and run the test suite with
|
266
|
-
rspec! Or if you just don't trust that it works like it should. The
|
267
|
-
specs probably need a little clean up...so fork it and clean it up ;)
|
268
|
-
|
269
|
-
Same with the docs. Clone the repo and build them yourself with `yard`
|
270
|
-
if you are in need of some excitement.
|
271
|
-
|
272
264
|
## Notes ##
|
273
265
|
|
274
266
|
Only the `SeqFile` class actually checks to make sure that you passed
|
@@ -80,11 +80,11 @@ class FastqFile < File
|
|
80
80
|
|
81
81
|
case count % 4
|
82
82
|
when 0
|
83
|
-
header = line
|
83
|
+
header = line[1..-1]
|
84
84
|
when 1
|
85
85
|
sequence = Sequence.new(line)
|
86
86
|
when 2
|
87
|
-
description = line
|
87
|
+
description = line[1..-1]
|
88
88
|
when 3
|
89
89
|
quality = Quality.new(line)
|
90
90
|
yield(header, sequence, description, quality)
|
data/lib/parse_fasta/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: parse_fasta
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.8.
|
4
|
+
version: 1.8.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Moore
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-
|
11
|
+
date: 2016-04-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|