parse_fasta 1.8.1 → 1.8.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +8 -8
- data/README.md +31 -39
- data/lib/parse_fasta/fastq_file.rb +2 -2
- data/lib/parse_fasta/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
MDE1YThkNzUyNzI0MTMwZDMyYzBiNzFiODMzZGQzNzQ5ODU3ZTk1MA==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
NjBmZGUxZTdkM2UyZTQ4YWY1MDliMTI0OTJlYjA5ZDFmMzg4OWRlZQ==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
MjIxNGNlODdkNTk3ZWE1ZDk1Zjg4ZDY0ZWE3NzE0ZWI0ODQ4MDZjZTk1MDY1
|
10
|
+
OGNiOGViOWYwZDU5ODY0YjNmZWY1ODYwOGVlN2E5MTVmYzZlZmIwNzE4MjNi
|
11
|
+
YWFjMjgzNGQ4YmMzODdjYjZjNTBmYTM4MWFiYTcyYjlmZWFhYWM=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
YTNiMTYzNmJhODkzMjEyMjBlOTgxOGIyMjFmMTFlOTE0NTEyOWZjNTgxMTRj
|
14
|
+
N2Y4YjE3NWUxMjYyNTRjNTYzZGE3MjBhNjJjZTNmNjRkYzY5ZGI2MGY0MjQz
|
15
|
+
N2RhMDUxN2E1MjY0NDZkOWQyMjEzYTU2ZDE4M2FlZDg3YzA0N2M=
|
data/README.md
CHANGED
@@ -66,14 +66,11 @@ Read fasta file into a hash.
|
|
66
66
|
|
67
67
|
## Versions ##
|
68
68
|
|
69
|
-
### 1.8 ###
|
69
|
+
### 1.8.2 ###
|
70
70
|
|
71
|
-
|
72
|
-
`parse_fasta` doesn't check whether the seq is AA or NA, if called on
|
73
|
-
an amino acid string, things will get weird as it will complement the
|
74
|
-
IUPAC characters in the AA string and leave others.
|
71
|
+
Speed up `FastqFile#each_record`.
|
75
72
|
|
76
|
-
|
73
|
+
### 1.8.1 ###
|
77
74
|
|
78
75
|
An error will be raised if a fasta file has a `>` in the
|
79
76
|
sequence. Sometimes files are not terminated with a newline
|
@@ -93,12 +90,14 @@ This will raise `ParseFasta::SequenceFormatError`.
|
|
93
90
|
|
94
91
|
Also, headers with lots of `>` within are fine now.
|
95
92
|
|
93
|
+
### 1.8 ###
|
96
94
|
|
97
|
-
|
98
|
-
|
99
|
-
|
95
|
+
Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
|
96
|
+
`parse_fasta` doesn't check whether the seq is AA or NA, if called on
|
97
|
+
an amino acid string, things will get weird as it will complement the
|
98
|
+
IUPAC characters in the AA string and leave others.
|
100
99
|
|
101
|
-
|
100
|
+
### 1.7.2 ###
|
102
101
|
|
103
102
|
Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
|
104
103
|
|
@@ -108,24 +107,28 @@ there are spaces that don't match in the quality and sequence in a
|
|
108
107
|
fastQ file, then things will get messed up in the FastQ file. FastQ
|
109
108
|
shouldn't have spaces though.
|
110
109
|
|
111
|
-
### 1.
|
110
|
+
### 1.7 ###
|
112
111
|
|
113
|
-
|
114
|
-
uses FastaFile and FastqFile internally. You can use this class if you
|
115
|
-
want your scripts to accept either fastA or fastQ files.
|
112
|
+
Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
|
116
113
|
|
117
|
-
|
118
|
-
|
114
|
+
### 1.6.2 ###
|
115
|
+
|
116
|
+
`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
|
117
|
+
that don't begin with a `>`.
|
119
118
|
|
120
|
-
|
119
|
+
### 1.6.1 ###
|
121
120
|
|
122
121
|
Better internal handling of empty sequences -- instead of raising
|
123
122
|
errors, pass empty sequences.
|
124
123
|
|
125
|
-
|
124
|
+
### 1.6 ###
|
126
125
|
|
127
|
-
`
|
128
|
-
|
126
|
+
Added `SeqFile` class, which accepts either fastA or fastQ files. It
|
127
|
+
uses FastaFile and FastqFile internally. You can use this class if you
|
128
|
+
want your scripts to accept either fastA or fastQ files.
|
129
|
+
|
130
|
+
If you need the description and quality string, you should use
|
131
|
+
FastqFile instead.
|
129
132
|
|
130
133
|
### 1.5 ###
|
131
134
|
|
@@ -204,17 +207,16 @@ Last version with File monkey patch.
|
|
204
207
|
|
205
208
|
## Benchmark ##
|
206
209
|
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
+
**NOTE**: These benchmarks are against an older version of
|
211
|
+
`parse_fasta`.
|
212
|
+
|
213
|
+
Some quick and dirty benchmarks against `BioRuby`.
|
210
214
|
|
211
215
|
### FastaFile#each_record ###
|
212
216
|
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
method from this gem and using the `FastaFormat` class from
|
217
|
-
BioRuby. You can see the test script in `benchmark.rb`.
|
217
|
+
Calculating sequence length length for each fasta record with both the
|
218
|
+
`each_record` method from this gem and using the `FastaFormat` class
|
219
|
+
from BioRuby. You can see the test script in `benchmark.rb`.
|
218
220
|
|
219
221
|
The test file contained 2,009,897 illumina reads and the file size
|
220
222
|
was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class:
|
@@ -255,20 +257,10 @@ test 2 was 4,000,000 and test 3 was 8,000,000 bases.
|
|
255
257
|
|
256
258
|
Nice!
|
257
259
|
|
258
|
-
Troll: "
|
259
|
-
sequence?"
|
260
|
+
Troll: "When will you find the GC of an 8,000,000 base sequence?"
|
260
261
|
|
261
262
|
Me: "Step off, troll!"
|
262
263
|
|
263
|
-
## Test suite & docs ##
|
264
|
-
|
265
|
-
For a good time, you could clone this repo and run the test suite with
|
266
|
-
rspec! Or if you just don't trust that it works like it should. The
|
267
|
-
specs probably need a little clean up...so fork it and clean it up ;)
|
268
|
-
|
269
|
-
Same with the docs. Clone the repo and build them yourself with `yard`
|
270
|
-
if you are in need of some excitement.
|
271
|
-
|
272
264
|
## Notes ##
|
273
265
|
|
274
266
|
Only the `SeqFile` class actually checks to make sure that you passed
|
@@ -80,11 +80,11 @@ class FastqFile < File
|
|
80
80
|
|
81
81
|
case count % 4
|
82
82
|
when 0
|
83
|
-
header = line
|
83
|
+
header = line[1..-1]
|
84
84
|
when 1
|
85
85
|
sequence = Sequence.new(line)
|
86
86
|
when 2
|
87
|
-
description = line
|
87
|
+
description = line[1..-1]
|
88
88
|
when 3
|
89
89
|
quality = Quality.new(line)
|
90
90
|
yield(header, sequence, description, quality)
|
data/lib/parse_fasta/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: parse_fasta
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.8.
|
4
|
+
version: 1.8.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Moore
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-
|
11
|
+
date: 2016-04-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|