bio-gff3-pltools 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +20 -188
- data/VERSION +1 -1
- data/lib/bio-gff3-pltools.rb +1 -0
- data/lib/bio-gff3-pltools/filtering.rb +24 -4
- data/lib/bio-gff3-pltools/validation.rb +17 -0
- metadata +6 -3
data/README.md
CHANGED
@@ -1,75 +1,52 @@
|
|
1
|
-
# gff3-pltools
|
1
|
+
# bioruby-gff3-pltools
|
2
2
|
|
3
|
-
[![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
|
3
|
+
[![Build Status](https://secure.travis-ci.org/mamarjan/bioruby-gff3-pltools.png)](http://travis-ci.org/mamarjan/bioruby-gff3-pltools)
|
4
4
|
|
5
5
|
Note: this software is under active development!
|
6
6
|
|
7
|
-
This is currently an early work in progress to create
|
8
|
-
|
9
|
-
programmers use those tools from Ruby.
|
7
|
+
This is currently an early work in progress to create a wrapper
|
8
|
+
library for gff3-pltools.
|
10
9
|
|
11
10
|
## Installation
|
12
11
|
|
13
12
|
### Requirements
|
14
13
|
|
15
|
-
|
16
|
-
|
17
|
-
To build the tools from source, you'll need the DMDv2 compiler in
|
18
|
-
your path. You can check here if there is a build of DMD available
|
19
|
-
for your platform:
|
20
|
-
|
21
|
-
http://dlang.org/download.html
|
22
|
-
|
23
|
-
Also, the rake utility is necessary to run the automated build
|
24
|
-
scripts.
|
14
|
+
gff3-pltools have to be installed and in the PATH. Ruby is a requirement
|
15
|
+
too, and the rest can be installed using bundler.
|
25
16
|
|
26
17
|
### Build and install instructions
|
27
18
|
|
28
|
-
|
29
|
-
|
19
|
+
Given the gff3-tools are installed, the gem command can be used to
|
20
|
+
install the latest gem from rubygems.org:
|
30
21
|
|
31
22
|
```sh
|
32
|
-
gem install bio-gff3-pltools
|
23
|
+
gem install bio-gff3-pltools
|
33
24
|
```
|
34
25
|
|
35
|
-
|
36
|
-
it themselves given the DMD compiler is available for their platform.
|
37
|
-
|
38
|
-
To build and install a gem for your platform, use the following steps:
|
26
|
+
To build and install the library from source, use the following steps:
|
39
27
|
|
40
28
|
```sh
|
41
|
-
tar -zxvf
|
42
|
-
cd
|
29
|
+
tar -zxvf bioruby-gff3-pltools-X.Y.Z.tar.gz
|
30
|
+
cd bioruby-gff3-pltools-X.Y.Z
|
31
|
+
bundle install
|
43
32
|
rake install
|
44
33
|
```
|
45
34
|
|
46
35
|
To build a gem without installing, use the rake task "build" instead
|
47
|
-
of install in the previous example.
|
48
|
-
|
49
|
-
To build the binary tools without building a gem or a Ruby library,
|
50
|
-
invoke the "utilities" rake task instead and copy the binaries from
|
51
|
-
the "bin/" directory to your PATH.
|
36
|
+
of "install" in the previous example.
|
52
37
|
|
53
38
|
### Run tests
|
54
39
|
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
rake unittests
|
59
|
-
```
|
60
|
-
|
61
|
-
To run tests for the Ruby library, first build the D utilities and
|
62
|
-
then start the "features" rake task, like this:
|
40
|
+
To run cucumber tests, first make sure the D utilities
|
41
|
+
are available in the path and then start the "features" rake task,
|
42
|
+
like this:
|
63
43
|
|
64
44
|
```sh
|
65
|
-
rake utilities
|
66
45
|
rake features
|
67
46
|
```
|
68
47
|
|
69
48
|
## Usage
|
70
49
|
|
71
|
-
### Ruby library
|
72
|
-
|
73
50
|
To use the library in your code, after installing the gem, simply
|
74
51
|
require the library:
|
75
52
|
|
@@ -79,165 +56,20 @@ require the library:
|
|
79
56
|
|
80
57
|
The API docs are online:
|
81
58
|
|
82
|
-
http://mamarjan.github.com/gff3-pltools/docs/0.
|
59
|
+
http://mamarjan.github.com/bioruby-gff3-pltools/docs/0.2.0/ruby-api/
|
83
60
|
|
84
61
|
For more code examples see the test files in the source tree.
|
85
62
|
|
86
|
-
### gff3-ffetch utility
|
87
|
-
|
88
|
-
Currently this utility supports only filtering a file, based on a
|
89
|
-
filtering expression. For example, you can use the following command
|
90
|
-
to filter out records with a CDS feature from a GFF3 file:
|
91
|
-
|
92
|
-
```sh
|
93
|
-
gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
|
94
|
-
```
|
95
|
-
|
96
|
-
The utility will use the fast (and soon parallel) D library to do the
|
97
|
-
parsing and filtering. You can then parse the result using your
|
98
|
-
programming language and library of choice.
|
99
|
-
|
100
|
-
Currently supported predicates are "field", "attribute", "equals",
|
101
|
-
"contains", "starts_with" and "not". You can combine them in a way
|
102
|
-
that makes sense. First, the utility needs to know what field or
|
103
|
-
attribute should be used for filtering. In the previous example,
|
104
|
-
that's the "field:feature" part. Next, the utility needs to know
|
105
|
-
what you want to do with it. In the example, that's the "equals"
|
106
|
-
part. And then the last part in the example is a parameter to the
|
107
|
-
"equals", which tells the utility what the attribute or field
|
108
|
-
should be compared to.
|
109
|
-
|
110
|
-
Parts of the expression are separated by a colon, ':', and if colon
|
111
|
-
is suposed to be part of a field name or value, it can be escaped
|
112
|
-
like this: "\\:".
|
113
|
-
|
114
|
-
Valid field names are: seqname, source, feature, start, end, score,
|
115
|
-
strand and phase.
|
116
|
-
|
117
|
-
A few more examples...
|
118
|
-
|
119
|
-
```sh
|
120
|
-
gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
|
121
|
-
```
|
122
|
-
|
123
|
-
The previous example chooses records which have the ID attribute
|
124
|
-
with the value gene1.
|
125
|
-
|
126
|
-
To see which records have no ID value, or ID which is an empty
|
127
|
-
string, use the following command:
|
128
|
-
|
129
|
-
```sh
|
130
|
-
gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
|
131
|
-
```
|
132
|
-
|
133
|
-
And to get records which have the ID attribute defined, you can use
|
134
|
-
this command:
|
135
|
-
|
136
|
-
```sh
|
137
|
-
gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
|
138
|
-
```
|
139
|
-
|
140
|
-
or
|
141
|
-
|
142
|
-
```sh
|
143
|
-
gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
|
144
|
-
```
|
145
|
-
|
146
|
-
However, the last two commands are not completely the same. In cases
|
147
|
-
where an attribute has multiple values, the Parent attribute for
|
148
|
-
example, the "attribute" predicate first runs the contained predicate
|
149
|
-
on all attribute's values and returns true when an operation
|
150
|
-
returns true for a parent value. That is, it has an implicit "and"
|
151
|
-
operation built-in.
|
152
|
-
|
153
|
-
There are a few more options available. In the examples above, the
|
154
|
-
data was comming from a GFF3 file which was specified on the command
|
155
|
-
line and the output was the screen. To use the standard input as the
|
156
|
-
source of the data, use "-" instead of a filename.
|
157
|
-
|
158
|
-
The default for output is the screen, or stdout. To redirect the
|
159
|
-
output to a file, you can use the "--output" option. Here is an
|
160
|
-
example:
|
161
|
-
|
162
|
-
```sh
|
163
|
-
gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
|
164
|
-
```
|
165
|
-
|
166
|
-
To limit the number of records in the results, you can use the
|
167
|
-
"--at-most" option. For example:
|
168
|
-
|
169
|
-
```sh
|
170
|
-
gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
|
171
|
-
```
|
172
|
-
|
173
|
-
If there are more then a 1000 records in the results, after the
|
174
|
-
1000th record printed, a line is appended with the following content:
|
175
|
-
"# ..." and the utility terminates.
|
176
|
-
|
177
|
-
### GFF3 File validation
|
178
|
-
|
179
|
-
The validation utility can be used like this:
|
180
|
-
|
181
|
-
```sh
|
182
|
-
./gff3-validate path/to/file.gff3
|
183
|
-
```
|
184
|
-
|
185
|
-
It will output any errors it finds to standard output. However, the
|
186
|
-
validation utility is currently very basic, and checks only for a few
|
187
|
-
cases: the number of columns, characters that should have been
|
188
|
-
escaped, are the start and stop coordinates integers and if the end
|
189
|
-
is greater then start, whether score is a float, valid values for
|
190
|
-
strand and phase, and the format of attributes.
|
191
|
-
|
192
|
-
### Benchmarking utility
|
193
|
-
|
194
|
-
There is a D application for performance benchmarking.
|
195
|
-
You can run it like this:
|
196
|
-
|
197
|
-
```sh
|
198
|
-
./gff3-benchmark path/to/file.gff3
|
199
|
-
```
|
200
|
-
|
201
|
-
The most basic case for the banchmarking utility is to parse the
|
202
|
-
file into records. More functionality is available using command
|
203
|
-
line options:
|
204
|
-
|
205
|
-
```
|
206
|
-
-v turn on validation
|
207
|
-
-r turn on replacement of escaped characters
|
208
|
-
-f merge records into features
|
209
|
-
-c N feature cache size (how many features to keep in memory), default=1000
|
210
|
-
-l link feature into parent-child relationships
|
211
|
-
```
|
212
|
-
|
213
|
-
Before exiting the utility prints the number of records or features
|
214
|
-
it parsed.
|
215
|
-
|
216
|
-
### Counting features
|
217
|
-
|
218
|
-
The gff3-ffetch utility keeps only a small part of records in memory
|
219
|
-
while combining them into features. To check if the cache size is
|
220
|
-
correct, the "gff3-count-features" utility can be used to get the
|
221
|
-
correct number of features in a file. It gets all the IDs into
|
222
|
-
memory first, and then devises the correct number of features.
|
223
|
-
|
224
|
-
To get the correct number of features in a file, use the following
|
225
|
-
command:
|
226
|
-
|
227
|
-
```sh
|
228
|
-
./gff3-count-features path/to/file.gff3
|
229
|
-
```
|
230
|
-
|
231
63
|
## Project home page
|
232
64
|
|
233
65
|
Project home page can be found at the following location:
|
234
66
|
|
235
|
-
http://mamarjan.github.com/gff3-pltools/
|
67
|
+
http://mamarjan.github.com/bioruby-gff3-pltools/
|
236
68
|
|
237
69
|
For information on the source tree, issues and
|
238
70
|
how to contribute, see
|
239
71
|
|
240
|
-
http://github.com/mamarjan/gff3-pltools
|
72
|
+
http://github.com/mamarjan/bioruby-gff3-pltools
|
241
73
|
|
242
74
|
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
243
75
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
data/lib/bio-gff3-pltools.rb
CHANGED
@@ -2,7 +2,8 @@ module Bio
|
|
2
2
|
module PL
|
3
3
|
module GFF3
|
4
4
|
# Runs the gff3-ffetch utility with the specified parameters on
|
5
|
-
# an external file. Options include :output
|
5
|
+
# an external file. Options include :output, :at_most,
|
6
|
+
# :pass_fasta_through, :keep_comments, :keep_pragmas
|
6
7
|
def self.filter_file filename, filter_string, options = {}
|
7
8
|
if !File.exists?(filename)
|
8
9
|
raise Exception.new("No such file - #{filename}")
|
@@ -16,7 +17,16 @@ module Bio
|
|
16
17
|
if !options[:at_most].nil?
|
17
18
|
at_most_option = "--at-most #{options[:at_most]}"
|
18
19
|
end
|
19
|
-
|
20
|
+
if options[:pass_fasta_through]
|
21
|
+
fasta_option = "--pass-fasta-through"
|
22
|
+
end
|
23
|
+
if options[:keep_comments]
|
24
|
+
comments_option = "--keep-comments"
|
25
|
+
end
|
26
|
+
if options[:keep_pragmas]
|
27
|
+
pragmas_option = "--keep-pragmas"
|
28
|
+
end
|
29
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}")
|
20
30
|
if output_option.nil?
|
21
31
|
output = gff3_ffetch.read
|
22
32
|
end
|
@@ -25,7 +35,8 @@ module Bio
|
|
25
35
|
end
|
26
36
|
|
27
37
|
# Runs the gff3-ffetch utility with the specified parameters while
|
28
|
-
# passing data to its stdin. Options include :output and :at_most
|
38
|
+
# passing data to its stdin. Options include :output and :at_most,
|
39
|
+
# :pass_fasta_through, :keep_comments, :keep_pragmas
|
29
40
|
def self.filter_data data, filter_string, options = {}
|
30
41
|
output_option = nil
|
31
42
|
output = nil
|
@@ -35,7 +46,16 @@ module Bio
|
|
35
46
|
if !options[:at_most].nil?
|
36
47
|
at_most_option = "--at-most #{options[:at_most]}"
|
37
48
|
end
|
38
|
-
|
49
|
+
if options[:pass_fasta_through]
|
50
|
+
fasta_option = "--pass-fasta-through"
|
51
|
+
end
|
52
|
+
if options[:keep_comments]
|
53
|
+
comments_option = "--keep-comments"
|
54
|
+
end
|
55
|
+
if options[:keep_pragmas]
|
56
|
+
pragmas_option = "--keep-pragmas"
|
57
|
+
end
|
58
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}", "r+")
|
39
59
|
gff3_ffetch.write data
|
40
60
|
gff3_ffetch.close_write
|
41
61
|
if output_option.nil?
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module Bio
|
2
|
+
module PL
|
3
|
+
module GFF3
|
4
|
+
def self.validate_file filename
|
5
|
+
if !File.exists?(filename)
|
6
|
+
raise Exception.new("No such file - #{filename}")
|
7
|
+
end
|
8
|
+
|
9
|
+
gff3_validate = IO.popen(["gff3-validate", "#{filename}", :err=>[:child, :out]])
|
10
|
+
output = gff3_validate.read
|
11
|
+
gff3_validate.close
|
12
|
+
output
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gff3-pltools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-07-
|
12
|
+
date: 2012-07-14 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
@@ -150,6 +150,7 @@ files:
|
|
150
150
|
- VERSION
|
151
151
|
- lib/bio-gff3-pltools.rb
|
152
152
|
- lib/bio-gff3-pltools/filtering.rb
|
153
|
+
- lib/bio-gff3-pltools/validation.rb
|
153
154
|
- LICENSE.txt
|
154
155
|
- README.md
|
155
156
|
homepage: http://mamarjan.github.com/gff3-pltools/
|
@@ -165,6 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
165
166
|
- - ! '>='
|
166
167
|
- !ruby/object:Gem::Version
|
167
168
|
version: '0'
|
169
|
+
segments:
|
170
|
+
- 0
|
171
|
+
hash: -1031554001
|
168
172
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
169
173
|
none: false
|
170
174
|
requirements:
|
@@ -178,4 +182,3 @@ signing_key:
|
|
178
182
|
specification_version: 3
|
179
183
|
summary: Ruby wrapper for the gff3-pltools.
|
180
184
|
test_files: []
|
181
|
-
has_rdoc:
|