bio-gff3-pltools 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +20 -188
- data/VERSION +1 -1
- data/lib/bio-gff3-pltools.rb +1 -0
- data/lib/bio-gff3-pltools/filtering.rb +24 -4
- data/lib/bio-gff3-pltools/validation.rb +17 -0
- metadata +6 -3
data/README.md
CHANGED
@@ -1,75 +1,52 @@
|
|
1
|
-
# gff3-pltools
|
1
|
+
# bioruby-gff3-pltools
|
2
2
|
|
3
|
-
[](http://travis-ci.org/mamarjan/gff3-pltools)
|
3
|
+
[](http://travis-ci.org/mamarjan/bioruby-gff3-pltools)
|
4
4
|
|
5
5
|
Note: this software is under active development!
|
6
6
|
|
7
|
-
This is currently an early work in progress to create
|
8
|
-
|
9
|
-
programmers use those tools from Ruby.
|
7
|
+
This is currently an early work in progress to create a wrapper
|
8
|
+
library for gff3-pltools.
|
10
9
|
|
11
10
|
## Installation
|
12
11
|
|
13
12
|
### Requirements
|
14
13
|
|
15
|
-
|
16
|
-
|
17
|
-
To build the tools from source, you'll need the DMDv2 compiler in
|
18
|
-
your path. You can check here if there is a build of DMD available
|
19
|
-
for your platform:
|
20
|
-
|
21
|
-
http://dlang.org/download.html
|
22
|
-
|
23
|
-
Also, the rake utility is necessary to run the automated build
|
24
|
-
scripts.
|
14
|
+
gff3-pltools have to be installed and in the PATH. Ruby is a requirement
|
15
|
+
too, and the rest can be installed using bundler.
|
25
16
|
|
26
17
|
### Build and install instructions
|
27
18
|
|
28
|
-
|
29
|
-
|
19
|
+
Given the gff3-tools are installed, the gem command can be used to
|
20
|
+
install the latest gem from rubygems.org:
|
30
21
|
|
31
22
|
```sh
|
32
|
-
gem install bio-gff3-pltools
|
23
|
+
gem install bio-gff3-pltools
|
33
24
|
```
|
34
25
|
|
35
|
-
|
36
|
-
it themselves given the DMD compiler is available for their platform.
|
37
|
-
|
38
|
-
To build and install a gem for your platform, use the following steps:
|
26
|
+
To build and install the library from source, use the following steps:
|
39
27
|
|
40
28
|
```sh
|
41
|
-
tar -zxvf
|
42
|
-
cd
|
29
|
+
tar -zxvf bioruby-gff3-pltools-X.Y.Z.tar.gz
|
30
|
+
cd bioruby-gff3-pltools-X.Y.Z
|
31
|
+
bundle install
|
43
32
|
rake install
|
44
33
|
```
|
45
34
|
|
46
35
|
To build a gem without installing, use the rake task "build" instead
|
47
|
-
of install in the previous example.
|
48
|
-
|
49
|
-
To build the binary tools without building a gem or a Ruby library,
|
50
|
-
invoke the "utilities" rake task instead and copy the binaries from
|
51
|
-
the "bin/" directory to your PATH.
|
36
|
+
of "install" in the previous example.
|
52
37
|
|
53
38
|
### Run tests
|
54
39
|
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
rake unittests
|
59
|
-
```
|
60
|
-
|
61
|
-
To run tests for the Ruby library, first build the D utilities and
|
62
|
-
then start the "features" rake task, like this:
|
40
|
+
To run cucumber tests, first make sure the D utilities
|
41
|
+
are available in the path and then start the "features" rake task,
|
42
|
+
like this:
|
63
43
|
|
64
44
|
```sh
|
65
|
-
rake utilities
|
66
45
|
rake features
|
67
46
|
```
|
68
47
|
|
69
48
|
## Usage
|
70
49
|
|
71
|
-
### Ruby library
|
72
|
-
|
73
50
|
To use the library in your code, after installing the gem, simply
|
74
51
|
require the library:
|
75
52
|
|
@@ -79,165 +56,20 @@ require the library:
|
|
79
56
|
|
80
57
|
The API docs are online:
|
81
58
|
|
82
|
-
http://mamarjan.github.com/gff3-pltools/docs/0.
|
59
|
+
http://mamarjan.github.com/bioruby-gff3-pltools/docs/0.2.0/ruby-api/
|
83
60
|
|
84
61
|
For more code examples see the test files in the source tree.
|
85
62
|
|
86
|
-
### gff3-ffetch utility
|
87
|
-
|
88
|
-
Currently this utility supports only filtering a file, based on a
|
89
|
-
filtering expression. For example, you can use the following command
|
90
|
-
to filter out records with a CDS feature from a GFF3 file:
|
91
|
-
|
92
|
-
```sh
|
93
|
-
gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
|
94
|
-
```
|
95
|
-
|
96
|
-
The utility will use the fast (and soon parallel) D library to do the
|
97
|
-
parsing and filtering. You can then parse the result using your
|
98
|
-
programming language and library of choice.
|
99
|
-
|
100
|
-
Currently supported predicates are "field", "attribute", "equals",
|
101
|
-
"contains", "starts_with" and "not". You can combine them in a way
|
102
|
-
that makes sense. First, the utility needs to know what field or
|
103
|
-
attribute should be used for filtering. In the previous example,
|
104
|
-
that's the "field:feature" part. Next, the utility needs to know
|
105
|
-
what you want to do with it. In the example, that's the "equals"
|
106
|
-
part. And then the last part in the example is a parameter to the
|
107
|
-
"equals", which tells the utility what the attribute or field
|
108
|
-
should be compared to.
|
109
|
-
|
110
|
-
Parts of the expression are separated by a colon, ':', and if colon
|
111
|
-
is suposed to be part of a field name or value, it can be escaped
|
112
|
-
like this: "\\:".
|
113
|
-
|
114
|
-
Valid field names are: seqname, source, feature, start, end, score,
|
115
|
-
strand and phase.
|
116
|
-
|
117
|
-
A few more examples...
|
118
|
-
|
119
|
-
```sh
|
120
|
-
gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
|
121
|
-
```
|
122
|
-
|
123
|
-
The previous example chooses records which have the ID attribute
|
124
|
-
with the value gene1.
|
125
|
-
|
126
|
-
To see which records have no ID value, or ID which is an empty
|
127
|
-
string, use the following command:
|
128
|
-
|
129
|
-
```sh
|
130
|
-
gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
|
131
|
-
```
|
132
|
-
|
133
|
-
And to get records which have the ID attribute defined, you can use
|
134
|
-
this command:
|
135
|
-
|
136
|
-
```sh
|
137
|
-
gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
|
138
|
-
```
|
139
|
-
|
140
|
-
or
|
141
|
-
|
142
|
-
```sh
|
143
|
-
gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
|
144
|
-
```
|
145
|
-
|
146
|
-
However, the last two commands are not completely the same. In cases
|
147
|
-
where an attribute has multiple values, the Parent attribute for
|
148
|
-
example, the "attribute" predicate first runs the contained predicate
|
149
|
-
on all attribute's values and returns true when an operation
|
150
|
-
returns true for a parent value. That is, it has an implicit "and"
|
151
|
-
operation built-in.
|
152
|
-
|
153
|
-
There are a few more options available. In the examples above, the
|
154
|
-
data was comming from a GFF3 file which was specified on the command
|
155
|
-
line and the output was the screen. To use the standard input as the
|
156
|
-
source of the data, use "-" instead of a filename.
|
157
|
-
|
158
|
-
The default for output is the screen, or stdout. To redirect the
|
159
|
-
output to a file, you can use the "--output" option. Here is an
|
160
|
-
example:
|
161
|
-
|
162
|
-
```sh
|
163
|
-
gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
|
164
|
-
```
|
165
|
-
|
166
|
-
To limit the number of records in the results, you can use the
|
167
|
-
"--at-most" option. For example:
|
168
|
-
|
169
|
-
```sh
|
170
|
-
gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
|
171
|
-
```
|
172
|
-
|
173
|
-
If there are more then a 1000 records in the results, after the
|
174
|
-
1000th record printed, a line is appended with the following content:
|
175
|
-
"# ..." and the utility terminates.
|
176
|
-
|
177
|
-
### GFF3 File validation
|
178
|
-
|
179
|
-
The validation utility can be used like this:
|
180
|
-
|
181
|
-
```sh
|
182
|
-
./gff3-validate path/to/file.gff3
|
183
|
-
```
|
184
|
-
|
185
|
-
It will output any errors it finds to standard output. However, the
|
186
|
-
validation utility is currently very basic, and checks only for a few
|
187
|
-
cases: the number of columns, characters that should have been
|
188
|
-
escaped, are the start and stop coordinates integers and if the end
|
189
|
-
is greater then start, whether score is a float, valid values for
|
190
|
-
strand and phase, and the format of attributes.
|
191
|
-
|
192
|
-
### Benchmarking utility
|
193
|
-
|
194
|
-
There is a D application for performance benchmarking.
|
195
|
-
You can run it like this:
|
196
|
-
|
197
|
-
```sh
|
198
|
-
./gff3-benchmark path/to/file.gff3
|
199
|
-
```
|
200
|
-
|
201
|
-
The most basic case for the banchmarking utility is to parse the
|
202
|
-
file into records. More functionality is available using command
|
203
|
-
line options:
|
204
|
-
|
205
|
-
```
|
206
|
-
-v turn on validation
|
207
|
-
-r turn on replacement of escaped characters
|
208
|
-
-f merge records into features
|
209
|
-
-c N feature cache size (how many features to keep in memory), default=1000
|
210
|
-
-l link feature into parent-child relationships
|
211
|
-
```
|
212
|
-
|
213
|
-
Before exiting the utility prints the number of records or features
|
214
|
-
it parsed.
|
215
|
-
|
216
|
-
### Counting features
|
217
|
-
|
218
|
-
The gff3-ffetch utility keeps only a small part of records in memory
|
219
|
-
while combining them into features. To check if the cache size is
|
220
|
-
correct, the "gff3-count-features" utility can be used to get the
|
221
|
-
correct number of features in a file. It gets all the IDs into
|
222
|
-
memory first, and then devises the correct number of features.
|
223
|
-
|
224
|
-
To get the correct number of features in a file, use the following
|
225
|
-
command:
|
226
|
-
|
227
|
-
```sh
|
228
|
-
./gff3-count-features path/to/file.gff3
|
229
|
-
```
|
230
|
-
|
231
63
|
## Project home page
|
232
64
|
|
233
65
|
Project home page can be found at the following location:
|
234
66
|
|
235
|
-
http://mamarjan.github.com/gff3-pltools/
|
67
|
+
http://mamarjan.github.com/bioruby-gff3-pltools/
|
236
68
|
|
237
69
|
For information on the source tree, issues and
|
238
70
|
how to contribute, see
|
239
71
|
|
240
|
-
http://github.com/mamarjan/gff3-pltools
|
72
|
+
http://github.com/mamarjan/bioruby-gff3-pltools
|
241
73
|
|
242
74
|
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
243
75
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
data/lib/bio-gff3-pltools.rb
CHANGED
@@ -2,7 +2,8 @@ module Bio
|
|
2
2
|
module PL
|
3
3
|
module GFF3
|
4
4
|
# Runs the gff3-ffetch utility with the specified parameters on
|
5
|
-
# an external file. Options include :output
|
5
|
+
# an external file. Options include :output, :at_most,
|
6
|
+
# :pass_fasta_through, :keep_comments, :keep_pragmas
|
6
7
|
def self.filter_file filename, filter_string, options = {}
|
7
8
|
if !File.exists?(filename)
|
8
9
|
raise Exception.new("No such file - #{filename}")
|
@@ -16,7 +17,16 @@ module Bio
|
|
16
17
|
if !options[:at_most].nil?
|
17
18
|
at_most_option = "--at-most #{options[:at_most]}"
|
18
19
|
end
|
19
|
-
|
20
|
+
if options[:pass_fasta_through]
|
21
|
+
fasta_option = "--pass-fasta-through"
|
22
|
+
end
|
23
|
+
if options[:keep_comments]
|
24
|
+
comments_option = "--keep-comments"
|
25
|
+
end
|
26
|
+
if options[:keep_pragmas]
|
27
|
+
pragmas_option = "--keep-pragmas"
|
28
|
+
end
|
29
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}")
|
20
30
|
if output_option.nil?
|
21
31
|
output = gff3_ffetch.read
|
22
32
|
end
|
@@ -25,7 +35,8 @@ module Bio
|
|
25
35
|
end
|
26
36
|
|
27
37
|
# Runs the gff3-ffetch utility with the specified parameters while
|
28
|
-
# passing data to its stdin. Options include :output and :at_most
|
38
|
+
# passing data to its stdin. Options include :output and :at_most,
|
39
|
+
# :pass_fasta_through, :keep_comments, :keep_pragmas
|
29
40
|
def self.filter_data data, filter_string, options = {}
|
30
41
|
output_option = nil
|
31
42
|
output = nil
|
@@ -35,7 +46,16 @@ module Bio
|
|
35
46
|
if !options[:at_most].nil?
|
36
47
|
at_most_option = "--at-most #{options[:at_most]}"
|
37
48
|
end
|
38
|
-
|
49
|
+
if options[:pass_fasta_through]
|
50
|
+
fasta_option = "--pass-fasta-through"
|
51
|
+
end
|
52
|
+
if options[:keep_comments]
|
53
|
+
comments_option = "--keep-comments"
|
54
|
+
end
|
55
|
+
if options[:keep_pragmas]
|
56
|
+
pragmas_option = "--keep-pragmas"
|
57
|
+
end
|
58
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}", "r+")
|
39
59
|
gff3_ffetch.write data
|
40
60
|
gff3_ffetch.close_write
|
41
61
|
if output_option.nil?
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module Bio
|
2
|
+
module PL
|
3
|
+
module GFF3
|
4
|
+
def self.validate_file filename
|
5
|
+
if !File.exists?(filename)
|
6
|
+
raise Exception.new("No such file - #{filename}")
|
7
|
+
end
|
8
|
+
|
9
|
+
gff3_validate = IO.popen(["gff3-validate", "#{filename}", :err=>[:child, :out]])
|
10
|
+
output = gff3_validate.read
|
11
|
+
gff3_validate.close
|
12
|
+
output
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gff3-pltools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-07-
|
12
|
+
date: 2012-07-14 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
@@ -150,6 +150,7 @@ files:
|
|
150
150
|
- VERSION
|
151
151
|
- lib/bio-gff3-pltools.rb
|
152
152
|
- lib/bio-gff3-pltools/filtering.rb
|
153
|
+
- lib/bio-gff3-pltools/validation.rb
|
153
154
|
- LICENSE.txt
|
154
155
|
- README.md
|
155
156
|
homepage: http://mamarjan.github.com/gff3-pltools/
|
@@ -165,6 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
165
166
|
- - ! '>='
|
166
167
|
- !ruby/object:Gem::Version
|
167
168
|
version: '0'
|
169
|
+
segments:
|
170
|
+
- 0
|
171
|
+
hash: -1031554001
|
168
172
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
169
173
|
none: false
|
170
174
|
requirements:
|
@@ -178,4 +182,3 @@ signing_key:
|
|
178
182
|
specification_version: 3
|
179
183
|
summary: Ruby wrapper for the gff3-pltools.
|
180
184
|
test_files: []
|
181
|
-
has_rdoc:
|