bio-gff3-pltools 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE.txt +20 -0
- data/README.md +258 -0
- data/VERSION +1 -0
- data/lib/bio-gff3-pltools.rb +12 -0
- data/lib/bio-gff3-pltools/filtering.rb +50 -0
- metadata +181 -0
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2012 Marjan Povolni
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,258 @@
|
|
1
|
+
# gff3-pltools
|
2
|
+
|
3
|
+
[![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
|
4
|
+
|
5
|
+
Note: this software is under active development!
|
6
|
+
|
7
|
+
This is currently an early work in progress to create parallel GFF3
|
8
|
+
and GTF parallel tools for D and a Ruby gem which would let Ruby
|
9
|
+
programmers use those tools from Ruby.
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
### Requirements
|
14
|
+
|
15
|
+
The binary builds are self-contained.
|
16
|
+
|
17
|
+
To build the tools from source, you'll need the DMDv2 compiler in
|
18
|
+
your path. You can check here if there is a build of DMD available
|
19
|
+
for your platform:
|
20
|
+
|
21
|
+
http://dlang.org/download.html
|
22
|
+
|
23
|
+
Also, the rake utility is necessary to run the automated build
|
24
|
+
scripts.
|
25
|
+
|
26
|
+
### Build and install instructions
|
27
|
+
|
28
|
+
Users of 32-bit and 64-bit Linux can download pre-build binary gems
|
29
|
+
and install them using the gem command:
|
30
|
+
|
31
|
+
```sh
|
32
|
+
gem install bio-gff3-pltools-linux32-X.Y.Z.gem
|
33
|
+
```
|
34
|
+
|
35
|
+
Users of other plaforms can download the source package, and build
|
36
|
+
it themselves given the DMD compiler is available for their platform.
|
37
|
+
|
38
|
+
To build and install a gem for your platform, use the following steps:
|
39
|
+
|
40
|
+
```sh
|
41
|
+
tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
|
42
|
+
cd bio-gff3-pltools-X.Y.Z
|
43
|
+
rake install
|
44
|
+
```
|
45
|
+
|
46
|
+
To build a gem without installing, use the rake task "build" instead
|
47
|
+
of install in the previous example.
|
48
|
+
|
49
|
+
To build the binary tools without building a gem or a Ruby library,
|
50
|
+
invoke the "utilities" rake task instead and copy the binaries from
|
51
|
+
the "bin/" directory to your PATH.
|
52
|
+
|
53
|
+
### Run tests
|
54
|
+
|
55
|
+
You can use the "unittests" rake task to run D unittests, like this:
|
56
|
+
|
57
|
+
```sh
|
58
|
+
rake unittests
|
59
|
+
```
|
60
|
+
|
61
|
+
To run tests for the Ruby library, first build the D utilities and
|
62
|
+
then start the "features" rake task, like this:
|
63
|
+
|
64
|
+
```sh
|
65
|
+
rake utilities
|
66
|
+
rake features
|
67
|
+
```
|
68
|
+
|
69
|
+
## Usage
|
70
|
+
|
71
|
+
### Ruby library
|
72
|
+
|
73
|
+
To use the library in your code, after installing the gem, simply
|
74
|
+
require the library:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
require 'bio-gff3-pltools'
|
78
|
+
```
|
79
|
+
|
80
|
+
The API docs are online:
|
81
|
+
|
82
|
+
http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
|
83
|
+
|
84
|
+
For more code examples see the test files in the source tree.
|
85
|
+
|
86
|
+
### gff3-ffetch utility
|
87
|
+
|
88
|
+
Currently this utility supports only filtering a file, based on a
|
89
|
+
filtering expression. For example, you can use the following command
|
90
|
+
to filter out records with a CDS feature from a GFF3 file:
|
91
|
+
|
92
|
+
```sh
|
93
|
+
gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
|
94
|
+
```
|
95
|
+
|
96
|
+
The utility will use the fast (and soon parallel) D library to do the
|
97
|
+
parsing and filtering. You can then parse the result using your
|
98
|
+
programming language and library of choice.
|
99
|
+
|
100
|
+
Currently supported predicates are "field", "attribute", "equals",
|
101
|
+
"contains", "starts_with" and "not". You can combine them in a way
|
102
|
+
that makes sense. First, the utility needs to know what field or
|
103
|
+
attribute should be used for filtering. In the previous example,
|
104
|
+
that's the "field:feature" part. Next, the utility needs to know
|
105
|
+
what you want to do with it. In the example, that's the "equals"
|
106
|
+
part. And then the last part in the example is a parameter to the
|
107
|
+
"equals", which tells the utility what the attribute or field
|
108
|
+
should be compared to.
|
109
|
+
|
110
|
+
Parts of the expression are separated by a colon, ':', and if colon
|
111
|
+
is suposed to be part of a field name or value, it can be escaped
|
112
|
+
like this: "\\:".
|
113
|
+
|
114
|
+
Valid field names are: seqname, source, feature, start, end, score,
|
115
|
+
strand and phase.
|
116
|
+
|
117
|
+
A few more examples...
|
118
|
+
|
119
|
+
```sh
|
120
|
+
gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
|
121
|
+
```
|
122
|
+
|
123
|
+
The previous example chooses records which have the ID attribute
|
124
|
+
with the value gene1.
|
125
|
+
|
126
|
+
To see which records have no ID value, or ID which is an empty
|
127
|
+
string, use the following command:
|
128
|
+
|
129
|
+
```sh
|
130
|
+
gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
|
131
|
+
```
|
132
|
+
|
133
|
+
And to get records which have the ID attribute defined, you can use
|
134
|
+
this command:
|
135
|
+
|
136
|
+
```sh
|
137
|
+
gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
|
138
|
+
```
|
139
|
+
|
140
|
+
or
|
141
|
+
|
142
|
+
```sh
|
143
|
+
gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
|
144
|
+
```
|
145
|
+
|
146
|
+
However, the last two commands are not completely the same. In cases
|
147
|
+
where an attribute has multiple values, the Parent attribute for
|
148
|
+
example, the "attribute" predicate first runs the contained predicate
|
149
|
+
on all attribute's values and returns true when an operation
|
150
|
+
returns true for a parent value. That is, it has an implicit "and"
|
151
|
+
operation built-in.
|
152
|
+
|
153
|
+
There are a few more options available. In the examples above, the
|
154
|
+
data was comming from a GFF3 file which was specified on the command
|
155
|
+
line and the output was the screen. To use the standard input as the
|
156
|
+
source of the data, use "-" instead of a filename.
|
157
|
+
|
158
|
+
The default for output is the screen, or stdout. To redirect the
|
159
|
+
output to a file, you can use the "--output" option. Here is an
|
160
|
+
example:
|
161
|
+
|
162
|
+
```sh
|
163
|
+
gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
|
164
|
+
```
|
165
|
+
|
166
|
+
To limit the number of records in the results, you can use the
|
167
|
+
"--at-most" option. For example:
|
168
|
+
|
169
|
+
```sh
|
170
|
+
gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
|
171
|
+
```
|
172
|
+
|
173
|
+
If there are more then a 1000 records in the results, after the
|
174
|
+
1000th record printed, a line is appended with the following content:
|
175
|
+
"# ..." and the utility terminates.
|
176
|
+
|
177
|
+
### GFF3 File validation
|
178
|
+
|
179
|
+
The validation utility can be used like this:
|
180
|
+
|
181
|
+
```sh
|
182
|
+
./gff3-validate path/to/file.gff3
|
183
|
+
```
|
184
|
+
|
185
|
+
It will output any errors it finds to standard output. However, the
|
186
|
+
validation utility is currently very basic, and checks only for a few
|
187
|
+
cases: the number of columns, characters that should have been
|
188
|
+
escaped, are the start and stop coordinates integers and if the end
|
189
|
+
is greater then start, whether score is a float, valid values for
|
190
|
+
strand and phase, and the format of attributes.
|
191
|
+
|
192
|
+
### Benchmarking utility
|
193
|
+
|
194
|
+
There is a D application for performance benchmarking.
|
195
|
+
You can run it like this:
|
196
|
+
|
197
|
+
```sh
|
198
|
+
./gff3-benchmark path/to/file.gff3
|
199
|
+
```
|
200
|
+
|
201
|
+
The most basic case for the banchmarking utility is to parse the
|
202
|
+
file into records. More functionality is available using command
|
203
|
+
line options:
|
204
|
+
|
205
|
+
```
|
206
|
+
-v turn on validation
|
207
|
+
-r turn on replacement of escaped characters
|
208
|
+
-f merge records into features
|
209
|
+
-c N feature cache size (how many features to keep in memory), default=1000
|
210
|
+
-l link feature into parent-child relationships
|
211
|
+
```
|
212
|
+
|
213
|
+
Before exiting the utility prints the number of records or features
|
214
|
+
it parsed.
|
215
|
+
|
216
|
+
### Counting features
|
217
|
+
|
218
|
+
The gff3-ffetch utility keeps only a small part of records in memory
|
219
|
+
while combining them into features. To check if the cache size is
|
220
|
+
correct, the "gff3-count-features" utility can be used to get the
|
221
|
+
correct number of features in a file. It gets all the IDs into
|
222
|
+
memory first, and then devises the correct number of features.
|
223
|
+
|
224
|
+
To get the correct number of features in a file, use the following
|
225
|
+
command:
|
226
|
+
|
227
|
+
```sh
|
228
|
+
./gff3-count-features path/to/file.gff3
|
229
|
+
```
|
230
|
+
|
231
|
+
## Project home page
|
232
|
+
|
233
|
+
Project home page can be found at the following location:
|
234
|
+
|
235
|
+
http://mamarjan.github.com/gff3-pltools/
|
236
|
+
|
237
|
+
For information on the source tree, issues and
|
238
|
+
how to contribute, see
|
239
|
+
|
240
|
+
http://github.com/mamarjan/gff3-pltools
|
241
|
+
|
242
|
+
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
243
|
+
|
244
|
+
## Cite
|
245
|
+
|
246
|
+
If you use this software, please cite one of
|
247
|
+
|
248
|
+
* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
|
249
|
+
* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
|
250
|
+
|
251
|
+
## Biogems.info
|
252
|
+
|
253
|
+
This Biogem is published at [#bio-gff3-pltools](http://biogems.info/index.html)
|
254
|
+
|
255
|
+
## Copyright
|
256
|
+
|
257
|
+
Copyright (c) 2012 Marjan Povolni. See LICENSE.txt for further details.
|
258
|
+
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.0
|
@@ -0,0 +1,12 @@
|
|
1
|
+
# Please require your code below, respecting the naming conventions in the
|
2
|
+
# bioruby directory tree.
|
3
|
+
#
|
4
|
+
# For example, say you have a plugin named bio-plugin, the only uncommented
|
5
|
+
# line in this file would be
|
6
|
+
#
|
7
|
+
# require 'bio/bio-plugin/plugin'
|
8
|
+
#
|
9
|
+
# In this file only require other files. Avoid other source code.
|
10
|
+
|
11
|
+
require 'bio-gff3-pltools/filtering.rb'
|
12
|
+
|
@@ -0,0 +1,50 @@
|
|
1
|
+
module Bio
|
2
|
+
module PL
|
3
|
+
module GFF3
|
4
|
+
# Runs the gff3-ffetch utility with the specified parameters on
|
5
|
+
# an external file. Options include :output and :at_most.
|
6
|
+
def self.filter_file filename, filter_string, options = {}
|
7
|
+
if !File.exists?(filename)
|
8
|
+
raise Exception.new("No such file - #{filename}")
|
9
|
+
end
|
10
|
+
|
11
|
+
output_option = nil
|
12
|
+
output = nil
|
13
|
+
if !options[:output].nil?
|
14
|
+
output_option = "--output #{options[:output]}"
|
15
|
+
end
|
16
|
+
if !options[:at_most].nil?
|
17
|
+
at_most_option = "--at-most #{options[:at_most]}"
|
18
|
+
end
|
19
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
|
20
|
+
if output_option.nil?
|
21
|
+
output = gff3_ffetch.read
|
22
|
+
end
|
23
|
+
gff3_ffetch.close
|
24
|
+
output
|
25
|
+
end
|
26
|
+
|
27
|
+
# Runs the gff3-ffetch utility with the specified parameters while
|
28
|
+
# passing data to its stdin. Options include :output and :at_most.
|
29
|
+
def self.filter_data data, filter_string, options = {}
|
30
|
+
output_option = nil
|
31
|
+
output = nil
|
32
|
+
if !options[:output].nil?
|
33
|
+
output_option = "--output #{options[:output]}"
|
34
|
+
end
|
35
|
+
if !options[:at_most].nil?
|
36
|
+
at_most_option = "--at-most #{options[:at_most]}"
|
37
|
+
end
|
38
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
|
39
|
+
gff3_ffetch.write data
|
40
|
+
gff3_ffetch.close_write
|
41
|
+
if output_option.nil?
|
42
|
+
output = gff3_ffetch.read
|
43
|
+
end
|
44
|
+
gff3_ffetch.close
|
45
|
+
output
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
metadata
ADDED
@@ -0,0 +1,181 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: bio-gff3-pltools
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Marjan Povolni
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2012-07-05 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: rspec
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ~>
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: 2.8.0
|
22
|
+
type: :development
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ~>
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: 2.8.0
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: yard
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ~>
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '0.7'
|
38
|
+
type: :development
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ~>
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '0.7'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: rdoc
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ~>
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '3.12'
|
54
|
+
type: :development
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ~>
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '3.12'
|
62
|
+
- !ruby/object:Gem::Dependency
|
63
|
+
name: cucumber
|
64
|
+
requirement: !ruby/object:Gem::Requirement
|
65
|
+
none: false
|
66
|
+
requirements:
|
67
|
+
- - ! '>='
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: '0'
|
70
|
+
type: :development
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
none: false
|
74
|
+
requirements:
|
75
|
+
- - ! '>='
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
- !ruby/object:Gem::Dependency
|
79
|
+
name: bundler
|
80
|
+
requirement: !ruby/object:Gem::Requirement
|
81
|
+
none: false
|
82
|
+
requirements:
|
83
|
+
- - ~>
|
84
|
+
- !ruby/object:Gem::Version
|
85
|
+
version: 1.1.3
|
86
|
+
type: :development
|
87
|
+
prerelease: false
|
88
|
+
version_requirements: !ruby/object:Gem::Requirement
|
89
|
+
none: false
|
90
|
+
requirements:
|
91
|
+
- - ~>
|
92
|
+
- !ruby/object:Gem::Version
|
93
|
+
version: 1.1.3
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: jeweler
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ~>
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: 1.8.3
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ~>
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: 1.8.3
|
110
|
+
- !ruby/object:Gem::Dependency
|
111
|
+
name: rdoc
|
112
|
+
requirement: !ruby/object:Gem::Requirement
|
113
|
+
none: false
|
114
|
+
requirements:
|
115
|
+
- - ~>
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '3.12'
|
118
|
+
type: :development
|
119
|
+
prerelease: false
|
120
|
+
version_requirements: !ruby/object:Gem::Requirement
|
121
|
+
none: false
|
122
|
+
requirements:
|
123
|
+
- - ~>
|
124
|
+
- !ruby/object:Gem::Version
|
125
|
+
version: '3.12'
|
126
|
+
- !ruby/object:Gem::Dependency
|
127
|
+
name: redcarpet
|
128
|
+
requirement: !ruby/object:Gem::Requirement
|
129
|
+
none: false
|
130
|
+
requirements:
|
131
|
+
- - ! '>='
|
132
|
+
- !ruby/object:Gem::Version
|
133
|
+
version: '0'
|
134
|
+
type: :development
|
135
|
+
prerelease: false
|
136
|
+
version_requirements: !ruby/object:Gem::Requirement
|
137
|
+
none: false
|
138
|
+
requirements:
|
139
|
+
- - ! '>='
|
140
|
+
- !ruby/object:Gem::Version
|
141
|
+
version: '0'
|
142
|
+
description: Ruby wrapper for the gff3-pltools.
|
143
|
+
email: marian.povolny@gmail.com
|
144
|
+
executables: []
|
145
|
+
extensions: []
|
146
|
+
extra_rdoc_files:
|
147
|
+
- LICENSE.txt
|
148
|
+
- README.md
|
149
|
+
files:
|
150
|
+
- VERSION
|
151
|
+
- lib/bio-gff3-pltools.rb
|
152
|
+
- lib/bio-gff3-pltools/filtering.rb
|
153
|
+
- LICENSE.txt
|
154
|
+
- README.md
|
155
|
+
homepage: http://mamarjan.github.com/gff3-pltools/
|
156
|
+
licenses:
|
157
|
+
- MIT
|
158
|
+
post_install_message:
|
159
|
+
rdoc_options: []
|
160
|
+
require_paths:
|
161
|
+
- lib
|
162
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
163
|
+
none: false
|
164
|
+
requirements:
|
165
|
+
- - ! '>='
|
166
|
+
- !ruby/object:Gem::Version
|
167
|
+
version: '0'
|
168
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
169
|
+
none: false
|
170
|
+
requirements:
|
171
|
+
- - ! '>='
|
172
|
+
- !ruby/object:Gem::Version
|
173
|
+
version: '0'
|
174
|
+
requirements: []
|
175
|
+
rubyforge_project:
|
176
|
+
rubygems_version: 1.8.24
|
177
|
+
signing_key:
|
178
|
+
specification_version: 3
|
179
|
+
summary: Ruby wrapper for the gff3-pltools.
|
180
|
+
test_files: []
|
181
|
+
has_rdoc:
|