bio-gff3-pltools 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE.txt +20 -0
- data/README.md +258 -0
- data/VERSION +1 -0
- data/lib/bio-gff3-pltools.rb +12 -0
- data/lib/bio-gff3-pltools/filtering.rb +50 -0
- metadata +181 -0
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2012 Marjan Povolni
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,258 @@
|
|
1
|
+
# gff3-pltools
|
2
|
+
|
3
|
+
[](http://travis-ci.org/mamarjan/gff3-pltools)
|
4
|
+
|
5
|
+
Note: this software is under active development!
|
6
|
+
|
7
|
+
This is currently an early work in progress to create parallel GFF3
|
8
|
+
and GTF parallel tools for D and a Ruby gem which would let Ruby
|
9
|
+
programmers use those tools from Ruby.
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
### Requirements
|
14
|
+
|
15
|
+
The binary builds are self-contained.
|
16
|
+
|
17
|
+
To build the tools from source, you'll need the DMDv2 compiler in
|
18
|
+
your path. You can check here if there is a build of DMD available
|
19
|
+
for your platform:
|
20
|
+
|
21
|
+
http://dlang.org/download.html
|
22
|
+
|
23
|
+
Also, the rake utility is necessary to run the automated build
|
24
|
+
scripts.
|
25
|
+
|
26
|
+
### Build and install instructions
|
27
|
+
|
28
|
+
Users of 32-bit and 64-bit Linux can download pre-build binary gems
|
29
|
+
and install them using the gem command:
|
30
|
+
|
31
|
+
```sh
|
32
|
+
gem install bio-gff3-pltools-linux32-X.Y.Z.gem
|
33
|
+
```
|
34
|
+
|
35
|
+
Users of other plaforms can download the source package, and build
|
36
|
+
it themselves given the DMD compiler is available for their platform.
|
37
|
+
|
38
|
+
To build and install a gem for your platform, use the following steps:
|
39
|
+
|
40
|
+
```sh
|
41
|
+
tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
|
42
|
+
cd bio-gff3-pltools-X.Y.Z
|
43
|
+
rake install
|
44
|
+
```
|
45
|
+
|
46
|
+
To build a gem without installing, use the rake task "build" instead
|
47
|
+
of install in the previous example.
|
48
|
+
|
49
|
+
To build the binary tools without building a gem or a Ruby library,
|
50
|
+
invoke the "utilities" rake task instead and copy the binaries from
|
51
|
+
the "bin/" directory to your PATH.
|
52
|
+
|
53
|
+
### Run tests
|
54
|
+
|
55
|
+
You can use the "unittests" rake task to run D unittests, like this:
|
56
|
+
|
57
|
+
```sh
|
58
|
+
rake unittests
|
59
|
+
```
|
60
|
+
|
61
|
+
To run tests for the Ruby library, first build the D utilities and
|
62
|
+
then start the "features" rake task, like this:
|
63
|
+
|
64
|
+
```sh
|
65
|
+
rake utilities
|
66
|
+
rake features
|
67
|
+
```
|
68
|
+
|
69
|
+
## Usage
|
70
|
+
|
71
|
+
### Ruby library
|
72
|
+
|
73
|
+
To use the library in your code, after installing the gem, simply
|
74
|
+
require the library:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
require 'bio-gff3-pltools'
|
78
|
+
```
|
79
|
+
|
80
|
+
The API docs are online:
|
81
|
+
|
82
|
+
http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
|
83
|
+
|
84
|
+
For more code examples see the test files in the source tree.
|
85
|
+
|
86
|
+
### gff3-ffetch utility
|
87
|
+
|
88
|
+
Currently this utility supports only filtering a file, based on a
|
89
|
+
filtering expression. For example, you can use the following command
|
90
|
+
to filter out records with a CDS feature from a GFF3 file:
|
91
|
+
|
92
|
+
```sh
|
93
|
+
gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
|
94
|
+
```
|
95
|
+
|
96
|
+
The utility will use the fast (and soon parallel) D library to do the
|
97
|
+
parsing and filtering. You can then parse the result using your
|
98
|
+
programming language and library of choice.
|
99
|
+
|
100
|
+
Currently supported predicates are "field", "attribute", "equals",
|
101
|
+
"contains", "starts_with" and "not". You can combine them in a way
|
102
|
+
that makes sense. First, the utility needs to know what field or
|
103
|
+
attribute should be used for filtering. In the previous example,
|
104
|
+
that's the "field:feature" part. Next, the utility needs to know
|
105
|
+
what you want to do with it. In the example, that's the "equals"
|
106
|
+
part. And then the last part in the example is a parameter to the
|
107
|
+
"equals", which tells the utility what the attribute or field
|
108
|
+
should be compared to.
|
109
|
+
|
110
|
+
Parts of the expression are separated by a colon, ':', and if colon
|
111
|
+
is suposed to be part of a field name or value, it can be escaped
|
112
|
+
like this: "\\:".
|
113
|
+
|
114
|
+
Valid field names are: seqname, source, feature, start, end, score,
|
115
|
+
strand and phase.
|
116
|
+
|
117
|
+
A few more examples...
|
118
|
+
|
119
|
+
```sh
|
120
|
+
gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
|
121
|
+
```
|
122
|
+
|
123
|
+
The previous example chooses records which have the ID attribute
|
124
|
+
with the value gene1.
|
125
|
+
|
126
|
+
To see which records have no ID value, or ID which is an empty
|
127
|
+
string, use the following command:
|
128
|
+
|
129
|
+
```sh
|
130
|
+
gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
|
131
|
+
```
|
132
|
+
|
133
|
+
And to get records which have the ID attribute defined, you can use
|
134
|
+
this command:
|
135
|
+
|
136
|
+
```sh
|
137
|
+
gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
|
138
|
+
```
|
139
|
+
|
140
|
+
or
|
141
|
+
|
142
|
+
```sh
|
143
|
+
gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
|
144
|
+
```
|
145
|
+
|
146
|
+
However, the last two commands are not completely the same. In cases
|
147
|
+
where an attribute has multiple values, the Parent attribute for
|
148
|
+
example, the "attribute" predicate first runs the contained predicate
|
149
|
+
on all attribute's values and returns true when an operation
|
150
|
+
returns true for a parent value. That is, it has an implicit "and"
|
151
|
+
operation built-in.
|
152
|
+
|
153
|
+
There are a few more options available. In the examples above, the
|
154
|
+
data was comming from a GFF3 file which was specified on the command
|
155
|
+
line and the output was the screen. To use the standard input as the
|
156
|
+
source of the data, use "-" instead of a filename.
|
157
|
+
|
158
|
+
The default for output is the screen, or stdout. To redirect the
|
159
|
+
output to a file, you can use the "--output" option. Here is an
|
160
|
+
example:
|
161
|
+
|
162
|
+
```sh
|
163
|
+
gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
|
164
|
+
```
|
165
|
+
|
166
|
+
To limit the number of records in the results, you can use the
|
167
|
+
"--at-most" option. For example:
|
168
|
+
|
169
|
+
```sh
|
170
|
+
gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
|
171
|
+
```
|
172
|
+
|
173
|
+
If there are more then a 1000 records in the results, after the
|
174
|
+
1000th record printed, a line is appended with the following content:
|
175
|
+
"# ..." and the utility terminates.
|
176
|
+
|
177
|
+
### GFF3 File validation
|
178
|
+
|
179
|
+
The validation utility can be used like this:
|
180
|
+
|
181
|
+
```sh
|
182
|
+
./gff3-validate path/to/file.gff3
|
183
|
+
```
|
184
|
+
|
185
|
+
It will output any errors it finds to standard output. However, the
|
186
|
+
validation utility is currently very basic, and checks only for a few
|
187
|
+
cases: the number of columns, characters that should have been
|
188
|
+
escaped, are the start and stop coordinates integers and if the end
|
189
|
+
is greater then start, whether score is a float, valid values for
|
190
|
+
strand and phase, and the format of attributes.
|
191
|
+
|
192
|
+
### Benchmarking utility
|
193
|
+
|
194
|
+
There is a D application for performance benchmarking.
|
195
|
+
You can run it like this:
|
196
|
+
|
197
|
+
```sh
|
198
|
+
./gff3-benchmark path/to/file.gff3
|
199
|
+
```
|
200
|
+
|
201
|
+
The most basic case for the banchmarking utility is to parse the
|
202
|
+
file into records. More functionality is available using command
|
203
|
+
line options:
|
204
|
+
|
205
|
+
```
|
206
|
+
-v turn on validation
|
207
|
+
-r turn on replacement of escaped characters
|
208
|
+
-f merge records into features
|
209
|
+
-c N feature cache size (how many features to keep in memory), default=1000
|
210
|
+
-l link feature into parent-child relationships
|
211
|
+
```
|
212
|
+
|
213
|
+
Before exiting the utility prints the number of records or features
|
214
|
+
it parsed.
|
215
|
+
|
216
|
+
### Counting features
|
217
|
+
|
218
|
+
The gff3-ffetch utility keeps only a small part of records in memory
|
219
|
+
while combining them into features. To check if the cache size is
|
220
|
+
correct, the "gff3-count-features" utility can be used to get the
|
221
|
+
correct number of features in a file. It gets all the IDs into
|
222
|
+
memory first, and then devises the correct number of features.
|
223
|
+
|
224
|
+
To get the correct number of features in a file, use the following
|
225
|
+
command:
|
226
|
+
|
227
|
+
```sh
|
228
|
+
./gff3-count-features path/to/file.gff3
|
229
|
+
```
|
230
|
+
|
231
|
+
## Project home page
|
232
|
+
|
233
|
+
Project home page can be found at the following location:
|
234
|
+
|
235
|
+
http://mamarjan.github.com/gff3-pltools/
|
236
|
+
|
237
|
+
For information on the source tree, issues and
|
238
|
+
how to contribute, see
|
239
|
+
|
240
|
+
http://github.com/mamarjan/gff3-pltools
|
241
|
+
|
242
|
+
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
243
|
+
|
244
|
+
## Cite
|
245
|
+
|
246
|
+
If you use this software, please cite one of
|
247
|
+
|
248
|
+
* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
|
249
|
+
* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
|
250
|
+
|
251
|
+
## Biogems.info
|
252
|
+
|
253
|
+
This Biogem is published at [#bio-gff3-pltools](http://biogems.info/index.html)
|
254
|
+
|
255
|
+
## Copyright
|
256
|
+
|
257
|
+
Copyright (c) 2012 Marjan Povolni. See LICENSE.txt for further details.
|
258
|
+
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.0
|
@@ -0,0 +1,12 @@
|
|
1
|
+
# Please require your code below, respecting the naming conventions in the
|
2
|
+
# bioruby directory tree.
|
3
|
+
#
|
4
|
+
# For example, say you have a plugin named bio-plugin, the only uncommented
|
5
|
+
# line in this file would be
|
6
|
+
#
|
7
|
+
# require 'bio/bio-plugin/plugin'
|
8
|
+
#
|
9
|
+
# In this file only require other files. Avoid other source code.
|
10
|
+
|
11
|
+
require 'bio-gff3-pltools/filtering.rb'
|
12
|
+
|
@@ -0,0 +1,50 @@
|
|
1
|
+
module Bio
|
2
|
+
module PL
|
3
|
+
module GFF3
|
4
|
+
# Runs the gff3-ffetch utility with the specified parameters on
|
5
|
+
# an external file. Options include :output and :at_most.
|
6
|
+
def self.filter_file filename, filter_string, options = {}
|
7
|
+
if !File.exists?(filename)
|
8
|
+
raise Exception.new("No such file - #{filename}")
|
9
|
+
end
|
10
|
+
|
11
|
+
output_option = nil
|
12
|
+
output = nil
|
13
|
+
if !options[:output].nil?
|
14
|
+
output_option = "--output #{options[:output]}"
|
15
|
+
end
|
16
|
+
if !options[:at_most].nil?
|
17
|
+
at_most_option = "--at-most #{options[:at_most]}"
|
18
|
+
end
|
19
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
|
20
|
+
if output_option.nil?
|
21
|
+
output = gff3_ffetch.read
|
22
|
+
end
|
23
|
+
gff3_ffetch.close
|
24
|
+
output
|
25
|
+
end
|
26
|
+
|
27
|
+
# Runs the gff3-ffetch utility with the specified parameters while
|
28
|
+
# passing data to its stdin. Options include :output and :at_most.
|
29
|
+
def self.filter_data data, filter_string, options = {}
|
30
|
+
output_option = nil
|
31
|
+
output = nil
|
32
|
+
if !options[:output].nil?
|
33
|
+
output_option = "--output #{options[:output]}"
|
34
|
+
end
|
35
|
+
if !options[:at_most].nil?
|
36
|
+
at_most_option = "--at-most #{options[:at_most]}"
|
37
|
+
end
|
38
|
+
gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
|
39
|
+
gff3_ffetch.write data
|
40
|
+
gff3_ffetch.close_write
|
41
|
+
if output_option.nil?
|
42
|
+
output = gff3_ffetch.read
|
43
|
+
end
|
44
|
+
gff3_ffetch.close
|
45
|
+
output
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
metadata
ADDED
@@ -0,0 +1,181 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: bio-gff3-pltools
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Marjan Povolni
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2012-07-05 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: rspec
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ~>
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: 2.8.0
|
22
|
+
type: :development
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ~>
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: 2.8.0
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: yard
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ~>
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '0.7'
|
38
|
+
type: :development
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ~>
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '0.7'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: rdoc
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ~>
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '3.12'
|
54
|
+
type: :development
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ~>
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '3.12'
|
62
|
+
- !ruby/object:Gem::Dependency
|
63
|
+
name: cucumber
|
64
|
+
requirement: !ruby/object:Gem::Requirement
|
65
|
+
none: false
|
66
|
+
requirements:
|
67
|
+
- - ! '>='
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: '0'
|
70
|
+
type: :development
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
none: false
|
74
|
+
requirements:
|
75
|
+
- - ! '>='
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
- !ruby/object:Gem::Dependency
|
79
|
+
name: bundler
|
80
|
+
requirement: !ruby/object:Gem::Requirement
|
81
|
+
none: false
|
82
|
+
requirements:
|
83
|
+
- - ~>
|
84
|
+
- !ruby/object:Gem::Version
|
85
|
+
version: 1.1.3
|
86
|
+
type: :development
|
87
|
+
prerelease: false
|
88
|
+
version_requirements: !ruby/object:Gem::Requirement
|
89
|
+
none: false
|
90
|
+
requirements:
|
91
|
+
- - ~>
|
92
|
+
- !ruby/object:Gem::Version
|
93
|
+
version: 1.1.3
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: jeweler
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ~>
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: 1.8.3
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ~>
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: 1.8.3
|
110
|
+
- !ruby/object:Gem::Dependency
|
111
|
+
name: rdoc
|
112
|
+
requirement: !ruby/object:Gem::Requirement
|
113
|
+
none: false
|
114
|
+
requirements:
|
115
|
+
- - ~>
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '3.12'
|
118
|
+
type: :development
|
119
|
+
prerelease: false
|
120
|
+
version_requirements: !ruby/object:Gem::Requirement
|
121
|
+
none: false
|
122
|
+
requirements:
|
123
|
+
- - ~>
|
124
|
+
- !ruby/object:Gem::Version
|
125
|
+
version: '3.12'
|
126
|
+
- !ruby/object:Gem::Dependency
|
127
|
+
name: redcarpet
|
128
|
+
requirement: !ruby/object:Gem::Requirement
|
129
|
+
none: false
|
130
|
+
requirements:
|
131
|
+
- - ! '>='
|
132
|
+
- !ruby/object:Gem::Version
|
133
|
+
version: '0'
|
134
|
+
type: :development
|
135
|
+
prerelease: false
|
136
|
+
version_requirements: !ruby/object:Gem::Requirement
|
137
|
+
none: false
|
138
|
+
requirements:
|
139
|
+
- - ! '>='
|
140
|
+
- !ruby/object:Gem::Version
|
141
|
+
version: '0'
|
142
|
+
description: Ruby wrapper for the gff3-pltools.
|
143
|
+
email: marian.povolny@gmail.com
|
144
|
+
executables: []
|
145
|
+
extensions: []
|
146
|
+
extra_rdoc_files:
|
147
|
+
- LICENSE.txt
|
148
|
+
- README.md
|
149
|
+
files:
|
150
|
+
- VERSION
|
151
|
+
- lib/bio-gff3-pltools.rb
|
152
|
+
- lib/bio-gff3-pltools/filtering.rb
|
153
|
+
- LICENSE.txt
|
154
|
+
- README.md
|
155
|
+
homepage: http://mamarjan.github.com/gff3-pltools/
|
156
|
+
licenses:
|
157
|
+
- MIT
|
158
|
+
post_install_message:
|
159
|
+
rdoc_options: []
|
160
|
+
require_paths:
|
161
|
+
- lib
|
162
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
163
|
+
none: false
|
164
|
+
requirements:
|
165
|
+
- - ! '>='
|
166
|
+
- !ruby/object:Gem::Version
|
167
|
+
version: '0'
|
168
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
169
|
+
none: false
|
170
|
+
requirements:
|
171
|
+
- - ! '>='
|
172
|
+
- !ruby/object:Gem::Version
|
173
|
+
version: '0'
|
174
|
+
requirements: []
|
175
|
+
rubyforge_project:
|
176
|
+
rubygems_version: 1.8.24
|
177
|
+
signing_key:
|
178
|
+
specification_version: 3
|
179
|
+
summary: Ruby wrapper for the gff3-pltools.
|
180
|
+
test_files: []
|
181
|
+
has_rdoc:
|