bio-gff3-pltools 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 Marjan Povolni
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,258 @@
1
+ # gff3-pltools
2
+
3
+ [![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
4
+
5
+ Note: this software is under active development!
6
+
7
+ This is currently an early work in progress to create parallel GFF3
8
+ and GTF parallel tools for D and a Ruby gem which would let Ruby
9
+ programmers use those tools from Ruby.
10
+
11
+ ## Installation
12
+
13
+ ### Requirements
14
+
15
+ The binary builds are self-contained.
16
+
17
+ To build the tools from source, you'll need the DMDv2 compiler in
18
+ your path. You can check here if there is a build of DMD available
19
+ for your platform:
20
+
21
+ http://dlang.org/download.html
22
+
23
+ Also, the rake utility is necessary to run the automated build
24
+ scripts.
25
+
26
+ ### Build and install instructions
27
+
28
+ Users of 32-bit and 64-bit Linux can download pre-build binary gems
29
+ and install them using the gem command:
30
+
31
+ ```sh
32
+ gem install bio-gff3-pltools-linux32-X.Y.Z.gem
33
+ ```
34
+
35
+ Users of other plaforms can download the source package, and build
36
+ it themselves given the DMD compiler is available for their platform.
37
+
38
+ To build and install a gem for your platform, use the following steps:
39
+
40
+ ```sh
41
+ tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
42
+ cd bio-gff3-pltools-X.Y.Z
43
+ rake install
44
+ ```
45
+
46
+ To build a gem without installing, use the rake task "build" instead
47
+ of install in the previous example.
48
+
49
+ To build the binary tools without building a gem or a Ruby library,
50
+ invoke the "utilities" rake task instead and copy the binaries from
51
+ the "bin/" directory to your PATH.
52
+
53
+ ### Run tests
54
+
55
+ You can use the "unittests" rake task to run D unittests, like this:
56
+
57
+ ```sh
58
+ rake unittests
59
+ ```
60
+
61
+ To run tests for the Ruby library, first build the D utilities and
62
+ then start the "features" rake task, like this:
63
+
64
+ ```sh
65
+ rake utilities
66
+ rake features
67
+ ```
68
+
69
+ ## Usage
70
+
71
+ ### Ruby library
72
+
73
+ To use the library in your code, after installing the gem, simply
74
+ require the library:
75
+
76
+ ```ruby
77
+ require 'bio-gff3-pltools'
78
+ ```
79
+
80
+ The API docs are online:
81
+
82
+ http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
83
+
84
+ For more code examples see the test files in the source tree.
85
+
86
+ ### gff3-ffetch utility
87
+
88
+ Currently this utility supports only filtering a file, based on a
89
+ filtering expression. For example, you can use the following command
90
+ to filter out records with a CDS feature from a GFF3 file:
91
+
92
+ ```sh
93
+ gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
94
+ ```
95
+
96
+ The utility will use the fast (and soon parallel) D library to do the
97
+ parsing and filtering. You can then parse the result using your
98
+ programming language and library of choice.
99
+
100
+ Currently supported predicates are "field", "attribute", "equals",
101
+ "contains", "starts_with" and "not". You can combine them in a way
102
+ that makes sense. First, the utility needs to know what field or
103
+ attribute should be used for filtering. In the previous example,
104
+ that's the "field:feature" part. Next, the utility needs to know
105
+ what you want to do with it. In the example, that's the "equals"
106
+ part. And then the last part in the example is a parameter to the
107
+ "equals", which tells the utility what the attribute or field
108
+ should be compared to.
109
+
110
+ Parts of the expression are separated by a colon, ':', and if colon
111
+ is suposed to be part of a field name or value, it can be escaped
112
+ like this: "\\:".
113
+
114
+ Valid field names are: seqname, source, feature, start, end, score,
115
+ strand and phase.
116
+
117
+ A few more examples...
118
+
119
+ ```sh
120
+ gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
121
+ ```
122
+
123
+ The previous example chooses records which have the ID attribute
124
+ with the value gene1.
125
+
126
+ To see which records have no ID value, or ID which is an empty
127
+ string, use the following command:
128
+
129
+ ```sh
130
+ gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
131
+ ```
132
+
133
+ And to get records which have the ID attribute defined, you can use
134
+ this command:
135
+
136
+ ```sh
137
+ gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
138
+ ```
139
+
140
+ or
141
+
142
+ ```sh
143
+ gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
144
+ ```
145
+
146
+ However, the last two commands are not completely the same. In cases
147
+ where an attribute has multiple values, the Parent attribute for
148
+ example, the "attribute" predicate first runs the contained predicate
149
+ on all attribute's values and returns true when an operation
150
+ returns true for a parent value. That is, it has an implicit "and"
151
+ operation built-in.
152
+
153
+ There are a few more options available. In the examples above, the
154
+ data was comming from a GFF3 file which was specified on the command
155
+ line and the output was the screen. To use the standard input as the
156
+ source of the data, use "-" instead of a filename.
157
+
158
+ The default for output is the screen, or stdout. To redirect the
159
+ output to a file, you can use the "--output" option. Here is an
160
+ example:
161
+
162
+ ```sh
163
+ gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
164
+ ```
165
+
166
+ To limit the number of records in the results, you can use the
167
+ "--at-most" option. For example:
168
+
169
+ ```sh
170
+ gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
171
+ ```
172
+
173
+ If there are more then a 1000 records in the results, after the
174
+ 1000th record printed, a line is appended with the following content:
175
+ "# ..." and the utility terminates.
176
+
177
+ ### GFF3 File validation
178
+
179
+ The validation utility can be used like this:
180
+
181
+ ```sh
182
+ ./gff3-validate path/to/file.gff3
183
+ ```
184
+
185
+ It will output any errors it finds to standard output. However, the
186
+ validation utility is currently very basic, and checks only for a few
187
+ cases: the number of columns, characters that should have been
188
+ escaped, are the start and stop coordinates integers and if the end
189
+ is greater then start, whether score is a float, valid values for
190
+ strand and phase, and the format of attributes.
191
+
192
+ ### Benchmarking utility
193
+
194
+ There is a D application for performance benchmarking.
195
+ You can run it like this:
196
+
197
+ ```sh
198
+ ./gff3-benchmark path/to/file.gff3
199
+ ```
200
+
201
+ The most basic case for the banchmarking utility is to parse the
202
+ file into records. More functionality is available using command
203
+ line options:
204
+
205
+ ```
206
+ -v turn on validation
207
+ -r turn on replacement of escaped characters
208
+ -f merge records into features
209
+ -c N feature cache size (how many features to keep in memory), default=1000
210
+ -l link feature into parent-child relationships
211
+ ```
212
+
213
+ Before exiting the utility prints the number of records or features
214
+ it parsed.
215
+
216
+ ### Counting features
217
+
218
+ The gff3-ffetch utility keeps only a small part of records in memory
219
+ while combining them into features. To check if the cache size is
220
+ correct, the "gff3-count-features" utility can be used to get the
221
+ correct number of features in a file. It gets all the IDs into
222
+ memory first, and then devises the correct number of features.
223
+
224
+ To get the correct number of features in a file, use the following
225
+ command:
226
+
227
+ ```sh
228
+ ./gff3-count-features path/to/file.gff3
229
+ ```
230
+
231
+ ## Project home page
232
+
233
+ Project home page can be found at the following location:
234
+
235
+ http://mamarjan.github.com/gff3-pltools/
236
+
237
+ For information on the source tree, issues and
238
+ how to contribute, see
239
+
240
+ http://github.com/mamarjan/gff3-pltools
241
+
242
+ The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
243
+
244
+ ## Cite
245
+
246
+ If you use this software, please cite one of
247
+
248
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
249
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
250
+
251
+ ## Biogems.info
252
+
253
+ This Biogem is published at [#bio-gff3-pltools](http://biogems.info/index.html)
254
+
255
+ ## Copyright
256
+
257
+ Copyright (c) 2012 Marjan Povolni. See LICENSE.txt for further details.
258
+
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,12 @@
1
+ # Please require your code below, respecting the naming conventions in the
2
+ # bioruby directory tree.
3
+ #
4
+ # For example, say you have a plugin named bio-plugin, the only uncommented
5
+ # line in this file would be
6
+ #
7
+ # require 'bio/bio-plugin/plugin'
8
+ #
9
+ # In this file only require other files. Avoid other source code.
10
+
11
+ require 'bio-gff3-pltools/filtering.rb'
12
+
@@ -0,0 +1,50 @@
1
+ module Bio
2
+ module PL
3
+ module GFF3
4
+ # Runs the gff3-ffetch utility with the specified parameters on
5
+ # an external file. Options include :output and :at_most.
6
+ def self.filter_file filename, filter_string, options = {}
7
+ if !File.exists?(filename)
8
+ raise Exception.new("No such file - #{filename}")
9
+ end
10
+
11
+ output_option = nil
12
+ output = nil
13
+ if !options[:output].nil?
14
+ output_option = "--output #{options[:output]}"
15
+ end
16
+ if !options[:at_most].nil?
17
+ at_most_option = "--at-most #{options[:at_most]}"
18
+ end
19
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
20
+ if output_option.nil?
21
+ output = gff3_ffetch.read
22
+ end
23
+ gff3_ffetch.close
24
+ output
25
+ end
26
+
27
+ # Runs the gff3-ffetch utility with the specified parameters while
28
+ # passing data to its stdin. Options include :output and :at_most.
29
+ def self.filter_data data, filter_string, options = {}
30
+ output_option = nil
31
+ output = nil
32
+ if !options[:output].nil?
33
+ output_option = "--output #{options[:output]}"
34
+ end
35
+ if !options[:at_most].nil?
36
+ at_most_option = "--at-most #{options[:at_most]}"
37
+ end
38
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
39
+ gff3_ffetch.write data
40
+ gff3_ffetch.close_write
41
+ if output_option.nil?
42
+ output = gff3_ffetch.read
43
+ end
44
+ gff3_ffetch.close
45
+ output
46
+ end
47
+ end
48
+ end
49
+ end
50
+
metadata ADDED
@@ -0,0 +1,181 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bio-gff3-pltools
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Marjan Povolni
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-07-05 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rspec
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ~>
20
+ - !ruby/object:Gem::Version
21
+ version: 2.8.0
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 2.8.0
30
+ - !ruby/object:Gem::Dependency
31
+ name: yard
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: '0.7'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: '0.7'
46
+ - !ruby/object:Gem::Dependency
47
+ name: rdoc
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ~>
52
+ - !ruby/object:Gem::Version
53
+ version: '3.12'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ~>
60
+ - !ruby/object:Gem::Version
61
+ version: '3.12'
62
+ - !ruby/object:Gem::Dependency
63
+ name: cucumber
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: bundler
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ~>
84
+ - !ruby/object:Gem::Version
85
+ version: 1.1.3
86
+ type: :development
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ~>
92
+ - !ruby/object:Gem::Version
93
+ version: 1.1.3
94
+ - !ruby/object:Gem::Dependency
95
+ name: jeweler
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ~>
100
+ - !ruby/object:Gem::Version
101
+ version: 1.8.3
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ~>
108
+ - !ruby/object:Gem::Version
109
+ version: 1.8.3
110
+ - !ruby/object:Gem::Dependency
111
+ name: rdoc
112
+ requirement: !ruby/object:Gem::Requirement
113
+ none: false
114
+ requirements:
115
+ - - ~>
116
+ - !ruby/object:Gem::Version
117
+ version: '3.12'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ none: false
122
+ requirements:
123
+ - - ~>
124
+ - !ruby/object:Gem::Version
125
+ version: '3.12'
126
+ - !ruby/object:Gem::Dependency
127
+ name: redcarpet
128
+ requirement: !ruby/object:Gem::Requirement
129
+ none: false
130
+ requirements:
131
+ - - ! '>='
132
+ - !ruby/object:Gem::Version
133
+ version: '0'
134
+ type: :development
135
+ prerelease: false
136
+ version_requirements: !ruby/object:Gem::Requirement
137
+ none: false
138
+ requirements:
139
+ - - ! '>='
140
+ - !ruby/object:Gem::Version
141
+ version: '0'
142
+ description: Ruby wrapper for the gff3-pltools.
143
+ email: marian.povolny@gmail.com
144
+ executables: []
145
+ extensions: []
146
+ extra_rdoc_files:
147
+ - LICENSE.txt
148
+ - README.md
149
+ files:
150
+ - VERSION
151
+ - lib/bio-gff3-pltools.rb
152
+ - lib/bio-gff3-pltools/filtering.rb
153
+ - LICENSE.txt
154
+ - README.md
155
+ homepage: http://mamarjan.github.com/gff3-pltools/
156
+ licenses:
157
+ - MIT
158
+ post_install_message:
159
+ rdoc_options: []
160
+ require_paths:
161
+ - lib
162
+ required_ruby_version: !ruby/object:Gem::Requirement
163
+ none: false
164
+ requirements:
165
+ - - ! '>='
166
+ - !ruby/object:Gem::Version
167
+ version: '0'
168
+ required_rubygems_version: !ruby/object:Gem::Requirement
169
+ none: false
170
+ requirements:
171
+ - - ! '>='
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ requirements: []
175
+ rubyforge_project:
176
+ rubygems_version: 1.8.24
177
+ signing_key:
178
+ specification_version: 3
179
+ summary: Ruby wrapper for the gff3-pltools.
180
+ test_files: []
181
+ has_rdoc: