bio-gff3-pltools 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 Marjan Povolni
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,258 @@
1
+ # gff3-pltools
2
+
3
+ [![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
4
+
5
+ Note: this software is under active development!
6
+
7
+ This is currently an early work in progress to create parallel GFF3
8
+ and GTF parallel tools for D and a Ruby gem which would let Ruby
9
+ programmers use those tools from Ruby.
10
+
11
+ ## Installation
12
+
13
+ ### Requirements
14
+
15
+ The binary builds are self-contained.
16
+
17
+ To build the tools from source, you'll need the DMDv2 compiler in
18
+ your path. You can check here if there is a build of DMD available
19
+ for your platform:
20
+
21
+ http://dlang.org/download.html
22
+
23
+ Also, the rake utility is necessary to run the automated build
24
+ scripts.
25
+
26
+ ### Build and install instructions
27
+
28
+ Users of 32-bit and 64-bit Linux can download pre-build binary gems
29
+ and install them using the gem command:
30
+
31
+ ```sh
32
+ gem install bio-gff3-pltools-linux32-X.Y.Z.gem
33
+ ```
34
+
35
+ Users of other plaforms can download the source package, and build
36
+ it themselves given the DMD compiler is available for their platform.
37
+
38
+ To build and install a gem for your platform, use the following steps:
39
+
40
+ ```sh
41
+ tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
42
+ cd bio-gff3-pltools-X.Y.Z
43
+ rake install
44
+ ```
45
+
46
+ To build a gem without installing, use the rake task "build" instead
47
+ of install in the previous example.
48
+
49
+ To build the binary tools without building a gem or a Ruby library,
50
+ invoke the "utilities" rake task instead and copy the binaries from
51
+ the "bin/" directory to your PATH.
52
+
53
+ ### Run tests
54
+
55
+ You can use the "unittests" rake task to run D unittests, like this:
56
+
57
+ ```sh
58
+ rake unittests
59
+ ```
60
+
61
+ To run tests for the Ruby library, first build the D utilities and
62
+ then start the "features" rake task, like this:
63
+
64
+ ```sh
65
+ rake utilities
66
+ rake features
67
+ ```
68
+
69
+ ## Usage
70
+
71
+ ### Ruby library
72
+
73
+ To use the library in your code, after installing the gem, simply
74
+ require the library:
75
+
76
+ ```ruby
77
+ require 'bio-gff3-pltools'
78
+ ```
79
+
80
+ The API docs are online:
81
+
82
+ http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
83
+
84
+ For more code examples see the test files in the source tree.
85
+
86
+ ### gff3-ffetch utility
87
+
88
+ Currently this utility supports only filtering a file, based on a
89
+ filtering expression. For example, you can use the following command
90
+ to filter out records with a CDS feature from a GFF3 file:
91
+
92
+ ```sh
93
+ gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
94
+ ```
95
+
96
+ The utility will use the fast (and soon parallel) D library to do the
97
+ parsing and filtering. You can then parse the result using your
98
+ programming language and library of choice.
99
+
100
+ Currently supported predicates are "field", "attribute", "equals",
101
+ "contains", "starts_with" and "not". You can combine them in a way
102
+ that makes sense. First, the utility needs to know what field or
103
+ attribute should be used for filtering. In the previous example,
104
+ that's the "field:feature" part. Next, the utility needs to know
105
+ what you want to do with it. In the example, that's the "equals"
106
+ part. And then the last part in the example is a parameter to the
107
+ "equals", which tells the utility what the attribute or field
108
+ should be compared to.
109
+
110
+ Parts of the expression are separated by a colon, ':', and if colon
111
+ is suposed to be part of a field name or value, it can be escaped
112
+ like this: "\\:".
113
+
114
+ Valid field names are: seqname, source, feature, start, end, score,
115
+ strand and phase.
116
+
117
+ A few more examples...
118
+
119
+ ```sh
120
+ gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
121
+ ```
122
+
123
+ The previous example chooses records which have the ID attribute
124
+ with the value gene1.
125
+
126
+ To see which records have no ID value, or ID which is an empty
127
+ string, use the following command:
128
+
129
+ ```sh
130
+ gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
131
+ ```
132
+
133
+ And to get records which have the ID attribute defined, you can use
134
+ this command:
135
+
136
+ ```sh
137
+ gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
138
+ ```
139
+
140
+ or
141
+
142
+ ```sh
143
+ gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
144
+ ```
145
+
146
+ However, the last two commands are not completely the same. In cases
147
+ where an attribute has multiple values, the Parent attribute for
148
+ example, the "attribute" predicate first runs the contained predicate
149
+ on all attribute's values and returns true when an operation
150
+ returns true for a parent value. That is, it has an implicit "and"
151
+ operation built-in.
152
+
153
+ There are a few more options available. In the examples above, the
154
+ data was comming from a GFF3 file which was specified on the command
155
+ line and the output was the screen. To use the standard input as the
156
+ source of the data, use "-" instead of a filename.
157
+
158
+ The default for output is the screen, or stdout. To redirect the
159
+ output to a file, you can use the "--output" option. Here is an
160
+ example:
161
+
162
+ ```sh
163
+ gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
164
+ ```
165
+
166
+ To limit the number of records in the results, you can use the
167
+ "--at-most" option. For example:
168
+
169
+ ```sh
170
+ gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
171
+ ```
172
+
173
+ If there are more then a 1000 records in the results, after the
174
+ 1000th record printed, a line is appended with the following content:
175
+ "# ..." and the utility terminates.
176
+
177
+ ### GFF3 File validation
178
+
179
+ The validation utility can be used like this:
180
+
181
+ ```sh
182
+ ./gff3-validate path/to/file.gff3
183
+ ```
184
+
185
+ It will output any errors it finds to standard output. However, the
186
+ validation utility is currently very basic, and checks only for a few
187
+ cases: the number of columns, characters that should have been
188
+ escaped, are the start and stop coordinates integers and if the end
189
+ is greater then start, whether score is a float, valid values for
190
+ strand and phase, and the format of attributes.
191
+
192
+ ### Benchmarking utility
193
+
194
+ There is a D application for performance benchmarking.
195
+ You can run it like this:
196
+
197
+ ```sh
198
+ ./gff3-benchmark path/to/file.gff3
199
+ ```
200
+
201
+ The most basic case for the banchmarking utility is to parse the
202
+ file into records. More functionality is available using command
203
+ line options:
204
+
205
+ ```
206
+ -v turn on validation
207
+ -r turn on replacement of escaped characters
208
+ -f merge records into features
209
+ -c N feature cache size (how many features to keep in memory), default=1000
210
+ -l link feature into parent-child relationships
211
+ ```
212
+
213
+ Before exiting the utility prints the number of records or features
214
+ it parsed.
215
+
216
+ ### Counting features
217
+
218
+ The gff3-ffetch utility keeps only a small part of records in memory
219
+ while combining them into features. To check if the cache size is
220
+ correct, the "gff3-count-features" utility can be used to get the
221
+ correct number of features in a file. It gets all the IDs into
222
+ memory first, and then devises the correct number of features.
223
+
224
+ To get the correct number of features in a file, use the following
225
+ command:
226
+
227
+ ```sh
228
+ ./gff3-count-features path/to/file.gff3
229
+ ```
230
+
231
+ ## Project home page
232
+
233
+ Project home page can be found at the following location:
234
+
235
+ http://mamarjan.github.com/gff3-pltools/
236
+
237
+ For information on the source tree, issues and
238
+ how to contribute, see
239
+
240
+ http://github.com/mamarjan/gff3-pltools
241
+
242
+ The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
243
+
244
+ ## Cite
245
+
246
+ If you use this software, please cite one of
247
+
248
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
249
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
250
+
251
+ ## Biogems.info
252
+
253
+ This Biogem is published at [#bio-gff3-pltools](http://biogems.info/index.html)
254
+
255
+ ## Copyright
256
+
257
+ Copyright (c) 2012 Marjan Povolni. See LICENSE.txt for further details.
258
+
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,12 @@
1
+ # Please require your code below, respecting the naming conventions in the
2
+ # bioruby directory tree.
3
+ #
4
+ # For example, say you have a plugin named bio-plugin, the only uncommented
5
+ # line in this file would be
6
+ #
7
+ # require 'bio/bio-plugin/plugin'
8
+ #
9
+ # In this file only require other files. Avoid other source code.
10
+
11
+ require 'bio-gff3-pltools/filtering.rb'
12
+
@@ -0,0 +1,50 @@
1
+ module Bio
2
+ module PL
3
+ module GFF3
4
+ # Runs the gff3-ffetch utility with the specified parameters on
5
+ # an external file. Options include :output and :at_most.
6
+ def self.filter_file filename, filter_string, options = {}
7
+ if !File.exists?(filename)
8
+ raise Exception.new("No such file - #{filename}")
9
+ end
10
+
11
+ output_option = nil
12
+ output = nil
13
+ if !options[:output].nil?
14
+ output_option = "--output #{options[:output]}"
15
+ end
16
+ if !options[:at_most].nil?
17
+ at_most_option = "--at-most #{options[:at_most]}"
18
+ end
19
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
20
+ if output_option.nil?
21
+ output = gff3_ffetch.read
22
+ end
23
+ gff3_ffetch.close
24
+ output
25
+ end
26
+
27
+ # Runs the gff3-ffetch utility with the specified parameters while
28
+ # passing data to its stdin. Options include :output and :at_most.
29
+ def self.filter_data data, filter_string, options = {}
30
+ output_option = nil
31
+ output = nil
32
+ if !options[:output].nil?
33
+ output_option = "--output #{options[:output]}"
34
+ end
35
+ if !options[:at_most].nil?
36
+ at_most_option = "--at-most #{options[:at_most]}"
37
+ end
38
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
39
+ gff3_ffetch.write data
40
+ gff3_ffetch.close_write
41
+ if output_option.nil?
42
+ output = gff3_ffetch.read
43
+ end
44
+ gff3_ffetch.close
45
+ output
46
+ end
47
+ end
48
+ end
49
+ end
50
+
metadata ADDED
@@ -0,0 +1,181 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bio-gff3-pltools
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Marjan Povolni
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-07-05 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rspec
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ~>
20
+ - !ruby/object:Gem::Version
21
+ version: 2.8.0
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 2.8.0
30
+ - !ruby/object:Gem::Dependency
31
+ name: yard
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: '0.7'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: '0.7'
46
+ - !ruby/object:Gem::Dependency
47
+ name: rdoc
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ~>
52
+ - !ruby/object:Gem::Version
53
+ version: '3.12'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ~>
60
+ - !ruby/object:Gem::Version
61
+ version: '3.12'
62
+ - !ruby/object:Gem::Dependency
63
+ name: cucumber
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: bundler
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ~>
84
+ - !ruby/object:Gem::Version
85
+ version: 1.1.3
86
+ type: :development
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ~>
92
+ - !ruby/object:Gem::Version
93
+ version: 1.1.3
94
+ - !ruby/object:Gem::Dependency
95
+ name: jeweler
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ~>
100
+ - !ruby/object:Gem::Version
101
+ version: 1.8.3
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ~>
108
+ - !ruby/object:Gem::Version
109
+ version: 1.8.3
110
+ - !ruby/object:Gem::Dependency
111
+ name: rdoc
112
+ requirement: !ruby/object:Gem::Requirement
113
+ none: false
114
+ requirements:
115
+ - - ~>
116
+ - !ruby/object:Gem::Version
117
+ version: '3.12'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ none: false
122
+ requirements:
123
+ - - ~>
124
+ - !ruby/object:Gem::Version
125
+ version: '3.12'
126
+ - !ruby/object:Gem::Dependency
127
+ name: redcarpet
128
+ requirement: !ruby/object:Gem::Requirement
129
+ none: false
130
+ requirements:
131
+ - - ! '>='
132
+ - !ruby/object:Gem::Version
133
+ version: '0'
134
+ type: :development
135
+ prerelease: false
136
+ version_requirements: !ruby/object:Gem::Requirement
137
+ none: false
138
+ requirements:
139
+ - - ! '>='
140
+ - !ruby/object:Gem::Version
141
+ version: '0'
142
+ description: Ruby wrapper for the gff3-pltools.
143
+ email: marian.povolny@gmail.com
144
+ executables: []
145
+ extensions: []
146
+ extra_rdoc_files:
147
+ - LICENSE.txt
148
+ - README.md
149
+ files:
150
+ - VERSION
151
+ - lib/bio-gff3-pltools.rb
152
+ - lib/bio-gff3-pltools/filtering.rb
153
+ - LICENSE.txt
154
+ - README.md
155
+ homepage: http://mamarjan.github.com/gff3-pltools/
156
+ licenses:
157
+ - MIT
158
+ post_install_message:
159
+ rdoc_options: []
160
+ require_paths:
161
+ - lib
162
+ required_ruby_version: !ruby/object:Gem::Requirement
163
+ none: false
164
+ requirements:
165
+ - - ! '>='
166
+ - !ruby/object:Gem::Version
167
+ version: '0'
168
+ required_rubygems_version: !ruby/object:Gem::Requirement
169
+ none: false
170
+ requirements:
171
+ - - ! '>='
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ requirements: []
175
+ rubyforge_project:
176
+ rubygems_version: 1.8.24
177
+ signing_key:
178
+ specification_version: 3
179
+ summary: Ruby wrapper for the gff3-pltools.
180
+ test_files: []
181
+ has_rdoc: