bio-gff3-pltools 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,75 +1,52 @@
1
- # gff3-pltools
1
+ # bioruby-gff3-pltools
2
2
 
3
- [![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
3
+ [![Build Status](https://secure.travis-ci.org/mamarjan/bioruby-gff3-pltools.png)](http://travis-ci.org/mamarjan/bioruby-gff3-pltools)
4
4
 
5
5
  Note: this software is under active development!
6
6
 
7
- This is currently an early work in progress to create parallel GFF3
8
- and GTF parallel tools for D and a Ruby gem which would let Ruby
9
- programmers use those tools from Ruby.
7
+ This is currently an early work in progress to create a wrapper
8
+ library for gff3-pltools.
10
9
 
11
10
  ## Installation
12
11
 
13
12
  ### Requirements
14
13
 
15
- The binary builds are self-contained.
16
-
17
- To build the tools from source, you'll need the DMDv2 compiler in
18
- your path. You can check here if there is a build of DMD available
19
- for your platform:
20
-
21
- http://dlang.org/download.html
22
-
23
- Also, the rake utility is necessary to run the automated build
24
- scripts.
14
+ gff3-pltools have to be installed and in the PATH. Ruby is a requirement
15
+ too, and the rest can be installed using bundler.
25
16
 
26
17
  ### Build and install instructions
27
18
 
28
- Users of 32-bit and 64-bit Linux can download pre-build binary gems
29
- and install them using the gem command:
19
+ Given the gff3-tools are installed, the gem command can be used to
20
+ install the latest gem from rubygems.org:
30
21
 
31
22
  ```sh
32
- gem install bio-gff3-pltools-linux32-X.Y.Z.gem
23
+ gem install bio-gff3-pltools
33
24
  ```
34
25
 
35
- Users of other plaforms can download the source package, and build
36
- it themselves given the DMD compiler is available for their platform.
37
-
38
- To build and install a gem for your platform, use the following steps:
26
+ To build and install the library from source, use the following steps:
39
27
 
40
28
  ```sh
41
- tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
42
- cd bio-gff3-pltools-X.Y.Z
29
+ tar -zxvf bioruby-gff3-pltools-X.Y.Z.tar.gz
30
+ cd bioruby-gff3-pltools-X.Y.Z
31
+ bundle install
43
32
  rake install
44
33
  ```
45
34
 
46
35
  To build a gem without installing, use the rake task "build" instead
47
- of install in the previous example.
48
-
49
- To build the binary tools without building a gem or a Ruby library,
50
- invoke the "utilities" rake task instead and copy the binaries from
51
- the "bin/" directory to your PATH.
36
+ of "install" in the previous example.
52
37
 
53
38
  ### Run tests
54
39
 
55
- You can use the "unittests" rake task to run D unittests, like this:
56
-
57
- ```sh
58
- rake unittests
59
- ```
60
-
61
- To run tests for the Ruby library, first build the D utilities and
62
- then start the "features" rake task, like this:
40
+ To run cucumber tests, first make sure the D utilities
41
+ are available in the path and then start the "features" rake task,
42
+ like this:
63
43
 
64
44
  ```sh
65
- rake utilities
66
45
  rake features
67
46
  ```
68
47
 
69
48
  ## Usage
70
49
 
71
- ### Ruby library
72
-
73
50
  To use the library in your code, after installing the gem, simply
74
51
  require the library:
75
52
 
@@ -79,165 +56,20 @@ require the library:
79
56
 
80
57
  The API docs are online:
81
58
 
82
- http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
59
+ http://mamarjan.github.com/bioruby-gff3-pltools/docs/0.2.0/ruby-api/
83
60
 
84
61
  For more code examples see the test files in the source tree.
85
62
 
86
- ### gff3-ffetch utility
87
-
88
- Currently this utility supports only filtering a file, based on a
89
- filtering expression. For example, you can use the following command
90
- to filter out records with a CDS feature from a GFF3 file:
91
-
92
- ```sh
93
- gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
94
- ```
95
-
96
- The utility will use the fast (and soon parallel) D library to do the
97
- parsing and filtering. You can then parse the result using your
98
- programming language and library of choice.
99
-
100
- Currently supported predicates are "field", "attribute", "equals",
101
- "contains", "starts_with" and "not". You can combine them in a way
102
- that makes sense. First, the utility needs to know what field or
103
- attribute should be used for filtering. In the previous example,
104
- that's the "field:feature" part. Next, the utility needs to know
105
- what you want to do with it. In the example, that's the "equals"
106
- part. And then the last part in the example is a parameter to the
107
- "equals", which tells the utility what the attribute or field
108
- should be compared to.
109
-
110
- Parts of the expression are separated by a colon, ':', and if colon
111
- is suposed to be part of a field name or value, it can be escaped
112
- like this: "\\:".
113
-
114
- Valid field names are: seqname, source, feature, start, end, score,
115
- strand and phase.
116
-
117
- A few more examples...
118
-
119
- ```sh
120
- gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
121
- ```
122
-
123
- The previous example chooses records which have the ID attribute
124
- with the value gene1.
125
-
126
- To see which records have no ID value, or ID which is an empty
127
- string, use the following command:
128
-
129
- ```sh
130
- gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
131
- ```
132
-
133
- And to get records which have the ID attribute defined, you can use
134
- this command:
135
-
136
- ```sh
137
- gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
138
- ```
139
-
140
- or
141
-
142
- ```sh
143
- gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
144
- ```
145
-
146
- However, the last two commands are not completely the same. In cases
147
- where an attribute has multiple values, the Parent attribute for
148
- example, the "attribute" predicate first runs the contained predicate
149
- on all attribute's values and returns true when an operation
150
- returns true for a parent value. That is, it has an implicit "and"
151
- operation built-in.
152
-
153
- There are a few more options available. In the examples above, the
154
- data was comming from a GFF3 file which was specified on the command
155
- line and the output was the screen. To use the standard input as the
156
- source of the data, use "-" instead of a filename.
157
-
158
- The default for output is the screen, or stdout. To redirect the
159
- output to a file, you can use the "--output" option. Here is an
160
- example:
161
-
162
- ```sh
163
- gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
164
- ```
165
-
166
- To limit the number of records in the results, you can use the
167
- "--at-most" option. For example:
168
-
169
- ```sh
170
- gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
171
- ```
172
-
173
- If there are more then a 1000 records in the results, after the
174
- 1000th record printed, a line is appended with the following content:
175
- "# ..." and the utility terminates.
176
-
177
- ### GFF3 File validation
178
-
179
- The validation utility can be used like this:
180
-
181
- ```sh
182
- ./gff3-validate path/to/file.gff3
183
- ```
184
-
185
- It will output any errors it finds to standard output. However, the
186
- validation utility is currently very basic, and checks only for a few
187
- cases: the number of columns, characters that should have been
188
- escaped, are the start and stop coordinates integers and if the end
189
- is greater then start, whether score is a float, valid values for
190
- strand and phase, and the format of attributes.
191
-
192
- ### Benchmarking utility
193
-
194
- There is a D application for performance benchmarking.
195
- You can run it like this:
196
-
197
- ```sh
198
- ./gff3-benchmark path/to/file.gff3
199
- ```
200
-
201
- The most basic case for the banchmarking utility is to parse the
202
- file into records. More functionality is available using command
203
- line options:
204
-
205
- ```
206
- -v turn on validation
207
- -r turn on replacement of escaped characters
208
- -f merge records into features
209
- -c N feature cache size (how many features to keep in memory), default=1000
210
- -l link feature into parent-child relationships
211
- ```
212
-
213
- Before exiting the utility prints the number of records or features
214
- it parsed.
215
-
216
- ### Counting features
217
-
218
- The gff3-ffetch utility keeps only a small part of records in memory
219
- while combining them into features. To check if the cache size is
220
- correct, the "gff3-count-features" utility can be used to get the
221
- correct number of features in a file. It gets all the IDs into
222
- memory first, and then devises the correct number of features.
223
-
224
- To get the correct number of features in a file, use the following
225
- command:
226
-
227
- ```sh
228
- ./gff3-count-features path/to/file.gff3
229
- ```
230
-
231
63
  ## Project home page
232
64
 
233
65
  Project home page can be found at the following location:
234
66
 
235
- http://mamarjan.github.com/gff3-pltools/
67
+ http://mamarjan.github.com/bioruby-gff3-pltools/
236
68
 
237
69
  For information on the source tree, issues and
238
70
  how to contribute, see
239
71
 
240
- http://github.com/mamarjan/gff3-pltools
72
+ http://github.com/mamarjan/bioruby-gff3-pltools
241
73
 
242
74
  The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
243
75
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.2.0
@@ -9,4 +9,5 @@
9
9
  # In this file only require other files. Avoid other source code.
10
10
 
11
11
  require 'bio-gff3-pltools/filtering.rb'
12
+ require 'bio-gff3-pltools/validation.rb'
12
13
 
@@ -2,7 +2,8 @@ module Bio
2
2
  module PL
3
3
  module GFF3
4
4
  # Runs the gff3-ffetch utility with the specified parameters on
5
- # an external file. Options include :output and :at_most.
5
+ # an external file. Options include :output, :at_most,
6
+ # :pass_fasta_through, :keep_comments, :keep_pragmas
6
7
  def self.filter_file filename, filter_string, options = {}
7
8
  if !File.exists?(filename)
8
9
  raise Exception.new("No such file - #{filename}")
@@ -16,7 +17,16 @@ module Bio
16
17
  if !options[:at_most].nil?
17
18
  at_most_option = "--at-most #{options[:at_most]}"
18
19
  end
19
- gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
20
+ if options[:pass_fasta_through]
21
+ fasta_option = "--pass-fasta-through"
22
+ end
23
+ if options[:keep_comments]
24
+ comments_option = "--keep-comments"
25
+ end
26
+ if options[:keep_pragmas]
27
+ pragmas_option = "--keep-pragmas"
28
+ end
29
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}")
20
30
  if output_option.nil?
21
31
  output = gff3_ffetch.read
22
32
  end
@@ -25,7 +35,8 @@ module Bio
25
35
  end
26
36
 
27
37
  # Runs the gff3-ffetch utility with the specified parameters while
28
- # passing data to its stdin. Options include :output and :at_most.
38
+ # passing data to its stdin. Options include :output and :at_most,
39
+ # :pass_fasta_through, :keep_comments, :keep_pragmas
29
40
  def self.filter_data data, filter_string, options = {}
30
41
  output_option = nil
31
42
  output = nil
@@ -35,7 +46,16 @@ module Bio
35
46
  if !options[:at_most].nil?
36
47
  at_most_option = "--at-most #{options[:at_most]}"
37
48
  end
38
- gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
49
+ if options[:pass_fasta_through]
50
+ fasta_option = "--pass-fasta-through"
51
+ end
52
+ if options[:keep_comments]
53
+ comments_option = "--keep-comments"
54
+ end
55
+ if options[:keep_pragmas]
56
+ pragmas_option = "--keep-pragmas"
57
+ end
58
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}", "r+")
39
59
  gff3_ffetch.write data
40
60
  gff3_ffetch.close_write
41
61
  if output_option.nil?
@@ -0,0 +1,17 @@
1
+ module Bio
2
+ module PL
3
+ module GFF3
4
+ def self.validate_file filename
5
+ if !File.exists?(filename)
6
+ raise Exception.new("No such file - #{filename}")
7
+ end
8
+
9
+ gff3_validate = IO.popen(["gff3-validate", "#{filename}", :err=>[:child, :out]])
10
+ output = gff3_validate.read
11
+ gff3_validate.close
12
+ output
13
+ end
14
+ end
15
+ end
16
+ end
17
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gff3-pltools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-07-05 00:00:00.000000000 Z
12
+ date: 2012-07-14 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -150,6 +150,7 @@ files:
150
150
  - VERSION
151
151
  - lib/bio-gff3-pltools.rb
152
152
  - lib/bio-gff3-pltools/filtering.rb
153
+ - lib/bio-gff3-pltools/validation.rb
153
154
  - LICENSE.txt
154
155
  - README.md
155
156
  homepage: http://mamarjan.github.com/gff3-pltools/
@@ -165,6 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
165
166
  - - ! '>='
166
167
  - !ruby/object:Gem::Version
167
168
  version: '0'
169
+ segments:
170
+ - 0
171
+ hash: -1031554001
168
172
  required_rubygems_version: !ruby/object:Gem::Requirement
169
173
  none: false
170
174
  requirements:
@@ -178,4 +182,3 @@ signing_key:
178
182
  specification_version: 3
179
183
  summary: Ruby wrapper for the gff3-pltools.
180
184
  test_files: []
181
- has_rdoc: