bio-gff3-pltools 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,75 +1,52 @@
1
- # gff3-pltools
1
+ # bioruby-gff3-pltools
2
2
 
3
- [![Build Status](https://secure.travis-ci.org/mamarjan/gff3-pltools.png)](http://travis-ci.org/mamarjan/gff3-pltools)
3
+ [![Build Status](https://secure.travis-ci.org/mamarjan/bioruby-gff3-pltools.png)](http://travis-ci.org/mamarjan/bioruby-gff3-pltools)
4
4
 
5
5
  Note: this software is under active development!
6
6
 
7
- This is currently an early work in progress to create parallel GFF3
8
- and GTF parallel tools for D and a Ruby gem which would let Ruby
9
- programmers use those tools from Ruby.
7
+ This is currently an early work in progress to create a wrapper
8
+ library for gff3-pltools.
10
9
 
11
10
  ## Installation
12
11
 
13
12
  ### Requirements
14
13
 
15
- The binary builds are self-contained.
16
-
17
- To build the tools from source, you'll need the DMDv2 compiler in
18
- your path. You can check here if there is a build of DMD available
19
- for your platform:
20
-
21
- http://dlang.org/download.html
22
-
23
- Also, the rake utility is necessary to run the automated build
24
- scripts.
14
+ gff3-pltools have to be installed and in the PATH. Ruby is a requirement
15
+ too, and the rest can be installed using bundler.
25
16
 
26
17
  ### Build and install instructions
27
18
 
28
- Users of 32-bit and 64-bit Linux can download pre-build binary gems
29
- and install them using the gem command:
19
+ Given the gff3-tools are installed, the gem command can be used to
20
+ install the latest gem from rubygems.org:
30
21
 
31
22
  ```sh
32
- gem install bio-gff3-pltools-linux32-X.Y.Z.gem
23
+ gem install bio-gff3-pltools
33
24
  ```
34
25
 
35
- Users of other plaforms can download the source package, and build
36
- it themselves given the DMD compiler is available for their platform.
37
-
38
- To build and install a gem for your platform, use the following steps:
26
+ To build and install the library from source, use the following steps:
39
27
 
40
28
  ```sh
41
- tar -zxvf bio-gff3-pltools-X.Y.Z.tar.gz
42
- cd bio-gff3-pltools-X.Y.Z
29
+ tar -zxvf bioruby-gff3-pltools-X.Y.Z.tar.gz
30
+ cd bioruby-gff3-pltools-X.Y.Z
31
+ bundle install
43
32
  rake install
44
33
  ```
45
34
 
46
35
  To build a gem without installing, use the rake task "build" instead
47
- of install in the previous example.
48
-
49
- To build the binary tools without building a gem or a Ruby library,
50
- invoke the "utilities" rake task instead and copy the binaries from
51
- the "bin/" directory to your PATH.
36
+ of "install" in the previous example.
52
37
 
53
38
  ### Run tests
54
39
 
55
- You can use the "unittests" rake task to run D unittests, like this:
56
-
57
- ```sh
58
- rake unittests
59
- ```
60
-
61
- To run tests for the Ruby library, first build the D utilities and
62
- then start the "features" rake task, like this:
40
+ To run cucumber tests, first make sure the D utilities
41
+ are available in the path and then start the "features" rake task,
42
+ like this:
63
43
 
64
44
  ```sh
65
- rake utilities
66
45
  rake features
67
46
  ```
68
47
 
69
48
  ## Usage
70
49
 
71
- ### Ruby library
72
-
73
50
  To use the library in your code, after installing the gem, simply
74
51
  require the library:
75
52
 
@@ -79,165 +56,20 @@ require the library:
79
56
 
80
57
  The API docs are online:
81
58
 
82
- http://mamarjan.github.com/gff3-pltools/docs/0.1.0/ruby-api/
59
+ http://mamarjan.github.com/bioruby-gff3-pltools/docs/0.2.0/ruby-api/
83
60
 
84
61
  For more code examples see the test files in the source tree.
85
62
 
86
- ### gff3-ffetch utility
87
-
88
- Currently this utility supports only filtering a file, based on a
89
- filtering expression. For example, you can use the following command
90
- to filter out records with a CDS feature from a GFF3 file:
91
-
92
- ```sh
93
- gff3-ffetch --filter field:feature:equals:CDS path-to-file.gff3
94
- ```
95
-
96
- The utility will use the fast (and soon parallel) D library to do the
97
- parsing and filtering. You can then parse the result using your
98
- programming language and library of choice.
99
-
100
- Currently supported predicates are "field", "attribute", "equals",
101
- "contains", "starts_with" and "not". You can combine them in a way
102
- that makes sense. First, the utility needs to know what field or
103
- attribute should be used for filtering. In the previous example,
104
- that's the "field:feature" part. Next, the utility needs to know
105
- what you want to do with it. In the example, that's the "equals"
106
- part. And then the last part in the example is a parameter to the
107
- "equals", which tells the utility what the attribute or field
108
- should be compared to.
109
-
110
- Parts of the expression are separated by a colon, ':', and if colon
111
- is suposed to be part of a field name or value, it can be escaped
112
- like this: "\\:".
113
-
114
- Valid field names are: seqname, source, feature, start, end, score,
115
- strand and phase.
116
-
117
- A few more examples...
118
-
119
- ```sh
120
- gff3-ffetch --filter attribute:ID:equals:gene1 path-to-file.gff3
121
- ```
122
-
123
- The previous example chooses records which have the ID attribute
124
- with the value gene1.
125
-
126
- To see which records have no ID value, or ID which is an empty
127
- string, use the following command:
128
-
129
- ```sh
130
- gff3-ffetch --filter attribute:ID:equals: path-to-file.gff3
131
- ```
132
-
133
- And to get records which have the ID attribute defined, you can use
134
- this command:
135
-
136
- ```sh
137
- gff3-ffetch --filter attribute:ID:not:equals: path-to-file.gff3
138
- ```
139
-
140
- or
141
-
142
- ```sh
143
- gff3-ffetch --filter not:attribute:ID:equals: path-to-file.gff3
144
- ```
145
-
146
- However, the last two commands are not completely the same. In cases
147
- where an attribute has multiple values, the Parent attribute for
148
- example, the "attribute" predicate first runs the contained predicate
149
- on all attribute's values and returns true when an operation
150
- returns true for a parent value. That is, it has an implicit "and"
151
- operation built-in.
152
-
153
- There are a few more options available. In the examples above, the
154
- data was comming from a GFF3 file which was specified on the command
155
- line and the output was the screen. To use the standard input as the
156
- source of the data, use "-" instead of a filename.
157
-
158
- The default for output is the screen, or stdout. To redirect the
159
- output to a file, you can use the "--output" option. Here is an
160
- example:
161
-
162
- ```sh
163
- gff3-ffetch --filter not:attribute:ID:equals: - --output tmp.gff3
164
- ```
165
-
166
- To limit the number of records in the results, you can use the
167
- "--at-most" option. For example:
168
-
169
- ```sh
170
- gff3-ffetch --filter not:attribute:ID:equals: - --at-most 1000
171
- ```
172
-
173
- If there are more then a 1000 records in the results, after the
174
- 1000th record printed, a line is appended with the following content:
175
- "# ..." and the utility terminates.
176
-
177
- ### GFF3 File validation
178
-
179
- The validation utility can be used like this:
180
-
181
- ```sh
182
- ./gff3-validate path/to/file.gff3
183
- ```
184
-
185
- It will output any errors it finds to standard output. However, the
186
- validation utility is currently very basic, and checks only for a few
187
- cases: the number of columns, characters that should have been
188
- escaped, are the start and stop coordinates integers and if the end
189
- is greater then start, whether score is a float, valid values for
190
- strand and phase, and the format of attributes.
191
-
192
- ### Benchmarking utility
193
-
194
- There is a D application for performance benchmarking.
195
- You can run it like this:
196
-
197
- ```sh
198
- ./gff3-benchmark path/to/file.gff3
199
- ```
200
-
201
- The most basic case for the banchmarking utility is to parse the
202
- file into records. More functionality is available using command
203
- line options:
204
-
205
- ```
206
- -v turn on validation
207
- -r turn on replacement of escaped characters
208
- -f merge records into features
209
- -c N feature cache size (how many features to keep in memory), default=1000
210
- -l link feature into parent-child relationships
211
- ```
212
-
213
- Before exiting the utility prints the number of records or features
214
- it parsed.
215
-
216
- ### Counting features
217
-
218
- The gff3-ffetch utility keeps only a small part of records in memory
219
- while combining them into features. To check if the cache size is
220
- correct, the "gff3-count-features" utility can be used to get the
221
- correct number of features in a file. It gets all the IDs into
222
- memory first, and then devises the correct number of features.
223
-
224
- To get the correct number of features in a file, use the following
225
- command:
226
-
227
- ```sh
228
- ./gff3-count-features path/to/file.gff3
229
- ```
230
-
231
63
  ## Project home page
232
64
 
233
65
  Project home page can be found at the following location:
234
66
 
235
- http://mamarjan.github.com/gff3-pltools/
67
+ http://mamarjan.github.com/bioruby-gff3-pltools/
236
68
 
237
69
  For information on the source tree, issues and
238
70
  how to contribute, see
239
71
 
240
- http://github.com/mamarjan/gff3-pltools
72
+ http://github.com/mamarjan/bioruby-gff3-pltools
241
73
 
242
74
  The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
243
75
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.2.0
@@ -9,4 +9,5 @@
9
9
  # In this file only require other files. Avoid other source code.
10
10
 
11
11
  require 'bio-gff3-pltools/filtering.rb'
12
+ require 'bio-gff3-pltools/validation.rb'
12
13
 
@@ -2,7 +2,8 @@ module Bio
2
2
  module PL
3
3
  module GFF3
4
4
  # Runs the gff3-ffetch utility with the specified parameters on
5
- # an external file. Options include :output and :at_most.
5
+ # an external file. Options include :output, :at_most,
6
+ # :pass_fasta_through, :keep_comments, :keep_pragmas
6
7
  def self.filter_file filename, filter_string, options = {}
7
8
  if !File.exists?(filename)
8
9
  raise Exception.new("No such file - #{filename}")
@@ -16,7 +17,16 @@ module Bio
16
17
  if !options[:at_most].nil?
17
18
  at_most_option = "--at-most #{options[:at_most]}"
18
19
  end
19
- gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option}")
20
+ if options[:pass_fasta_through]
21
+ fasta_option = "--pass-fasta-through"
22
+ end
23
+ if options[:keep_comments]
24
+ comments_option = "--keep-comments"
25
+ end
26
+ if options[:keep_pragmas]
27
+ pragmas_option = "--keep-pragmas"
28
+ end
29
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} #{filename} #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}")
20
30
  if output_option.nil?
21
31
  output = gff3_ffetch.read
22
32
  end
@@ -25,7 +35,8 @@ module Bio
25
35
  end
26
36
 
27
37
  # Runs the gff3-ffetch utility with the specified parameters while
28
- # passing data to its stdin. Options include :output and :at_most.
38
+ # passing data to its stdin. Options include :output and :at_most,
39
+ # :pass_fasta_through, :keep_comments, :keep_pragmas
29
40
  def self.filter_data data, filter_string, options = {}
30
41
  output_option = nil
31
42
  output = nil
@@ -35,7 +46,16 @@ module Bio
35
46
  if !options[:at_most].nil?
36
47
  at_most_option = "--at-most #{options[:at_most]}"
37
48
  end
38
- gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option}", "r+")
49
+ if options[:pass_fasta_through]
50
+ fasta_option = "--pass-fasta-through"
51
+ end
52
+ if options[:keep_comments]
53
+ comments_option = "--keep-comments"
54
+ end
55
+ if options[:keep_pragmas]
56
+ pragmas_option = "--keep-pragmas"
57
+ end
58
+ gff3_ffetch = IO.popen("gff3-ffetch --filter #{filter_string} - #{output_option} #{at_most_option} #{fasta_option} #{comments_option} #{pragmas_option}", "r+")
39
59
  gff3_ffetch.write data
40
60
  gff3_ffetch.close_write
41
61
  if output_option.nil?
@@ -0,0 +1,17 @@
1
+ module Bio
2
+ module PL
3
+ module GFF3
4
+ def self.validate_file filename
5
+ if !File.exists?(filename)
6
+ raise Exception.new("No such file - #{filename}")
7
+ end
8
+
9
+ gff3_validate = IO.popen(["gff3-validate", "#{filename}", :err=>[:child, :out]])
10
+ output = gff3_validate.read
11
+ gff3_validate.close
12
+ output
13
+ end
14
+ end
15
+ end
16
+ end
17
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gff3-pltools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-07-05 00:00:00.000000000 Z
12
+ date: 2012-07-14 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -150,6 +150,7 @@ files:
150
150
  - VERSION
151
151
  - lib/bio-gff3-pltools.rb
152
152
  - lib/bio-gff3-pltools/filtering.rb
153
+ - lib/bio-gff3-pltools/validation.rb
153
154
  - LICENSE.txt
154
155
  - README.md
155
156
  homepage: http://mamarjan.github.com/gff3-pltools/
@@ -165,6 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
165
166
  - - ! '>='
166
167
  - !ruby/object:Gem::Version
167
168
  version: '0'
169
+ segments:
170
+ - 0
171
+ hash: -1031554001
168
172
  required_rubygems_version: !ruby/object:Gem::Requirement
169
173
  none: false
170
174
  requirements:
@@ -178,4 +182,3 @@ signing_key:
178
182
  specification_version: 3
179
183
  summary: Ruby wrapper for the gff3-pltools.
180
184
  test_files: []
181
- has_rdoc: