bio-maf 0.3.1 → 0.3.2
Sign up to get free protection for your applications and to get access to all the features.
- data/DEVELOPMENT.md +4 -0
- data/README.md +25 -1
- data/bin/maf_extract +5 -2
- data/bio-maf.gemspec +1 -1
- data/man/maf_extract.1 +113 -4
- data/man/maf_extract.1.ronn +42 -4
- metadata +2 -2
data/DEVELOPMENT.md
CHANGED
data/README.md
CHANGED
@@ -129,6 +129,10 @@ end
|
|
129
129
|
# => Matched block at 80082713, 54 bases
|
130
130
|
```
|
131
131
|
|
132
|
+
This can be done with [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html) as well:
|
133
|
+
|
134
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766
|
135
|
+
|
132
136
|
### Extract alignment blocks truncated to a given interval
|
133
137
|
|
134
138
|
Given a genomic interval of interest, one can also extract only the
|
@@ -144,6 +148,10 @@ puts "Got #{blocks.size} blocks, first #{blocks.first.ref_seq.size} base pairs."
|
|
144
148
|
# => Got 2 blocks, first 18 base pairs.
|
145
149
|
```
|
146
150
|
|
151
|
+
Or, with [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
152
|
+
|
153
|
+
$ maf_extract -d test/data --mode slice --interval mm8.chr7:80082592-80082766
|
154
|
+
|
147
155
|
### Filter species returned in alignment blocks
|
148
156
|
|
149
157
|
```ruby
|
@@ -159,6 +167,10 @@ puts "Block has #{block.sequences.size} sequences."
|
|
159
167
|
# => Block has 3 sequences.
|
160
168
|
```
|
161
169
|
|
170
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
171
|
+
|
172
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 --only-species hg18,mm8,rheMac2
|
173
|
+
|
162
174
|
### Extract blocks matching certain conditions
|
163
175
|
|
164
176
|
See also the [Cucumber feature][] and [step definitions][] for this.
|
@@ -176,6 +188,10 @@ n_blocks = access.find(q).count
|
|
176
188
|
# => 1
|
177
189
|
```
|
178
190
|
|
191
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
192
|
+
|
193
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082471-80082730 --with-all-species panTro2,loxAfr1
|
194
|
+
|
179
195
|
#### Match only blocks with a certain number of sequences
|
180
196
|
|
181
197
|
```ruby
|
@@ -186,6 +202,10 @@ n_blocks = access.find(q).count
|
|
186
202
|
# => 1
|
187
203
|
```
|
188
204
|
|
205
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
206
|
+
|
207
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082767-80083008 --min-sequences 6
|
208
|
+
|
189
209
|
#### Match only blocks within a text size range
|
190
210
|
|
191
211
|
```ruby
|
@@ -196,6 +216,10 @@ n_blocks = access.find(q).count
|
|
196
216
|
# => 3
|
197
217
|
```
|
198
218
|
|
219
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
220
|
+
|
221
|
+
$ maf_extract -d test/data --interval mm8.chr7:0-80100000 --min-text-size 72 --max-text-size 160
|
222
|
+
|
199
223
|
### Process each block in a MAF file
|
200
224
|
|
201
225
|
```ruby
|
@@ -333,6 +357,7 @@ end
|
|
333
357
|
Man pages for command line tools:
|
334
358
|
|
335
359
|
* [`maf_index(1)`](http://csw.github.com/bioruby-maf/man/maf_index.1.html)
|
360
|
+
* [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html)
|
336
361
|
* [`maf_to_fasta(1)`](http://csw.github.com/bioruby-maf/man/maf_to_fasta.1.html)
|
337
362
|
* [`maf_tile(1)`](http://csw.github.com/bioruby-maf/man/maf_tile.1.html)
|
338
363
|
|
@@ -377,4 +402,3 @@ This Biogem is published at [biogems.info](http://biogems.info/index.html#bio-ma
|
|
377
402
|
## Copyright
|
378
403
|
|
379
404
|
Copyright (c) 2012 Clayton Wheeler. See LICENSE.txt for further details.
|
380
|
-
|
data/bin/maf_extract
CHANGED
@@ -22,8 +22,11 @@ def handle_list_spec(spec)
|
|
22
22
|
end
|
23
23
|
|
24
24
|
def handle_interval_spec(int)
|
25
|
-
|
26
|
-
|
25
|
+
if int =~ /(.+):(\d+)-(\d+)/
|
26
|
+
Bio::GenomicInterval.zero_based($1, $2.to_i, $3.to_i)
|
27
|
+
else
|
28
|
+
raise "Invalid interval specification: #{int}"
|
29
|
+
end
|
27
30
|
end
|
28
31
|
|
29
32
|
$op = OptionParser.new do |opts|
|
data/bio-maf.gemspec
CHANGED
data/man/maf_extract.1
CHANGED
@@ -7,13 +7,13 @@
|
|
7
7
|
\fBmaf_extract\fR \- extract blocks from MAF files
|
8
8
|
.
|
9
9
|
.SH "SYNOPSIS"
|
10
|
-
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-interval SEQ:START
|
10
|
+
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-interval SEQ:START\-END \fIOPTIONS\fR
|
11
11
|
.
|
12
12
|
.P
|
13
13
|
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-bed BED \fIOPTIONS\fR
|
14
14
|
.
|
15
15
|
.P
|
16
|
-
\fBmaf_extract\fR \-d MAFDIR \-\-interval SEQ:START
|
16
|
+
\fBmaf_extract\fR \-d MAFDIR \-\-interval SEQ:START\-END \fIOPTIONS\fR
|
17
17
|
.
|
18
18
|
.P
|
19
19
|
\fBmaf_extract\fR \-d MAFDIR \-\-bed BED \fIOPTIONS\fR
|
@@ -69,7 +69,7 @@ The extraction mode to use\. With \fB\-\-mode intersect\fR, any alignment block
|
|
69
69
|
The specified file will be parsed as a BED file, and each interval it contains will be matched in turn\.
|
70
70
|
.
|
71
71
|
.TP
|
72
|
-
\fB\-\-interval SEQ:START
|
72
|
+
\fB\-\-interval SEQ:START\-END\fR
|
73
73
|
A single zero\-based half\-open genomic interval will be matched, with sequence identifier \fIseq\fR, (inclusive) start position \fIstart\fR, and (exclusive) end position \fIend\fR\.
|
74
74
|
.
|
75
75
|
.P
|
@@ -141,7 +141,116 @@ Run verbosely, with additional informational messages\.
|
|
141
141
|
Log debugging information\.
|
142
142
|
.
|
143
143
|
.SH "EXAMPLES"
|
144
|
-
|
144
|
+
Extract MAF blocks intersecting with a given interval:
|
145
|
+
.
|
146
|
+
.IP "" 4
|
147
|
+
.
|
148
|
+
.nf
|
149
|
+
|
150
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766
|
151
|
+
.
|
152
|
+
.fi
|
153
|
+
.
|
154
|
+
.IP "" 0
|
155
|
+
.
|
156
|
+
.P
|
157
|
+
As above, but operating on a single file:
|
158
|
+
.
|
159
|
+
.IP "" 4
|
160
|
+
.
|
161
|
+
.nf
|
162
|
+
|
163
|
+
$ maf_extract \-m test/data/mm8_chr7_tiny\.maf \e
|
164
|
+
\-i test/data/mm8_chr7_tiny\.kct \e
|
165
|
+
\-\-interval mm8\.chr7:80082592\-80082766
|
166
|
+
.
|
167
|
+
.fi
|
168
|
+
.
|
169
|
+
.IP "" 0
|
170
|
+
.
|
171
|
+
.P
|
172
|
+
Like the first case, but writing output to a file:
|
173
|
+
.
|
174
|
+
.IP "" 4
|
175
|
+
.
|
176
|
+
.nf
|
177
|
+
|
178
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e
|
179
|
+
\-\-output out\.maf
|
180
|
+
.
|
181
|
+
.fi
|
182
|
+
.
|
183
|
+
.IP "" 0
|
184
|
+
.
|
185
|
+
.P
|
186
|
+
Extract a slice of MAF blocks over a given interval:
|
187
|
+
.
|
188
|
+
.IP "" 4
|
189
|
+
.
|
190
|
+
.nf
|
191
|
+
|
192
|
+
$ maf_extract \-d test/data \-\-mode slice \e
|
193
|
+
\-\-interval mm8\.chr7:80082592\-80082766
|
194
|
+
.
|
195
|
+
.fi
|
196
|
+
.
|
197
|
+
.IP "" 0
|
198
|
+
.
|
199
|
+
.P
|
200
|
+
Filter for sequences from only certain species:
|
201
|
+
.
|
202
|
+
.IP "" 4
|
203
|
+
.
|
204
|
+
.nf
|
205
|
+
|
206
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e
|
207
|
+
\-\-only\-species hg18,mm8,rheMac2
|
208
|
+
.
|
209
|
+
.fi
|
210
|
+
.
|
211
|
+
.IP "" 0
|
212
|
+
.
|
213
|
+
.P
|
214
|
+
Extract only blocks with all specified species:
|
215
|
+
.
|
216
|
+
.IP "" 4
|
217
|
+
.
|
218
|
+
.nf
|
219
|
+
|
220
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082471\-80082730 \e
|
221
|
+
\-\-with\-all\-species panTro2,loxAfr1
|
222
|
+
.
|
223
|
+
.fi
|
224
|
+
.
|
225
|
+
.IP "" 0
|
226
|
+
.
|
227
|
+
.P
|
228
|
+
Extract blocks with at least a certain number of sequences:
|
229
|
+
.
|
230
|
+
.IP "" 4
|
231
|
+
.
|
232
|
+
.nf
|
233
|
+
|
234
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082767\-80083008 \e
|
235
|
+
\-\-min\-sequences 6
|
236
|
+
.
|
237
|
+
.fi
|
238
|
+
.
|
239
|
+
.IP "" 0
|
240
|
+
.
|
241
|
+
.P
|
242
|
+
Extract blocks with text sizes in a certain range:
|
243
|
+
.
|
244
|
+
.IP "" 4
|
245
|
+
.
|
246
|
+
.nf
|
247
|
+
|
248
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:0\-80100000 \e
|
249
|
+
\-\-min\-text\-size 72 \-\-max\-text\-size 160
|
250
|
+
.
|
251
|
+
.fi
|
252
|
+
.
|
253
|
+
.IP "" 0
|
145
254
|
.
|
146
255
|
.SH "ENVIRONMENT"
|
147
256
|
\fBmaf_index\fR is a Ruby program and relies on ordinary Ruby environment variables\.
|
data/man/maf_extract.1.ronn
CHANGED
@@ -3,11 +3,11 @@ maf_extract(1) -- extract blocks from MAF files
|
|
3
3
|
|
4
4
|
## SYNOPSIS
|
5
5
|
|
6
|
-
`maf_extract` -m MAF [-i INDEX] --interval SEQ:START
|
6
|
+
`maf_extract` -m MAF [-i INDEX] --interval SEQ:START-END [OPTIONS]
|
7
7
|
|
8
8
|
`maf_extract` -m MAF [-i INDEX] --bed BED [OPTIONS]
|
9
9
|
|
10
|
-
`maf_extract` -d MAFDIR --interval SEQ:START
|
10
|
+
`maf_extract` -d MAFDIR --interval SEQ:START-END [OPTIONS]
|
11
11
|
|
12
12
|
`maf_extract` -d MAFDIR --bed BED [OPTIONS]
|
13
13
|
|
@@ -79,7 +79,7 @@ Extraction options:
|
|
79
79
|
The specified file will be parsed as a BED file, and each interval
|
80
80
|
it contains will be matched in turn.
|
81
81
|
|
82
|
-
* `--interval SEQ:START
|
82
|
+
* `--interval SEQ:START-END`:
|
83
83
|
A single zero-based half-open genomic interval will be matched,
|
84
84
|
with sequence identifier <seq>, (inclusive) start position <start>,
|
85
85
|
and (exclusive) end position <end>.
|
@@ -153,7 +153,45 @@ Logging options:
|
|
153
153
|
|
154
154
|
## EXAMPLES
|
155
155
|
|
156
|
-
|
156
|
+
Extract MAF blocks intersecting with a given interval:
|
157
|
+
|
158
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766
|
159
|
+
|
160
|
+
As above, but operating on a single file:
|
161
|
+
|
162
|
+
$ maf_extract -m test/data/mm8_chr7_tiny.maf \
|
163
|
+
-i test/data/mm8_chr7_tiny.kct \
|
164
|
+
--interval mm8.chr7:80082592-80082766
|
165
|
+
|
166
|
+
Like the first case, but writing output to a file:
|
167
|
+
|
168
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 \
|
169
|
+
--output out.maf
|
170
|
+
|
171
|
+
Extract a slice of MAF blocks over a given interval:
|
172
|
+
|
173
|
+
$ maf_extract -d test/data --mode slice \
|
174
|
+
--interval mm8.chr7:80082592-80082766
|
175
|
+
|
176
|
+
Filter for sequences from only certain species:
|
177
|
+
|
178
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 \
|
179
|
+
--only-species hg18,mm8,rheMac2
|
180
|
+
|
181
|
+
Extract only blocks with all specified species:
|
182
|
+
|
183
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082471-80082730 \
|
184
|
+
--with-all-species panTro2,loxAfr1
|
185
|
+
|
186
|
+
Extract blocks with at least a certain number of sequences:
|
187
|
+
|
188
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082767-80083008 \
|
189
|
+
--min-sequences 6
|
190
|
+
|
191
|
+
Extract blocks with text sizes in a certain range:
|
192
|
+
|
193
|
+
$ maf_extract -d test/data --interval mm8.chr7:0-80100000 \
|
194
|
+
--min-text-size 72 --max-text-size 160
|
157
195
|
|
158
196
|
## ENVIRONMENT
|
159
197
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-maf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -219,7 +219,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
219
219
|
version: '0'
|
220
220
|
segments:
|
221
221
|
- 0
|
222
|
-
hash:
|
222
|
+
hash: 2092820657742105268
|
223
223
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
224
224
|
none: false
|
225
225
|
requirements:
|