bio-maf 0.3.1 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/DEVELOPMENT.md +4 -0
- data/README.md +25 -1
- data/bin/maf_extract +5 -2
- data/bio-maf.gemspec +1 -1
- data/man/maf_extract.1 +113 -4
- data/man/maf_extract.1.ronn +42 -4
- metadata +2 -2
data/DEVELOPMENT.md
CHANGED
data/README.md
CHANGED
@@ -129,6 +129,10 @@ end
|
|
129
129
|
# => Matched block at 80082713, 54 bases
|
130
130
|
```
|
131
131
|
|
132
|
+
This can be done with [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html) as well:
|
133
|
+
|
134
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766
|
135
|
+
|
132
136
|
### Extract alignment blocks truncated to a given interval
|
133
137
|
|
134
138
|
Given a genomic interval of interest, one can also extract only the
|
@@ -144,6 +148,10 @@ puts "Got #{blocks.size} blocks, first #{blocks.first.ref_seq.size} base pairs."
|
|
144
148
|
# => Got 2 blocks, first 18 base pairs.
|
145
149
|
```
|
146
150
|
|
151
|
+
Or, with [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
152
|
+
|
153
|
+
$ maf_extract -d test/data --mode slice --interval mm8.chr7:80082592-80082766
|
154
|
+
|
147
155
|
### Filter species returned in alignment blocks
|
148
156
|
|
149
157
|
```ruby
|
@@ -159,6 +167,10 @@ puts "Block has #{block.sequences.size} sequences."
|
|
159
167
|
# => Block has 3 sequences.
|
160
168
|
```
|
161
169
|
|
170
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
171
|
+
|
172
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 --only-species hg18,mm8,rheMac2
|
173
|
+
|
162
174
|
### Extract blocks matching certain conditions
|
163
175
|
|
164
176
|
See also the [Cucumber feature][] and [step definitions][] for this.
|
@@ -176,6 +188,10 @@ n_blocks = access.find(q).count
|
|
176
188
|
# => 1
|
177
189
|
```
|
178
190
|
|
191
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
192
|
+
|
193
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082471-80082730 --with-all-species panTro2,loxAfr1
|
194
|
+
|
179
195
|
#### Match only blocks with a certain number of sequences
|
180
196
|
|
181
197
|
```ruby
|
@@ -186,6 +202,10 @@ n_blocks = access.find(q).count
|
|
186
202
|
# => 1
|
187
203
|
```
|
188
204
|
|
205
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
206
|
+
|
207
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082767-80083008 --min-sequences 6
|
208
|
+
|
189
209
|
#### Match only blocks within a text size range
|
190
210
|
|
191
211
|
```ruby
|
@@ -196,6 +216,10 @@ n_blocks = access.find(q).count
|
|
196
216
|
# => 3
|
197
217
|
```
|
198
218
|
|
219
|
+
With [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html):
|
220
|
+
|
221
|
+
$ maf_extract -d test/data --interval mm8.chr7:0-80100000 --min-text-size 72 --max-text-size 160
|
222
|
+
|
199
223
|
### Process each block in a MAF file
|
200
224
|
|
201
225
|
```ruby
|
@@ -333,6 +357,7 @@ end
|
|
333
357
|
Man pages for command line tools:
|
334
358
|
|
335
359
|
* [`maf_index(1)`](http://csw.github.com/bioruby-maf/man/maf_index.1.html)
|
360
|
+
* [`maf_extract(1)`](http://csw.github.com/bioruby-maf/man/maf_extract.1.html)
|
336
361
|
* [`maf_to_fasta(1)`](http://csw.github.com/bioruby-maf/man/maf_to_fasta.1.html)
|
337
362
|
* [`maf_tile(1)`](http://csw.github.com/bioruby-maf/man/maf_tile.1.html)
|
338
363
|
|
@@ -377,4 +402,3 @@ This Biogem is published at [biogems.info](http://biogems.info/index.html#bio-ma
|
|
377
402
|
## Copyright
|
378
403
|
|
379
404
|
Copyright (c) 2012 Clayton Wheeler. See LICENSE.txt for further details.
|
380
|
-
|
data/bin/maf_extract
CHANGED
@@ -22,8 +22,11 @@ def handle_list_spec(spec)
|
|
22
22
|
end
|
23
23
|
|
24
24
|
def handle_interval_spec(int)
|
25
|
-
|
26
|
-
|
25
|
+
if int =~ /(.+):(\d+)-(\d+)/
|
26
|
+
Bio::GenomicInterval.zero_based($1, $2.to_i, $3.to_i)
|
27
|
+
else
|
28
|
+
raise "Invalid interval specification: #{int}"
|
29
|
+
end
|
27
30
|
end
|
28
31
|
|
29
32
|
$op = OptionParser.new do |opts|
|
data/bio-maf.gemspec
CHANGED
data/man/maf_extract.1
CHANGED
@@ -7,13 +7,13 @@
|
|
7
7
|
\fBmaf_extract\fR \- extract blocks from MAF files
|
8
8
|
.
|
9
9
|
.SH "SYNOPSIS"
|
10
|
-
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-interval SEQ:START
|
10
|
+
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-interval SEQ:START\-END \fIOPTIONS\fR
|
11
11
|
.
|
12
12
|
.P
|
13
13
|
\fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-bed BED \fIOPTIONS\fR
|
14
14
|
.
|
15
15
|
.P
|
16
|
-
\fBmaf_extract\fR \-d MAFDIR \-\-interval SEQ:START
|
16
|
+
\fBmaf_extract\fR \-d MAFDIR \-\-interval SEQ:START\-END \fIOPTIONS\fR
|
17
17
|
.
|
18
18
|
.P
|
19
19
|
\fBmaf_extract\fR \-d MAFDIR \-\-bed BED \fIOPTIONS\fR
|
@@ -69,7 +69,7 @@ The extraction mode to use\. With \fB\-\-mode intersect\fR, any alignment block
|
|
69
69
|
The specified file will be parsed as a BED file, and each interval it contains will be matched in turn\.
|
70
70
|
.
|
71
71
|
.TP
|
72
|
-
\fB\-\-interval SEQ:START
|
72
|
+
\fB\-\-interval SEQ:START\-END\fR
|
73
73
|
A single zero\-based half\-open genomic interval will be matched, with sequence identifier \fIseq\fR, (inclusive) start position \fIstart\fR, and (exclusive) end position \fIend\fR\.
|
74
74
|
.
|
75
75
|
.P
|
@@ -141,7 +141,116 @@ Run verbosely, with additional informational messages\.
|
|
141
141
|
Log debugging information\.
|
142
142
|
.
|
143
143
|
.SH "EXAMPLES"
|
144
|
-
|
144
|
+
Extract MAF blocks intersecting with a given interval:
|
145
|
+
.
|
146
|
+
.IP "" 4
|
147
|
+
.
|
148
|
+
.nf
|
149
|
+
|
150
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766
|
151
|
+
.
|
152
|
+
.fi
|
153
|
+
.
|
154
|
+
.IP "" 0
|
155
|
+
.
|
156
|
+
.P
|
157
|
+
As above, but operating on a single file:
|
158
|
+
.
|
159
|
+
.IP "" 4
|
160
|
+
.
|
161
|
+
.nf
|
162
|
+
|
163
|
+
$ maf_extract \-m test/data/mm8_chr7_tiny\.maf \e
|
164
|
+
\-i test/data/mm8_chr7_tiny\.kct \e
|
165
|
+
\-\-interval mm8\.chr7:80082592\-80082766
|
166
|
+
.
|
167
|
+
.fi
|
168
|
+
.
|
169
|
+
.IP "" 0
|
170
|
+
.
|
171
|
+
.P
|
172
|
+
Like the first case, but writing output to a file:
|
173
|
+
.
|
174
|
+
.IP "" 4
|
175
|
+
.
|
176
|
+
.nf
|
177
|
+
|
178
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e
|
179
|
+
\-\-output out\.maf
|
180
|
+
.
|
181
|
+
.fi
|
182
|
+
.
|
183
|
+
.IP "" 0
|
184
|
+
.
|
185
|
+
.P
|
186
|
+
Extract a slice of MAF blocks over a given interval:
|
187
|
+
.
|
188
|
+
.IP "" 4
|
189
|
+
.
|
190
|
+
.nf
|
191
|
+
|
192
|
+
$ maf_extract \-d test/data \-\-mode slice \e
|
193
|
+
\-\-interval mm8\.chr7:80082592\-80082766
|
194
|
+
.
|
195
|
+
.fi
|
196
|
+
.
|
197
|
+
.IP "" 0
|
198
|
+
.
|
199
|
+
.P
|
200
|
+
Filter for sequences from only certain species:
|
201
|
+
.
|
202
|
+
.IP "" 4
|
203
|
+
.
|
204
|
+
.nf
|
205
|
+
|
206
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e
|
207
|
+
\-\-only\-species hg18,mm8,rheMac2
|
208
|
+
.
|
209
|
+
.fi
|
210
|
+
.
|
211
|
+
.IP "" 0
|
212
|
+
.
|
213
|
+
.P
|
214
|
+
Extract only blocks with all specified species:
|
215
|
+
.
|
216
|
+
.IP "" 4
|
217
|
+
.
|
218
|
+
.nf
|
219
|
+
|
220
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082471\-80082730 \e
|
221
|
+
\-\-with\-all\-species panTro2,loxAfr1
|
222
|
+
.
|
223
|
+
.fi
|
224
|
+
.
|
225
|
+
.IP "" 0
|
226
|
+
.
|
227
|
+
.P
|
228
|
+
Extract blocks with at least a certain number of sequences:
|
229
|
+
.
|
230
|
+
.IP "" 4
|
231
|
+
.
|
232
|
+
.nf
|
233
|
+
|
234
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:80082767\-80083008 \e
|
235
|
+
\-\-min\-sequences 6
|
236
|
+
.
|
237
|
+
.fi
|
238
|
+
.
|
239
|
+
.IP "" 0
|
240
|
+
.
|
241
|
+
.P
|
242
|
+
Extract blocks with text sizes in a certain range:
|
243
|
+
.
|
244
|
+
.IP "" 4
|
245
|
+
.
|
246
|
+
.nf
|
247
|
+
|
248
|
+
$ maf_extract \-d test/data \-\-interval mm8\.chr7:0\-80100000 \e
|
249
|
+
\-\-min\-text\-size 72 \-\-max\-text\-size 160
|
250
|
+
.
|
251
|
+
.fi
|
252
|
+
.
|
253
|
+
.IP "" 0
|
145
254
|
.
|
146
255
|
.SH "ENVIRONMENT"
|
147
256
|
\fBmaf_index\fR is a Ruby program and relies on ordinary Ruby environment variables\.
|
data/man/maf_extract.1.ronn
CHANGED
@@ -3,11 +3,11 @@ maf_extract(1) -- extract blocks from MAF files
|
|
3
3
|
|
4
4
|
## SYNOPSIS
|
5
5
|
|
6
|
-
`maf_extract` -m MAF [-i INDEX] --interval SEQ:START
|
6
|
+
`maf_extract` -m MAF [-i INDEX] --interval SEQ:START-END [OPTIONS]
|
7
7
|
|
8
8
|
`maf_extract` -m MAF [-i INDEX] --bed BED [OPTIONS]
|
9
9
|
|
10
|
-
`maf_extract` -d MAFDIR --interval SEQ:START
|
10
|
+
`maf_extract` -d MAFDIR --interval SEQ:START-END [OPTIONS]
|
11
11
|
|
12
12
|
`maf_extract` -d MAFDIR --bed BED [OPTIONS]
|
13
13
|
|
@@ -79,7 +79,7 @@ Extraction options:
|
|
79
79
|
The specified file will be parsed as a BED file, and each interval
|
80
80
|
it contains will be matched in turn.
|
81
81
|
|
82
|
-
* `--interval SEQ:START
|
82
|
+
* `--interval SEQ:START-END`:
|
83
83
|
A single zero-based half-open genomic interval will be matched,
|
84
84
|
with sequence identifier <seq>, (inclusive) start position <start>,
|
85
85
|
and (exclusive) end position <end>.
|
@@ -153,7 +153,45 @@ Logging options:
|
|
153
153
|
|
154
154
|
## EXAMPLES
|
155
155
|
|
156
|
-
|
156
|
+
Extract MAF blocks intersecting with a given interval:
|
157
|
+
|
158
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766
|
159
|
+
|
160
|
+
As above, but operating on a single file:
|
161
|
+
|
162
|
+
$ maf_extract -m test/data/mm8_chr7_tiny.maf \
|
163
|
+
-i test/data/mm8_chr7_tiny.kct \
|
164
|
+
--interval mm8.chr7:80082592-80082766
|
165
|
+
|
166
|
+
Like the first case, but writing output to a file:
|
167
|
+
|
168
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 \
|
169
|
+
--output out.maf
|
170
|
+
|
171
|
+
Extract a slice of MAF blocks over a given interval:
|
172
|
+
|
173
|
+
$ maf_extract -d test/data --mode slice \
|
174
|
+
--interval mm8.chr7:80082592-80082766
|
175
|
+
|
176
|
+
Filter for sequences from only certain species:
|
177
|
+
|
178
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082592-80082766 \
|
179
|
+
--only-species hg18,mm8,rheMac2
|
180
|
+
|
181
|
+
Extract only blocks with all specified species:
|
182
|
+
|
183
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082471-80082730 \
|
184
|
+
--with-all-species panTro2,loxAfr1
|
185
|
+
|
186
|
+
Extract blocks with at least a certain number of sequences:
|
187
|
+
|
188
|
+
$ maf_extract -d test/data --interval mm8.chr7:80082767-80083008 \
|
189
|
+
--min-sequences 6
|
190
|
+
|
191
|
+
Extract blocks with text sizes in a certain range:
|
192
|
+
|
193
|
+
$ maf_extract -d test/data --interval mm8.chr7:0-80100000 \
|
194
|
+
--min-text-size 72 --max-text-size 160
|
157
195
|
|
158
196
|
## ENVIRONMENT
|
159
197
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-maf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -219,7 +219,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
219
219
|
version: '0'
|
220
220
|
segments:
|
221
221
|
- 0
|
222
|
-
hash:
|
222
|
+
hash: 2092820657742105268
|
223
223
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
224
224
|
none: false
|
225
225
|
requirements:
|