bio-sra 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +36 -15
- data/VERSION +1 -1
- data/bin/sra_download +18 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7ffa909e8f796caf46d17335cd68a8075cc5da00
|
4
|
+
data.tar.gz: fdf1328d17f032197485359f73c313533649ca23
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5b457ed258b8fa212996f142b69049c9e1b86abd36b3f316522cecb8bfd009349a8920e112ee34e977c1300075a5a1240c93c6707d9f3d82c4e8f9eb4b1fd6b3
|
7
|
+
data.tar.gz: 980aa9f9c2b0266b558ae16f7d0b643adf9f78837c67e21638a4be33b5fed224b1e49ca6aa39cef91bbdf17e102630ceb4c6d6eb2b6d38afb95d0f27011af2cb
|
data/README.md
CHANGED
@@ -1,20 +1,18 @@
|
|
1
1
|
# bio-sra
|
2
2
|
|
3
|
-
[![Build Status](https://secure.travis-ci.org/wwood/bioruby-sra.png)](http://travis-ci.org/wwood/bioruby-sra)
|
4
|
-
|
5
3
|
A Sequence Read Archive (SRA) download script and Ruby interface to the [SRAdb](ncbi.nlm.nih.gov/pmc/articles/PMC3560148/) (SRA metadata) SQLite database.
|
6
4
|
|
7
5
|
## Installation
|
8
6
|
|
9
7
|
```sh
|
10
|
-
gem install bio-sra
|
8
|
+
$ gem install bio-sra
|
11
9
|
```
|
12
10
|
|
13
11
|
## Download script usage
|
14
12
|
|
15
13
|
Download a single run file to the current directory:
|
16
14
|
```sh
|
17
|
-
sra_download
|
15
|
+
$ sra_download ERR229501
|
18
16
|
```
|
19
17
|
|
20
18
|
Download a list of runs
|
@@ -22,19 +20,45 @@ Download a list of runs
|
|
22
20
|
$ cat srr_list.txt
|
23
21
|
ERR229501
|
24
22
|
ERR229498
|
25
|
-
$ sra_download
|
23
|
+
$ sra_download -f srr_list.txt
|
26
24
|
```
|
27
25
|
|
28
|
-
Download all runs that are a part of the experiment ERP001779
|
26
|
+
Download all runs that are a part of the experiment ERP001779 "Microbial biogeography of public restroom surfaces". This requires an [SRAdb](http://www.bioconductor.org/packages/release/bioc/html/SRAdb.html) database (i.e. a database of the SRA metadata), which can be downloaded from
|
29
27
|
```sh
|
30
|
-
$ sra_download ERP001779
|
28
|
+
$ sra_download -d '/path/to/SRAmetadb.sqlite' ERP001779
|
31
29
|
```
|
32
|
-
|
30
|
+
The SRAdb SQLite file can be downloaded from these mirrors:
|
31
|
+
* http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz
|
32
|
+
* http://watson.nci.nih.gov/~zhujack/SRAmetadb.sqlite.gz
|
33
|
+
* http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz
|
33
34
|
|
34
35
|
## Ruby interface script
|
35
36
|
|
36
37
|
```ruby
|
37
38
|
require 'bio-sra'
|
39
|
+
|
40
|
+
# Connect to the database
|
41
|
+
Bio::SRA::Connection.connect '/path/to/SRAmetadb.sqlite'
|
42
|
+
```
|
43
|
+
Once connected, the each row of the Bio::SRA::Tables::SRA table represents an SRA run:
|
44
|
+
```
|
45
|
+
Bio::SRA::Tables::SRA.first.run_accession
|
46
|
+
# => "DRR000001"
|
47
|
+
|
48
|
+
Bio::SRA::Tables::SRA.first.submission_accession
|
49
|
+
# => "DRA000001"
|
50
|
+
|
51
|
+
Bio::SRA::Tables::SRA.first.submission_date
|
52
|
+
# => "2009-06-20"
|
53
|
+
|
54
|
+
Bio::SRA::Tables::SRA.first.submission_comment
|
55
|
+
# => "Bacillus subtilis subsp. natto BEST195 draft sequence, the chromosome and plasmid pBEST195S"
|
56
|
+
```
|
57
|
+
There is a description of each available table on the [wiki](https://github.com/wwood/bioruby-sra/wiki).
|
58
|
+
|
59
|
+
There are also methods for working with accession numbers, e.g.
|
60
|
+
```ruby
|
61
|
+
Bio::SRA::Accession.classify_accession_type('ERP001779') #=> :study_accession
|
38
62
|
```
|
39
63
|
|
40
64
|
The API doc is online. For more code examples see the test files in
|
@@ -47,20 +71,17 @@ how to contribute, see
|
|
47
71
|
|
48
72
|
http://github.com/wwood/bioruby-sra
|
49
73
|
|
50
|
-
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
51
|
-
|
52
74
|
## Cite
|
53
75
|
|
54
|
-
This Ruby code is unpublished, but
|
76
|
+
This Ruby code is unpublished, but citing the SRAdb paper is probably good practice:
|
55
77
|
|
56
|
-
* [
|
57
|
-
* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
|
78
|
+
* [SRAdb: query and use public next-generation sequencing data from within R](dx.doi.org/10.1186/1471-2105-14-19)
|
58
79
|
|
59
80
|
## Biogems.info
|
60
81
|
|
61
|
-
This Biogem is published at [
|
82
|
+
This Biogem is published at [biogems.info](http://biogems.info/index.html)
|
62
83
|
|
63
84
|
## Copyright
|
64
85
|
|
65
|
-
Copyright (c) 2012 Ben J. Woodcroft. See LICENSE.txt for further details.
|
86
|
+
Copyright (c) 2012-2014 Ben J. Woodcroft. See LICENSE.txt for further details.
|
66
87
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
data/bin/sra_download
CHANGED
@@ -27,6 +27,9 @@ Download data from SRA \n"
|
|
27
27
|
opts.on('-f', "--file FILENAME", "Provide a file of accession numbers, separated by whitespace or commas [default: not used, use the first argument <SRA_ACCESSION>]") do |f|
|
28
28
|
options[:accessions_file] = f
|
29
29
|
end
|
30
|
+
opts.on('-d', '--db SRAmetaDB_PATH', "Path to the SRAmetadb downloaded from NCBI e.g. from the URL [required unless all accessions are runs (rather than e.g. studies or submissions)]") do |arg|
|
31
|
+
options[:sradb] = arg
|
32
|
+
end
|
30
33
|
opts.on("--format FORMAT", "format for download [default: 'sra']") do |f|
|
31
34
|
format_string_to_sym = {
|
32
35
|
'sralite' => :sralite, # no longer supported by NCBI?
|
@@ -89,7 +92,21 @@ end
|
|
89
92
|
|
90
93
|
# Connect to the database if required
|
91
94
|
log.info "Connecting to database.."
|
92
|
-
|
95
|
+
unless options[:treat_input_as_runs]
|
96
|
+
if options[:sradb]
|
97
|
+
Bio::SRA::Connection.connect options[:sradb]
|
98
|
+
else
|
99
|
+
Bio::SRA::Connection.connect
|
100
|
+
end
|
101
|
+
|
102
|
+
# Check for connection
|
103
|
+
begin
|
104
|
+
s = Bio::SRA::Tables::SRA.first
|
105
|
+
rescue
|
106
|
+
log.error "There was a problem connecting to the database at `#{options[:sradb] }', was it specified correctly?"
|
107
|
+
exit 2
|
108
|
+
end
|
109
|
+
end
|
93
110
|
|
94
111
|
log.info "Collecting a list of runs to download.."
|
95
112
|
runs = []
|