bio-sra 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +36 -15
- data/VERSION +1 -1
- data/bin/sra_download +18 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7ffa909e8f796caf46d17335cd68a8075cc5da00
|
4
|
+
data.tar.gz: fdf1328d17f032197485359f73c313533649ca23
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5b457ed258b8fa212996f142b69049c9e1b86abd36b3f316522cecb8bfd009349a8920e112ee34e977c1300075a5a1240c93c6707d9f3d82c4e8f9eb4b1fd6b3
|
7
|
+
data.tar.gz: 980aa9f9c2b0266b558ae16f7d0b643adf9f78837c67e21638a4be33b5fed224b1e49ca6aa39cef91bbdf17e102630ceb4c6d6eb2b6d38afb95d0f27011af2cb
|
data/README.md
CHANGED
@@ -1,20 +1,18 @@
|
|
1
1
|
# bio-sra
|
2
2
|
|
3
|
-
[](http://travis-ci.org/wwood/bioruby-sra)
|
4
|
-
|
5
3
|
A Sequence Read Archive (SRA) download script and Ruby interface to the [SRAdb](ncbi.nlm.nih.gov/pmc/articles/PMC3560148/) (SRA metadata) SQLite database.
|
6
4
|
|
7
5
|
## Installation
|
8
6
|
|
9
7
|
```sh
|
10
|
-
gem install bio-sra
|
8
|
+
$ gem install bio-sra
|
11
9
|
```
|
12
10
|
|
13
11
|
## Download script usage
|
14
12
|
|
15
13
|
Download a single run file to the current directory:
|
16
14
|
```sh
|
17
|
-
sra_download
|
15
|
+
$ sra_download ERR229501
|
18
16
|
```
|
19
17
|
|
20
18
|
Download a list of runs
|
@@ -22,19 +20,45 @@ Download a list of runs
|
|
22
20
|
$ cat srr_list.txt
|
23
21
|
ERR229501
|
24
22
|
ERR229498
|
25
|
-
$ sra_download
|
23
|
+
$ sra_download -f srr_list.txt
|
26
24
|
```
|
27
25
|
|
28
|
-
Download all runs that are a part of the experiment ERP001779
|
26
|
+
Download all runs that are a part of the experiment ERP001779 "Microbial biogeography of public restroom surfaces". This requires an [SRAdb](http://www.bioconductor.org/packages/release/bioc/html/SRAdb.html) database (i.e. a database of the SRA metadata), which can be downloaded from
|
29
27
|
```sh
|
30
|
-
$ sra_download ERP001779
|
28
|
+
$ sra_download -d '/path/to/SRAmetadb.sqlite' ERP001779
|
31
29
|
```
|
32
|
-
|
30
|
+
The SRAdb SQLite file can be downloaded from these mirrors:
|
31
|
+
* http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz
|
32
|
+
* http://watson.nci.nih.gov/~zhujack/SRAmetadb.sqlite.gz
|
33
|
+
* http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz
|
33
34
|
|
34
35
|
## Ruby interface script
|
35
36
|
|
36
37
|
```ruby
|
37
38
|
require 'bio-sra'
|
39
|
+
|
40
|
+
# Connect to the database
|
41
|
+
Bio::SRA::Connection.connect '/path/to/SRAmetadb.sqlite'
|
42
|
+
```
|
43
|
+
Once connected, the each row of the Bio::SRA::Tables::SRA table represents an SRA run:
|
44
|
+
```
|
45
|
+
Bio::SRA::Tables::SRA.first.run_accession
|
46
|
+
# => "DRR000001"
|
47
|
+
|
48
|
+
Bio::SRA::Tables::SRA.first.submission_accession
|
49
|
+
# => "DRA000001"
|
50
|
+
|
51
|
+
Bio::SRA::Tables::SRA.first.submission_date
|
52
|
+
# => "2009-06-20"
|
53
|
+
|
54
|
+
Bio::SRA::Tables::SRA.first.submission_comment
|
55
|
+
# => "Bacillus subtilis subsp. natto BEST195 draft sequence, the chromosome and plasmid pBEST195S"
|
56
|
+
```
|
57
|
+
There is a description of each available table on the [wiki](https://github.com/wwood/bioruby-sra/wiki).
|
58
|
+
|
59
|
+
There are also methods for working with accession numbers, e.g.
|
60
|
+
```ruby
|
61
|
+
Bio::SRA::Accession.classify_accession_type('ERP001779') #=> :study_accession
|
38
62
|
```
|
39
63
|
|
40
64
|
The API doc is online. For more code examples see the test files in
|
@@ -47,20 +71,17 @@ how to contribute, see
|
|
47
71
|
|
48
72
|
http://github.com/wwood/bioruby-sra
|
49
73
|
|
50
|
-
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
51
|
-
|
52
74
|
## Cite
|
53
75
|
|
54
|
-
This Ruby code is unpublished, but
|
76
|
+
This Ruby code is unpublished, but citing the SRAdb paper is probably good practice:
|
55
77
|
|
56
|
-
* [
|
57
|
-
* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
|
78
|
+
* [SRAdb: query and use public next-generation sequencing data from within R](dx.doi.org/10.1186/1471-2105-14-19)
|
58
79
|
|
59
80
|
## Biogems.info
|
60
81
|
|
61
|
-
This Biogem is published at [
|
82
|
+
This Biogem is published at [biogems.info](http://biogems.info/index.html)
|
62
83
|
|
63
84
|
## Copyright
|
64
85
|
|
65
|
-
Copyright (c) 2012 Ben J. Woodcroft. See LICENSE.txt for further details.
|
86
|
+
Copyright (c) 2012-2014 Ben J. Woodcroft. See LICENSE.txt for further details.
|
66
87
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
data/bin/sra_download
CHANGED
@@ -27,6 +27,9 @@ Download data from SRA \n"
|
|
27
27
|
opts.on('-f', "--file FILENAME", "Provide a file of accession numbers, separated by whitespace or commas [default: not used, use the first argument <SRA_ACCESSION>]") do |f|
|
28
28
|
options[:accessions_file] = f
|
29
29
|
end
|
30
|
+
opts.on('-d', '--db SRAmetaDB_PATH', "Path to the SRAmetadb downloaded from NCBI e.g. from the URL [required unless all accessions are runs (rather than e.g. studies or submissions)]") do |arg|
|
31
|
+
options[:sradb] = arg
|
32
|
+
end
|
30
33
|
opts.on("--format FORMAT", "format for download [default: 'sra']") do |f|
|
31
34
|
format_string_to_sym = {
|
32
35
|
'sralite' => :sralite, # no longer supported by NCBI?
|
@@ -89,7 +92,21 @@ end
|
|
89
92
|
|
90
93
|
# Connect to the database if required
|
91
94
|
log.info "Connecting to database.."
|
92
|
-
|
95
|
+
unless options[:treat_input_as_runs]
|
96
|
+
if options[:sradb]
|
97
|
+
Bio::SRA::Connection.connect options[:sradb]
|
98
|
+
else
|
99
|
+
Bio::SRA::Connection.connect
|
100
|
+
end
|
101
|
+
|
102
|
+
# Check for connection
|
103
|
+
begin
|
104
|
+
s = Bio::SRA::Tables::SRA.first
|
105
|
+
rescue
|
106
|
+
log.error "There was a problem connecting to the database at `#{options[:sradb] }', was it specified correctly?"
|
107
|
+
exit 2
|
108
|
+
end
|
109
|
+
end
|
93
110
|
|
94
111
|
log.info "Collecting a list of runs to download.."
|
95
112
|
runs = []
|