bio-sra 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +36 -15
  3. data/VERSION +1 -1
  4. data/bin/sra_download +18 -1
  5. metadata +1 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 48531f3fc7b4facb8b4ed2b4523f6ab5eabb6897
4
- data.tar.gz: 1ce0ebaccc4601144aa0150caf2913e2c3d96de1
3
+ metadata.gz: 7ffa909e8f796caf46d17335cd68a8075cc5da00
4
+ data.tar.gz: fdf1328d17f032197485359f73c313533649ca23
5
5
  SHA512:
6
- metadata.gz: 3a81f703d495b138cb4a2d53ec2cced77fe888cd6f505ca28c9bec2bb20cddff4ca3b57c2b7cfcf31f8eda5fe00679d1bab503c91423f1f66ab141028ddf10c0
7
- data.tar.gz: 7f4564e2b124d49b6313548f88020202942092ac41788f6d3228ce9608f8c92c955eadc0e5e789cc05744eb312341511a162848b783fdb3d1550bb317834a2c3
6
+ metadata.gz: 5b457ed258b8fa212996f142b69049c9e1b86abd36b3f316522cecb8bfd009349a8920e112ee34e977c1300075a5a1240c93c6707d9f3d82c4e8f9eb4b1fd6b3
7
+ data.tar.gz: 980aa9f9c2b0266b558ae16f7d0b643adf9f78837c67e21638a4be33b5fed224b1e49ca6aa39cef91bbdf17e102630ceb4c6d6eb2b6d38afb95d0f27011af2cb
data/README.md CHANGED
@@ -1,20 +1,18 @@
1
1
  # bio-sra
2
2
 
3
- [![Build Status](https://secure.travis-ci.org/wwood/bioruby-sra.png)](http://travis-ci.org/wwood/bioruby-sra)
4
-
5
3
  A Sequence Read Archive (SRA) download script and Ruby interface to the [SRAdb](ncbi.nlm.nih.gov/pmc/articles/PMC3560148/) (SRA metadata) SQLite database.
6
4
 
7
5
  ## Installation
8
6
 
9
7
  ```sh
10
- gem install bio-sra
8
+ $ gem install bio-sra
11
9
  ```
12
10
 
13
11
  ## Download script usage
14
12
 
15
13
  Download a single run file to the current directory:
16
14
  ```sh
17
- sra_download --runs ERR229501.sra
15
+ $ sra_download ERR229501
18
16
  ```
19
17
 
20
18
  Download a list of runs
@@ -22,19 +20,45 @@ Download a list of runs
22
20
  $ cat srr_list.txt
23
21
  ERR229501
24
22
  ERR229498
25
- $ sra_download --runs -f srr_list.txt
23
+ $ sra_download -f srr_list.txt
26
24
  ```
27
25
 
28
- Download all runs that are a part of the experiment ERP001779 (Microbial biogeography of public restroom surfaces)
26
+ Download all runs that are a part of the experiment ERP001779 "Microbial biogeography of public restroom surfaces". This requires an [SRAdb](http://www.bioconductor.org/packages/release/bioc/html/SRAdb.html) database (i.e. a database of the SRA metadata), which can be downloaded from
29
27
  ```sh
30
- $ sra_download ERP001779
28
+ $ sra_download -d '/path/to/SRAmetadb.sqlite' ERP001779
31
29
  ```
32
- This finds ERP001779 and links it to runs through the SRAdb
30
+ The SRAdb SQLite file can be downloaded from these mirrors:
31
+ * http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz
32
+ * http://watson.nci.nih.gov/~zhujack/SRAmetadb.sqlite.gz
33
+ * http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz
33
34
 
34
35
  ## Ruby interface script
35
36
 
36
37
  ```ruby
37
38
  require 'bio-sra'
39
+
40
+ # Connect to the database
41
+ Bio::SRA::Connection.connect '/path/to/SRAmetadb.sqlite'
42
+ ```
43
+ Once connected, the each row of the Bio::SRA::Tables::SRA table represents an SRA run:
44
+ ```
45
+ Bio::SRA::Tables::SRA.first.run_accession
46
+ # => "DRR000001"
47
+
48
+ Bio::SRA::Tables::SRA.first.submission_accession
49
+ # => "DRA000001"
50
+
51
+ Bio::SRA::Tables::SRA.first.submission_date
52
+ # => "2009-06-20"
53
+
54
+ Bio::SRA::Tables::SRA.first.submission_comment
55
+ # => "Bacillus subtilis subsp. natto BEST195 draft sequence, the chromosome and plasmid pBEST195S"
56
+ ```
57
+ There is a description of each available table on the [wiki](https://github.com/wwood/bioruby-sra/wiki).
58
+
59
+ There are also methods for working with accession numbers, e.g.
60
+ ```ruby
61
+ Bio::SRA::Accession.classify_accession_type('ERP001779') #=> :study_accession
38
62
  ```
39
63
 
40
64
  The API doc is online. For more code examples see the test files in
@@ -47,20 +71,17 @@ how to contribute, see
47
71
 
48
72
  http://github.com/wwood/bioruby-sra
49
73
 
50
- The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
51
-
52
74
  ## Cite
53
75
 
54
- This Ruby code is unpublished, but there's a problem with
76
+ This Ruby code is unpublished, but citing the SRAdb paper is probably good practice:
55
77
 
56
- * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
57
- * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
78
+ * [SRAdb: query and use public next-generation sequencing data from within R](dx.doi.org/10.1186/1471-2105-14-19)
58
79
 
59
80
  ## Biogems.info
60
81
 
61
- This Biogem is published at [#bio-sra](http://biogems.info/index.html)
82
+ This Biogem is published at [biogems.info](http://biogems.info/index.html)
62
83
 
63
84
  ## Copyright
64
85
 
65
- Copyright (c) 2012 Ben J. Woodcroft. See LICENSE.txt for further details.
86
+ Copyright (c) 2012-2014 Ben J. Woodcroft. See LICENSE.txt for further details.
66
87
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.2.0
@@ -27,6 +27,9 @@ Download data from SRA \n"
27
27
  opts.on('-f', "--file FILENAME", "Provide a file of accession numbers, separated by whitespace or commas [default: not used, use the first argument <SRA_ACCESSION>]") do |f|
28
28
  options[:accessions_file] = f
29
29
  end
30
+ opts.on('-d', '--db SRAmetaDB_PATH', "Path to the SRAmetadb downloaded from NCBI e.g. from the URL [required unless all accessions are runs (rather than e.g. studies or submissions)]") do |arg|
31
+ options[:sradb] = arg
32
+ end
30
33
  opts.on("--format FORMAT", "format for download [default: 'sra']") do |f|
31
34
  format_string_to_sym = {
32
35
  'sralite' => :sralite, # no longer supported by NCBI?
@@ -89,7 +92,21 @@ end
89
92
 
90
93
  # Connect to the database if required
91
94
  log.info "Connecting to database.."
92
- Bio::SRA::Connection.connect unless options[:treat_input_as_runs]
95
+ unless options[:treat_input_as_runs]
96
+ if options[:sradb]
97
+ Bio::SRA::Connection.connect options[:sradb]
98
+ else
99
+ Bio::SRA::Connection.connect
100
+ end
101
+
102
+ # Check for connection
103
+ begin
104
+ s = Bio::SRA::Tables::SRA.first
105
+ rescue
106
+ log.error "There was a problem connecting to the database at `#{options[:sradb] }', was it specified correctly?"
107
+ exit 2
108
+ end
109
+ end
93
110
 
94
111
  log.info "Collecting a list of runs to download.."
95
112
  runs = []
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-sra
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben J. Woodcroft