miga-base 0.2.0.6

Sign up to get free protection for your applications and to get access to all the features.
Files changed (52) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +351 -0
  3. data/actions/add_result +61 -0
  4. data/actions/add_taxonomy +86 -0
  5. data/actions/create_dataset +62 -0
  6. data/actions/create_project +70 -0
  7. data/actions/daemon +69 -0
  8. data/actions/download_dataset +77 -0
  9. data/actions/find_datasets +63 -0
  10. data/actions/import_datasets +86 -0
  11. data/actions/index_taxonomy +71 -0
  12. data/actions/list_datasets +83 -0
  13. data/actions/list_files +67 -0
  14. data/actions/unlink_dataset +52 -0
  15. data/bin/miga +48 -0
  16. data/lib/miga/daemon.rb +178 -0
  17. data/lib/miga/dataset.rb +286 -0
  18. data/lib/miga/gui.rb +289 -0
  19. data/lib/miga/metadata.rb +74 -0
  20. data/lib/miga/project.rb +268 -0
  21. data/lib/miga/remote_dataset.rb +154 -0
  22. data/lib/miga/result.rb +102 -0
  23. data/lib/miga/tax_index.rb +70 -0
  24. data/lib/miga/taxonomy.rb +107 -0
  25. data/lib/miga.rb +83 -0
  26. data/scripts/_distances_noref_nomulti.bash +86 -0
  27. data/scripts/_distances_ref_nomulti.bash +105 -0
  28. data/scripts/aai_distances.bash +40 -0
  29. data/scripts/ani_distances.bash +39 -0
  30. data/scripts/assembly.bash +38 -0
  31. data/scripts/cds.bash +45 -0
  32. data/scripts/clade_finding.bash +27 -0
  33. data/scripts/distances.bash +30 -0
  34. data/scripts/essential_genes.bash +29 -0
  35. data/scripts/haai_distances.bash +39 -0
  36. data/scripts/init.bash +211 -0
  37. data/scripts/miga.bash +12 -0
  38. data/scripts/mytaxa.bash +93 -0
  39. data/scripts/mytaxa_scan.bash +85 -0
  40. data/scripts/ogs.bash +36 -0
  41. data/scripts/read_quality.bash +37 -0
  42. data/scripts/ssu.bash +35 -0
  43. data/scripts/subclades.bash +26 -0
  44. data/scripts/trimmed_fasta.bash +47 -0
  45. data/scripts/trimmed_reads.bash +57 -0
  46. data/utils/adapters.fa +302 -0
  47. data/utils/mytaxa_scan.R +89 -0
  48. data/utils/mytaxa_scan.rb +58 -0
  49. data/utils/requirements.txt +19 -0
  50. data/utils/subclades-compile.rb +48 -0
  51. data/utils/subclades.R +171 -0
  52. metadata +185 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f80152072105bd365145133c00ddfcd432a008c0
4
+ data.tar.gz: 7444a990c359e6c9f2a6a595e688e6319df50ebb
5
+ SHA512:
6
+ metadata.gz: e4bb05e73def629ea39d72fac9d6e702b247051fb3b21b8db84195127e6d135d9b94bbf5037cde9c8f21e611cf580d53078c215d9c9187bdf861908ad42efe0c
7
+ data.tar.gz: ee27ea7cf9b98a3de760e18249e89f6410181d963017e86f5878710bf80a6d3ed5c1715d42ff7b394371add8709816d7c54e9fda5556ae6e4e96a3c4b384ca82
data/README.md ADDED
@@ -0,0 +1,351 @@
1
+ [![Code Climate](https://codeclimate.com/github/bio-miga/miga/badges/gpa.svg)](https://codeclimate.com/github/bio-miga/miga)
2
+ [![Test Coverage](https://codeclimate.com/github/bio-miga/miga/badges/coverage.svg)](https://codeclimate.com/github/bio-miga/miga/coverage)
3
+ [![Build Status](https://travis-ci.org/lmrodriguezr/gfa.svg?branch=master)](https://travis-ci.org/lmrodriguezr/gfa)
4
+
5
+ MiGA: Microbial Genomes Atlas
6
+ =============================
7
+
8
+
9
+ Installation
10
+ ------------
11
+
12
+ Please see [INSTALLATION.md](./INSTALLATION.md) for instructions.
13
+
14
+
15
+ Getting started with MiGA
16
+ -------------------------
17
+
18
+ ### MiGA Interfaces
19
+
20
+ You caninteract with MiGA through different interfaces. These interfaces have
21
+ different purposes, but they also have some degree of overlap, because different
22
+ users with different aims sometimes want to do the same thing. Throughout this
23
+ manual I'll be telling you how to do things using mostly the CLI, but I'll also
24
+ try to mention the GUI and the Web Interface. The CLI is the most comprehensive
25
+ and flexible interface, but the other two are friendlier to humans. There is a
26
+ fourth interface that I won't be mentioning at all, but I'll try to document:
27
+ the Ruby API. MiGA is mostly written in Ruby, with an object-oriented approach,
28
+ and all the interfaces are just thin layers atop the Ruby core. That means that
29
+ you can write your own interfaces (or pieces) if you know how to talk to these
30
+ Ruby objects. Sometimes I even use `irb`, which is an interactive shell for
31
+ Ruby, but that's mostly for debugging.
32
+
33
+ #### MiGA CLI
34
+
35
+ CLI stands for Command Line Interface. This is a set of little scripts that let
36
+ you talk with MiGA through the terminal shell. If MiGA is in your PATH (see
37
+ [installation details](./INSTALLATION.md#miga-in-your-path)), you can simply run
38
+ `miga` in your terminal, and the help messages will take it from there. All the
39
+ MiGA CLI calls look like:
40
+
41
+ ```bash
42
+ miga task [options]
43
+ ```
44
+
45
+ Where `task` is one of the supported tasks and `[options]` is a set of dash-flag
46
+ options supported by each task. `-h` is always there to provide help. If you're
47
+ a MiGA administrator, this is probably the most convenient option for you (but
48
+ hey, give the GUI a chance).
49
+
50
+ #### MiGA GUI
51
+
52
+ The Graphical User Interface is the friendlier option for setting up a MiGA
53
+ project. It doesn't have as many options as the CLI, but it's pretty easy to
54
+ use, so it's a good option if you have a typical project in your hands.
55
+
56
+ #### MiGA Web
57
+
58
+ The Web interface for MiGA is the way MiGA reports results from a project. It's
59
+ not designed to set up new projects, but to explore existing ones, and to submit
60
+ non-reference datasets for analyses.
61
+
62
+ ### Creating your first project
63
+
64
+ You can do this in the GUI, but I like the CLI better, so I'll be telling you
65
+ how to tell MiGA what to do from the CLI. First, think where you'll place your
66
+ project. Normally this means a location...
67
+
68
+ 1. ... with enough space. This is, plan for at least 4 or 5 times the size of
69
+ the input files.
70
+
71
+ 2. ... accessible by worker nodes. If you're using a single server, this is not
72
+ really an issue. However, if you plan on deploying MiGA in a cluster
73
+ infrastructure, make sure your project is reachable by worker nodes.
74
+
75
+ 3. ... with fast access. It's not a great idea to set up projects in remote
76
+ drives with large latency. In some cases there no way around this, for example
77
+ when that's the only available option in your cluster infrastructure, but try
78
+ to avoid this as much as possible.
79
+
80
+ Now that you know where to create your project, go ahead and run:
81
+
82
+ ```bash
83
+ miga create_project -P /path/to/project1 -t type-of-project
84
+ ```
85
+
86
+ Where `/path/to/project1` is the path to where the project should be created.
87
+ You don't need to create the folder in advance, MiGA will take care. See the
88
+ next section to help you decide what `type-of-project` to use. There are some
89
+ other options that are not mandatory, but will make your project richer. Take a
90
+ look at `miga create_project -h`.
91
+
92
+ #### Project types
93
+
94
+ Projects can be set for different purposes, so we've divided them into "types".
95
+ There are four of them, depending on the types of datasets to be processed (see
96
+ [Dataset types](#dataset-types)):
97
+
98
+ 1. **mixed**: A generic project with any supported type of datasets.
99
+
100
+ 2. **metagenomes**: A project containing only metagenomic datasets. This
101
+ includes either (or both) metagenomes and viromes.
102
+
103
+ 3. **genomes**: A project containing only single-organism datasets. This
104
+ includes any of the single-organism types: genome, scgenome, and/or popgenome.
105
+
106
+ 4. **clade**: Same as "genomes", but all the datasets are expected to be from
107
+ the same species. This type of project performs additional analyses that expect
108
+ a very dense ANI matrix, so all genomes in it are expected to have AAI > 90%.
109
+
110
+ ### Creating datasets
111
+
112
+ Once your project is ready, you can start populating it with datasets and data.
113
+ While it's possible to create empty datasets using `miga create_dataset`, the
114
+ preferred method is to first add data and then use the data to create the
115
+ datasets in batch. For example, lets assume you have a collection of paired-end
116
+ raw reads from several datasets. The first step is to format the filenames
117
+ properly. For each one of your datasets, pick a name that conforms the
118
+ [MiGA names](#miga-names) restrictions (we'll call it "ds1") and rename your
119
+ reads to `/path/to/project1/data/01.raw_reads/ds1.1.fastq` for the first
120
+ sister and `/path/to/project1/data/01.raw_reads/ds1.2.fastq` for the second
121
+ sister. Also, add the date into `/path/to/project1/data/01.raw_reads/ds1.done`.
122
+ Check what are the [expected result files](#expected-result-files) below if you
123
+ want to start at any other point in the pipeline. Once you have renamed (or
124
+ copied) the files inside the project folder, run:
125
+
126
+ ```bash
127
+ miga find_datasets -P /path/to/project1 -a -r -t type-of-dataset
128
+ ```
129
+
130
+ The `-a` flag tells MiGA that you want to add the datasets (not just find them);
131
+ the `-r` flag tells MiGA that your datasets are to be treated as "reference"
132
+ datasets (see [Non-reference datasets](#non-reference-datasets) below); and the
133
+ `-t` option tells MiGA what type of datasets you're adding (see
134
+ [Dataset types](#dataset-types) below). If you have a mixture of dataset types,
135
+ process one at a time. This is, perform this step for each dataset type. Don't
136
+ worry about the datasets that are already registered, those will be ignored by
137
+ the `find_datasets` task and will remain unchanged.
138
+
139
+ #### Expected result files
140
+
141
+ For brevity, we'll assume that you're inside `/path/to/project1/data`; *i.e.*,
142
+ in the `data` directory of your project. We'll also assume that you're naming
143
+ your dataset **ds1**, but you can change this by anything following the
144
+ [MiGA names](#miga-names) restrictions. Now, these are the "input" points that
145
+ you can use in MiGA:
146
+
147
+ 1. **Paired-end raw reads**: The expected files are `01.raw_reads/ds1.1.fastq`
148
+ and `01.raw_reads/ds1.2.fastq`, each including a sister end. The reads must be
149
+ in the same order in both files (MiGA won't check). You can also use gzipped
150
+ files instead.
151
+
152
+ 2. **Single-end raw reads**: The expected file is `01.raw_reads/ds1.1.fastq`.
153
+ You can also use a gzipped file instead.
154
+
155
+ 3. **Paired-end trimmed reads**: These are assumed to be quality-controlled
156
+ reads in FastA format, with both ends passing the quality filters. The minimum
157
+ expected file is `04.trimmed_fasta/ds1.CoupledReads.fa`, which contains the
158
+ reads interposed. You can also pass (in addition) the reads that past the
159
+ quality check without the sister as a gzipped FastA at
160
+ `04.trimmed_fasta/ds1.SingleReads.fa.gz`.
161
+
162
+ 4. **Single-end trimmed reads**: Similar to the option above, only
163
+ quality-checked reads are expected here. The expected file is
164
+ `04.trimmed_fasta/ds1.SingleReads.fa`.
165
+
166
+ 5. **Assembled fragments**: This can be any assembly result, including complete
167
+ genomes. The expected file is `05.assembly/ds1.LargeContigs.fna`, containing
168
+ only contigs longer than 500bp. You can also provide the complete assembly
169
+ (without length-filtering) at `05.assembly/ds1.AllContigs.fna`.
170
+
171
+ 6. **Predicted genes/proteins**: This is the total collection of predicted genes
172
+ and proteins. The expected files are `06.cds/ds1.fna`, containing genes, and
173
+ `06.cds/ds1.faa`, containing proteins. You can also provide the locations of
174
+ said genes in the genome in gzipped GFF v2 (`06.cds/ds1.gff2.gz`), gzipped
175
+ GFF v3 (`06.cds/ds1.gff3.gz`), or gzipped tabular (`06.cds/ds1.tab.gz`).
176
+
177
+ **IMPORTANT**: In all cases, an additional `ds1.done` file MUST be created in
178
+ the same folder. This is meant to prevent MiGA from mistakenly adding files as
179
+ results before they're done being processed or transferred. This file must
180
+ contain the current [date in MiGA format](#date-in-miga-format). Here's a quick
181
+ code snippet to add the `.done` file for all the input files in `01.raw_reads`
182
+ (you can adapt this accordingly to any of the other options):
183
+
184
+ ```bash
185
+ cd /path/to/project1/data/01.raw_reads
186
+ for i in *.1.fastq ; do
187
+ date "+%Y-%m-%d %H:%M:%S %z" > $(basename $i .1.fastq).done
188
+ done
189
+ ```
190
+
191
+ #### Dataset types
192
+
193
+ This is how you tell MiGA what kind of data you have in your datasets. Lets see
194
+ the definitions:
195
+
196
+ 1. **genome**: The genome from an isolate.
197
+ 2. **metagenome**: A metagenome (excluding viromes).
198
+ 3. **virome**: A viral metagenome.
199
+ 4. **scgenome**: A genome from a single cell.
200
+ 5. **popgenome**: The genome of a population (including microdiversity).
201
+
202
+ #### Non-reference datasets
203
+
204
+
205
+ #### Creating a RefSeq project
206
+
207
+ If you've reached this point, you are now ready to create a large functional
208
+ project. If you want to continue using this documentation on real data but
209
+ don't have any of your own handy (or if you want to use RefSeq data), this
210
+ is a quick tutoral on how to create a functional MiGA project using ALL of
211
+ NCBI's Prokaryotic RefSeq data.
212
+
213
+ **Step 1: Create the project**. That's simple, just `cd` to the directory you
214
+ want to use, and execute `miga create_project -P MiGA_RefSeq -t genomes`.
215
+
216
+ **Step 2: Download the data**. Just `cd MiGA_RefSeq`, and execute this code:
217
+
218
+ ```bash
219
+ wget -O reference_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on&type=reference'
220
+ grep -v '^#' reference_genomes.txt \
221
+ | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} {print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
222
+ | while read ln ; do
223
+ sp=$(echo $ln | perl -pe 's/.*# //')
224
+ if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
225
+ echo $ln
226
+ $ln
227
+ fi
228
+ done
229
+ ```
230
+
231
+ And that's it. The first line will download the most current list of genomes
232
+ included in NCBI's Prokaryotic RefSeq, and the rest will repeatedly execute the
233
+ `download_dataset` task, that automatically fetches the data (even the genome's
234
+ taxonomy!). Note that the code above checks first if a dataset already exists,
235
+ so if you want to update an existing MiGA_RefSeq project, simply repeat step 2
236
+ and only missing genomes will be fetched.
237
+
238
+ Note that running time for the above code may vary depending on the network and
239
+ the size of RefSeq, but I was able to create a complete project with 122 genomes
240
+ in under 10 minutes.
241
+
242
+ **Alternative step 2: downloading all representatives**. If you want a larger
243
+ and more comprehensive collection, and not just the reference genomes, you can
244
+ download all of the representative genomes in the prokaryotic RefSeq with this
245
+ alternative code:
246
+
247
+ ```bash
248
+ wget -O representative_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on'
249
+ grep -v '^#' representative_genomes.txt \
250
+ | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} $4{print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
251
+ | while read ln ; do
252
+ sp=$(echo $ln | perl -pe 's/.*# //')
253
+ if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
254
+ echo $ln
255
+ $ln
256
+ fi
257
+ done
258
+ ```
259
+
260
+ This is a much larger set (1,246), hence it'll take much more time. I finished
261
+ downloading the whole thing in about one and a half hours.
262
+
263
+
264
+ Launching daemons
265
+ -----------------
266
+
267
+ ### Configuring daemons
268
+
269
+
270
+ ### Understating the MiGA configuration file
271
+
272
+
273
+ ### Arbitrary configuration scripts
274
+
275
+
276
+ ### Fixing system calls with aliases
277
+
278
+ In some cases, we might not have the same executable names as MiGA expects, or
279
+ we might have broken modules in our cluster that can be easily fixed with an
280
+ `alias`. In these cases, you can use
281
+ [arbitrary configuration scripts](#arbitrary-configuration-scripts) to generate
282
+ one or more `alias`. Importantly, MiGA daemons work with non-interactive shells,
283
+ which means you likely need to explicitly allow for alias extensions, for
284
+ example:
285
+
286
+ ```bash
287
+ # Allow alias expansions in non-interactive shells
288
+ shopt -s expand_aliases
289
+
290
+ # Call FastQC with the environmental Perl,
291
+ # not the built-in /usr/bin/perl:
292
+ alias fastqc="perl $(which fastqc)"
293
+
294
+ # Use the standard name for RAxML (pthreads)
295
+ # instead of the one my sys-admin decided to use:
296
+ alias raxmlHPC-PTHREADS=RAxML_pthreads
297
+ ```
298
+
299
+ The examples above illustrate how to use `alias` to fix broken packages or to
300
+ make Software with non-standard names reachable.
301
+
302
+ **Known caveats to this solution:** This solution CANNOT BE USED in the few
303
+ cases in which a whole package is expected based on a single executable. For
304
+ example, adding the enveomics scripts to your `PATH` is far easier than creating
305
+ an `alias` for each script. Also, MiGA expects to find the model, the activation
306
+ key, and the scripts of MetaGeneMark in the same folder of the `gmhmmp` binary,
307
+ so setting an`alias` may prevent MiGA from finding these ancillary files.
308
+
309
+
310
+ Cluster infrastructure
311
+ ----------------------
312
+
313
+
314
+ ### Loading optional modules
315
+
316
+
317
+ See also [Fixing system calls with aliases](#fixing-system-calls-with-aliases).
318
+
319
+
320
+ Miscellaneous
321
+ -------------
322
+
323
+ These below are reference snippets that for which I couldn't find a more
324
+ suitable home, but are important documentation.
325
+
326
+ ### MiGA Names
327
+
328
+ MiGA names are non-empty strings composed exclusively of alphanumerics and
329
+ underscores. All the dataset names in MiGA must conform this restriction, but
330
+ not all the projects do. Other objects must conform the MiGA name restrictions,
331
+ such as taxonomic entries.
332
+
333
+ ### Date in MiGA format
334
+
335
+ The official format in which MiGA represents date/times is the default of Ruby's
336
+ `Time.now.to_s`. In the *nix `date` utility this corresponds to the format:
337
+ `+%Y-%m-%d %H:%M:%S %z`.
338
+
339
+
340
+ Authors
341
+ -------
342
+
343
+ Developed and maintained by [Luis M. Rodriguez-R][lrr].
344
+
345
+
346
+ License
347
+ -------
348
+
349
+ See [LICENSE](LICENSE).
350
+
351
+ [lrr]: http://lmrodriguezr.github.io/
@@ -0,0 +1,61 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true}
10
+ opts = OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Registers a result.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset PATH",
20
+ "(Mandatory if the result is dataset-specific) ID of the dataset to use."
21
+ ){ |v| o[:dataset]=v }
22
+ opt.on("-r", "--result STRING",
23
+ "(Mandatory) Name of the result to add.",
24
+ "Recognized names for dataset-specific results include:",
25
+ *MiGA::Dataset.RESULT_DIRS.keys.map{|n| " ~ #{n}"},
26
+ "Recognized names for project-wide results include:",
27
+ *MiGA::Project.RESULT_DIRS.keys.map{|n| " ~ #{n}"}){ |v| o[:name]=v }
28
+ opt.on("-v", "--verbose",
29
+ "Print additional information to STDERR."){ o[:q]=false }
30
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
31
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
32
+ end
33
+ opt.on("-h", "--help", "Display this screen.") do
34
+ puts opt
35
+ exit
36
+ end
37
+ opt.separator ""
38
+ end.parse!
39
+
40
+
41
+ ### MAIN
42
+ opts.parse!
43
+ raise "-P is mandatory." if o[:project].nil?
44
+ raise "-r is mandatory." if o[:name].nil?
45
+
46
+ $stderr.puts "Loading project." unless o[:q]
47
+ p = MiGA::Project.load(o[:project])
48
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
49
+
50
+ $stderr.puts "Registering result." unless o[:q]
51
+ if o[:dataset].nil?
52
+ r = p.add_result o[:name].to_sym
53
+ else
54
+ d = p.dataset(o[:dataset])
55
+ r = d.add_result o[:name].to_sym
56
+ end
57
+
58
+ raise "Cannot add result, incomplete expected files." if r.nil?
59
+
60
+ $stderr.puts "Done." unless o[:q]
61
+
@@ -0,0 +1,86 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Registers taxonomic information for datasets.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset PATH",
20
+ "(Mandatory unless -t is provided) ID of the dataset to use."
21
+ ){ |v| o[:dataset]=v }
22
+ opt.on("-s", "--tax-string STRING",
23
+ "(Mandatory unless -t is provided) String corresponding to the taxonomy",
24
+ "of the dataset. The MiGA format of string taxonomy is a space-delimited",
25
+ "set of 'rank:name' pairs."){ |v| o[:taxstring]=v }
26
+ opt.on("-t", "--tax-file PATH",
27
+ "(Mandatory unless -D and -s are provided) Tab-delimited file containing",
28
+ "datasets taxonomy. Each row corresponds to a datasets and each column",
29
+ "corresponds to a rank. The first row must be a header with the rank ",
30
+ "names, and the first column must contain dataset names."
31
+ ){ |v| o[:taxfile]=v }
32
+ opt.on("-v", "--verbose",
33
+ "Print additional information to STDERR."){ o[:q]=false }
34
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
35
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
36
+ end
37
+ opt.on("-h", "--help", "Display this screen.") do
38
+ puts opt
39
+ exit
40
+ end
41
+ opt.separator ""
42
+ end.parse!
43
+
44
+
45
+ ### MAIN
46
+ raise "-P is mandatory." if o[:project].nil?
47
+ raise "-D is mandatory unless -t is provided." if
48
+ o[:dataset].nil? and o[:taxfile].nil?
49
+ raise "-s is mandatory unless -t is provided." if
50
+ o[:taxstring].nil? and o[:taxfile].nil?
51
+
52
+ $stderr.puts "Loading project." unless o[:q]
53
+ p = MiGA::Project.load(o[:project])
54
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
55
+
56
+ if not o[:taxfile].nil?
57
+ $stderr.puts "Reading tax-file and registering taxonomy." unless o[:q]
58
+ tfh = File.open(o[:taxfile], "r")
59
+ header = nil
60
+ while ln = tfh.gets
61
+ next if ln =~ /^\s*?$/
62
+ r = ln.chomp.split /\t/, -1
63
+ dn = r.shift
64
+ if header.nil?
65
+ header = r
66
+ next
67
+ end
68
+ d = p.dataset dn
69
+ if d.nil?
70
+ warn "Impossible to find dataset at line #{$.}: #{dn}. Ignoring..."
71
+ next
72
+ end
73
+ d.metadata[:tax] = MiGA::Taxonomy.new(r, header)
74
+ d.save
75
+ $stderr.puts " #{d.name} registered." unless o[:q]
76
+ end
77
+ tfh.close
78
+ else
79
+ $stderr.puts "Registering taxonomy." unless o[:q]
80
+ d = p.dataset o[:dataset]
81
+ d.metadata[:tax] = MiGA::Taxonomy.new(o[:taxstring])
82
+ d.save
83
+ end
84
+
85
+ $stderr.puts "Done." unless o[:q]
86
+
@@ -0,0 +1,62 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Nov-29-2015
7
+ #
8
+
9
+ o = {q:true, ref:true}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Creates an empty dataset in a pre-existing MiGA project.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset STRING",
20
+ "(Mandatory) ID of the dataset to create."){ |v| o[:dataset]=v }
21
+ opt.on("-t", "--type STRING",
22
+ "Type of dataset. Recognized types include:",
23
+ *MiGA::Dataset.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
24
+ ){ |v| o[:type]=v.to_sym }
25
+ opt.on("-q", "--query",
26
+ "If set, the dataset is registered as a query, not a reference dataset."
27
+ ){ |v| o[:ref]=!v }
28
+ opt.on("-d", "--description STRING",
29
+ "Description of the dataset."){ |v| o[:description]=v }
30
+ opt.on("-u", "--user STRING",
31
+ "Owner of the dataset."){ |v| o[:user]=v }
32
+ opt.on("-c", "--comments STRING",
33
+ "Comments on the dataset."){ |v| o[:comments]=v }
34
+ opt.on("-v", "--verbose",
35
+ "Print additional information to STDERR."){ o[:q]=false }
36
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
37
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
38
+ end
39
+ opt.on("-h", "--help", "Display this screen.") do
40
+ puts opt
41
+ exit
42
+ end
43
+ opt.separator ""
44
+ end.parse!
45
+
46
+
47
+ ### MAIN
48
+ raise "-P is mandatory." if o[:project].nil?
49
+ raise "-D is mandatory." if o[:dataset].nil?
50
+
51
+ $stderr.puts "Loading project." unless o[:q]
52
+ p = MiGA::Project.load(o[:project])
53
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
54
+
55
+ $stderr.puts "Creating dataset." unless o[:q]
56
+ md = {}
57
+ [:type, :description, :user, :comments].each{ |k| md[k]=o[k] unless o[k].nil? }
58
+ d = MiGA::Dataset.new(p, o[:dataset], o[:ref], md)
59
+ p.add_dataset(o[:dataset])
60
+
61
+ $stderr.puts "Done." unless o[:q]
62
+
@@ -0,0 +1,70 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true, update:false}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Creates an empty MiGA project.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to create."){ |v| o[:project]=v }
19
+ opt.on("-t", "--type STRING",
20
+ "Type of dataset. Recognized types include:",
21
+ *MiGA::Project.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
22
+ ){ |v| o[:type]=v.to_sym }
23
+ opt.on("-n", "--name STRING",
24
+ "Name of the project."){ |v| o[:name]=v }
25
+ opt.on("-d", "--description STRING",
26
+ "Description of the project."){ |v| o[:description]=v }
27
+ opt.on("-u", "--user STRING", "Owner of the project."){ |v| o[:user]=v }
28
+ opt.on("-c", "--comments STRING",
29
+ "Comments on the project."){ |v| o[:comments]=v }
30
+ opt.on("--update",
31
+ "Updates the project if it already exists."){ o[:update]=true }
32
+ opt.on("-v", "--verbose",
33
+ "Print additional information to STDERR."){ o[:q]=false }
34
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
35
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
36
+ end
37
+ opt.on("-h", "--help", "Display this screen.") do
38
+ puts opt
39
+ exit
40
+ end
41
+ opt.separator ""
42
+ end.parse!
43
+
44
+
45
+ ### MAIN
46
+ raise "-P is mandatory." if o[:project].nil?
47
+
48
+ unless File.exist? "#{ENV["HOME"]}/.miga_rc" and
49
+ File.exist? "#{ENV["HOME"]}/.miga_daemon.json"
50
+ puts "You must initialize MiGA before creating the first project.\n" +
51
+ "Do you want to initialize MiGA now? (yes / no)"
52
+ `'#{File.dirname(__FILE__)}/../scripts/init.bash'` if
53
+ $stdin.gets.chomp == 'yes'
54
+ end
55
+
56
+ $stderr.puts "Creating project." unless o[:q]
57
+ raise "Project already exists, aborting." unless
58
+ o[:update] or not MiGA::Project.exist? o[:project]
59
+ p = MiGA::Project.new(o[:project], o[:update])
60
+ # The following check is redundant with MiGA::Project#create,
61
+ # but allows upgrading projects from (very) early code versions
62
+ o[:name] = File.basename(p.path) if
63
+ o[:update] and o[:name].nil?
64
+ [:name, :description, :user, :comments, :type].each do |k|
65
+ p.metadata[k] = o[k] unless o[k].nil?
66
+ end
67
+ p.save
68
+
69
+ $stderr.puts "Done." unless o[:q]
70
+