miga-base 0.2.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +351 -0
  3. data/actions/add_result +61 -0
  4. data/actions/add_taxonomy +86 -0
  5. data/actions/create_dataset +62 -0
  6. data/actions/create_project +70 -0
  7. data/actions/daemon +69 -0
  8. data/actions/download_dataset +77 -0
  9. data/actions/find_datasets +63 -0
  10. data/actions/import_datasets +86 -0
  11. data/actions/index_taxonomy +71 -0
  12. data/actions/list_datasets +83 -0
  13. data/actions/list_files +67 -0
  14. data/actions/unlink_dataset +52 -0
  15. data/bin/miga +48 -0
  16. data/lib/miga/daemon.rb +178 -0
  17. data/lib/miga/dataset.rb +286 -0
  18. data/lib/miga/gui.rb +289 -0
  19. data/lib/miga/metadata.rb +74 -0
  20. data/lib/miga/project.rb +268 -0
  21. data/lib/miga/remote_dataset.rb +154 -0
  22. data/lib/miga/result.rb +102 -0
  23. data/lib/miga/tax_index.rb +70 -0
  24. data/lib/miga/taxonomy.rb +107 -0
  25. data/lib/miga.rb +83 -0
  26. data/scripts/_distances_noref_nomulti.bash +86 -0
  27. data/scripts/_distances_ref_nomulti.bash +105 -0
  28. data/scripts/aai_distances.bash +40 -0
  29. data/scripts/ani_distances.bash +39 -0
  30. data/scripts/assembly.bash +38 -0
  31. data/scripts/cds.bash +45 -0
  32. data/scripts/clade_finding.bash +27 -0
  33. data/scripts/distances.bash +30 -0
  34. data/scripts/essential_genes.bash +29 -0
  35. data/scripts/haai_distances.bash +39 -0
  36. data/scripts/init.bash +211 -0
  37. data/scripts/miga.bash +12 -0
  38. data/scripts/mytaxa.bash +93 -0
  39. data/scripts/mytaxa_scan.bash +85 -0
  40. data/scripts/ogs.bash +36 -0
  41. data/scripts/read_quality.bash +37 -0
  42. data/scripts/ssu.bash +35 -0
  43. data/scripts/subclades.bash +26 -0
  44. data/scripts/trimmed_fasta.bash +47 -0
  45. data/scripts/trimmed_reads.bash +57 -0
  46. data/utils/adapters.fa +302 -0
  47. data/utils/mytaxa_scan.R +89 -0
  48. data/utils/mytaxa_scan.rb +58 -0
  49. data/utils/requirements.txt +19 -0
  50. data/utils/subclades-compile.rb +48 -0
  51. data/utils/subclades.R +171 -0
  52. metadata +185 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f80152072105bd365145133c00ddfcd432a008c0
4
+ data.tar.gz: 7444a990c359e6c9f2a6a595e688e6319df50ebb
5
+ SHA512:
6
+ metadata.gz: e4bb05e73def629ea39d72fac9d6e702b247051fb3b21b8db84195127e6d135d9b94bbf5037cde9c8f21e611cf580d53078c215d9c9187bdf861908ad42efe0c
7
+ data.tar.gz: ee27ea7cf9b98a3de760e18249e89f6410181d963017e86f5878710bf80a6d3ed5c1715d42ff7b394371add8709816d7c54e9fda5556ae6e4e96a3c4b384ca82
data/README.md ADDED
@@ -0,0 +1,351 @@
1
+ [![Code Climate](https://codeclimate.com/github/bio-miga/miga/badges/gpa.svg)](https://codeclimate.com/github/bio-miga/miga)
2
+ [![Test Coverage](https://codeclimate.com/github/bio-miga/miga/badges/coverage.svg)](https://codeclimate.com/github/bio-miga/miga/coverage)
3
+ [![Build Status](https://travis-ci.org/lmrodriguezr/gfa.svg?branch=master)](https://travis-ci.org/lmrodriguezr/gfa)
4
+
5
+ MiGA: Microbial Genomes Atlas
6
+ =============================
7
+
8
+
9
+ Installation
10
+ ------------
11
+
12
+ Please see [INSTALLATION.md](./INSTALLATION.md) for instructions.
13
+
14
+
15
+ Getting started with MiGA
16
+ -------------------------
17
+
18
+ ### MiGA Interfaces
19
+
20
+ You caninteract with MiGA through different interfaces. These interfaces have
21
+ different purposes, but they also have some degree of overlap, because different
22
+ users with different aims sometimes want to do the same thing. Throughout this
23
+ manual I'll be telling you how to do things using mostly the CLI, but I'll also
24
+ try to mention the GUI and the Web Interface. The CLI is the most comprehensive
25
+ and flexible interface, but the other two are friendlier to humans. There is a
26
+ fourth interface that I won't be mentioning at all, but I'll try to document:
27
+ the Ruby API. MiGA is mostly written in Ruby, with an object-oriented approach,
28
+ and all the interfaces are just thin layers atop the Ruby core. That means that
29
+ you can write your own interfaces (or pieces) if you know how to talk to these
30
+ Ruby objects. Sometimes I even use `irb`, which is an interactive shell for
31
+ Ruby, but that's mostly for debugging.
32
+
33
+ #### MiGA CLI
34
+
35
+ CLI stands for Command Line Interface. This is a set of little scripts that let
36
+ you talk with MiGA through the terminal shell. If MiGA is in your PATH (see
37
+ [installation details](./INSTALLATION.md#miga-in-your-path)), you can simply run
38
+ `miga` in your terminal, and the help messages will take it from there. All the
39
+ MiGA CLI calls look like:
40
+
41
+ ```bash
42
+ miga task [options]
43
+ ```
44
+
45
+ Where `task` is one of the supported tasks and `[options]` is a set of dash-flag
46
+ options supported by each task. `-h` is always there to provide help. If you're
47
+ a MiGA administrator, this is probably the most convenient option for you (but
48
+ hey, give the GUI a chance).
49
+
50
+ #### MiGA GUI
51
+
52
+ The Graphical User Interface is the friendlier option for setting up a MiGA
53
+ project. It doesn't have as many options as the CLI, but it's pretty easy to
54
+ use, so it's a good option if you have a typical project in your hands.
55
+
56
+ #### MiGA Web
57
+
58
+ The Web interface for MiGA is the way MiGA reports results from a project. It's
59
+ not designed to set up new projects, but to explore existing ones, and to submit
60
+ non-reference datasets for analyses.
61
+
62
+ ### Creating your first project
63
+
64
+ You can do this in the GUI, but I like the CLI better, so I'll be telling you
65
+ how to tell MiGA what to do from the CLI. First, think where you'll place your
66
+ project. Normally this means a location...
67
+
68
+ 1. ... with enough space. This is, plan for at least 4 or 5 times the size of
69
+ the input files.
70
+
71
+ 2. ... accessible by worker nodes. If you're using a single server, this is not
72
+ really an issue. However, if you plan on deploying MiGA in a cluster
73
+ infrastructure, make sure your project is reachable by worker nodes.
74
+
75
+ 3. ... with fast access. It's not a great idea to set up projects in remote
76
+ drives with large latency. In some cases there no way around this, for example
77
+ when that's the only available option in your cluster infrastructure, but try
78
+ to avoid this as much as possible.
79
+
80
+ Now that you know where to create your project, go ahead and run:
81
+
82
+ ```bash
83
+ miga create_project -P /path/to/project1 -t type-of-project
84
+ ```
85
+
86
+ Where `/path/to/project1` is the path to where the project should be created.
87
+ You don't need to create the folder in advance, MiGA will take care. See the
88
+ next section to help you decide what `type-of-project` to use. There are some
89
+ other options that are not mandatory, but will make your project richer. Take a
90
+ look at `miga create_project -h`.
91
+
92
+ #### Project types
93
+
94
+ Projects can be set for different purposes, so we've divided them into "types".
95
+ There are four of them, depending on the types of datasets to be processed (see
96
+ [Dataset types](#dataset-types)):
97
+
98
+ 1. **mixed**: A generic project with any supported type of datasets.
99
+
100
+ 2. **metagenomes**: A project containing only metagenomic datasets. This
101
+ includes either (or both) metagenomes and viromes.
102
+
103
+ 3. **genomes**: A project containing only single-organism datasets. This
104
+ includes any of the single-organism types: genome, scgenome, and/or popgenome.
105
+
106
+ 4. **clade**: Same as "genomes", but all the datasets are expected to be from
107
+ the same species. This type of project performs additional analyses that expect
108
+ a very dense ANI matrix, so all genomes in it are expected to have AAI > 90%.
109
+
110
+ ### Creating datasets
111
+
112
+ Once your project is ready, you can start populating it with datasets and data.
113
+ While it's possible to create empty datasets using `miga create_dataset`, the
114
+ preferred method is to first add data and then use the data to create the
115
+ datasets in batch. For example, lets assume you have a collection of paired-end
116
+ raw reads from several datasets. The first step is to format the filenames
117
+ properly. For each one of your datasets, pick a name that conforms the
118
+ [MiGA names](#miga-names) restrictions (we'll call it "ds1") and rename your
119
+ reads to `/path/to/project1/data/01.raw_reads/ds1.1.fastq` for the first
120
+ sister and `/path/to/project1/data/01.raw_reads/ds1.2.fastq` for the second
121
+ sister. Also, add the date into `/path/to/project1/data/01.raw_reads/ds1.done`.
122
+ Check what are the [expected result files](#expected-result-files) below if you
123
+ want to start at any other point in the pipeline. Once you have renamed (or
124
+ copied) the files inside the project folder, run:
125
+
126
+ ```bash
127
+ miga find_datasets -P /path/to/project1 -a -r -t type-of-dataset
128
+ ```
129
+
130
+ The `-a` flag tells MiGA that you want to add the datasets (not just find them);
131
+ the `-r` flag tells MiGA that your datasets are to be treated as "reference"
132
+ datasets (see [Non-reference datasets](#non-reference-datasets) below); and the
133
+ `-t` option tells MiGA what type of datasets you're adding (see
134
+ [Dataset types](#dataset-types) below). If you have a mixture of dataset types,
135
+ process one at a time. This is, perform this step for each dataset type. Don't
136
+ worry about the datasets that are already registered, those will be ignored by
137
+ the `find_datasets` task and will remain unchanged.
138
+
139
+ #### Expected result files
140
+
141
+ For brevity, we'll assume that you're inside `/path/to/project1/data`; *i.e.*,
142
+ in the `data` directory of your project. We'll also assume that you're naming
143
+ your dataset **ds1**, but you can change this by anything following the
144
+ [MiGA names](#miga-names) restrictions. Now, these are the "input" points that
145
+ you can use in MiGA:
146
+
147
+ 1. **Paired-end raw reads**: The expected files are `01.raw_reads/ds1.1.fastq`
148
+ and `01.raw_reads/ds1.2.fastq`, each including a sister end. The reads must be
149
+ in the same order in both files (MiGA won't check). You can also use gzipped
150
+ files instead.
151
+
152
+ 2. **Single-end raw reads**: The expected file is `01.raw_reads/ds1.1.fastq`.
153
+ You can also use a gzipped file instead.
154
+
155
+ 3. **Paired-end trimmed reads**: These are assumed to be quality-controlled
156
+ reads in FastA format, with both ends passing the quality filters. The minimum
157
+ expected file is `04.trimmed_fasta/ds1.CoupledReads.fa`, which contains the
158
+ reads interposed. You can also pass (in addition) the reads that past the
159
+ quality check without the sister as a gzipped FastA at
160
+ `04.trimmed_fasta/ds1.SingleReads.fa.gz`.
161
+
162
+ 4. **Single-end trimmed reads**: Similar to the option above, only
163
+ quality-checked reads are expected here. The expected file is
164
+ `04.trimmed_fasta/ds1.SingleReads.fa`.
165
+
166
+ 5. **Assembled fragments**: This can be any assembly result, including complete
167
+ genomes. The expected file is `05.assembly/ds1.LargeContigs.fna`, containing
168
+ only contigs longer than 500bp. You can also provide the complete assembly
169
+ (without length-filtering) at `05.assembly/ds1.AllContigs.fna`.
170
+
171
+ 6. **Predicted genes/proteins**: This is the total collection of predicted genes
172
+ and proteins. The expected files are `06.cds/ds1.fna`, containing genes, and
173
+ `06.cds/ds1.faa`, containing proteins. You can also provide the locations of
174
+ said genes in the genome in gzipped GFF v2 (`06.cds/ds1.gff2.gz`), gzipped
175
+ GFF v3 (`06.cds/ds1.gff3.gz`), or gzipped tabular (`06.cds/ds1.tab.gz`).
176
+
177
+ **IMPORTANT**: In all cases, an additional `ds1.done` file MUST be created in
178
+ the same folder. This is meant to prevent MiGA from mistakenly adding files as
179
+ results before they're done being processed or transferred. This file must
180
+ contain the current [date in MiGA format](#date-in-miga-format). Here's a quick
181
+ code snippet to add the `.done` file for all the input files in `01.raw_reads`
182
+ (you can adapt this accordingly to any of the other options):
183
+
184
+ ```bash
185
+ cd /path/to/project1/data/01.raw_reads
186
+ for i in *.1.fastq ; do
187
+ date "+%Y-%m-%d %H:%M:%S %z" > $(basename $i .1.fastq).done
188
+ done
189
+ ```
190
+
191
+ #### Dataset types
192
+
193
+ This is how you tell MiGA what kind of data you have in your datasets. Lets see
194
+ the definitions:
195
+
196
+ 1. **genome**: The genome from an isolate.
197
+ 2. **metagenome**: A metagenome (excluding viromes).
198
+ 3. **virome**: A viral metagenome.
199
+ 4. **scgenome**: A genome from a single cell.
200
+ 5. **popgenome**: The genome of a population (including microdiversity).
201
+
202
+ #### Non-reference datasets
203
+
204
+
205
+ #### Creating a RefSeq project
206
+
207
+ If you've reached this point, you are now ready to create a large functional
208
+ project. If you want to continue using this documentation on real data but
209
+ don't have any of your own handy (or if you want to use RefSeq data), this
210
+ is a quick tutoral on how to create a functional MiGA project using ALL of
211
+ NCBI's Prokaryotic RefSeq data.
212
+
213
+ **Step 1: Create the project**. That's simple, just `cd` to the directory you
214
+ want to use, and execute `miga create_project -P MiGA_RefSeq -t genomes`.
215
+
216
+ **Step 2: Download the data**. Just `cd MiGA_RefSeq`, and execute this code:
217
+
218
+ ```bash
219
+ wget -O reference_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on&type=reference'
220
+ grep -v '^#' reference_genomes.txt \
221
+ | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} {print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
222
+ | while read ln ; do
223
+ sp=$(echo $ln | perl -pe 's/.*# //')
224
+ if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
225
+ echo $ln
226
+ $ln
227
+ fi
228
+ done
229
+ ```
230
+
231
+ And that's it. The first line will download the most current list of genomes
232
+ included in NCBI's Prokaryotic RefSeq, and the rest will repeatedly execute the
233
+ `download_dataset` task, that automatically fetches the data (even the genome's
234
+ taxonomy!). Note that the code above checks first if a dataset already exists,
235
+ so if you want to update an existing MiGA_RefSeq project, simply repeat step 2
236
+ and only missing genomes will be fetched.
237
+
238
+ Note that running time for the above code may vary depending on the network and
239
+ the size of RefSeq, but I was able to create a complete project with 122 genomes
240
+ in under 10 minutes.
241
+
242
+ **Alternative step 2: downloading all representatives**. If you want a larger
243
+ and more comprehensive collection, and not just the reference genomes, you can
244
+ download all of the representative genomes in the prokaryotic RefSeq with this
245
+ alternative code:
246
+
247
+ ```bash
248
+ wget -O representative_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on'
249
+ grep -v '^#' representative_genomes.txt \
250
+ | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} $4{print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
251
+ | while read ln ; do
252
+ sp=$(echo $ln | perl -pe 's/.*# //')
253
+ if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
254
+ echo $ln
255
+ $ln
256
+ fi
257
+ done
258
+ ```
259
+
260
+ This is a much larger set (1,246), hence it'll take much more time. I finished
261
+ downloading the whole thing in about one and a half hours.
262
+
263
+
264
+ Launching daemons
265
+ -----------------
266
+
267
+ ### Configuring daemons
268
+
269
+
270
+ ### Understating the MiGA configuration file
271
+
272
+
273
+ ### Arbitrary configuration scripts
274
+
275
+
276
+ ### Fixing system calls with aliases
277
+
278
+ In some cases, we might not have the same executable names as MiGA expects, or
279
+ we might have broken modules in our cluster that can be easily fixed with an
280
+ `alias`. In these cases, you can use
281
+ [arbitrary configuration scripts](#arbitrary-configuration-scripts) to generate
282
+ one or more `alias`. Importantly, MiGA daemons work with non-interactive shells,
283
+ which means you likely need to explicitly allow for alias extensions, for
284
+ example:
285
+
286
+ ```bash
287
+ # Allow alias expansions in non-interactive shells
288
+ shopt -s expand_aliases
289
+
290
+ # Call FastQC with the environmental Perl,
291
+ # not the built-in /usr/bin/perl:
292
+ alias fastqc="perl $(which fastqc)"
293
+
294
+ # Use the standard name for RAxML (pthreads)
295
+ # instead of the one my sys-admin decided to use:
296
+ alias raxmlHPC-PTHREADS=RAxML_pthreads
297
+ ```
298
+
299
+ The examples above illustrate how to use `alias` to fix broken packages or to
300
+ make Software with non-standard names reachable.
301
+
302
+ **Known caveats to this solution:** This solution CANNOT BE USED in the few
303
+ cases in which a whole package is expected based on a single executable. For
304
+ example, adding the enveomics scripts to your `PATH` is far easier than creating
305
+ an `alias` for each script. Also, MiGA expects to find the model, the activation
306
+ key, and the scripts of MetaGeneMark in the same folder of the `gmhmmp` binary,
307
+ so setting an`alias` may prevent MiGA from finding these ancillary files.
308
+
309
+
310
+ Cluster infrastructure
311
+ ----------------------
312
+
313
+
314
+ ### Loading optional modules
315
+
316
+
317
+ See also [Fixing system calls with aliases](#fixing-system-calls-with-aliases).
318
+
319
+
320
+ Miscellaneous
321
+ -------------
322
+
323
+ These below are reference snippets that for which I couldn't find a more
324
+ suitable home, but are important documentation.
325
+
326
+ ### MiGA Names
327
+
328
+ MiGA names are non-empty strings composed exclusively of alphanumerics and
329
+ underscores. All the dataset names in MiGA must conform this restriction, but
330
+ not all the projects do. Other objects must conform the MiGA name restrictions,
331
+ such as taxonomic entries.
332
+
333
+ ### Date in MiGA format
334
+
335
+ The official format in which MiGA represents date/times is the default of Ruby's
336
+ `Time.now.to_s`. In the *nix `date` utility this corresponds to the format:
337
+ `+%Y-%m-%d %H:%M:%S %z`.
338
+
339
+
340
+ Authors
341
+ -------
342
+
343
+ Developed and maintained by [Luis M. Rodriguez-R][lrr].
344
+
345
+
346
+ License
347
+ -------
348
+
349
+ See [LICENSE](LICENSE).
350
+
351
+ [lrr]: http://lmrodriguezr.github.io/
@@ -0,0 +1,61 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true}
10
+ opts = OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Registers a result.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset PATH",
20
+ "(Mandatory if the result is dataset-specific) ID of the dataset to use."
21
+ ){ |v| o[:dataset]=v }
22
+ opt.on("-r", "--result STRING",
23
+ "(Mandatory) Name of the result to add.",
24
+ "Recognized names for dataset-specific results include:",
25
+ *MiGA::Dataset.RESULT_DIRS.keys.map{|n| " ~ #{n}"},
26
+ "Recognized names for project-wide results include:",
27
+ *MiGA::Project.RESULT_DIRS.keys.map{|n| " ~ #{n}"}){ |v| o[:name]=v }
28
+ opt.on("-v", "--verbose",
29
+ "Print additional information to STDERR."){ o[:q]=false }
30
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
31
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
32
+ end
33
+ opt.on("-h", "--help", "Display this screen.") do
34
+ puts opt
35
+ exit
36
+ end
37
+ opt.separator ""
38
+ end.parse!
39
+
40
+
41
+ ### MAIN
42
+ opts.parse!
43
+ raise "-P is mandatory." if o[:project].nil?
44
+ raise "-r is mandatory." if o[:name].nil?
45
+
46
+ $stderr.puts "Loading project." unless o[:q]
47
+ p = MiGA::Project.load(o[:project])
48
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
49
+
50
+ $stderr.puts "Registering result." unless o[:q]
51
+ if o[:dataset].nil?
52
+ r = p.add_result o[:name].to_sym
53
+ else
54
+ d = p.dataset(o[:dataset])
55
+ r = d.add_result o[:name].to_sym
56
+ end
57
+
58
+ raise "Cannot add result, incomplete expected files." if r.nil?
59
+
60
+ $stderr.puts "Done." unless o[:q]
61
+
@@ -0,0 +1,86 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Registers taxonomic information for datasets.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset PATH",
20
+ "(Mandatory unless -t is provided) ID of the dataset to use."
21
+ ){ |v| o[:dataset]=v }
22
+ opt.on("-s", "--tax-string STRING",
23
+ "(Mandatory unless -t is provided) String corresponding to the taxonomy",
24
+ "of the dataset. The MiGA format of string taxonomy is a space-delimited",
25
+ "set of 'rank:name' pairs."){ |v| o[:taxstring]=v }
26
+ opt.on("-t", "--tax-file PATH",
27
+ "(Mandatory unless -D and -s are provided) Tab-delimited file containing",
28
+ "datasets taxonomy. Each row corresponds to a datasets and each column",
29
+ "corresponds to a rank. The first row must be a header with the rank ",
30
+ "names, and the first column must contain dataset names."
31
+ ){ |v| o[:taxfile]=v }
32
+ opt.on("-v", "--verbose",
33
+ "Print additional information to STDERR."){ o[:q]=false }
34
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
35
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
36
+ end
37
+ opt.on("-h", "--help", "Display this screen.") do
38
+ puts opt
39
+ exit
40
+ end
41
+ opt.separator ""
42
+ end.parse!
43
+
44
+
45
+ ### MAIN
46
+ raise "-P is mandatory." if o[:project].nil?
47
+ raise "-D is mandatory unless -t is provided." if
48
+ o[:dataset].nil? and o[:taxfile].nil?
49
+ raise "-s is mandatory unless -t is provided." if
50
+ o[:taxstring].nil? and o[:taxfile].nil?
51
+
52
+ $stderr.puts "Loading project." unless o[:q]
53
+ p = MiGA::Project.load(o[:project])
54
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
55
+
56
+ if not o[:taxfile].nil?
57
+ $stderr.puts "Reading tax-file and registering taxonomy." unless o[:q]
58
+ tfh = File.open(o[:taxfile], "r")
59
+ header = nil
60
+ while ln = tfh.gets
61
+ next if ln =~ /^\s*?$/
62
+ r = ln.chomp.split /\t/, -1
63
+ dn = r.shift
64
+ if header.nil?
65
+ header = r
66
+ next
67
+ end
68
+ d = p.dataset dn
69
+ if d.nil?
70
+ warn "Impossible to find dataset at line #{$.}: #{dn}. Ignoring..."
71
+ next
72
+ end
73
+ d.metadata[:tax] = MiGA::Taxonomy.new(r, header)
74
+ d.save
75
+ $stderr.puts " #{d.name} registered." unless o[:q]
76
+ end
77
+ tfh.close
78
+ else
79
+ $stderr.puts "Registering taxonomy." unless o[:q]
80
+ d = p.dataset o[:dataset]
81
+ d.metadata[:tax] = MiGA::Taxonomy.new(o[:taxstring])
82
+ d.save
83
+ end
84
+
85
+ $stderr.puts "Done." unless o[:q]
86
+
@@ -0,0 +1,62 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Nov-29-2015
7
+ #
8
+
9
+ o = {q:true, ref:true}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Creates an empty dataset in a pre-existing MiGA project.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
19
+ opt.on("-D", "--dataset STRING",
20
+ "(Mandatory) ID of the dataset to create."){ |v| o[:dataset]=v }
21
+ opt.on("-t", "--type STRING",
22
+ "Type of dataset. Recognized types include:",
23
+ *MiGA::Dataset.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
24
+ ){ |v| o[:type]=v.to_sym }
25
+ opt.on("-q", "--query",
26
+ "If set, the dataset is registered as a query, not a reference dataset."
27
+ ){ |v| o[:ref]=!v }
28
+ opt.on("-d", "--description STRING",
29
+ "Description of the dataset."){ |v| o[:description]=v }
30
+ opt.on("-u", "--user STRING",
31
+ "Owner of the dataset."){ |v| o[:user]=v }
32
+ opt.on("-c", "--comments STRING",
33
+ "Comments on the dataset."){ |v| o[:comments]=v }
34
+ opt.on("-v", "--verbose",
35
+ "Print additional information to STDERR."){ o[:q]=false }
36
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
37
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
38
+ end
39
+ opt.on("-h", "--help", "Display this screen.") do
40
+ puts opt
41
+ exit
42
+ end
43
+ opt.separator ""
44
+ end.parse!
45
+
46
+
47
+ ### MAIN
48
+ raise "-P is mandatory." if o[:project].nil?
49
+ raise "-D is mandatory." if o[:dataset].nil?
50
+
51
+ $stderr.puts "Loading project." unless o[:q]
52
+ p = MiGA::Project.load(o[:project])
53
+ raise "Impossible to load project: #{o[:project]}" if p.nil?
54
+
55
+ $stderr.puts "Creating dataset." unless o[:q]
56
+ md = {}
57
+ [:type, :description, :user, :comments].each{ |k| md[k]=o[k] unless o[k].nil? }
58
+ d = MiGA::Dataset.new(p, o[:dataset], o[:ref], md)
59
+ p.add_dataset(o[:dataset])
60
+
61
+ $stderr.puts "Done." unless o[:q]
62
+
@@ -0,0 +1,70 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # @package MiGA
4
+ # @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
5
+ # @license artistic license 2.0
6
+ # @update Oct-01-2015
7
+ #
8
+
9
+ o = {q:true, update:false}
10
+ OptionParser.new do |opt|
11
+ opt.banner = <<BAN
12
+ Creates an empty MiGA project.
13
+
14
+ Usage: #{$0} #{File.basename(__FILE__)} [options]
15
+ BAN
16
+ opt.separator ""
17
+ opt.on("-P", "--project PATH",
18
+ "(Mandatory) Path to the project to create."){ |v| o[:project]=v }
19
+ opt.on("-t", "--type STRING",
20
+ "Type of dataset. Recognized types include:",
21
+ *MiGA::Project.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
22
+ ){ |v| o[:type]=v.to_sym }
23
+ opt.on("-n", "--name STRING",
24
+ "Name of the project."){ |v| o[:name]=v }
25
+ opt.on("-d", "--description STRING",
26
+ "Description of the project."){ |v| o[:description]=v }
27
+ opt.on("-u", "--user STRING", "Owner of the project."){ |v| o[:user]=v }
28
+ opt.on("-c", "--comments STRING",
29
+ "Comments on the project."){ |v| o[:comments]=v }
30
+ opt.on("--update",
31
+ "Updates the project if it already exists."){ o[:update]=true }
32
+ opt.on("-v", "--verbose",
33
+ "Print additional information to STDERR."){ o[:q]=false }
34
+ opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
35
+ v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
36
+ end
37
+ opt.on("-h", "--help", "Display this screen.") do
38
+ puts opt
39
+ exit
40
+ end
41
+ opt.separator ""
42
+ end.parse!
43
+
44
+
45
+ ### MAIN
46
+ raise "-P is mandatory." if o[:project].nil?
47
+
48
+ unless File.exist? "#{ENV["HOME"]}/.miga_rc" and
49
+ File.exist? "#{ENV["HOME"]}/.miga_daemon.json"
50
+ puts "You must initialize MiGA before creating the first project.\n" +
51
+ "Do you want to initialize MiGA now? (yes / no)"
52
+ `'#{File.dirname(__FILE__)}/../scripts/init.bash'` if
53
+ $stdin.gets.chomp == 'yes'
54
+ end
55
+
56
+ $stderr.puts "Creating project." unless o[:q]
57
+ raise "Project already exists, aborting." unless
58
+ o[:update] or not MiGA::Project.exist? o[:project]
59
+ p = MiGA::Project.new(o[:project], o[:update])
60
+ # The following check is redundant with MiGA::Project#create,
61
+ # but allows upgrading projects from (very) early code versions
62
+ o[:name] = File.basename(p.path) if
63
+ o[:update] and o[:name].nil?
64
+ [:name, :description, :user, :comments, :type].each do |k|
65
+ p.metadata[k] = o[k] unless o[k].nil?
66
+ end
67
+ p.save
68
+
69
+ $stderr.puts "Done." unless o[:q]
70
+