miga-base 0.2.0.6
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.md +351 -0
- data/actions/add_result +61 -0
- data/actions/add_taxonomy +86 -0
- data/actions/create_dataset +62 -0
- data/actions/create_project +70 -0
- data/actions/daemon +69 -0
- data/actions/download_dataset +77 -0
- data/actions/find_datasets +63 -0
- data/actions/import_datasets +86 -0
- data/actions/index_taxonomy +71 -0
- data/actions/list_datasets +83 -0
- data/actions/list_files +67 -0
- data/actions/unlink_dataset +52 -0
- data/bin/miga +48 -0
- data/lib/miga/daemon.rb +178 -0
- data/lib/miga/dataset.rb +286 -0
- data/lib/miga/gui.rb +289 -0
- data/lib/miga/metadata.rb +74 -0
- data/lib/miga/project.rb +268 -0
- data/lib/miga/remote_dataset.rb +154 -0
- data/lib/miga/result.rb +102 -0
- data/lib/miga/tax_index.rb +70 -0
- data/lib/miga/taxonomy.rb +107 -0
- data/lib/miga.rb +83 -0
- data/scripts/_distances_noref_nomulti.bash +86 -0
- data/scripts/_distances_ref_nomulti.bash +105 -0
- data/scripts/aai_distances.bash +40 -0
- data/scripts/ani_distances.bash +39 -0
- data/scripts/assembly.bash +38 -0
- data/scripts/cds.bash +45 -0
- data/scripts/clade_finding.bash +27 -0
- data/scripts/distances.bash +30 -0
- data/scripts/essential_genes.bash +29 -0
- data/scripts/haai_distances.bash +39 -0
- data/scripts/init.bash +211 -0
- data/scripts/miga.bash +12 -0
- data/scripts/mytaxa.bash +93 -0
- data/scripts/mytaxa_scan.bash +85 -0
- data/scripts/ogs.bash +36 -0
- data/scripts/read_quality.bash +37 -0
- data/scripts/ssu.bash +35 -0
- data/scripts/subclades.bash +26 -0
- data/scripts/trimmed_fasta.bash +47 -0
- data/scripts/trimmed_reads.bash +57 -0
- data/utils/adapters.fa +302 -0
- data/utils/mytaxa_scan.R +89 -0
- data/utils/mytaxa_scan.rb +58 -0
- data/utils/requirements.txt +19 -0
- data/utils/subclades-compile.rb +48 -0
- data/utils/subclades.R +171 -0
- metadata +185 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: f80152072105bd365145133c00ddfcd432a008c0
|
4
|
+
data.tar.gz: 7444a990c359e6c9f2a6a595e688e6319df50ebb
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: e4bb05e73def629ea39d72fac9d6e702b247051fb3b21b8db84195127e6d135d9b94bbf5037cde9c8f21e611cf580d53078c215d9c9187bdf861908ad42efe0c
|
7
|
+
data.tar.gz: ee27ea7cf9b98a3de760e18249e89f6410181d963017e86f5878710bf80a6d3ed5c1715d42ff7b394371add8709816d7c54e9fda5556ae6e4e96a3c4b384ca82
|
data/README.md
ADDED
@@ -0,0 +1,351 @@
|
|
1
|
+
[![Code Climate](https://codeclimate.com/github/bio-miga/miga/badges/gpa.svg)](https://codeclimate.com/github/bio-miga/miga)
|
2
|
+
[![Test Coverage](https://codeclimate.com/github/bio-miga/miga/badges/coverage.svg)](https://codeclimate.com/github/bio-miga/miga/coverage)
|
3
|
+
[![Build Status](https://travis-ci.org/lmrodriguezr/gfa.svg?branch=master)](https://travis-ci.org/lmrodriguezr/gfa)
|
4
|
+
|
5
|
+
MiGA: Microbial Genomes Atlas
|
6
|
+
=============================
|
7
|
+
|
8
|
+
|
9
|
+
Installation
|
10
|
+
------------
|
11
|
+
|
12
|
+
Please see [INSTALLATION.md](./INSTALLATION.md) for instructions.
|
13
|
+
|
14
|
+
|
15
|
+
Getting started with MiGA
|
16
|
+
-------------------------
|
17
|
+
|
18
|
+
### MiGA Interfaces
|
19
|
+
|
20
|
+
You caninteract with MiGA through different interfaces. These interfaces have
|
21
|
+
different purposes, but they also have some degree of overlap, because different
|
22
|
+
users with different aims sometimes want to do the same thing. Throughout this
|
23
|
+
manual I'll be telling you how to do things using mostly the CLI, but I'll also
|
24
|
+
try to mention the GUI and the Web Interface. The CLI is the most comprehensive
|
25
|
+
and flexible interface, but the other two are friendlier to humans. There is a
|
26
|
+
fourth interface that I won't be mentioning at all, but I'll try to document:
|
27
|
+
the Ruby API. MiGA is mostly written in Ruby, with an object-oriented approach,
|
28
|
+
and all the interfaces are just thin layers atop the Ruby core. That means that
|
29
|
+
you can write your own interfaces (or pieces) if you know how to talk to these
|
30
|
+
Ruby objects. Sometimes I even use `irb`, which is an interactive shell for
|
31
|
+
Ruby, but that's mostly for debugging.
|
32
|
+
|
33
|
+
#### MiGA CLI
|
34
|
+
|
35
|
+
CLI stands for Command Line Interface. This is a set of little scripts that let
|
36
|
+
you talk with MiGA through the terminal shell. If MiGA is in your PATH (see
|
37
|
+
[installation details](./INSTALLATION.md#miga-in-your-path)), you can simply run
|
38
|
+
`miga` in your terminal, and the help messages will take it from there. All the
|
39
|
+
MiGA CLI calls look like:
|
40
|
+
|
41
|
+
```bash
|
42
|
+
miga task [options]
|
43
|
+
```
|
44
|
+
|
45
|
+
Where `task` is one of the supported tasks and `[options]` is a set of dash-flag
|
46
|
+
options supported by each task. `-h` is always there to provide help. If you're
|
47
|
+
a MiGA administrator, this is probably the most convenient option for you (but
|
48
|
+
hey, give the GUI a chance).
|
49
|
+
|
50
|
+
#### MiGA GUI
|
51
|
+
|
52
|
+
The Graphical User Interface is the friendlier option for setting up a MiGA
|
53
|
+
project. It doesn't have as many options as the CLI, but it's pretty easy to
|
54
|
+
use, so it's a good option if you have a typical project in your hands.
|
55
|
+
|
56
|
+
#### MiGA Web
|
57
|
+
|
58
|
+
The Web interface for MiGA is the way MiGA reports results from a project. It's
|
59
|
+
not designed to set up new projects, but to explore existing ones, and to submit
|
60
|
+
non-reference datasets for analyses.
|
61
|
+
|
62
|
+
### Creating your first project
|
63
|
+
|
64
|
+
You can do this in the GUI, but I like the CLI better, so I'll be telling you
|
65
|
+
how to tell MiGA what to do from the CLI. First, think where you'll place your
|
66
|
+
project. Normally this means a location...
|
67
|
+
|
68
|
+
1. ... with enough space. This is, plan for at least 4 or 5 times the size of
|
69
|
+
the input files.
|
70
|
+
|
71
|
+
2. ... accessible by worker nodes. If you're using a single server, this is not
|
72
|
+
really an issue. However, if you plan on deploying MiGA in a cluster
|
73
|
+
infrastructure, make sure your project is reachable by worker nodes.
|
74
|
+
|
75
|
+
3. ... with fast access. It's not a great idea to set up projects in remote
|
76
|
+
drives with large latency. In some cases there no way around this, for example
|
77
|
+
when that's the only available option in your cluster infrastructure, but try
|
78
|
+
to avoid this as much as possible.
|
79
|
+
|
80
|
+
Now that you know where to create your project, go ahead and run:
|
81
|
+
|
82
|
+
```bash
|
83
|
+
miga create_project -P /path/to/project1 -t type-of-project
|
84
|
+
```
|
85
|
+
|
86
|
+
Where `/path/to/project1` is the path to where the project should be created.
|
87
|
+
You don't need to create the folder in advance, MiGA will take care. See the
|
88
|
+
next section to help you decide what `type-of-project` to use. There are some
|
89
|
+
other options that are not mandatory, but will make your project richer. Take a
|
90
|
+
look at `miga create_project -h`.
|
91
|
+
|
92
|
+
#### Project types
|
93
|
+
|
94
|
+
Projects can be set for different purposes, so we've divided them into "types".
|
95
|
+
There are four of them, depending on the types of datasets to be processed (see
|
96
|
+
[Dataset types](#dataset-types)):
|
97
|
+
|
98
|
+
1. **mixed**: A generic project with any supported type of datasets.
|
99
|
+
|
100
|
+
2. **metagenomes**: A project containing only metagenomic datasets. This
|
101
|
+
includes either (or both) metagenomes and viromes.
|
102
|
+
|
103
|
+
3. **genomes**: A project containing only single-organism datasets. This
|
104
|
+
includes any of the single-organism types: genome, scgenome, and/or popgenome.
|
105
|
+
|
106
|
+
4. **clade**: Same as "genomes", but all the datasets are expected to be from
|
107
|
+
the same species. This type of project performs additional analyses that expect
|
108
|
+
a very dense ANI matrix, so all genomes in it are expected to have AAI > 90%.
|
109
|
+
|
110
|
+
### Creating datasets
|
111
|
+
|
112
|
+
Once your project is ready, you can start populating it with datasets and data.
|
113
|
+
While it's possible to create empty datasets using `miga create_dataset`, the
|
114
|
+
preferred method is to first add data and then use the data to create the
|
115
|
+
datasets in batch. For example, lets assume you have a collection of paired-end
|
116
|
+
raw reads from several datasets. The first step is to format the filenames
|
117
|
+
properly. For each one of your datasets, pick a name that conforms the
|
118
|
+
[MiGA names](#miga-names) restrictions (we'll call it "ds1") and rename your
|
119
|
+
reads to `/path/to/project1/data/01.raw_reads/ds1.1.fastq` for the first
|
120
|
+
sister and `/path/to/project1/data/01.raw_reads/ds1.2.fastq` for the second
|
121
|
+
sister. Also, add the date into `/path/to/project1/data/01.raw_reads/ds1.done`.
|
122
|
+
Check what are the [expected result files](#expected-result-files) below if you
|
123
|
+
want to start at any other point in the pipeline. Once you have renamed (or
|
124
|
+
copied) the files inside the project folder, run:
|
125
|
+
|
126
|
+
```bash
|
127
|
+
miga find_datasets -P /path/to/project1 -a -r -t type-of-dataset
|
128
|
+
```
|
129
|
+
|
130
|
+
The `-a` flag tells MiGA that you want to add the datasets (not just find them);
|
131
|
+
the `-r` flag tells MiGA that your datasets are to be treated as "reference"
|
132
|
+
datasets (see [Non-reference datasets](#non-reference-datasets) below); and the
|
133
|
+
`-t` option tells MiGA what type of datasets you're adding (see
|
134
|
+
[Dataset types](#dataset-types) below). If you have a mixture of dataset types,
|
135
|
+
process one at a time. This is, perform this step for each dataset type. Don't
|
136
|
+
worry about the datasets that are already registered, those will be ignored by
|
137
|
+
the `find_datasets` task and will remain unchanged.
|
138
|
+
|
139
|
+
#### Expected result files
|
140
|
+
|
141
|
+
For brevity, we'll assume that you're inside `/path/to/project1/data`; *i.e.*,
|
142
|
+
in the `data` directory of your project. We'll also assume that you're naming
|
143
|
+
your dataset **ds1**, but you can change this by anything following the
|
144
|
+
[MiGA names](#miga-names) restrictions. Now, these are the "input" points that
|
145
|
+
you can use in MiGA:
|
146
|
+
|
147
|
+
1. **Paired-end raw reads**: The expected files are `01.raw_reads/ds1.1.fastq`
|
148
|
+
and `01.raw_reads/ds1.2.fastq`, each including a sister end. The reads must be
|
149
|
+
in the same order in both files (MiGA won't check). You can also use gzipped
|
150
|
+
files instead.
|
151
|
+
|
152
|
+
2. **Single-end raw reads**: The expected file is `01.raw_reads/ds1.1.fastq`.
|
153
|
+
You can also use a gzipped file instead.
|
154
|
+
|
155
|
+
3. **Paired-end trimmed reads**: These are assumed to be quality-controlled
|
156
|
+
reads in FastA format, with both ends passing the quality filters. The minimum
|
157
|
+
expected file is `04.trimmed_fasta/ds1.CoupledReads.fa`, which contains the
|
158
|
+
reads interposed. You can also pass (in addition) the reads that past the
|
159
|
+
quality check without the sister as a gzipped FastA at
|
160
|
+
`04.trimmed_fasta/ds1.SingleReads.fa.gz`.
|
161
|
+
|
162
|
+
4. **Single-end trimmed reads**: Similar to the option above, only
|
163
|
+
quality-checked reads are expected here. The expected file is
|
164
|
+
`04.trimmed_fasta/ds1.SingleReads.fa`.
|
165
|
+
|
166
|
+
5. **Assembled fragments**: This can be any assembly result, including complete
|
167
|
+
genomes. The expected file is `05.assembly/ds1.LargeContigs.fna`, containing
|
168
|
+
only contigs longer than 500bp. You can also provide the complete assembly
|
169
|
+
(without length-filtering) at `05.assembly/ds1.AllContigs.fna`.
|
170
|
+
|
171
|
+
6. **Predicted genes/proteins**: This is the total collection of predicted genes
|
172
|
+
and proteins. The expected files are `06.cds/ds1.fna`, containing genes, and
|
173
|
+
`06.cds/ds1.faa`, containing proteins. You can also provide the locations of
|
174
|
+
said genes in the genome in gzipped GFF v2 (`06.cds/ds1.gff2.gz`), gzipped
|
175
|
+
GFF v3 (`06.cds/ds1.gff3.gz`), or gzipped tabular (`06.cds/ds1.tab.gz`).
|
176
|
+
|
177
|
+
**IMPORTANT**: In all cases, an additional `ds1.done` file MUST be created in
|
178
|
+
the same folder. This is meant to prevent MiGA from mistakenly adding files as
|
179
|
+
results before they're done being processed or transferred. This file must
|
180
|
+
contain the current [date in MiGA format](#date-in-miga-format). Here's a quick
|
181
|
+
code snippet to add the `.done` file for all the input files in `01.raw_reads`
|
182
|
+
(you can adapt this accordingly to any of the other options):
|
183
|
+
|
184
|
+
```bash
|
185
|
+
cd /path/to/project1/data/01.raw_reads
|
186
|
+
for i in *.1.fastq ; do
|
187
|
+
date "+%Y-%m-%d %H:%M:%S %z" > $(basename $i .1.fastq).done
|
188
|
+
done
|
189
|
+
```
|
190
|
+
|
191
|
+
#### Dataset types
|
192
|
+
|
193
|
+
This is how you tell MiGA what kind of data you have in your datasets. Lets see
|
194
|
+
the definitions:
|
195
|
+
|
196
|
+
1. **genome**: The genome from an isolate.
|
197
|
+
2. **metagenome**: A metagenome (excluding viromes).
|
198
|
+
3. **virome**: A viral metagenome.
|
199
|
+
4. **scgenome**: A genome from a single cell.
|
200
|
+
5. **popgenome**: The genome of a population (including microdiversity).
|
201
|
+
|
202
|
+
#### Non-reference datasets
|
203
|
+
|
204
|
+
|
205
|
+
#### Creating a RefSeq project
|
206
|
+
|
207
|
+
If you've reached this point, you are now ready to create a large functional
|
208
|
+
project. If you want to continue using this documentation on real data but
|
209
|
+
don't have any of your own handy (or if you want to use RefSeq data), this
|
210
|
+
is a quick tutoral on how to create a functional MiGA project using ALL of
|
211
|
+
NCBI's Prokaryotic RefSeq data.
|
212
|
+
|
213
|
+
**Step 1: Create the project**. That's simple, just `cd` to the directory you
|
214
|
+
want to use, and execute `miga create_project -P MiGA_RefSeq -t genomes`.
|
215
|
+
|
216
|
+
**Step 2: Download the data**. Just `cd MiGA_RefSeq`, and execute this code:
|
217
|
+
|
218
|
+
```bash
|
219
|
+
wget -O reference_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on&type=reference'
|
220
|
+
grep -v '^#' reference_genomes.txt \
|
221
|
+
| awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} {print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
|
222
|
+
| while read ln ; do
|
223
|
+
sp=$(echo $ln | perl -pe 's/.*# //')
|
224
|
+
if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
|
225
|
+
echo $ln
|
226
|
+
$ln
|
227
|
+
fi
|
228
|
+
done
|
229
|
+
```
|
230
|
+
|
231
|
+
And that's it. The first line will download the most current list of genomes
|
232
|
+
included in NCBI's Prokaryotic RefSeq, and the rest will repeatedly execute the
|
233
|
+
`download_dataset` task, that automatically fetches the data (even the genome's
|
234
|
+
taxonomy!). Note that the code above checks first if a dataset already exists,
|
235
|
+
so if you want to update an existing MiGA_RefSeq project, simply repeat step 2
|
236
|
+
and only missing genomes will be fetched.
|
237
|
+
|
238
|
+
Note that running time for the above code may vary depending on the network and
|
239
|
+
the size of RefSeq, but I was able to create a complete project with 122 genomes
|
240
|
+
in under 10 minutes.
|
241
|
+
|
242
|
+
**Alternative step 2: downloading all representatives**. If you want a larger
|
243
|
+
and more comprehensive collection, and not just the reference genomes, you can
|
244
|
+
download all of the representative genomes in the prokaryotic RefSeq with this
|
245
|
+
alternative code:
|
246
|
+
|
247
|
+
```bash
|
248
|
+
wget -O representative_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&download=on'
|
249
|
+
grep -v '^#' representative_genomes.txt \
|
250
|
+
| awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} $4{print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
|
251
|
+
| while read ln ; do
|
252
|
+
sp=$(echo $ln | perl -pe 's/.*# //')
|
253
|
+
if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
|
254
|
+
echo $ln
|
255
|
+
$ln
|
256
|
+
fi
|
257
|
+
done
|
258
|
+
```
|
259
|
+
|
260
|
+
This is a much larger set (1,246), hence it'll take much more time. I finished
|
261
|
+
downloading the whole thing in about one and a half hours.
|
262
|
+
|
263
|
+
|
264
|
+
Launching daemons
|
265
|
+
-----------------
|
266
|
+
|
267
|
+
### Configuring daemons
|
268
|
+
|
269
|
+
|
270
|
+
### Understating the MiGA configuration file
|
271
|
+
|
272
|
+
|
273
|
+
### Arbitrary configuration scripts
|
274
|
+
|
275
|
+
|
276
|
+
### Fixing system calls with aliases
|
277
|
+
|
278
|
+
In some cases, we might not have the same executable names as MiGA expects, or
|
279
|
+
we might have broken modules in our cluster that can be easily fixed with an
|
280
|
+
`alias`. In these cases, you can use
|
281
|
+
[arbitrary configuration scripts](#arbitrary-configuration-scripts) to generate
|
282
|
+
one or more `alias`. Importantly, MiGA daemons work with non-interactive shells,
|
283
|
+
which means you likely need to explicitly allow for alias extensions, for
|
284
|
+
example:
|
285
|
+
|
286
|
+
```bash
|
287
|
+
# Allow alias expansions in non-interactive shells
|
288
|
+
shopt -s expand_aliases
|
289
|
+
|
290
|
+
# Call FastQC with the environmental Perl,
|
291
|
+
# not the built-in /usr/bin/perl:
|
292
|
+
alias fastqc="perl $(which fastqc)"
|
293
|
+
|
294
|
+
# Use the standard name for RAxML (pthreads)
|
295
|
+
# instead of the one my sys-admin decided to use:
|
296
|
+
alias raxmlHPC-PTHREADS=RAxML_pthreads
|
297
|
+
```
|
298
|
+
|
299
|
+
The examples above illustrate how to use `alias` to fix broken packages or to
|
300
|
+
make Software with non-standard names reachable.
|
301
|
+
|
302
|
+
**Known caveats to this solution:** This solution CANNOT BE USED in the few
|
303
|
+
cases in which a whole package is expected based on a single executable. For
|
304
|
+
example, adding the enveomics scripts to your `PATH` is far easier than creating
|
305
|
+
an `alias` for each script. Also, MiGA expects to find the model, the activation
|
306
|
+
key, and the scripts of MetaGeneMark in the same folder of the `gmhmmp` binary,
|
307
|
+
so setting an`alias` may prevent MiGA from finding these ancillary files.
|
308
|
+
|
309
|
+
|
310
|
+
Cluster infrastructure
|
311
|
+
----------------------
|
312
|
+
|
313
|
+
|
314
|
+
### Loading optional modules
|
315
|
+
|
316
|
+
|
317
|
+
See also [Fixing system calls with aliases](#fixing-system-calls-with-aliases).
|
318
|
+
|
319
|
+
|
320
|
+
Miscellaneous
|
321
|
+
-------------
|
322
|
+
|
323
|
+
These below are reference snippets that for which I couldn't find a more
|
324
|
+
suitable home, but are important documentation.
|
325
|
+
|
326
|
+
### MiGA Names
|
327
|
+
|
328
|
+
MiGA names are non-empty strings composed exclusively of alphanumerics and
|
329
|
+
underscores. All the dataset names in MiGA must conform this restriction, but
|
330
|
+
not all the projects do. Other objects must conform the MiGA name restrictions,
|
331
|
+
such as taxonomic entries.
|
332
|
+
|
333
|
+
### Date in MiGA format
|
334
|
+
|
335
|
+
The official format in which MiGA represents date/times is the default of Ruby's
|
336
|
+
`Time.now.to_s`. In the *nix `date` utility this corresponds to the format:
|
337
|
+
`+%Y-%m-%d %H:%M:%S %z`.
|
338
|
+
|
339
|
+
|
340
|
+
Authors
|
341
|
+
-------
|
342
|
+
|
343
|
+
Developed and maintained by [Luis M. Rodriguez-R][lrr].
|
344
|
+
|
345
|
+
|
346
|
+
License
|
347
|
+
-------
|
348
|
+
|
349
|
+
See [LICENSE](LICENSE).
|
350
|
+
|
351
|
+
[lrr]: http://lmrodriguezr.github.io/
|
data/actions/add_result
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# @package MiGA
|
4
|
+
# @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
|
5
|
+
# @license artistic license 2.0
|
6
|
+
# @update Oct-01-2015
|
7
|
+
#
|
8
|
+
|
9
|
+
o = {q:true}
|
10
|
+
opts = OptionParser.new do |opt|
|
11
|
+
opt.banner = <<BAN
|
12
|
+
Registers a result.
|
13
|
+
|
14
|
+
Usage: #{$0} #{File.basename(__FILE__)} [options]
|
15
|
+
BAN
|
16
|
+
opt.separator ""
|
17
|
+
opt.on("-P", "--project PATH",
|
18
|
+
"(Mandatory) Path to the project to use."){ |v| o[:project]=v }
|
19
|
+
opt.on("-D", "--dataset PATH",
|
20
|
+
"(Mandatory if the result is dataset-specific) ID of the dataset to use."
|
21
|
+
){ |v| o[:dataset]=v }
|
22
|
+
opt.on("-r", "--result STRING",
|
23
|
+
"(Mandatory) Name of the result to add.",
|
24
|
+
"Recognized names for dataset-specific results include:",
|
25
|
+
*MiGA::Dataset.RESULT_DIRS.keys.map{|n| " ~ #{n}"},
|
26
|
+
"Recognized names for project-wide results include:",
|
27
|
+
*MiGA::Project.RESULT_DIRS.keys.map{|n| " ~ #{n}"}){ |v| o[:name]=v }
|
28
|
+
opt.on("-v", "--verbose",
|
29
|
+
"Print additional information to STDERR."){ o[:q]=false }
|
30
|
+
opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
|
31
|
+
v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
|
32
|
+
end
|
33
|
+
opt.on("-h", "--help", "Display this screen.") do
|
34
|
+
puts opt
|
35
|
+
exit
|
36
|
+
end
|
37
|
+
opt.separator ""
|
38
|
+
end.parse!
|
39
|
+
|
40
|
+
|
41
|
+
### MAIN
|
42
|
+
opts.parse!
|
43
|
+
raise "-P is mandatory." if o[:project].nil?
|
44
|
+
raise "-r is mandatory." if o[:name].nil?
|
45
|
+
|
46
|
+
$stderr.puts "Loading project." unless o[:q]
|
47
|
+
p = MiGA::Project.load(o[:project])
|
48
|
+
raise "Impossible to load project: #{o[:project]}" if p.nil?
|
49
|
+
|
50
|
+
$stderr.puts "Registering result." unless o[:q]
|
51
|
+
if o[:dataset].nil?
|
52
|
+
r = p.add_result o[:name].to_sym
|
53
|
+
else
|
54
|
+
d = p.dataset(o[:dataset])
|
55
|
+
r = d.add_result o[:name].to_sym
|
56
|
+
end
|
57
|
+
|
58
|
+
raise "Cannot add result, incomplete expected files." if r.nil?
|
59
|
+
|
60
|
+
$stderr.puts "Done." unless o[:q]
|
61
|
+
|
@@ -0,0 +1,86 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# @package MiGA
|
4
|
+
# @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
|
5
|
+
# @license artistic license 2.0
|
6
|
+
# @update Oct-01-2015
|
7
|
+
#
|
8
|
+
|
9
|
+
o = {q:true}
|
10
|
+
OptionParser.new do |opt|
|
11
|
+
opt.banner = <<BAN
|
12
|
+
Registers taxonomic information for datasets.
|
13
|
+
|
14
|
+
Usage: #{$0} #{File.basename(__FILE__)} [options]
|
15
|
+
BAN
|
16
|
+
opt.separator ""
|
17
|
+
opt.on("-P", "--project PATH",
|
18
|
+
"(Mandatory) Path to the project to use."){ |v| o[:project]=v }
|
19
|
+
opt.on("-D", "--dataset PATH",
|
20
|
+
"(Mandatory unless -t is provided) ID of the dataset to use."
|
21
|
+
){ |v| o[:dataset]=v }
|
22
|
+
opt.on("-s", "--tax-string STRING",
|
23
|
+
"(Mandatory unless -t is provided) String corresponding to the taxonomy",
|
24
|
+
"of the dataset. The MiGA format of string taxonomy is a space-delimited",
|
25
|
+
"set of 'rank:name' pairs."){ |v| o[:taxstring]=v }
|
26
|
+
opt.on("-t", "--tax-file PATH",
|
27
|
+
"(Mandatory unless -D and -s are provided) Tab-delimited file containing",
|
28
|
+
"datasets taxonomy. Each row corresponds to a datasets and each column",
|
29
|
+
"corresponds to a rank. The first row must be a header with the rank ",
|
30
|
+
"names, and the first column must contain dataset names."
|
31
|
+
){ |v| o[:taxfile]=v }
|
32
|
+
opt.on("-v", "--verbose",
|
33
|
+
"Print additional information to STDERR."){ o[:q]=false }
|
34
|
+
opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
|
35
|
+
v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
|
36
|
+
end
|
37
|
+
opt.on("-h", "--help", "Display this screen.") do
|
38
|
+
puts opt
|
39
|
+
exit
|
40
|
+
end
|
41
|
+
opt.separator ""
|
42
|
+
end.parse!
|
43
|
+
|
44
|
+
|
45
|
+
### MAIN
|
46
|
+
raise "-P is mandatory." if o[:project].nil?
|
47
|
+
raise "-D is mandatory unless -t is provided." if
|
48
|
+
o[:dataset].nil? and o[:taxfile].nil?
|
49
|
+
raise "-s is mandatory unless -t is provided." if
|
50
|
+
o[:taxstring].nil? and o[:taxfile].nil?
|
51
|
+
|
52
|
+
$stderr.puts "Loading project." unless o[:q]
|
53
|
+
p = MiGA::Project.load(o[:project])
|
54
|
+
raise "Impossible to load project: #{o[:project]}" if p.nil?
|
55
|
+
|
56
|
+
if not o[:taxfile].nil?
|
57
|
+
$stderr.puts "Reading tax-file and registering taxonomy." unless o[:q]
|
58
|
+
tfh = File.open(o[:taxfile], "r")
|
59
|
+
header = nil
|
60
|
+
while ln = tfh.gets
|
61
|
+
next if ln =~ /^\s*?$/
|
62
|
+
r = ln.chomp.split /\t/, -1
|
63
|
+
dn = r.shift
|
64
|
+
if header.nil?
|
65
|
+
header = r
|
66
|
+
next
|
67
|
+
end
|
68
|
+
d = p.dataset dn
|
69
|
+
if d.nil?
|
70
|
+
warn "Impossible to find dataset at line #{$.}: #{dn}. Ignoring..."
|
71
|
+
next
|
72
|
+
end
|
73
|
+
d.metadata[:tax] = MiGA::Taxonomy.new(r, header)
|
74
|
+
d.save
|
75
|
+
$stderr.puts " #{d.name} registered." unless o[:q]
|
76
|
+
end
|
77
|
+
tfh.close
|
78
|
+
else
|
79
|
+
$stderr.puts "Registering taxonomy." unless o[:q]
|
80
|
+
d = p.dataset o[:dataset]
|
81
|
+
d.metadata[:tax] = MiGA::Taxonomy.new(o[:taxstring])
|
82
|
+
d.save
|
83
|
+
end
|
84
|
+
|
85
|
+
$stderr.puts "Done." unless o[:q]
|
86
|
+
|
@@ -0,0 +1,62 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# @package MiGA
|
4
|
+
# @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
|
5
|
+
# @license artistic license 2.0
|
6
|
+
# @update Nov-29-2015
|
7
|
+
#
|
8
|
+
|
9
|
+
o = {q:true, ref:true}
|
10
|
+
OptionParser.new do |opt|
|
11
|
+
opt.banner = <<BAN
|
12
|
+
Creates an empty dataset in a pre-existing MiGA project.
|
13
|
+
|
14
|
+
Usage: #{$0} #{File.basename(__FILE__)} [options]
|
15
|
+
BAN
|
16
|
+
opt.separator ""
|
17
|
+
opt.on("-P", "--project PATH",
|
18
|
+
"(Mandatory) Path to the project to use."){ |v| o[:project]=v }
|
19
|
+
opt.on("-D", "--dataset STRING",
|
20
|
+
"(Mandatory) ID of the dataset to create."){ |v| o[:dataset]=v }
|
21
|
+
opt.on("-t", "--type STRING",
|
22
|
+
"Type of dataset. Recognized types include:",
|
23
|
+
*MiGA::Dataset.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
|
24
|
+
){ |v| o[:type]=v.to_sym }
|
25
|
+
opt.on("-q", "--query",
|
26
|
+
"If set, the dataset is registered as a query, not a reference dataset."
|
27
|
+
){ |v| o[:ref]=!v }
|
28
|
+
opt.on("-d", "--description STRING",
|
29
|
+
"Description of the dataset."){ |v| o[:description]=v }
|
30
|
+
opt.on("-u", "--user STRING",
|
31
|
+
"Owner of the dataset."){ |v| o[:user]=v }
|
32
|
+
opt.on("-c", "--comments STRING",
|
33
|
+
"Comments on the dataset."){ |v| o[:comments]=v }
|
34
|
+
opt.on("-v", "--verbose",
|
35
|
+
"Print additional information to STDERR."){ o[:q]=false }
|
36
|
+
opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
|
37
|
+
v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
|
38
|
+
end
|
39
|
+
opt.on("-h", "--help", "Display this screen.") do
|
40
|
+
puts opt
|
41
|
+
exit
|
42
|
+
end
|
43
|
+
opt.separator ""
|
44
|
+
end.parse!
|
45
|
+
|
46
|
+
|
47
|
+
### MAIN
|
48
|
+
raise "-P is mandatory." if o[:project].nil?
|
49
|
+
raise "-D is mandatory." if o[:dataset].nil?
|
50
|
+
|
51
|
+
$stderr.puts "Loading project." unless o[:q]
|
52
|
+
p = MiGA::Project.load(o[:project])
|
53
|
+
raise "Impossible to load project: #{o[:project]}" if p.nil?
|
54
|
+
|
55
|
+
$stderr.puts "Creating dataset." unless o[:q]
|
56
|
+
md = {}
|
57
|
+
[:type, :description, :user, :comments].each{ |k| md[k]=o[k] unless o[k].nil? }
|
58
|
+
d = MiGA::Dataset.new(p, o[:dataset], o[:ref], md)
|
59
|
+
p.add_dataset(o[:dataset])
|
60
|
+
|
61
|
+
$stderr.puts "Done." unless o[:q]
|
62
|
+
|
@@ -0,0 +1,70 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# @package MiGA
|
4
|
+
# @author Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
|
5
|
+
# @license artistic license 2.0
|
6
|
+
# @update Oct-01-2015
|
7
|
+
#
|
8
|
+
|
9
|
+
o = {q:true, update:false}
|
10
|
+
OptionParser.new do |opt|
|
11
|
+
opt.banner = <<BAN
|
12
|
+
Creates an empty MiGA project.
|
13
|
+
|
14
|
+
Usage: #{$0} #{File.basename(__FILE__)} [options]
|
15
|
+
BAN
|
16
|
+
opt.separator ""
|
17
|
+
opt.on("-P", "--project PATH",
|
18
|
+
"(Mandatory) Path to the project to create."){ |v| o[:project]=v }
|
19
|
+
opt.on("-t", "--type STRING",
|
20
|
+
"Type of dataset. Recognized types include:",
|
21
|
+
*MiGA::Project.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
|
22
|
+
){ |v| o[:type]=v.to_sym }
|
23
|
+
opt.on("-n", "--name STRING",
|
24
|
+
"Name of the project."){ |v| o[:name]=v }
|
25
|
+
opt.on("-d", "--description STRING",
|
26
|
+
"Description of the project."){ |v| o[:description]=v }
|
27
|
+
opt.on("-u", "--user STRING", "Owner of the project."){ |v| o[:user]=v }
|
28
|
+
opt.on("-c", "--comments STRING",
|
29
|
+
"Comments on the project."){ |v| o[:comments]=v }
|
30
|
+
opt.on("--update",
|
31
|
+
"Updates the project if it already exists."){ o[:update]=true }
|
32
|
+
opt.on("-v", "--verbose",
|
33
|
+
"Print additional information to STDERR."){ o[:q]=false }
|
34
|
+
opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
|
35
|
+
v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
|
36
|
+
end
|
37
|
+
opt.on("-h", "--help", "Display this screen.") do
|
38
|
+
puts opt
|
39
|
+
exit
|
40
|
+
end
|
41
|
+
opt.separator ""
|
42
|
+
end.parse!
|
43
|
+
|
44
|
+
|
45
|
+
### MAIN
|
46
|
+
raise "-P is mandatory." if o[:project].nil?
|
47
|
+
|
48
|
+
unless File.exist? "#{ENV["HOME"]}/.miga_rc" and
|
49
|
+
File.exist? "#{ENV["HOME"]}/.miga_daemon.json"
|
50
|
+
puts "You must initialize MiGA before creating the first project.\n" +
|
51
|
+
"Do you want to initialize MiGA now? (yes / no)"
|
52
|
+
`'#{File.dirname(__FILE__)}/../scripts/init.bash'` if
|
53
|
+
$stdin.gets.chomp == 'yes'
|
54
|
+
end
|
55
|
+
|
56
|
+
$stderr.puts "Creating project." unless o[:q]
|
57
|
+
raise "Project already exists, aborting." unless
|
58
|
+
o[:update] or not MiGA::Project.exist? o[:project]
|
59
|
+
p = MiGA::Project.new(o[:project], o[:update])
|
60
|
+
# The following check is redundant with MiGA::Project#create,
|
61
|
+
# but allows upgrading projects from (very) early code versions
|
62
|
+
o[:name] = File.basename(p.path) if
|
63
|
+
o[:update] and o[:name].nil?
|
64
|
+
[:name, :description, :user, :comments, :type].each do |k|
|
65
|
+
p.metadata[k] = o[k] unless o[k].nil?
|
66
|
+
end
|
67
|
+
p.save
|
68
|
+
|
69
|
+
$stderr.puts "Done." unless o[:q]
|
70
|
+
|