RubyGems - miga-base - Versions diffs - 0.2.0.6 - Mend

miga-base 0.2.0.6

Files changed (52) hide show

checksums.yaml +7 -0
data/README.md +351 -0
data/actions/add_result +61 -0
data/actions/add_taxonomy +86 -0
data/actions/create_dataset +62 -0
data/actions/create_project +70 -0
data/actions/daemon +69 -0
data/actions/download_dataset +77 -0
data/actions/find_datasets +63 -0
data/actions/import_datasets +86 -0
data/actions/index_taxonomy +71 -0
data/actions/list_datasets +83 -0
data/actions/list_files +67 -0
data/actions/unlink_dataset +52 -0
data/bin/miga +48 -0
data/lib/miga/daemon.rb +178 -0
data/lib/miga/dataset.rb +286 -0
data/lib/miga/gui.rb +289 -0
data/lib/miga/metadata.rb +74 -0
data/lib/miga/project.rb +268 -0
data/lib/miga/remote_dataset.rb +154 -0
data/lib/miga/result.rb +102 -0
data/lib/miga/tax_index.rb +70 -0
data/lib/miga/taxonomy.rb +107 -0
data/lib/miga.rb +83 -0
data/scripts/_distances_noref_nomulti.bash +86 -0
data/scripts/_distances_ref_nomulti.bash +105 -0
data/scripts/aai_distances.bash +40 -0
data/scripts/ani_distances.bash +39 -0
data/scripts/assembly.bash +38 -0
data/scripts/cds.bash +45 -0
data/scripts/clade_finding.bash +27 -0
data/scripts/distances.bash +30 -0
data/scripts/essential_genes.bash +29 -0
data/scripts/haai_distances.bash +39 -0
data/scripts/init.bash +211 -0
data/scripts/miga.bash +12 -0
data/scripts/mytaxa.bash +93 -0
data/scripts/mytaxa_scan.bash +85 -0
data/scripts/ogs.bash +36 -0
data/scripts/read_quality.bash +37 -0
data/scripts/ssu.bash +35 -0
data/scripts/subclades.bash +26 -0
data/scripts/trimmed_fasta.bash +47 -0
data/scripts/trimmed_reads.bash +57 -0
data/utils/adapters.fa +302 -0
data/utils/mytaxa_scan.R +89 -0
data/utils/mytaxa_scan.rb +58 -0
data/utils/requirements.txt +19 -0
data/utils/subclades-compile.rb +48 -0
data/utils/subclades.R +171 -0
metadata +185 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: f80152072105bd365145133c00ddfcd432a008c0
+  data.tar.gz: 7444a990c359e6c9f2a6a595e688e6319df50ebb
+SHA512:
+  metadata.gz: e4bb05e73def629ea39d72fac9d6e702b247051fb3b21b8db84195127e6d135d9b94bbf5037cde9c8f21e611cf580d53078c215d9c9187bdf861908ad42efe0c
+  data.tar.gz: ee27ea7cf9b98a3de760e18249e89f6410181d963017e86f5878710bf80a6d3ed5c1715d42ff7b394371add8709816d7c54e9fda5556ae6e4e96a3c4b384ca82

data/README.md ADDED Viewed

@@ -0,0 +1,351 @@
+[![Code Climate](https://codeclimate.com/github/bio-miga/miga/badges/gpa.svg)](https://codeclimate.com/github/bio-miga/miga)
+[![Test Coverage](https://codeclimate.com/github/bio-miga/miga/badges/coverage.svg)](https://codeclimate.com/github/bio-miga/miga/coverage)
+[![Build Status](https://travis-ci.org/lmrodriguezr/gfa.svg?branch=master)](https://travis-ci.org/lmrodriguezr/gfa)
+MiGA: Microbial Genomes Atlas
+=============================
+Installation
+------------
+Please see [INSTALLATION.md](./INSTALLATION.md) for instructions.
+Getting started with MiGA
+-------------------------
+### MiGA Interfaces
+You caninteract with MiGA through different interfaces. These interfaces have
+different purposes, but they also have some degree of overlap, because different
+users with different aims sometimes want to do the same thing. Throughout this
+manual I'll be telling you how to do things using mostly the CLI, but I'll also
+try to mention the GUI and the Web Interface. The CLI is the most comprehensive
+and flexible interface, but the other two are friendlier to humans. There is a
+fourth interface that I won't be mentioning at all, but I'll try to document:
+the Ruby API. MiGA is mostly written in Ruby, with an object-oriented approach,
+and all the interfaces are just thin layers atop the Ruby core. That means that
+you can write your own interfaces (or pieces) if you know how to talk to these
+Ruby objects. Sometimes I even use `irb`, which is an interactive shell for
+Ruby, but that's mostly for debugging.
+#### MiGA CLI
+CLI stands for Command Line Interface. This is a set of little scripts that let
+you talk with MiGA through the terminal shell. If MiGA is in your PATH (see
+[installation details](./INSTALLATION.md#miga-in-your-path)), you can simply run
+`miga` in your terminal, and the help messages will take it from there. All the
+MiGA CLI calls look like:
+```bash
+miga task [options]
+```
+Where `task` is one of the supported tasks and `[options]` is a set of dash-flag
+options supported by each task. `-h` is always there to provide help. If you're
+a MiGA administrator, this is probably the most convenient option for you (but
+hey, give the GUI a chance).
+#### MiGA GUI
+The Graphical User Interface is the friendlier option for setting up a MiGA
+project. It doesn't have as many options as the CLI, but it's pretty easy to
+use, so it's a good option if you have a typical project in your hands.
+#### MiGA Web
+The Web interface for MiGA is the way MiGA reports results from a project. It's
+not designed to set up new projects, but to explore existing ones, and to submit
+non-reference datasets for analyses.
+### Creating your first project
+You can do this in the GUI, but I like the CLI better, so I'll be telling you
+how to tell MiGA what to do from the CLI. First, think where you'll place your
+project. Normally this means a location...
+1. ... with enough space. This is, plan for at least 4 or 5 times the size of
+the input files.
+2. ... accessible by worker nodes. If you're using a single server, this is not
+really an issue. However, if you plan on deploying MiGA in a cluster
+infrastructure, make sure your project is reachable by worker nodes.
+3. ... with fast access. It's not a great idea to set up projects in remote
+drives with large latency. In some cases there no way around this, for example
+when that's the only available option in your cluster infrastructure, but try
+to avoid this as much as possible.
+Now that you know where to create your project, go ahead and run:
+```bash
+miga create_project -P /path/to/project1 -t type-of-project
+```
+Where `/path/to/project1` is the path to where the project should be created.
+You don't need to create the folder in advance, MiGA will take care. See the
+next section to help you decide what `type-of-project` to use. There are some
+other options that are not mandatory, but will make your project richer. Take a
+look at `miga create_project -h`.
+#### Project types
+Projects can be set for different purposes, so we've divided them into "types".
+There are four of them, depending on the types of datasets to be processed (see
+[Dataset types](#dataset-types)):
+1. **mixed**: A generic project with any supported type of datasets.
+2. **metagenomes**: A project containing only metagenomic datasets. This
+includes either (or both) metagenomes and viromes.
+3. **genomes**: A project containing only single-organism datasets. This
+includes any of the single-organism types: genome, scgenome, and/or popgenome.
+4. **clade**: Same as "genomes", but all the datasets are expected to be from
+the same species. This type of project performs additional analyses that expect
+a very dense ANI matrix, so all genomes in it are expected to have AAI > 90%.
+### Creating datasets
+Once your project is ready, you can start populating it with datasets and data.
+While it's possible to create empty datasets using `miga create_dataset`, the
+preferred method is to first add data and then use the data to create the
+datasets in batch. For example, lets assume you have a collection of paired-end
+raw reads from several datasets. The first step is to format the filenames
+properly. For each one of your datasets, pick a name that conforms the
+[MiGA names](#miga-names) restrictions (we'll call it "ds1") and rename your
+reads to `/path/to/project1/data/01.raw_reads/ds1.1.fastq` for the first
+sister and `/path/to/project1/data/01.raw_reads/ds1.2.fastq` for the second
+sister. Also, add the date into `/path/to/project1/data/01.raw_reads/ds1.done`.
+Check what are the [expected result files](#expected-result-files) below if you
+want to start at any other point in the pipeline. Once you have renamed (or
+copied) the files inside the project folder, run:
+```bash
+miga find_datasets -P /path/to/project1 -a -r -t type-of-dataset
+```
+The `-a` flag tells MiGA that you want to add the datasets (not just find them);
+the `-r` flag tells MiGA that your datasets are to be treated as "reference"
+datasets (see [Non-reference datasets](#non-reference-datasets) below); and the
+`-t` option tells MiGA what type of datasets you're adding (see
+[Dataset types](#dataset-types) below). If you have a mixture of dataset types,
+process one at a time. This is, perform this step for each dataset type. Don't
+worry about the datasets that are already registered, those will be ignored by
+the `find_datasets` task and will remain unchanged.
+#### Expected result files
+For brevity, we'll assume that you're inside `/path/to/project1/data`; *i.e.*,
+in the `data` directory of your project. We'll also assume that you're naming
+your dataset **ds1**, but you can change this by anything following the
+[MiGA names](#miga-names) restrictions. Now, these are the "input" points that
+you can use in MiGA:
+1. **Paired-end raw reads**: The expected files are `01.raw_reads/ds1.1.fastq`
+and `01.raw_reads/ds1.2.fastq`, each including a sister end. The reads must be
+in the same order in both files (MiGA won't check). You can also use gzipped
+files instead.
+2. **Single-end raw reads**: The expected file is `01.raw_reads/ds1.1.fastq`.
+You can also use a gzipped file instead.
+3. **Paired-end trimmed reads**: These are assumed to be quality-controlled
+reads in FastA format, with both ends passing the quality filters. The minimum
+expected file is `04.trimmed_fasta/ds1.CoupledReads.fa`, which contains the
+reads interposed. You can also pass (in addition) the reads that past the
+quality check without the sister as a gzipped FastA at
+`04.trimmed_fasta/ds1.SingleReads.fa.gz`.
+4. **Single-end trimmed reads**: Similar to the option above, only
+quality-checked reads are expected here. The expected file is
+`04.trimmed_fasta/ds1.SingleReads.fa`.
+5. **Assembled fragments**: This can be any assembly result, including complete
+genomes. The expected file is `05.assembly/ds1.LargeContigs.fna`, containing
+only contigs longer than 500bp. You can also provide the complete assembly
+(without length-filtering) at `05.assembly/ds1.AllContigs.fna`.
+6. **Predicted genes/proteins**: This is the total collection of predicted genes
+and proteins. The expected files are `06.cds/ds1.fna`, containing genes, and
+`06.cds/ds1.faa`, containing proteins. You can also provide the locations of
+said genes in the genome in gzipped GFF v2 (`06.cds/ds1.gff2.gz`), gzipped
+GFF v3 (`06.cds/ds1.gff3.gz`), or gzipped tabular (`06.cds/ds1.tab.gz`).
+**IMPORTANT**: In all cases, an additional `ds1.done` file MUST be created in
+the same folder. This is meant to prevent MiGA from mistakenly adding files as
+results before they're done being processed or transferred. This file must
+contain the current [date in MiGA format](#date-in-miga-format). Here's a quick
+code snippet to add the `.done` file for all the input files in `01.raw_reads`
+(you can adapt this accordingly to any of the other options):
+```bash
+cd /path/to/project1/data/01.raw_reads
+for i in *.1.fastq ; do
+   date "+%Y-%m-%d %H:%M:%S %z" > $(basename $i .1.fastq).done
+done
+```
+#### Dataset types
+This is how you tell MiGA what kind of data you have in your datasets. Lets see
+the definitions:
+1. **genome**: The genome from an isolate.
+2. **metagenome**: A metagenome (excluding viromes).
+3. **virome**: A viral metagenome.
+4. **scgenome**: A genome from a single cell.
+5. **popgenome**: The genome of a population (including microdiversity).
+#### Non-reference datasets
+#### Creating a RefSeq project
+If you've reached this point, you are now ready to create a large functional
+project. If you want to continue using this documentation on real data but
+don't have any of your own handy (or if you want to use RefSeq data), this
+is a quick tutoral on how to create a functional MiGA project using ALL of
+NCBI's Prokaryotic RefSeq data.
+**Step 1: Create the project**. That's simple, just `cd` to the directory you
+want to use, and execute `miga create_project -P MiGA_RefSeq -t genomes`.
+**Step 2: Download the data**. Just `cd MiGA_RefSeq`, and execute this code:
+```bash
+wget -O reference_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&amp;download=on&amp;type=reference'
+grep -v '^#' reference_genomes.txt \
+   | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} {print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
+   | while read ln ; do
+      sp=$(echo $ln | perl -pe 's/.*# //')
+      if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
+	 echo $ln
+	 $ln
+      fi
+   done
+```
+And that's it. The first line will download the most current list of genomes
+included in NCBI's Prokaryotic RefSeq, and the rest will repeatedly execute the
+`download_dataset` task, that automatically fetches the data (even the genome's
+taxonomy!). Note that the code above checks first if a dataset already exists,
+so if you want to update an existing MiGA_RefSeq project, simply repeat step 2
+and only missing genomes will be fetched.
+Note that running time for the above code may vary depending on the network and
+the size of RefSeq, but I was able to create a complete project with 122 genomes
+in under 10 minutes.
+**Alternative step 2: downloading all representatives**. If you want a larger
+and more comprehensive collection, and not just the reference genomes, you can
+download all of the representative genomes in the prokaryotic RefSeq with this
+alternative code:
+```bash
+wget -O representative_genomes.txt 'http://www.ncbi.nlm.nih.gov/genomes/Genome2BE/genome2srv.cgi?action=refgenomes&amp;download=on'
+grep -v '^#' representative_genomes.txt \
+   | awk -F'\t' '{gsub(/[^A-Za-z0-9]/,"_",$3)} $4{print "miga download_dataset -P . -D "$3" -I "$4" -U ncbi --db nuccore -t genome -v # "$3""}' \
+   | while read ln ; do
+      sp=$(echo $ln | perl -pe 's/.*# //')
+      if [[ ! -n $(miga list_datasets -P . -D $sp) ]] ; then
+	 echo $ln
+	 $ln
+      fi
+   done
+```
+This is a much larger set (1,246), hence it'll take much more time. I finished
+downloading the whole thing in about one and a half hours.
+Launching daemons
+-----------------
+### Configuring daemons
+### Understating the MiGA configuration file
+### Arbitrary configuration scripts
+### Fixing system calls with aliases
+In some cases, we might not have the same executable names as MiGA expects, or
+we might have broken modules in our cluster that can be easily fixed with an
+`alias`. In these cases, you can use
+[arbitrary configuration scripts](#arbitrary-configuration-scripts) to generate
+one or more `alias`. Importantly, MiGA daemons work with non-interactive shells,
+which means you likely need to explicitly allow for alias extensions, for
+example:
+```bash
+# Allow alias expansions in non-interactive shells
+shopt -s expand_aliases
+# Call FastQC with the environmental Perl,
+# not the built-in /usr/bin/perl:
+alias fastqc="perl $(which fastqc)"
+# Use the standard name for RAxML (pthreads)
+# instead of the one my sys-admin decided to use:
+alias raxmlHPC-PTHREADS=RAxML_pthreads
+```
+The examples above illustrate how to use `alias` to fix broken packages or to
+make Software with non-standard names reachable.
+**Known caveats to this solution:** This solution CANNOT BE USED in the few
+cases in which a whole package is expected based on a single executable. For
+example, adding the enveomics scripts to your `PATH` is far easier than creating
+an `alias` for each script. Also, MiGA expects to find the model, the activation
+key, and the scripts of MetaGeneMark in the same folder of the `gmhmmp` binary,
+so setting an`alias` may prevent MiGA from finding these ancillary files.
+Cluster infrastructure
+----------------------
+### Loading optional modules
+See also [Fixing system calls with aliases](#fixing-system-calls-with-aliases).
+Miscellaneous
+-------------
+These below are reference snippets that for which I couldn't find a more
+suitable home, but are important documentation.
+### MiGA Names
+MiGA names are non-empty strings composed exclusively of alphanumerics and
+underscores. All the dataset names in MiGA must conform this restriction, but
+not all the projects do. Other objects must conform the MiGA name restrictions,
+such as taxonomic entries.
+### Date in MiGA format
+The official format in which MiGA represents date/times is the default of Ruby's
+`Time.now.to_s`. In the *nix `date` utility this corresponds to the format:
+`+%Y-%m-%d %H:%M:%S %z`.
+Authors
+-------
+Developed and maintained by [Luis M. Rodriguez-R][lrr].
+License
+-------
+See [LICENSE](LICENSE).
+[lrr]: http://lmrodriguezr.github.io/

data/actions/add_result ADDED Viewed

@@ -0,0 +1,61 @@
+#!/usr/bin/env ruby
+#
+# @package MiGA
+# @author  Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
+# @license artistic license 2.0
+# @update  Oct-01-2015
+#
+o = {q:true}
+opts = OptionParser.new do |opt|
+   opt.banner = <<BAN
+Registers a result.
+Usage: #{$0} #{File.basename(__FILE__)} [options]
+BAN
+   opt.separator ""
+   opt.on("-P", "--project PATH",
+      "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
+   opt.on("-D", "--dataset PATH",
+      "(Mandatory if the result is dataset-specific) ID of the dataset to use."
+      ){ |v| o[:dataset]=v }
+   opt.on("-r", "--result STRING",
+      "(Mandatory) Name of the result to add.",
+      "Recognized names for dataset-specific results include:",
+      *MiGA::Dataset.RESULT_DIRS.keys.map{|n| " ~ #{n}"},
+      "Recognized names for project-wide results include:",
+      *MiGA::Project.RESULT_DIRS.keys.map{|n| " ~ #{n}"}){ |v| o[:name]=v }
+   opt.on("-v", "--verbose",
+      "Print additional information to STDERR."){ o[:q]=false }
+   opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
+      v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
+   end
+   opt.on("-h", "--help", "Display this screen.") do
+      puts opt
+      exit
+   end
+   opt.separator ""
+end.parse!
+### MAIN
+opts.parse!
+raise "-P is mandatory." if o[:project].nil?
+raise "-r is mandatory." if o[:name].nil?
+$stderr.puts "Loading project." unless o[:q]
+p = MiGA::Project.load(o[:project])
+raise "Impossible to load project: #{o[:project]}" if p.nil?
+$stderr.puts "Registering result." unless o[:q]
+if o[:dataset].nil?
+   r = p.add_result o[:name].to_sym
+else
+   d = p.dataset(o[:dataset])
+   r = d.add_result o[:name].to_sym
+end
+raise "Cannot add result, incomplete expected files." if r.nil?
+$stderr.puts "Done." unless o[:q]

data/actions/add_taxonomy ADDED Viewed

@@ -0,0 +1,86 @@
+#!/usr/bin/env ruby
+#
+# @package MiGA
+# @author  Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
+# @license artistic license 2.0
+# @update  Oct-01-2015
+#
+o = {q:true}
+OptionParser.new do |opt|
+   opt.banner = <<BAN
+Registers taxonomic information for datasets.
+Usage: #{$0} #{File.basename(__FILE__)} [options]
+BAN
+   opt.separator ""
+   opt.on("-P", "--project PATH",
+      "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
+   opt.on("-D", "--dataset PATH",
+      "(Mandatory unless -t is provided) ID of the dataset to use."
+      ){ |v| o[:dataset]=v }
+   opt.on("-s", "--tax-string STRING",
+      "(Mandatory unless -t is provided) String corresponding to the taxonomy",
+      "of the dataset. The MiGA format of string taxonomy is a space-delimited",
+      "set of 'rank:name' pairs."){ |v| o[:taxstring]=v }
+   opt.on("-t", "--tax-file PATH",
+      "(Mandatory unless -D and -s are provided) Tab-delimited file containing",
+      "datasets taxonomy.  Each row corresponds to a datasets and each column",
+      "corresponds to a rank.  The first row must be a header with the rank ",
+      "names, and the first column must contain dataset names."
+      ){ |v| o[:taxfile]=v }
+   opt.on("-v", "--verbose",
+      "Print additional information to STDERR."){ o[:q]=false }
+   opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
+      v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
+   end
+   opt.on("-h", "--help", "Display this screen.") do
+      puts opt
+      exit
+   end
+   opt.separator ""
+end.parse!
+### MAIN
+raise "-P is mandatory." if o[:project].nil?
+raise "-D is mandatory unless -t is provided." if
+   o[:dataset].nil? and o[:taxfile].nil?
+raise "-s is mandatory unless -t is provided." if
+   o[:taxstring].nil? and o[:taxfile].nil?
+$stderr.puts "Loading project." unless o[:q]
+p = MiGA::Project.load(o[:project])
+raise "Impossible to load project: #{o[:project]}" if p.nil?
+if not o[:taxfile].nil?
+   $stderr.puts "Reading tax-file and registering taxonomy." unless o[:q]
+   tfh = File.open(o[:taxfile], "r")
+   header = nil
+   while ln = tfh.gets
+      next if ln =~ /^\s*?$/
+      r = ln.chomp.split /\t/, -1
+      dn = r.shift
+      if header.nil?
+	 header = r
+	 next
+      end
+      d = p.dataset dn
+      if d.nil?
+	 warn "Impossible to find dataset at line #{$.}: #{dn}. Ignoring..."
+	 next
+      end
+      d.metadata[:tax] = MiGA::Taxonomy.new(r, header)
+      d.save
+      $stderr.puts " #{d.name} registered." unless o[:q]
+   end
+   tfh.close
+else
+   $stderr.puts "Registering taxonomy." unless o[:q]
+   d = p.dataset o[:dataset]
+   d.metadata[:tax] = MiGA::Taxonomy.new(o[:taxstring])
+   d.save
+end
+$stderr.puts "Done." unless o[:q]

data/actions/create_dataset ADDED Viewed

@@ -0,0 +1,62 @@
+#!/usr/bin/env ruby
+#
+# @package MiGA
+# @author  Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
+# @license artistic license 2.0
+# @update  Nov-29-2015
+#
+o = {q:true, ref:true}
+OptionParser.new do |opt|
+   opt.banner = <<BAN
+Creates an empty dataset in a pre-existing MiGA project.
+Usage: #{$0} #{File.basename(__FILE__)} [options]
+BAN
+   opt.separator ""
+   opt.on("-P", "--project PATH",
+      "(Mandatory) Path to the project to use."){ |v| o[:project]=v }
+   opt.on("-D", "--dataset STRING",
+      "(Mandatory) ID of the dataset to create."){ |v| o[:dataset]=v }
+   opt.on("-t", "--type STRING",
+      "Type of dataset. Recognized types include:",
+      *MiGA::Dataset.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
+      ){ |v| o[:type]=v.to_sym }
+   opt.on("-q", "--query",
+      "If set, the dataset is registered as a query, not a reference dataset."
+      ){ |v| o[:ref]=!v }
+   opt.on("-d", "--description STRING",
+      "Description of the dataset."){ |v| o[:description]=v }
+   opt.on("-u", "--user STRING",
+      "Owner of the dataset."){ |v| o[:user]=v }
+   opt.on("-c", "--comments STRING",
+      "Comments on the dataset."){ |v| o[:comments]=v }
+   opt.on("-v", "--verbose",
+      "Print additional information to STDERR."){ o[:q]=false }
+   opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
+      v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
+   end
+   opt.on("-h", "--help", "Display this screen.") do
+      puts opt
+      exit
+   end
+   opt.separator ""
+end.parse!
+### MAIN
+raise "-P is mandatory." if o[:project].nil?
+raise "-D is mandatory." if o[:dataset].nil?
+$stderr.puts "Loading project." unless o[:q]
+p = MiGA::Project.load(o[:project])
+raise "Impossible to load project: #{o[:project]}" if p.nil?
+$stderr.puts "Creating dataset." unless o[:q]
+md = {}
+[:type, :description, :user, :comments].each{ |k| md[k]=o[k] unless o[k].nil? }
+d = MiGA::Dataset.new(p, o[:dataset], o[:ref], md)
+p.add_dataset(o[:dataset])
+$stderr.puts "Done." unless o[:q]

data/actions/create_project ADDED Viewed

@@ -0,0 +1,70 @@
+#!/usr/bin/env ruby
+#
+# @package MiGA
+# @author  Luis M. Rodriguez-R <lmrodriguezr at gmail dot com>
+# @license artistic license 2.0
+# @update  Oct-01-2015
+#
+o = {q:true, update:false}
+OptionParser.new do |opt|
+   opt.banner = <<BAN
+Creates an empty MiGA project.
+Usage: #{$0} #{File.basename(__FILE__)} [options]
+BAN
+   opt.separator ""
+   opt.on("-P", "--project PATH",
+      "(Mandatory) Path to the project to create."){ |v| o[:project]=v }
+   opt.on("-t", "--type STRING",
+      "Type of dataset. Recognized types include:",
+      *MiGA::Project.KNOWN_TYPES.map{ |k,v| "~ #{k}: #{v[:description]}"}
+      ){ |v| o[:type]=v.to_sym }
+   opt.on("-n", "--name STRING",
+      "Name of the project."){ |v| o[:name]=v }
+   opt.on("-d", "--description STRING",
+      "Description of the project."){ |v| o[:description]=v }
+   opt.on("-u", "--user STRING", "Owner of the project."){ |v| o[:user]=v }
+   opt.on("-c", "--comments STRING",
+      "Comments on the project."){ |v| o[:comments]=v }
+   opt.on("--update",
+      "Updates the project if it already exists."){ o[:update]=true }
+   opt.on("-v", "--verbose",
+      "Print additional information to STDERR."){ o[:q]=false }
+   opt.on("-d", "--debug INT", "Print debugging information to STDERR.") do |v|
+      v.to_i>1 ? MiGA::MiGA.DEBUG_TRACE_ON : MiGA::MiGA.DEBUG_ON
+   end
+   opt.on("-h", "--help", "Display this screen.") do
+      puts opt
+      exit
+   end
+   opt.separator ""
+end.parse!
+### MAIN
+raise "-P is mandatory." if o[:project].nil?
+unless File.exist? "#{ENV["HOME"]}/.miga_rc" and
+      File.exist? "#{ENV["HOME"]}/.miga_daemon.json"
+   puts "You must initialize MiGA before creating the first project.\n" +
+      "Do you want to initialize MiGA now? (yes / no)"
+   `'#{File.dirname(__FILE__)}/../scripts/init.bash'` if
+      $stdin.gets.chomp == 'yes'
+end
+$stderr.puts "Creating project." unless o[:q]
+raise "Project already exists, aborting." unless
+   o[:update] or not MiGA::Project.exist? o[:project]
+p = MiGA::Project.new(o[:project], o[:update])
+# The following check is redundant with MiGA::Project#create,
+# but allows upgrading projects from (very) early code versions
+o[:name] = File.basename(p.path) if
+   o[:update] and o[:name].nil?
+[:name, :description, :user, :comments, :type].each do |k|
+   p.metadata[k] = o[k] unless o[k].nil?
+end
+p.save
+$stderr.puts "Done." unless o[:q]