RubyGems - miga-base - Versions diffs - 0.7.26.0 → 0.7.26.1 - Mend

miga-base 0.7.26.0 → 0.7.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (276) hide show

checksums.yaml +4 -4
data/lib/miga/version.rb +1 -1
data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Archaea_SCG.hmm +41964 -0
data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Bacteria_SCG.hmm +32439 -0
data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Complete_SCG_DB.hmm +62056 -0
data/utils/FastAAI/FastAAI/FastAAI +1336 -0
data/utils/FastAAI/README.md +84 -0
data/utils/FastAAI/kAAI_v1.0_virus.py +1296 -0
data/utils/enveomics/Docs/recplot2.md +244 -0
data/utils/enveomics/Examples/aai-matrix.bash +66 -0
data/utils/enveomics/Examples/ani-matrix.bash +66 -0
data/utils/enveomics/Examples/essential-phylogeny.bash +105 -0
data/utils/enveomics/Examples/unus-genome-phylogeny.bash +100 -0
data/utils/enveomics/LICENSE.txt +73 -0
data/utils/enveomics/Makefile +52 -0
data/utils/enveomics/Manifest/Tasks/aasubs.json +103 -0
data/utils/enveomics/Manifest/Tasks/blasttab.json +786 -0
data/utils/enveomics/Manifest/Tasks/distances.json +161 -0
data/utils/enveomics/Manifest/Tasks/fasta.json +766 -0
data/utils/enveomics/Manifest/Tasks/fastq.json +243 -0
data/utils/enveomics/Manifest/Tasks/graphics.json +126 -0
data/utils/enveomics/Manifest/Tasks/mapping.json +67 -0
data/utils/enveomics/Manifest/Tasks/ogs.json +382 -0
data/utils/enveomics/Manifest/Tasks/other.json +829 -0
data/utils/enveomics/Manifest/Tasks/remote.json +355 -0
data/utils/enveomics/Manifest/Tasks/sequence-identity.json +501 -0
data/utils/enveomics/Manifest/Tasks/tables.json +308 -0
data/utils/enveomics/Manifest/Tasks/trees.json +68 -0
data/utils/enveomics/Manifest/Tasks/variants.json +111 -0
data/utils/enveomics/Manifest/categories.json +156 -0
data/utils/enveomics/Manifest/examples.json +154 -0
data/utils/enveomics/Manifest/tasks.json +4 -0
data/utils/enveomics/Pipelines/assembly.pbs/CONFIG.mock.bash +69 -0
data/utils/enveomics/Pipelines/assembly.pbs/FastA.N50.pl +1 -0
data/utils/enveomics/Pipelines/assembly.pbs/FastA.filterN.pl +1 -0
data/utils/enveomics/Pipelines/assembly.pbs/FastA.length.pl +1 -0
data/utils/enveomics/Pipelines/assembly.pbs/README.md +189 -0
data/utils/enveomics/Pipelines/assembly.pbs/RUNME-2.bash +112 -0
data/utils/enveomics/Pipelines/assembly.pbs/RUNME-3.bash +23 -0
data/utils/enveomics/Pipelines/assembly.pbs/RUNME-4.bash +44 -0
data/utils/enveomics/Pipelines/assembly.pbs/RUNME.bash +50 -0
data/utils/enveomics/Pipelines/assembly.pbs/kSelector.R +37 -0
data/utils/enveomics/Pipelines/assembly.pbs/newbler.pbs +68 -0
data/utils/enveomics/Pipelines/assembly.pbs/newbler_preparator.pl +49 -0
data/utils/enveomics/Pipelines/assembly.pbs/soap.pbs +80 -0
data/utils/enveomics/Pipelines/assembly.pbs/stats.pbs +57 -0
data/utils/enveomics/Pipelines/assembly.pbs/velvet.pbs +63 -0
data/utils/enveomics/Pipelines/blast.pbs/01.pbs.bash +38 -0
data/utils/enveomics/Pipelines/blast.pbs/02.pbs.bash +73 -0
data/utils/enveomics/Pipelines/blast.pbs/03.pbs.bash +21 -0
data/utils/enveomics/Pipelines/blast.pbs/BlastTab.recover_job.pl +72 -0
data/utils/enveomics/Pipelines/blast.pbs/CONFIG.mock.bash +98 -0
data/utils/enveomics/Pipelines/blast.pbs/FastA.split.pl +1 -0
data/utils/enveomics/Pipelines/blast.pbs/README.md +127 -0
data/utils/enveomics/Pipelines/blast.pbs/RUNME.bash +109 -0
data/utils/enveomics/Pipelines/blast.pbs/TASK.check.bash +128 -0
data/utils/enveomics/Pipelines/blast.pbs/TASK.dry.bash +16 -0
data/utils/enveomics/Pipelines/blast.pbs/TASK.eo.bash +22 -0
data/utils/enveomics/Pipelines/blast.pbs/TASK.pause.bash +26 -0
data/utils/enveomics/Pipelines/blast.pbs/TASK.run.bash +89 -0
data/utils/enveomics/Pipelines/blast.pbs/sentinel.pbs.bash +29 -0
data/utils/enveomics/Pipelines/idba.pbs/README.md +49 -0
data/utils/enveomics/Pipelines/idba.pbs/RUNME.bash +95 -0
data/utils/enveomics/Pipelines/idba.pbs/run.pbs +56 -0
data/utils/enveomics/Pipelines/trim.pbs/README.md +54 -0
data/utils/enveomics/Pipelines/trim.pbs/RUNME.bash +70 -0
data/utils/enveomics/Pipelines/trim.pbs/run.pbs +130 -0
data/utils/enveomics/README.md +42 -0
data/utils/enveomics/Scripts/AAsubs.log2ratio.rb +171 -0
data/utils/enveomics/Scripts/Aln.cat.rb +163 -0
data/utils/enveomics/Scripts/Aln.convert.pl +35 -0
data/utils/enveomics/Scripts/AlphaDiversity.pl +152 -0
data/utils/enveomics/Scripts/BedGraph.tad.rb +93 -0
data/utils/enveomics/Scripts/BedGraph.window.rb +71 -0
data/utils/enveomics/Scripts/BlastPairwise.AAsubs.pl +102 -0
data/utils/enveomics/Scripts/BlastTab.addlen.rb +63 -0
data/utils/enveomics/Scripts/BlastTab.advance.bash +48 -0
data/utils/enveomics/Scripts/BlastTab.best_hit_sorted.pl +55 -0
data/utils/enveomics/Scripts/BlastTab.catsbj.pl +104 -0
data/utils/enveomics/Scripts/BlastTab.cogCat.rb +76 -0
data/utils/enveomics/Scripts/BlastTab.filter.pl +47 -0
data/utils/enveomics/Scripts/BlastTab.kegg_pep2path_rest.pl +194 -0
data/utils/enveomics/Scripts/BlastTab.metaxaPrep.pl +104 -0
data/utils/enveomics/Scripts/BlastTab.pairedHits.rb +157 -0
data/utils/enveomics/Scripts/BlastTab.recplot2.R +48 -0
data/utils/enveomics/Scripts/BlastTab.seqdepth.pl +86 -0
data/utils/enveomics/Scripts/BlastTab.seqdepth_ZIP.pl +119 -0
data/utils/enveomics/Scripts/BlastTab.seqdepth_nomedian.pl +86 -0
data/utils/enveomics/Scripts/BlastTab.subsample.pl +47 -0
data/utils/enveomics/Scripts/BlastTab.sumPerHit.pl +114 -0
data/utils/enveomics/Scripts/BlastTab.taxid2taxrank.pl +90 -0
data/utils/enveomics/Scripts/BlastTab.topHits_sorted.rb +101 -0
data/utils/enveomics/Scripts/Chao1.pl +97 -0
data/utils/enveomics/Scripts/CharTable.classify.rb +234 -0
data/utils/enveomics/Scripts/EBIseq2tax.rb +83 -0
data/utils/enveomics/Scripts/FastA.N50.pl +56 -0
data/utils/enveomics/Scripts/FastA.extract.rb +152 -0
data/utils/enveomics/Scripts/FastA.filter.pl +52 -0
data/utils/enveomics/Scripts/FastA.filterLen.pl +28 -0
data/utils/enveomics/Scripts/FastA.filterN.pl +60 -0
data/utils/enveomics/Scripts/FastA.fragment.rb +92 -0
data/utils/enveomics/Scripts/FastA.gc.pl +42 -0
data/utils/enveomics/Scripts/FastA.interpose.pl +93 -0
data/utils/enveomics/Scripts/FastA.length.pl +38 -0
data/utils/enveomics/Scripts/FastA.mask.rb +89 -0
data/utils/enveomics/Scripts/FastA.per_file.pl +36 -0
data/utils/enveomics/Scripts/FastA.qlen.pl +57 -0
data/utils/enveomics/Scripts/FastA.rename.pl +65 -0
data/utils/enveomics/Scripts/FastA.revcom.pl +23 -0
data/utils/enveomics/Scripts/FastA.sample.rb +83 -0
data/utils/enveomics/Scripts/FastA.slider.pl +85 -0
data/utils/enveomics/Scripts/FastA.split.pl +55 -0
data/utils/enveomics/Scripts/FastA.split.rb +79 -0
data/utils/enveomics/Scripts/FastA.subsample.pl +131 -0
data/utils/enveomics/Scripts/FastA.tag.rb +65 -0
data/utils/enveomics/Scripts/FastA.wrap.rb +48 -0
data/utils/enveomics/Scripts/FastQ.filter.pl +54 -0
data/utils/enveomics/Scripts/FastQ.interpose.pl +90 -0
data/utils/enveomics/Scripts/FastQ.offset.pl +90 -0
data/utils/enveomics/Scripts/FastQ.split.pl +53 -0
data/utils/enveomics/Scripts/FastQ.tag.rb +63 -0
data/utils/enveomics/Scripts/FastQ.test-error.rb +81 -0
data/utils/enveomics/Scripts/FastQ.toFastA.awk +24 -0
data/utils/enveomics/Scripts/GFF.catsbj.pl +127 -0
data/utils/enveomics/Scripts/GenBank.add_fields.rb +84 -0
data/utils/enveomics/Scripts/HMM.essential.rb +351 -0
data/utils/enveomics/Scripts/HMM.haai.rb +168 -0
data/utils/enveomics/Scripts/HMMsearch.extractIds.rb +83 -0
data/utils/enveomics/Scripts/JPlace.distances.rb +88 -0
data/utils/enveomics/Scripts/JPlace.to_iToL.rb +320 -0
data/utils/enveomics/Scripts/M5nr.getSequences.rb +81 -0
data/utils/enveomics/Scripts/MeTaxa.distribution.pl +198 -0
data/utils/enveomics/Scripts/MyTaxa.fragsByTax.pl +35 -0
data/utils/enveomics/Scripts/MyTaxa.seq-taxrank.rb +49 -0
data/utils/enveomics/Scripts/NCBIacc2tax.rb +92 -0
data/utils/enveomics/Scripts/Newick.autoprune.R +27 -0
data/utils/enveomics/Scripts/RAxML-EPA.to_iToL.pl +228 -0
data/utils/enveomics/Scripts/RecPlot2.compareIdentities.R +32 -0
data/utils/enveomics/Scripts/RefSeq.download.bash +48 -0
data/utils/enveomics/Scripts/SRA.download.bash +57 -0
data/utils/enveomics/Scripts/TRIBS.plot-test.R +36 -0
data/utils/enveomics/Scripts/TRIBS.test.R +39 -0
data/utils/enveomics/Scripts/Table.barplot.R +31 -0
data/utils/enveomics/Scripts/Table.df2dist.R +30 -0
data/utils/enveomics/Scripts/Table.filter.pl +61 -0
data/utils/enveomics/Scripts/Table.merge.pl +77 -0
data/utils/enveomics/Scripts/Table.replace.rb +69 -0
data/utils/enveomics/Scripts/Table.round.rb +63 -0
data/utils/enveomics/Scripts/Table.split.pl +57 -0
data/utils/enveomics/Scripts/Taxonomy.silva2ncbi.rb +227 -0
data/utils/enveomics/Scripts/VCF.KaKs.rb +147 -0
data/utils/enveomics/Scripts/VCF.SNPs.rb +88 -0
data/utils/enveomics/Scripts/aai.rb +418 -0
data/utils/enveomics/Scripts/ani.rb +362 -0
data/utils/enveomics/Scripts/clust.rand.rb +102 -0
data/utils/enveomics/Scripts/gi2tax.rb +103 -0
data/utils/enveomics/Scripts/in_silico_GA_GI.pl +96 -0
data/utils/enveomics/Scripts/lib/data/dupont_2012_essential.hmm.gz +0 -0
data/utils/enveomics/Scripts/lib/data/lee_2019_essential.hmm.gz +0 -0
data/utils/enveomics/Scripts/lib/enveomics.R +1 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/enveomics.rb +24 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/jplace.rb +253 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/og.rb +182 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/remote_data.rb +74 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/seq_range.rb +237 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/stat.rb +30 -0
data/utils/enveomics/Scripts/lib/enveomics_rb/vcf.rb +135 -0
data/utils/enveomics/Scripts/ogs.annotate.rb +88 -0
data/utils/enveomics/Scripts/ogs.core-pan.rb +160 -0
data/utils/enveomics/Scripts/ogs.extract.rb +125 -0
data/utils/enveomics/Scripts/ogs.mcl.rb +186 -0
data/utils/enveomics/Scripts/ogs.rb +104 -0
data/utils/enveomics/Scripts/ogs.stats.rb +131 -0
data/utils/enveomics/Scripts/rbm.rb +146 -0
data/utils/enveomics/Tests/Makefile +10 -0
data/utils/enveomics/Tests/Mgen_M2288.faa +3189 -0
data/utils/enveomics/Tests/Mgen_M2288.fna +8282 -0
data/utils/enveomics/Tests/Mgen_M2321.fna +8288 -0
data/utils/enveomics/Tests/Nequ_Kin4M.faa +2970 -0
data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.tribs.Rdata +0 -0
data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.txt +7 -0
data/utils/enveomics/Tests/Xanthomonas_oryzae.aai-mat.tsv +17 -0
data/utils/enveomics/Tests/Xanthomonas_oryzae.aai.tsv +137 -0
data/utils/enveomics/Tests/a_mg.cds-go.blast.tsv +123 -0
data/utils/enveomics/Tests/a_mg.reads-cds.blast.tsv +200 -0
data/utils/enveomics/Tests/a_mg.reads-cds.counts.tsv +55 -0
data/utils/enveomics/Tests/alkB.nwk +1 -0
data/utils/enveomics/Tests/anthrax-cansnp-data.tsv +13 -0
data/utils/enveomics/Tests/anthrax-cansnp-key.tsv +17 -0
data/utils/enveomics/Tests/hiv1.faa +59 -0
data/utils/enveomics/Tests/hiv1.fna +134 -0
data/utils/enveomics/Tests/hiv2.faa +70 -0
data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv +233 -0
data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.lim +1 -0
data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.rec +233 -0
data/utils/enveomics/Tests/phyla_counts.tsv +10 -0
data/utils/enveomics/Tests/primate_lentivirus.ogs +11 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv1.rbm +9 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv2.rbm +8 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-siv.rbm +6 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-hiv2.rbm +9 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-siv.rbm +6 -0
data/utils/enveomics/Tests/primate_lentivirus.rbm/siv-siv.rbm +6 -0
data/utils/enveomics/build_enveomics_r.bash +45 -0
data/utils/enveomics/enveomics.R/DESCRIPTION +31 -0
data/utils/enveomics/enveomics.R/NAMESPACE +39 -0
data/utils/enveomics/enveomics.R/R/autoprune.R +155 -0
data/utils/enveomics/enveomics.R/R/barplot.R +184 -0
data/utils/enveomics/enveomics.R/R/cliopts.R +135 -0
data/utils/enveomics/enveomics.R/R/df2dist.R +154 -0
data/utils/enveomics/enveomics.R/R/growthcurve.R +331 -0
data/utils/enveomics/enveomics.R/R/recplot.R +354 -0
data/utils/enveomics/enveomics.R/R/recplot2.R +1631 -0
data/utils/enveomics/enveomics.R/R/tribs.R +583 -0
data/utils/enveomics/enveomics.R/R/utils.R +50 -0
data/utils/enveomics/enveomics.R/README.md +80 -0
data/utils/enveomics/enveomics.R/data/growth.curves.rda +0 -0
data/utils/enveomics/enveomics.R/data/phyla.counts.rda +0 -0
data/utils/enveomics/enveomics.R/man/cash-enve.GrowthCurve-method.Rd +17 -0
data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2-method.Rd +17 -0
data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2.Peak-method.Rd +17 -0
data/utils/enveomics/enveomics.R/man/enve.GrowthCurve-class.Rd +25 -0
data/utils/enveomics/enveomics.R/man/enve.TRIBS-class.Rd +46 -0
data/utils/enveomics/enveomics.R/man/enve.TRIBS.merge.Rd +23 -0
data/utils/enveomics/enveomics.R/man/enve.TRIBStest-class.Rd +47 -0
data/utils/enveomics/enveomics.R/man/enve.__prune.iter.Rd +23 -0
data/utils/enveomics/enveomics.R/man/enve.__prune.reduce.Rd +23 -0
data/utils/enveomics/enveomics.R/man/enve.__tribs.Rd +32 -0
data/utils/enveomics/enveomics.R/man/enve.barplot.Rd +91 -0
data/utils/enveomics/enveomics.R/man/enve.cliopts.Rd +57 -0
data/utils/enveomics/enveomics.R/man/enve.col.alpha.Rd +24 -0
data/utils/enveomics/enveomics.R/man/enve.col2alpha.Rd +19 -0
data/utils/enveomics/enveomics.R/man/enve.df2dist.Rd +39 -0
data/utils/enveomics/enveomics.R/man/enve.df2dist.group.Rd +38 -0
data/utils/enveomics/enveomics.R/man/enve.df2dist.list.Rd +40 -0
data/utils/enveomics/enveomics.R/man/enve.growthcurve.Rd +67 -0
data/utils/enveomics/enveomics.R/man/enve.prune.dist.Rd +37 -0
data/utils/enveomics/enveomics.R/man/enve.recplot.Rd +122 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2-class.Rd +45 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.ANIr.Rd +24 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.Rd +68 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.__counts.Rd +25 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.__peakHist.Rd +21 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.__whichClosestPeak.Rd +19 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.changeCutoff.Rd +19 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.compareIdentities.Rd +41 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.coordinates.Rd +29 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.corePeak.Rd +18 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.extractWindows.Rd +40 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.Rd +36 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_e.Rd +19 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_m.Rd +19 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__emauto_one.Rd +27 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mow_one.Rd +41 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mower.Rd +17 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.em.Rd +43 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.emauto.Rd +37 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.mower.Rd +74 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.peak-class.Rd +59 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.seqdepth.Rd +27 -0
data/utils/enveomics/enveomics.R/man/enve.recplot2.windowDepthThreshold.Rd +32 -0
data/utils/enveomics/enveomics.R/man/enve.tribs.Rd +59 -0
data/utils/enveomics/enveomics.R/man/enve.tribs.test.Rd +28 -0
data/utils/enveomics/enveomics.R/man/enve.truncate.Rd +27 -0
data/utils/enveomics/enveomics.R/man/growth.curves.Rd +14 -0
data/utils/enveomics/enveomics.R/man/phyla.counts.Rd +13 -0
data/utils/enveomics/enveomics.R/man/plot.enve.GrowthCurve.Rd +63 -0
data/utils/enveomics/enveomics.R/man/plot.enve.TRIBS.Rd +38 -0
data/utils/enveomics/enveomics.R/man/plot.enve.TRIBStest.Rd +38 -0
data/utils/enveomics/enveomics.R/man/plot.enve.recplot2.Rd +111 -0
data/utils/enveomics/enveomics.R/man/summary.enve.GrowthCurve.Rd +19 -0
data/utils/enveomics/enveomics.R/man/summary.enve.TRIBS.Rd +19 -0
data/utils/enveomics/enveomics.R/man/summary.enve.TRIBStest.Rd +19 -0
data/utils/enveomics/globals.mk +8 -0
data/utils/enveomics/manifest.json +9 -0
metadata +277 -4

data/utils/enveomics/Pipelines/assembly.pbs/stats.pbs ADDED Viewed

@@ -0,0 +1,57 @@
+#!/bin/bash
+#PBS -q iw-shared-6
+#PBS -l nodes=1:ppn=1
+#PBS -l mem=1gb
+#PBS -l walltime=3:00:00
+#PBS -k oe
+# Check mandatory variables
+if [[ "$LIB" == "" ]]; then
+   echo "Error: LIB is mandatory" >&2
+   exit 1;
+fi
+if [[ "$PDIR" == "" ]]; then
+   echo "Error: PDIR is mandatory" >&2
+   exit 1;
+fi
+# Run
+module load perl/5.14.4
+echo "K N50 used reads " > $LIB.velvet.n50
+echo "K N50 used reads " > $LIB.soap.n50
+for ID in $(seq 10 31); do
+   let KMER=$ID*2+1
+   DIRV="$LIB.velvet_$KMER"
+   DIRS="$LIB.soap_$KMER"
+   echo $KMER > $LIB.velvet.n50.$KMER
+   echo $KMER > $LIB.soap.n50.$KMER
+   # N50 (>=500)
+   perl "$PDIR/FastA.N50.pl" "$DIRV/contigs.fa" 500 | grep '^N50' | sed -e 's/.*: //' >> $LIB.velvet.n50.$KMER
+   perl "$PDIR/FastA.N50.pl" "$DIRS/O.contig" 500 | grep '^N50' | sed -e 's/.*: //' >> $LIB.soap.n50.$KMER
+   # Used and Total reads
+   tail -n 1 $DIRV/Log | sed -e 's/.* using \\([0-9]*\\)\\/\\([0-9]*\\) reads.*/\\1\\n\\2/' >> $LIB.velvet.n50.$KMER
+   if [ -e "$DIRS/O.readOnContig" ] ; then
+      cat "$DIRS/O.readOnContig" | grep -vc '^read' >> $LIB.soap.n50.$KMER
+   elif [ -e "$DIRS/O.readOnContig.gz" ] ; then
+      zcat "$DIRS/O.readOnContig.gz" | grep -vc '^read' >> $LIB.soap.n50.$KMER
+   else
+      echo 0 >> $LIB.soap.n50.$KMER
+   fi
+   head -n 1 $DIRS/O.peGrads | awk '{print $3}' >> $LIB.soap.n50.$KMER
+   # Join
+   (cat $LIB.velvet.n50.$KMER | tr "\\n" " "; echo) >> $LIB.velvet.n50
+   rm $LIB.velvet.n50.$KMER
+   (cat $LIB.soap.n50.$KMER | tr "\\n" " "; echo) >> $LIB.soap.n50
+   rm $LIB.soap.n50.$KMER
+done
+# Create plot
+module load R/3.1.2
+echo "
+source('$PDIR/kSelector.R');
+pdf('$LIB.n50.pdf', 13, 7);
+kSelector('$LIB.velvet.n50', '$LIB (Velvet)');
+kSelector('$LIB.soap.n50', '$LIB (SOAP)');
+dev.off();
+" | R --vanilla -q

data/utils/enveomics/Pipelines/assembly.pbs/velvet.pbs ADDED Viewed

@@ -0,0 +1,63 @@
+#!/bin/bash
+#PBS -l nodes=1:ppn=1
+#PBS -k oe
+# Some defaults for the parameters
+FORMAT=${FORMAT:-fasta};
+INSLEN=${INSLEN:-300};
+USECOUPLED=${USECOUPLED:-yes};
+USESINGLE=${USESINGLE:-no};
+CLEANUP=${CLEANUP:-yes}
+# Check mandatory variables
+if [[ "$LIB" == "" ]]; then
+   echo "Error: LIB is mandatory" >&2
+   exit 1;
+fi
+if [[ "$PDIR" == "" ]]; then
+   echo "Error: PDIR is mandatory" >&2
+   exit 1;
+fi
+if [[ "$DATA" == "" ]]; then
+   echo "Error: DATA is mandatory" >&2
+   exit 1;
+fi
+# Prepare input
+KMER=$PBS_ARRAYID
+CWD=$(pwd)
+DIR="$CWD/$LIB.velvet_$KMER"
+# Run
+module load velvet/1.2.10
+echo velveth > $DIR.proc
+CMD="velveth_101_omp $DIR $KMER -$FORMAT"
+if [[ "$USECOUPLED" == "yes" ]]; then
+   CMD="$CMD -shortPaired $DATA/$LIB.CoupledReads.fa"
+fi
+if [[ "$USESINGLE" == "yes" ]]; then
+   CMD="$CMD -short $DATA/$LIB.SingleReads.fa"
+fi
+if [[ "$VELVETH_EXTRA" != "" ]]; then
+   CMD="$CMD $VELVETH_EXTRA"
+fi
+$CMD &> $DIR.hlog
+echo velvetg > $DIR.proc
+velvetg_101_omp "$DIR" -exp_cov auto -cov_cutoff auto -ins_length "$INSLEN" $VELVETG_EXTRA &> $DIR.glog
+if [[ -d $DIR ]] ; then
+   if [[ -s $DIR/contigs.fa ]] ; then
+      if [[ "$CLEANUP" != "no" ]] ; then
+	 echo cleanup > $DIR.proc
+	 rm $DIR/Sequences
+	 rm $DIR/Roadmaps
+	 rm $DIR/*Graph*
+      fi
+      echo done > $DIR.proc
+   else
+      echo "$0: Error: File $DIR/contigs.fa doesn't exist, something went wrong" >&2
+      exit 1
+   fi
+else
+   echo "$0: Error: Directory $DIR doesn't exist, something went wrong" >&2
+   exit 1
+fi

data/utils/enveomics/Pipelines/blast.pbs/01.pbs.bash ADDED Viewed

@@ -0,0 +1,38 @@
+# blast.pbs pipeline
+# Step 01 : Initialize input files
+# 00. Read configuration
+cd $SCRATCH ;
+TASK="dry" ;
+source "$PDIR/RUNME.bash" ;
+echo "$PBS_JOBID" > "$SCRATCH/success/01.00" ;
+if [[ ! -e "$SCRATCH/success/01.01" ]] ; then
+   # 01. BEGIN
+   REGISTER_JOB "01" "01" "Custom BEGIN function" \
+      && BEGIN \
+      || exit 1 ;
+   touch "$SCRATCH/success/01.01" ;
+fi
+if [[ ! -e "$SCRATCH/success/01.02" ]] ; then
+   # 02. Split
+   [[ -d "$SCRATCH/tmp/split" ]] && rm -R "$SCRATCH/tmp/split" ;
+   REGISTER_JOB "01" "02" "Splitting query files" \
+      && mkdir "$SCRATCH/tmp/split" \
+      && perl "$PDIR/FastA.split.pl" "$INPUT" "$SCRATCH/tmp/split/$PROJ" "$MAX_JOBS" \
+      || exit 1 ;
+   touch "$SCRATCH/success/01.02" ;
+fi ;
+if [[ ! -e "$SCRATCH/success/01.03" ]] ; then
+   # 03. Finalize
+   REGISTER_JOB "01" "03" "Finalizing input preparation" \
+      && mv "$SCRATCH/tmp/split" "$SCRATCH/tmp/in" \
+      || exit 1 ;
+   touch "$SCRATCH/success/01.03" ;
+fi ;
+[[ -d "$SCRATCH/tmp/out" ]] || ( mkdir "$SCRATCH/tmp/out" || exit 1 ) ;
+JOB_DONE "01" ;

data/utils/enveomics/Pipelines/blast.pbs/02.pbs.bash ADDED Viewed

@@ -0,0 +1,73 @@
+# blast.pbs pipeline
+# Step 02 : Run BLAST
+# Read configuration
+cd $SCRATCH ;
+TASK="dry" ;
+source "$PDIR/RUNME.bash" ;
+# 00. Initial vars
+ID_N=$PBS_ARRAYID
+[[ "$ID_N" == "" ]] && exit 1 ;
+[[ -e "$SCRATCH/success/02.$ID_N" ]] && exit 0 ;
+IN="$SCRATCH/tmp/in/$PROJ.$ID_N.fa" ;
+OUT="$SCRATCH/tmp/out/$PROJ.blast.$ID_N" ;
+FINAL_OUT="$SCRATCH/results/$PROJ.$ID_N.blast" ;
+if [[ -e "$SCRATCH/success/02.$ID_N.00" ]] ; then
+   pre_job=$(cat "$SCRATCH/success/02.$ID_N.00") ;
+   state=$(qstat -f "$pre_job" 2>/dev/null | grep job_state | sed -e 's/.*= //')
+   if [[ "$state" == "R" ]] ; then
+      echo "Warning: This task is already being executed by $pre_job. Aborting." >&2 ;
+      exit 0 ;
+   elif [[ "$state" == "" ]] ; then
+      echo "Warning: This task was initialized by $pre_job, but it's currently not running. Superseding." >&2 ;
+   fi ;
+fi
+echo "$PBS_JOBID" > "$SCRATCH/success/02.$ID_N.00" ;
+# 01. Before BLAST
+if [[ ! -e "$SCRATCH/success/02.$ID_N.01" ]] ; then
+   BEFORE_BLAST "$IN" "$OUT" || exit 1 ;
+   touch "$SCRATCH/success/02.$ID_N.01" ;
+fi ;
+# 02. Run BLAST
+if [[ ! -e "$SCRATCH/success/02.$ID_N.02" ]] ; then
+   # Recover previous runs, if any
+   if [[ -s "$OUT" ]] ; then
+      perl "$PDIR/BlastTab.recover_job.pl" "$IN" "$OUT" \
+	 || exit 1 ;
+   fi ;
+   # Run BLAST
+   RUN_BLAST "$IN" "$OUT" \
+      && mv "$OUT" "$OUT-z" \
+      || exit 1 ;
+   touch "$SCRATCH/success/02.$ID_N.02" ;
+fi ;
+# 03. Collect BLAST parts
+if [[ ! -e "$SCRATCH/success/02.$ID_N.03" ]] ; then
+   if [[ -e "$OUT" ]] ; then
+      echo "Warning: The file $OUT pre-exists, but the BLAST collection was incomplete." >&2 ;
+      echo "  I'm assuming that it corresponds to the first part of the result, but you should check manually." >&2 ;
+      echo "  The last lines are:" >&2 ;
+      tail -n 3 "$OUT" >&2 ;
+   else
+      touch "$OUT" || exit 1 ;
+   fi ;
+   for i in $(ls $OUT-*) ; do
+      cat "$i" >> "$OUT" ;
+      rm "$i" || exit 1 ;
+   done ;
+   mv "$OUT" "$FINAL_OUT"
+   touch "$SCRATCH/success/02.$ID_N.03" ;
+fi ;
+# 04. After BLAST
+if [[ ! -e "$SCRATCH/success/02.$ID_N.04" ]] ; then
+   AFTER_BLAST "$IN" "$FINAL_OUT" || exit 1 ;
+   touch "$SCRATCH/success/02.$ID_N.04" ;
+fi ;
+touch "$SCRATCH/success/02.$ID_N" ;

data/utils/enveomics/Pipelines/blast.pbs/03.pbs.bash ADDED Viewed

@@ -0,0 +1,21 @@
+# blast.pbs pipeline
+# Step 03 : Finalize
+# Read configuration
+cd $SCRATCH ;
+TASK="dry" ;
+source "$PDIR/RUNME.bash" ;
+PREFIX="$SCRATCH/results/$PROJ" ;
+OUT="$SCRATCH/$PROJ.blast" ;
+echo "$PBS_JOBID" > "$SCRATCH/success/02.00" ;
+# 01. END
+if [[ ! -e "$SCRATCH/success/03.01" ]] ; then
+   REGISTER_JOB "03" "01" "Custom END function" \
+      && END "$PREFIX" "$OUT" \
+      || exit 1 ;
+   touch "$SCRATCH/success/03.01" ;
+fi ;
+JOB_DONE "03" ;

data/utils/enveomics/Pipelines/blast.pbs/BlastTab.recover_job.pl ADDED Viewed

@@ -0,0 +1,72 @@
+#!/usr/bin/perl
+use warnings;
+use strict;
+use File::Copy;
+my($fasta, $blast) = @ARGV;
+($fasta and $blast) or die "
+.USAGE:
+   $0 query.fa blast.txt
+   query.fa	Query sequences in FastA format.
+   blast.txt	Incomplete BLAST output in tabular format.
+";
+print "Fixing $blast:\n";
+my $blast_res;
+for(my $i=0; 1; $i++){
+   $blast_res = "$blast-$i";
+   last unless -e $blast_res;
+}
+open BLAST, "<", $blast or die "Cannot read the file: $blast: $!\n";
+open TMP, ">", "$blast-tmp" or die "Cannot create the file: $blast-tmp: $!\n";
+my $last="";
+my $last_id="";
+my $before = "";
+while(my $ln=<BLAST>){
+   chomp $ln;
+   last unless $ln =~ m/(.+?)\t/;
+   my $id = $1;
+   if($id eq $last_id){
+      $last.= $ln."\n";
+   }else{
+      print TMP $last if $last;
+      $before = $last_id;
+      $last = $ln."\n";
+      $last_id = $id;
+   }
+}
+close BLAST;
+close TMP;
+move "$blast-tmp", $blast_res or die "Cannot move file $blast-tmp into $blast_res: $!\n";
+unlink $blast or die "Cannot delete file: $blast: $!\n";
+unless($before eq ""){
+   print "[$before] ";
+   $before = ">$before";
+   open FASTA, "<", $fasta or die "Cannot read file: $fasta: $!\n";
+   open TMP, ">", "$fasta-tmp" or die "Cannot create file: $fasta-tmp: $!\n";
+   my $print = 0;
+   my $at = 0;
+   my $i = 0;
+   while(my $ln=<FASTA>){
+      $i++;
+      $print = 1 if $at and $ln =~ /^>/;
+      print TMP $ln if $print;
+      $ln =~ s/\s+.*//;
+      chomp $ln;
+      $at = $i if $ln eq $before;
+   }
+   close TMP;
+   close FASTA;
+   printf 'recovered at %.2f%% (%d/%d).'."\n", 100*$at/$i, $at, $i if $i;
+   move $fasta, "$fasta.old" or die "Cannot move file $fasta into $fasta.old: $!\n";
+   move "$fasta-tmp", $fasta or die "Cannot move file $fasta-tmp into $fasta: $!\n";
+}

data/utils/enveomics/Pipelines/blast.pbs/CONFIG.mock.bash ADDED Viewed

@@ -0,0 +1,98 @@
+#!/bin/bash
+##################### VARIABLES
+# Queue and resources.
+QUEUE="iw-shared-6" ;
+MAX_JOBS=500 ; # Maximum number of concurrent jobs. Never exceed 1990.
+PPN=2 ;
+RAM="9gb" ;
+# Paths
+SCRATCH_DIR="$HOME/scratch/pipelines/blast" ; # Where the outputs and temporals will be created
+INPUT="$HOME/data/my-large-file.fasta" ; # Input query file
+DB="$HOME/data/db/nr" ; # Input database
+PROGRAM="blastp" ;
+# Pipeline
+MAX_TRIALS=5 ; # Maximum number of automated attempts to re-start a job
+##################### FUNCTIONS
+## All the functions below can be edited to suit your particular job.
+## No function can be empty, but you can use a "dummy" function (like true).
+## All functions have access to any of the variables defined above.
+##
+## The functions are executed in the following order (from left to right):
+##
+##           / -----> BEFORE_BLAST --> RUN_BLAST --> AFTER_BLAST ---\
+##          /              ···            ···            ···         \
+## BEGIN --#--------> BEFORE_BLAST --> RUN_BLAST --> AFTER_BLAST -----#---> END
+##          \              ···            ···            ···         /
+##           \ -----> BEFORE_BLAST --> RUN_BLAST --> AFTER_BLAST ---/
+##
+# Function to execute ONLY ONCE at the begining
+function BEGIN {
+   ### Format the database (assuming proteins, check commands):
+   # module load ncbi_blast/2.2.25 || exit 1 ;
+   # makeblastdb -in $HOME/data/some-database.faa -title $DB -dbtype prot || exit 1 ;
+   # module unload ncbi_blast/2.2.25 || exit 1 ;
+   ### Don't do anything:
+   true ;
+}
+# Function to execute BEFORE running the BLAST, for each sub-task.
+function BEFORE_BLAST {
+   local IN=$1 # Query file
+   local OUT=$2 # Blast file (to be created)
+   ### Don't do anything:
+   true ;
+}
+# Function that executes BLAST, for each sub-task
+function RUN_BLAST {
+   local IN=$1 # Query file
+   local OUT=$2 # Blast file (to be created)
+   ### Run BLAST+ with 13th and 14th columns (query length and subject length):
+   module load ncbi_blast/2.2.28_binary || exit 1 ;
+   $PROGRAM -query $IN -db $DB -out $OUT -num_threads $PPN \
+   	-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen" \
+	|| exit 1 ;
+   module unload ncbi_blast/2.2.28_binary || exit 1 ;
+   ### Run BLAT (nucleotides)
+   # module load blat/rhel6 || exit 1 ;
+   # blat $DB $IN -out=blast8 $OUT || exit 1 ;
+   # module unload blat/rhel6 || exit 1 ;
+   ### Run BLAT (proteins)
+   # module load blat/rhel6 || exit 1 ;
+   # blat $DB $IN -out=blast8 -prot $OUT || exit 1 ;
+   # module unload blat/rhel6 || exit 1 ;
+}
+# Function to execute AFTER running the BLAST, for each sub-task
+function AFTER_BLAST {
+   local IN=$1 # Query files
+   local OUT=$2 # Blast files
+   ### Filter by best-match:
+   # sort $OUT | perl $PDIR/../../Scripts/BlastTab.best_hit_sorted.pl > $OUT.bm
+   ### Filter by Bit-score 60:
+   # awk '$12>=60' $OUT > $OUT.bs60
+   ### Filter by corrected identity 95 (only if it has the additional 13th column):
+   # awk '$3*$4/$13 >= 95' $OUT > $OUT.ci95
+   ### Don't do anything:
+   true ;
+}
+# Function to execute ONLY ONCE at the end, to concatenate the results
+function END {
+   local PREFIX=$1 # Prefix of all Blast files
+   local OUT=$2 # Single Blast output (to be created).
+   ### Simply concatenate files:
+   # cat $PREFIX.*.blast > $OUT
+   ### Concatenate only the filtered files (if filtering in AFTER_BLAST):
+   # cat $PREFIX.*.blast.bs60 > $OUT
+   ### Sort the BLAST by query (might require considerable RAM):
+   # sort -k 1 $PREFIX.*.blast > $OUT
+   ### Don't do anyhthing:
+   true ;
+}

data/utils/enveomics/Pipelines/blast.pbs/FastA.split.pl ADDED Viewed

	@@ -0,0 +1 @@
1	+ ../../Scripts/FastA.split.pl

data/utils/enveomics/Pipelines/blast.pbs/README.md ADDED Viewed

@@ -0,0 +1,127 @@
+@author: Luis Miguel Rodriguez-R <lmrodriguezr at gmail dot com>
+@update: Feb-20-2014
+@license: artistic 2.0
+@status: auto
+@pbs: yes
+# IMPORTANT
+This pipeline was developed for the [PACE cluster](http://pace.gatech.edu/).  You
+are free to use it in other platforms with adequate adjustments.
+# PURPOSE
+Simplifies submitting and tracking large BLAST jobs in cluster.
+# HELP
+1. Files preparation:
+   1.1. Obtain the enveomics package in the cluster. You can use: `git clone https://github.com/lmrodriguezr/enveomics.git`
+   1.2. Prepare the query sequences and the database.
+   1.3. Copy the file `CONFIG.mock.bash` to `CONFIG.<name>.bash`, where `<name>` is a
+      short name for your project (avoid characters other than alphanumeric).
+   1.4. Change the variables in `CONFIG.<name>.bash`. The **Queue and resources** and the
+      **Pipeline** variables are very standard, and can be kept unchanged. The **Paths**
+      variables indicate where your input files are and where the output files are to
+      be created, so check them carefully. Finally, the **FUNCTIONS** define the core
+      functionality of the pipeline, and should also be reviewed. By default, the
+      Pipeline simply runs BLAST+, with default parameters and tabular output with two
+      extra columns (qlen and slen). However, additional functionality can easily be
+      incorporated via these functions, such as BLAST filtering, concatenation, sorting,
+      or even execution of other programs instead of BLAST, such as BLAT, etc. Note that
+      the output MUST be BLAST-like tabular, because this is the only format supported
+      to check completeness and recover incomplete runs.
+2. Pipeline execution:
+   2.1. To initialize a run, execute: `./RUNME.bash <name> run`.
+   2.2. To check the status of a job, execute: `./RUNME.bash <name> check`.
+   2.3. To pause a run, execute: `./RUNME.bash <name> pause` (see 2.1 to resume).
+   2.4. To check if your CONFIG defines all required parameters, execute: `./RUNME.bash <name> dry`.
+   2.5. To review all the e/o files in the run, execute: `./RUNME.bash <name> eo`.
+3. Finalizing:
+   3.1. `./RUNME.bash <name> check` will inform you if a project finished. If it finished successfully,
+      you can review your (split) results in $SCRATCH/results. If you concatenated the results in the
+      `END` function, you should have a file with all the results in $SCRATCH/<name>.blast.
+   3.2. Usually, checking the e/o files at the end is a good idea (`./RUNME.bash <name> eo`). However,
+      bear in mind that this Pipeline can overcome several errors and is robust to most failures, so
+      don't be alarmed at the first sight of errors.
+# Comments
+* Some scripts contained in this package are actually symlinks to files in the
+  _Scripts_ folder.  Check the existance of these files when copied to
+  the cluster.
+# Troubleshooting
+1. Do I really have to change directory (`cd`) to the pipeline's folder everytime I want to execute
+   something?
+   No.  Not really.  For simplicity, this file tells you to execute `./RUNME.bash`.  However, you don't
+   really have to be there, you can execute it from any location.  For example, if you saved enveomics in
+   your home directory, you can just execute `~/enveomics/blast.pbs/RUNME.bash` insted from any location
+   in the head node.
+2. When I check a project, few sub-jobs are Active for much longer than the others. How do I know if those
+   sub-jobs are really active?
+   Lets review an example of a problematic run. When you run `./RUNME.bash <name> check`, you see the
+   following in the "Active jobs" section:
+   ````
+   Idle: 155829.shared-sched.pace.gatech.edu: 02: 00: Mon Mar 17 14:10:28 EDT 2014
+     Sub-jobs:500 Active:4 ( 0.8% ) Eligible:0 ( 0.0% ) Blocked:0 ( 0.0% ) Completed:496 ( 99.2% )
+   Idle: 155830.shared-sched.pace.gatech.edu: 02: 00: Mon Mar 17 14:10:28 EDT 2014
+   Running jobs: 0.
+   Idle jobs: 2.
+   ````
+   That means that the job "155829.shared-sched.pace.gatech.edu" has four Active jobs, while all the others are Completed. This is
+   a sign of something problematic.  You can see the complete status of each array using
+   `checkjob -v <JOB_NAME>`. In our example above, you would run `checkjob -v 155829`. In the output
+   of checkjob, most jobs should report "Completed". In this example, there are four jobs that are not
+   complete:
+   ````
+   ...
+   387 : 155829[387] : Completed
+   388 : 155829[388] : Running
+   389 : 155829[389] : Running
+   390 : 155829[390] : Running
+   391 : 155829[391] : Running
+   392 : 155829[392] : Completed
+   ...
+   ````
+   So you can simply check these sub-jobs in more detail. For example, if I run `checkjob -v 155829[388]`,
+   I see that the job is running in the machine `iw-k30-12.pace.gatech.edu` (Task Distribution), so I can try
+   to login to that machine to check if the job is actually running, using `top -u $(whoami)`. However, when
+   I run `ssh iw-k30-12`, I got a "Connection closed" error, which means that the machine hung up. At this point,
+   you might want to try one of the following solutions:
+   2.1. Pause the project using `./RUNME.bash <name> pause`, wait a few minutes, and resume using
+        `./RUNME.bash <name> run`. If you tried this a couple of times and you still have sub-jobs hanging, try:
+   2.2. Check if your sub-jobs finished. Sometimes sub-jobs die too soon to return a success code, but they actually
+        finished. Just run the following command: `ls <SCRATCH>/<name>/success/02.* | wc -l`, where `<SCRATCH>` is the
+	value you set for the `SCRATCH` variable in the CONFIG file, and `<name>` is the name of your project. If the
+	output of that command is a number, and that number is exactly six times the number of jobs (`MAX_JOBS` in the
+	CONFIG file, typically 500), then your step 2 actually finished. In my case, I have 500 jobs, and the output
+	was 3000, so my job finished successfully, but the pipeline didn't notice. You can manually tell the pipeline
+	to go on running: `touch <SCRATCH>/<name>/success/02`, and pausing/resuming the project (see 2.1 above). If
+	the output is not the expected number (in my case, 3000, which is 6*500), DON'T RUN `touch`, just try the
+	solution 2.1 above once again.