miga-base 0.7.26.0 → 1.0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/lib/miga/_data/aai-intax.blast.tsv.gz +0 -0
- data/lib/miga/_data/aai-intax.diamond.tsv.gz +0 -0
- data/lib/miga/_data/aai-novel.blast.tsv.gz +0 -0
- data/lib/miga/_data/aai-novel.diamond.tsv.gz +0 -0
- data/lib/miga/cli/action/classify_wf.rb +2 -2
- data/lib/miga/cli/action/derep_wf.rb +1 -1
- data/lib/miga/cli/action/doctor.rb +57 -14
- data/lib/miga/cli/action/doctor/base.rb +47 -23
- data/lib/miga/cli/action/init.rb +11 -7
- data/lib/miga/cli/action/init/files_helper.rb +1 -0
- data/lib/miga/cli/action/ncbi_get.rb +3 -3
- data/lib/miga/cli/action/tax_dist.rb +2 -2
- data/lib/miga/cli/action/wf.rb +5 -4
- data/lib/miga/common.rb +1 -0
- data/lib/miga/daemon.rb +11 -4
- data/lib/miga/dataset/result.rb +10 -6
- data/lib/miga/json.rb +5 -4
- data/lib/miga/metadata.rb +5 -1
- data/lib/miga/parallel.rb +36 -0
- data/lib/miga/project.rb +8 -8
- data/lib/miga/project/base.rb +4 -4
- data/lib/miga/project/result.rb +2 -2
- data/lib/miga/sqlite.rb +10 -2
- data/lib/miga/version.rb +23 -9
- data/scripts/aai_distances.bash +16 -18
- data/scripts/ani_distances.bash +16 -17
- data/scripts/assembly.bash +31 -16
- data/scripts/haai_distances.bash +3 -27
- data/scripts/miga.bash +6 -4
- data/scripts/p.bash +1 -1
- data/scripts/read_quality.bash +9 -18
- data/scripts/trimmed_fasta.bash +14 -30
- data/scripts/trimmed_reads.bash +36 -36
- data/test/parallel_test.rb +31 -0
- data/test/project_test.rb +2 -1
- data/test/remote_dataset_test.rb +1 -1
- data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Archaea_SCG.hmm +41964 -0
- data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Bacteria_SCG.hmm +32439 -0
- data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Complete_SCG_DB.hmm +62056 -0
- data/utils/FastAAI/FastAAI/FastAAI +1336 -0
- data/utils/FastAAI/README.md +84 -0
- data/utils/FastAAI/kAAI_v1.0_virus.py +1296 -0
- data/utils/distance/commands.rb +1 -0
- data/utils/distance/database.rb +0 -1
- data/utils/distance/runner.rb +2 -4
- data/utils/enveomics/Docs/recplot2.md +244 -0
- data/utils/enveomics/Examples/aai-matrix.bash +66 -0
- data/utils/enveomics/Examples/ani-matrix.bash +66 -0
- data/utils/enveomics/Examples/essential-phylogeny.bash +105 -0
- data/utils/enveomics/Examples/unus-genome-phylogeny.bash +100 -0
- data/utils/enveomics/LICENSE.txt +73 -0
- data/utils/enveomics/Makefile +52 -0
- data/utils/enveomics/Manifest/Tasks/aasubs.json +103 -0
- data/utils/enveomics/Manifest/Tasks/blasttab.json +786 -0
- data/utils/enveomics/Manifest/Tasks/distances.json +161 -0
- data/utils/enveomics/Manifest/Tasks/fasta.json +802 -0
- data/utils/enveomics/Manifest/Tasks/fastq.json +291 -0
- data/utils/enveomics/Manifest/Tasks/graphics.json +126 -0
- data/utils/enveomics/Manifest/Tasks/mapping.json +137 -0
- data/utils/enveomics/Manifest/Tasks/ogs.json +382 -0
- data/utils/enveomics/Manifest/Tasks/other.json +906 -0
- data/utils/enveomics/Manifest/Tasks/remote.json +355 -0
- data/utils/enveomics/Manifest/Tasks/sequence-identity.json +638 -0
- data/utils/enveomics/Manifest/Tasks/tables.json +308 -0
- data/utils/enveomics/Manifest/Tasks/trees.json +68 -0
- data/utils/enveomics/Manifest/Tasks/variants.json +111 -0
- data/utils/enveomics/Manifest/categories.json +165 -0
- data/utils/enveomics/Manifest/examples.json +154 -0
- data/utils/enveomics/Manifest/tasks.json +4 -0
- data/utils/enveomics/Pipelines/assembly.pbs/CONFIG.mock.bash +69 -0
- data/utils/enveomics/Pipelines/assembly.pbs/FastA.N50.pl +1 -0
- data/utils/enveomics/Pipelines/assembly.pbs/FastA.filterN.pl +1 -0
- data/utils/enveomics/Pipelines/assembly.pbs/FastA.length.pl +1 -0
- data/utils/enveomics/Pipelines/assembly.pbs/README.md +189 -0
- data/utils/enveomics/Pipelines/assembly.pbs/RUNME-2.bash +112 -0
- data/utils/enveomics/Pipelines/assembly.pbs/RUNME-3.bash +23 -0
- data/utils/enveomics/Pipelines/assembly.pbs/RUNME-4.bash +44 -0
- data/utils/enveomics/Pipelines/assembly.pbs/RUNME.bash +50 -0
- data/utils/enveomics/Pipelines/assembly.pbs/kSelector.R +37 -0
- data/utils/enveomics/Pipelines/assembly.pbs/newbler.pbs +68 -0
- data/utils/enveomics/Pipelines/assembly.pbs/newbler_preparator.pl +49 -0
- data/utils/enveomics/Pipelines/assembly.pbs/soap.pbs +80 -0
- data/utils/enveomics/Pipelines/assembly.pbs/stats.pbs +57 -0
- data/utils/enveomics/Pipelines/assembly.pbs/velvet.pbs +63 -0
- data/utils/enveomics/Pipelines/blast.pbs/01.pbs.bash +38 -0
- data/utils/enveomics/Pipelines/blast.pbs/02.pbs.bash +73 -0
- data/utils/enveomics/Pipelines/blast.pbs/03.pbs.bash +21 -0
- data/utils/enveomics/Pipelines/blast.pbs/BlastTab.recover_job.pl +72 -0
- data/utils/enveomics/Pipelines/blast.pbs/CONFIG.mock.bash +98 -0
- data/utils/enveomics/Pipelines/blast.pbs/FastA.split.pl +1 -0
- data/utils/enveomics/Pipelines/blast.pbs/README.md +127 -0
- data/utils/enveomics/Pipelines/blast.pbs/RUNME.bash +109 -0
- data/utils/enveomics/Pipelines/blast.pbs/TASK.check.bash +128 -0
- data/utils/enveomics/Pipelines/blast.pbs/TASK.dry.bash +16 -0
- data/utils/enveomics/Pipelines/blast.pbs/TASK.eo.bash +22 -0
- data/utils/enveomics/Pipelines/blast.pbs/TASK.pause.bash +26 -0
- data/utils/enveomics/Pipelines/blast.pbs/TASK.run.bash +89 -0
- data/utils/enveomics/Pipelines/blast.pbs/sentinel.pbs.bash +29 -0
- data/utils/enveomics/Pipelines/idba.pbs/README.md +49 -0
- data/utils/enveomics/Pipelines/idba.pbs/RUNME.bash +95 -0
- data/utils/enveomics/Pipelines/idba.pbs/run.pbs +56 -0
- data/utils/enveomics/Pipelines/trim.pbs/README.md +54 -0
- data/utils/enveomics/Pipelines/trim.pbs/RUNME.bash +70 -0
- data/utils/enveomics/Pipelines/trim.pbs/run.pbs +130 -0
- data/utils/enveomics/README.md +42 -0
- data/utils/enveomics/Scripts/AAsubs.log2ratio.rb +171 -0
- data/utils/enveomics/Scripts/Aln.cat.rb +221 -0
- data/utils/enveomics/Scripts/Aln.convert.pl +35 -0
- data/utils/enveomics/Scripts/AlphaDiversity.pl +152 -0
- data/utils/enveomics/Scripts/BedGraph.tad.rb +93 -0
- data/utils/enveomics/Scripts/BedGraph.window.rb +71 -0
- data/utils/enveomics/Scripts/BlastPairwise.AAsubs.pl +102 -0
- data/utils/enveomics/Scripts/BlastTab.addlen.rb +63 -0
- data/utils/enveomics/Scripts/BlastTab.advance.bash +48 -0
- data/utils/enveomics/Scripts/BlastTab.best_hit_sorted.pl +55 -0
- data/utils/enveomics/Scripts/BlastTab.catsbj.pl +104 -0
- data/utils/enveomics/Scripts/BlastTab.cogCat.rb +76 -0
- data/utils/enveomics/Scripts/BlastTab.filter.pl +47 -0
- data/utils/enveomics/Scripts/BlastTab.kegg_pep2path_rest.pl +194 -0
- data/utils/enveomics/Scripts/BlastTab.metaxaPrep.pl +104 -0
- data/utils/enveomics/Scripts/BlastTab.pairedHits.rb +157 -0
- data/utils/enveomics/Scripts/BlastTab.recplot2.R +48 -0
- data/utils/enveomics/Scripts/BlastTab.seqdepth.pl +86 -0
- data/utils/enveomics/Scripts/BlastTab.seqdepth_ZIP.pl +119 -0
- data/utils/enveomics/Scripts/BlastTab.seqdepth_nomedian.pl +86 -0
- data/utils/enveomics/Scripts/BlastTab.subsample.pl +47 -0
- data/utils/enveomics/Scripts/BlastTab.sumPerHit.pl +114 -0
- data/utils/enveomics/Scripts/BlastTab.taxid2taxrank.pl +90 -0
- data/utils/enveomics/Scripts/BlastTab.topHits_sorted.rb +101 -0
- data/utils/enveomics/Scripts/Chao1.pl +97 -0
- data/utils/enveomics/Scripts/CharTable.classify.rb +234 -0
- data/utils/enveomics/Scripts/EBIseq2tax.rb +83 -0
- data/utils/enveomics/Scripts/FastA.N50.pl +60 -0
- data/utils/enveomics/Scripts/FastA.extract.rb +152 -0
- data/utils/enveomics/Scripts/FastA.filter.pl +52 -0
- data/utils/enveomics/Scripts/FastA.filterLen.pl +28 -0
- data/utils/enveomics/Scripts/FastA.filterN.pl +60 -0
- data/utils/enveomics/Scripts/FastA.fragment.rb +100 -0
- data/utils/enveomics/Scripts/FastA.gc.pl +42 -0
- data/utils/enveomics/Scripts/FastA.interpose.pl +93 -0
- data/utils/enveomics/Scripts/FastA.length.pl +38 -0
- data/utils/enveomics/Scripts/FastA.mask.rb +89 -0
- data/utils/enveomics/Scripts/FastA.per_file.pl +36 -0
- data/utils/enveomics/Scripts/FastA.qlen.pl +57 -0
- data/utils/enveomics/Scripts/FastA.rename.pl +65 -0
- data/utils/enveomics/Scripts/FastA.revcom.pl +23 -0
- data/utils/enveomics/Scripts/FastA.sample.rb +98 -0
- data/utils/enveomics/Scripts/FastA.slider.pl +85 -0
- data/utils/enveomics/Scripts/FastA.split.pl +55 -0
- data/utils/enveomics/Scripts/FastA.split.rb +79 -0
- data/utils/enveomics/Scripts/FastA.subsample.pl +131 -0
- data/utils/enveomics/Scripts/FastA.tag.rb +65 -0
- data/utils/enveomics/Scripts/FastA.toFastQ.rb +69 -0
- data/utils/enveomics/Scripts/FastA.wrap.rb +48 -0
- data/utils/enveomics/Scripts/FastQ.filter.pl +54 -0
- data/utils/enveomics/Scripts/FastQ.interpose.pl +90 -0
- data/utils/enveomics/Scripts/FastQ.maskQual.rb +89 -0
- data/utils/enveomics/Scripts/FastQ.offset.pl +90 -0
- data/utils/enveomics/Scripts/FastQ.split.pl +53 -0
- data/utils/enveomics/Scripts/FastQ.tag.rb +70 -0
- data/utils/enveomics/Scripts/FastQ.test-error.rb +81 -0
- data/utils/enveomics/Scripts/FastQ.toFastA.awk +24 -0
- data/utils/enveomics/Scripts/GFF.catsbj.pl +127 -0
- data/utils/enveomics/Scripts/GenBank.add_fields.rb +84 -0
- data/utils/enveomics/Scripts/HMM.essential.rb +351 -0
- data/utils/enveomics/Scripts/HMM.haai.rb +168 -0
- data/utils/enveomics/Scripts/HMMsearch.extractIds.rb +83 -0
- data/utils/enveomics/Scripts/JPlace.distances.rb +88 -0
- data/utils/enveomics/Scripts/JPlace.to_iToL.rb +320 -0
- data/utils/enveomics/Scripts/M5nr.getSequences.rb +81 -0
- data/utils/enveomics/Scripts/MeTaxa.distribution.pl +198 -0
- data/utils/enveomics/Scripts/MyTaxa.fragsByTax.pl +35 -0
- data/utils/enveomics/Scripts/MyTaxa.seq-taxrank.rb +49 -0
- data/utils/enveomics/Scripts/NCBIacc2tax.rb +92 -0
- data/utils/enveomics/Scripts/Newick.autoprune.R +27 -0
- data/utils/enveomics/Scripts/RAxML-EPA.to_iToL.pl +228 -0
- data/utils/enveomics/Scripts/RecPlot2.compareIdentities.R +32 -0
- data/utils/enveomics/Scripts/RefSeq.download.bash +48 -0
- data/utils/enveomics/Scripts/SRA.download.bash +55 -0
- data/utils/enveomics/Scripts/TRIBS.plot-test.R +36 -0
- data/utils/enveomics/Scripts/TRIBS.test.R +39 -0
- data/utils/enveomics/Scripts/Table.barplot.R +31 -0
- data/utils/enveomics/Scripts/Table.df2dist.R +30 -0
- data/utils/enveomics/Scripts/Table.filter.pl +61 -0
- data/utils/enveomics/Scripts/Table.merge.pl +77 -0
- data/utils/enveomics/Scripts/Table.prefScore.R +60 -0
- data/utils/enveomics/Scripts/Table.replace.rb +69 -0
- data/utils/enveomics/Scripts/Table.round.rb +63 -0
- data/utils/enveomics/Scripts/Table.split.pl +57 -0
- data/utils/enveomics/Scripts/Taxonomy.silva2ncbi.rb +227 -0
- data/utils/enveomics/Scripts/VCF.KaKs.rb +147 -0
- data/utils/enveomics/Scripts/VCF.SNPs.rb +88 -0
- data/utils/enveomics/Scripts/aai.rb +419 -0
- data/utils/enveomics/Scripts/ani.rb +362 -0
- data/utils/enveomics/Scripts/anir.rb +137 -0
- data/utils/enveomics/Scripts/clust.rand.rb +102 -0
- data/utils/enveomics/Scripts/gi2tax.rb +103 -0
- data/utils/enveomics/Scripts/in_silico_GA_GI.pl +96 -0
- data/utils/enveomics/Scripts/lib/data/dupont_2012_essential.hmm.gz +0 -0
- data/utils/enveomics/Scripts/lib/data/lee_2019_essential.hmm.gz +0 -0
- data/utils/enveomics/Scripts/lib/enveomics.R +1 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/anir.rb +293 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/bm_set.rb +175 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/enveomics.rb +24 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/errors.rb +17 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/gmm_em.rb +30 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/jplace.rb +253 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/match.rb +63 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/og.rb +182 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/rbm.rb +49 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/remote_data.rb +74 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/seq_range.rb +237 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/stats.rb +3 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/stats/rand.rb +31 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/stats/sample.rb +152 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/utils.rb +73 -0
- data/utils/enveomics/Scripts/lib/enveomics_rb/vcf.rb +135 -0
- data/utils/enveomics/Scripts/ogs.annotate.rb +88 -0
- data/utils/enveomics/Scripts/ogs.core-pan.rb +160 -0
- data/utils/enveomics/Scripts/ogs.extract.rb +125 -0
- data/utils/enveomics/Scripts/ogs.mcl.rb +186 -0
- data/utils/enveomics/Scripts/ogs.rb +104 -0
- data/utils/enveomics/Scripts/ogs.stats.rb +131 -0
- data/utils/enveomics/Scripts/rbm-legacy.rb +172 -0
- data/utils/enveomics/Scripts/rbm.rb +100 -0
- data/utils/enveomics/Scripts/sam.filter.rb +148 -0
- data/utils/enveomics/Tests/Makefile +10 -0
- data/utils/enveomics/Tests/Mgen_M2288.faa +3189 -0
- data/utils/enveomics/Tests/Mgen_M2288.fna +8282 -0
- data/utils/enveomics/Tests/Mgen_M2321.fna +8288 -0
- data/utils/enveomics/Tests/Nequ_Kin4M.faa +2970 -0
- data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.tribs.Rdata +0 -0
- data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.txt +7 -0
- data/utils/enveomics/Tests/Xanthomonas_oryzae.aai-mat.tsv +17 -0
- data/utils/enveomics/Tests/Xanthomonas_oryzae.aai.tsv +137 -0
- data/utils/enveomics/Tests/a_mg.cds-go.blast.tsv +123 -0
- data/utils/enveomics/Tests/a_mg.reads-cds.blast.tsv +200 -0
- data/utils/enveomics/Tests/a_mg.reads-cds.counts.tsv +55 -0
- data/utils/enveomics/Tests/alkB.nwk +1 -0
- data/utils/enveomics/Tests/anthrax-cansnp-data.tsv +13 -0
- data/utils/enveomics/Tests/anthrax-cansnp-key.tsv +17 -0
- data/utils/enveomics/Tests/hiv1.faa +59 -0
- data/utils/enveomics/Tests/hiv1.fna +134 -0
- data/utils/enveomics/Tests/hiv2.faa +70 -0
- data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv +233 -0
- data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.lim +1 -0
- data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.rec +233 -0
- data/utils/enveomics/Tests/phyla_counts.tsv +10 -0
- data/utils/enveomics/Tests/primate_lentivirus.ogs +11 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv1.rbm +9 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv2.rbm +8 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-siv.rbm +6 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-hiv2.rbm +9 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-siv.rbm +6 -0
- data/utils/enveomics/Tests/primate_lentivirus.rbm/siv-siv.rbm +6 -0
- data/utils/enveomics/build_enveomics_r.bash +45 -0
- data/utils/enveomics/enveomics.R/DESCRIPTION +31 -0
- data/utils/enveomics/enveomics.R/NAMESPACE +39 -0
- data/utils/enveomics/enveomics.R/R/autoprune.R +155 -0
- data/utils/enveomics/enveomics.R/R/barplot.R +184 -0
- data/utils/enveomics/enveomics.R/R/cliopts.R +135 -0
- data/utils/enveomics/enveomics.R/R/df2dist.R +154 -0
- data/utils/enveomics/enveomics.R/R/growthcurve.R +331 -0
- data/utils/enveomics/enveomics.R/R/prefscore.R +79 -0
- data/utils/enveomics/enveomics.R/R/recplot.R +354 -0
- data/utils/enveomics/enveomics.R/R/recplot2.R +1631 -0
- data/utils/enveomics/enveomics.R/R/tribs.R +583 -0
- data/utils/enveomics/enveomics.R/R/utils.R +80 -0
- data/utils/enveomics/enveomics.R/README.md +81 -0
- data/utils/enveomics/enveomics.R/data/growth.curves.rda +0 -0
- data/utils/enveomics/enveomics.R/data/phyla.counts.rda +0 -0
- data/utils/enveomics/enveomics.R/man/cash-enve.GrowthCurve-method.Rd +16 -0
- data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2-method.Rd +16 -0
- data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2.Peak-method.Rd +16 -0
- data/utils/enveomics/enveomics.R/man/enve.GrowthCurve-class.Rd +25 -0
- data/utils/enveomics/enveomics.R/man/enve.TRIBS-class.Rd +46 -0
- data/utils/enveomics/enveomics.R/man/enve.TRIBS.merge.Rd +23 -0
- data/utils/enveomics/enveomics.R/man/enve.TRIBStest-class.Rd +47 -0
- data/utils/enveomics/enveomics.R/man/enve.__prune.iter.Rd +23 -0
- data/utils/enveomics/enveomics.R/man/enve.__prune.reduce.Rd +23 -0
- data/utils/enveomics/enveomics.R/man/enve.__tribs.Rd +40 -0
- data/utils/enveomics/enveomics.R/man/enve.barplot.Rd +103 -0
- data/utils/enveomics/enveomics.R/man/enve.cliopts.Rd +67 -0
- data/utils/enveomics/enveomics.R/man/enve.col.alpha.Rd +24 -0
- data/utils/enveomics/enveomics.R/man/enve.col2alpha.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/enve.df2dist.Rd +45 -0
- data/utils/enveomics/enveomics.R/man/enve.df2dist.group.Rd +44 -0
- data/utils/enveomics/enveomics.R/man/enve.df2dist.list.Rd +47 -0
- data/utils/enveomics/enveomics.R/man/enve.growthcurve.Rd +75 -0
- data/utils/enveomics/enveomics.R/man/enve.prefscore.Rd +50 -0
- data/utils/enveomics/enveomics.R/man/enve.prune.dist.Rd +44 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot.Rd +139 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2-class.Rd +45 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.ANIr.Rd +24 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.Rd +77 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.__counts.Rd +25 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.__peakHist.Rd +21 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.__whichClosestPeak.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.changeCutoff.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.compareIdentities.Rd +47 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.coordinates.Rd +29 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.corePeak.Rd +18 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.extractWindows.Rd +45 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.Rd +36 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_e.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_m.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__emauto_one.Rd +27 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mow_one.Rd +52 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mower.Rd +17 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.em.Rd +51 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.emauto.Rd +43 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.mower.Rd +82 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.peak-class.Rd +59 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.seqdepth.Rd +27 -0
- data/utils/enveomics/enveomics.R/man/enve.recplot2.windowDepthThreshold.Rd +36 -0
- data/utils/enveomics/enveomics.R/man/enve.selvector.Rd +23 -0
- data/utils/enveomics/enveomics.R/man/enve.tribs.Rd +68 -0
- data/utils/enveomics/enveomics.R/man/enve.tribs.test.Rd +28 -0
- data/utils/enveomics/enveomics.R/man/enve.truncate.Rd +27 -0
- data/utils/enveomics/enveomics.R/man/growth.curves.Rd +14 -0
- data/utils/enveomics/enveomics.R/man/phyla.counts.Rd +13 -0
- data/utils/enveomics/enveomics.R/man/plot.enve.GrowthCurve.Rd +78 -0
- data/utils/enveomics/enveomics.R/man/plot.enve.TRIBS.Rd +46 -0
- data/utils/enveomics/enveomics.R/man/plot.enve.TRIBStest.Rd +45 -0
- data/utils/enveomics/enveomics.R/man/plot.enve.recplot2.Rd +125 -0
- data/utils/enveomics/enveomics.R/man/summary.enve.GrowthCurve.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/summary.enve.TRIBS.Rd +19 -0
- data/utils/enveomics/enveomics.R/man/summary.enve.TRIBStest.Rd +19 -0
- data/utils/enveomics/globals.mk +8 -0
- data/utils/enveomics/manifest.json +9 -0
- data/utils/multitrim/Multitrim How-To.pdf +0 -0
- data/utils/multitrim/README.md +67 -0
- data/utils/multitrim/multitrim.py +1555 -0
- data/utils/multitrim/multitrim.yml +13 -0
- data/utils/requirements.txt +4 -3
- metadata +304 -3
@@ -0,0 +1,165 @@
|
|
1
|
+
{
|
2
|
+
"categories": {
|
3
|
+
"Sequence similarity search": {
|
4
|
+
"Statistics": [
|
5
|
+
"BedGraph.tad.rb",
|
6
|
+
"BedGraph.window.rb",
|
7
|
+
"BlastPairwise.AAsubs.pl",
|
8
|
+
"BlastTab.advance.bash",
|
9
|
+
"BlastTab.recplot2.R",
|
10
|
+
"BlastTab.seqdepth.pl",
|
11
|
+
"BlastTab.seqdepth_nomedian.pl",
|
12
|
+
"BlastTab.seqdepth_ZIP.pl",
|
13
|
+
"BlastTab.sumPerHit.pl",
|
14
|
+
"FastQ.test-error.rb",
|
15
|
+
"RecPlot2.compareIdentities.R"
|
16
|
+
],
|
17
|
+
"Manipulation": [
|
18
|
+
"BlastTab.addlen.rb",
|
19
|
+
"BlastTab.best_hit_sorted.pl",
|
20
|
+
"BlastTab.catsbj.pl",
|
21
|
+
"BlastTab.cogCat.rb",
|
22
|
+
"BlastTab.filter.pl",
|
23
|
+
"BlastTab.kegg_pep2path_rest.pl",
|
24
|
+
"BlastTab.pairedHits.rb",
|
25
|
+
"BlastTab.subsample.pl",
|
26
|
+
"BlastTab.taxid2taxrank.pl",
|
27
|
+
"BlastTab.topHits_sorted.rb",
|
28
|
+
"sam.filter.rb"
|
29
|
+
],
|
30
|
+
"Execution": [
|
31
|
+
"aai.rb",
|
32
|
+
"ani.rb",
|
33
|
+
"anir.rb",
|
34
|
+
"HMM.haai.rb",
|
35
|
+
"rbm.rb"
|
36
|
+
]
|
37
|
+
},
|
38
|
+
"Sequence analyses": {
|
39
|
+
"Statistics": [
|
40
|
+
"FastA.gc.pl",
|
41
|
+
"FastA.length.pl",
|
42
|
+
"FastA.N50.pl",
|
43
|
+
"FastA.qlen.pl",
|
44
|
+
"FastQ.test-error.rb"
|
45
|
+
],
|
46
|
+
"Manipulation": [
|
47
|
+
"FastA.extract.rb",
|
48
|
+
"FastA.filter.pl",
|
49
|
+
"FastA.filterLen.pl",
|
50
|
+
"FastA.filterN.pl",
|
51
|
+
"FastA.fragment.rb",
|
52
|
+
"FastA.interpose.pl",
|
53
|
+
"FastA.mask.rb",
|
54
|
+
"FastA.per_file.pl",
|
55
|
+
"FastA.rename.pl",
|
56
|
+
"FastA.revcom.pl",
|
57
|
+
"FastA.sample.rb",
|
58
|
+
"FastA.slider.pl",
|
59
|
+
"FastA.split.pl",
|
60
|
+
"FastA.split.rb",
|
61
|
+
"FastA.subsample.pl",
|
62
|
+
"FastA.tag.rb",
|
63
|
+
"FastA.toFastQ.rb",
|
64
|
+
"FastA.wrap.rb",
|
65
|
+
"FastQ.filter.pl",
|
66
|
+
"FastQ.interpose.pl",
|
67
|
+
"FastQ.maskQual.rb",
|
68
|
+
"FastQ.offset.pl",
|
69
|
+
"FastQ.split.pl",
|
70
|
+
"FastQ.tag.rb",
|
71
|
+
"FastQ.toFastA.awk"
|
72
|
+
]
|
73
|
+
},
|
74
|
+
"Diversity": {
|
75
|
+
"Community": [
|
76
|
+
"AlphaDiversity.pl",
|
77
|
+
"Chao1.pl",
|
78
|
+
"Table.barplot.R",
|
79
|
+
"Table.prefScore.R"
|
80
|
+
],
|
81
|
+
"Population": [
|
82
|
+
"VCF.SNPs.rb",
|
83
|
+
"VCF.KaKs.rb",
|
84
|
+
"Table.prefScore.R"
|
85
|
+
]
|
86
|
+
},
|
87
|
+
"Annotation": {
|
88
|
+
"Database mapping": [
|
89
|
+
"BlastTab.kegg_pep2path_rest.pl",
|
90
|
+
"BlastTab.taxid2taxrank.pl",
|
91
|
+
"EBIseq2tax.rb",
|
92
|
+
"NCBIacc2tax.rb",
|
93
|
+
"gi2tax.rb",
|
94
|
+
"M5nr.getSequences.rb",
|
95
|
+
"RefSeq.download.bash",
|
96
|
+
"SRA.download.bash"
|
97
|
+
],
|
98
|
+
"Tables": [
|
99
|
+
"Table.barplot.R",
|
100
|
+
"GenBank.add_fields.rb",
|
101
|
+
"MyTaxa.fragsByTax.pl",
|
102
|
+
"Table.df2dist.R",
|
103
|
+
"Table.filter.pl",
|
104
|
+
"Table.merge.pl",
|
105
|
+
"Table.replace.rb",
|
106
|
+
"Table.round.rb",
|
107
|
+
"Table.split.pl"
|
108
|
+
],
|
109
|
+
"Search": [
|
110
|
+
"HMM.essential.rb",
|
111
|
+
"HMM.haai.rb",
|
112
|
+
"HMMsearch.extractIds.rb",
|
113
|
+
"ogs.annotate.rb",
|
114
|
+
"ogs.core-pan.rb",
|
115
|
+
"ogs.extract.rb",
|
116
|
+
"ogs.mcl.rb",
|
117
|
+
"ogs.stats.rb",
|
118
|
+
"ogs.rb"
|
119
|
+
]
|
120
|
+
},
|
121
|
+
"Other data": {
|
122
|
+
"Phylogenetic and other distances": [
|
123
|
+
"CharTable.classify.rb",
|
124
|
+
"JPlace.distances.rb",
|
125
|
+
"JPlace.to_iToL.rb",
|
126
|
+
"Newick.autoprune.R",
|
127
|
+
"TRIBS.test.R",
|
128
|
+
"TRIBS.plot-test.R",
|
129
|
+
"Table.df2dist.R"
|
130
|
+
],
|
131
|
+
"Taxonomic": [
|
132
|
+
"CharTable.classify.rb",
|
133
|
+
"EBIseq2tax.rb",
|
134
|
+
"NCBIacc2tax.rb",
|
135
|
+
"Table.barplot.R",
|
136
|
+
"gi2tax.rb",
|
137
|
+
"MyTaxa.fragsByTax.pl",
|
138
|
+
"MyTaxa.seq-taxrank.rb",
|
139
|
+
"Taxonomy.silva2ncbi.rb"
|
140
|
+
],
|
141
|
+
"Alignments": [
|
142
|
+
"AAsubs.log2ratio.rb",
|
143
|
+
"Aln.cat.rb",
|
144
|
+
"Aln.convert.pl",
|
145
|
+
"BlastPairwise.AAsubs.pl"
|
146
|
+
],
|
147
|
+
"Clustering": [
|
148
|
+
"ogs.mcl.rb",
|
149
|
+
"clust.rand.rb"
|
150
|
+
],
|
151
|
+
"Read recruitments": [
|
152
|
+
"anir.rb",
|
153
|
+
"BedGraph.tad.rb",
|
154
|
+
"BedGraph.window.rb",
|
155
|
+
"BlastTab.catsbj.pl",
|
156
|
+
"BlastTab.pairedHits.rb",
|
157
|
+
"BlastTab.recplot2.R",
|
158
|
+
"FastQ.test-error.rb",
|
159
|
+
"GFF.catsbj.pl",
|
160
|
+
"RecPlot2.compareIdentities.R",
|
161
|
+
"sam.filter.rb"
|
162
|
+
]
|
163
|
+
}
|
164
|
+
}
|
165
|
+
}
|
@@ -0,0 +1,154 @@
|
|
1
|
+
{
|
2
|
+
"_": "Input files and directories are included in the 'Tests' folder.",
|
3
|
+
"examples": [
|
4
|
+
{
|
5
|
+
"_": "== Examples of genome comparisons ==",
|
6
|
+
"task": "ogs.stats.rb",
|
7
|
+
"description": ["Statistics on the groups of orthology in the Primate",
|
8
|
+
"Lentivirus Group, including HIV-1, HIV-2, and SIV."],
|
9
|
+
"values": ["primate_lentivirus.ogs",null,null,null,null,null]
|
10
|
+
},
|
11
|
+
{
|
12
|
+
"task": "ani.rb",
|
13
|
+
"description": ["Average Nucleotide Identity (ANI) between two strains",
|
14
|
+
"of Mycoplasma genitalium (M2288 and M2321)."],
|
15
|
+
"values": ["Mgen_M2288.fna","Mgen_M2321.fna",null,null,null,null,null,
|
16
|
+
null,null,null,null,null,null,null,null,null,null,null,null,null,null,
|
17
|
+
null,null,null]
|
18
|
+
},
|
19
|
+
{
|
20
|
+
"task": "aai.rb",
|
21
|
+
"description": ["Average Amino acid Identity (AAI) between Mycoplasma",
|
22
|
+
"genitalium (Bacteria) and Nanoarchaeum equitans (Archaea)."],
|
23
|
+
"values": ["Mgen_M2288.faa","Nequ_Kin4M.faa",null,null,null,null,null,
|
24
|
+
null,null,null,null,null,null,null,null,null,null,null,null,null,null,
|
25
|
+
null,null,null]
|
26
|
+
},
|
27
|
+
{
|
28
|
+
"task": "rbm.rb",
|
29
|
+
"description": ["Reciprocal Best Matches between the proteomes of the",
|
30
|
+
"two major HIV types (HIV-1 and HIV-2)."],
|
31
|
+
"values": ["hiv1.faa","hiv2.faa",null,null,null,null,null,null,null,null,
|
32
|
+
null,null,"hiv1-hiv2.rbm"]
|
33
|
+
},
|
34
|
+
{
|
35
|
+
"task": "ogs.mcl.rb",
|
36
|
+
"description": ["Groups of orthology in the Primate Letivirus Group,",
|
37
|
+
"including HIV-1, HIV-2, and SIV."],
|
38
|
+
"values": ["primate_lentivirus.ogs","primate_lentivirus.rbm",null,null,
|
39
|
+
null,null,null,null,null,null,null,null]
|
40
|
+
},
|
41
|
+
{
|
42
|
+
"task": "Table.df2dist.R",
|
43
|
+
"description": ["Transforms a list of AAI values between Xanthomonas",
|
44
|
+
"oryzae genomes into a distance matrix."],
|
45
|
+
"values": ["Xanthomonas_oryzae.aai.tsv",null,null,null,null,100.0,
|
46
|
+
"Xanthomonas_oryzae.aai-mat.tsv"]
|
47
|
+
},
|
48
|
+
{
|
49
|
+
"_": "== Recruitment plots",
|
50
|
+
"task": "BlastTab.catsbj.pl",
|
51
|
+
"description": ["Prepares recruitment plot files for a comparison",
|
52
|
+
"between a virome containing HIV and the HIV-1 genome."],
|
53
|
+
"values": [null,null,null,null,"hiv1.fna","hiv_mix-hiv1.blast.tsv"]
|
54
|
+
},
|
55
|
+
{
|
56
|
+
"task": "BlastTab.recplot2.R",
|
57
|
+
"description": ["Generates recruitment plots for a comparison",
|
58
|
+
"between a virome containing HIV and the HIV-1 genome."],
|
59
|
+
"values": ["hiv_mix-hiv1.blast.tsv",50,100,null,null,null,null,null,null,
|
60
|
+
null,null,null,"hiv_mix-hiv1.Rdata","hiv_mix-hiv1.pdf",null,null]
|
61
|
+
},
|
62
|
+
{
|
63
|
+
"_": "== Examples of functional annotations ==",
|
64
|
+
"task": "HMM.essential.rb",
|
65
|
+
"description": ["Typical single-copy bacterial genes present in",
|
66
|
+
"Mycoplasma genitalium."],
|
67
|
+
"values": ["Mgen_M2288.faa",null,null,null,null,null,null,true,null,null,
|
68
|
+
null,null,null,null,null,null,null,null,null]
|
69
|
+
},
|
70
|
+
{
|
71
|
+
"task": "HMM.essential.rb",
|
72
|
+
"description": ["Typical single-copy archaeal genes present in",
|
73
|
+
"Nanoarchaeum equitans."],
|
74
|
+
"values": ["Mgen_M2288.faa",null,null,null,null,null,null,null,true,null,
|
75
|
+
null,null,null,null,null,null,null,null,null]
|
76
|
+
},
|
77
|
+
{
|
78
|
+
"task": "Newick.autoprune.R",
|
79
|
+
"description": ["Prune an AlkB tree with 110 tips to get only distant",
|
80
|
+
"representatives (41)."],
|
81
|
+
"values": ["alkB.nwk",0.9,null,null,null,null,null,"alkB-pruned.nwk"]
|
82
|
+
},
|
83
|
+
{
|
84
|
+
"_": "== Examples of BLAST statistics and manipulation",
|
85
|
+
"task": "BlastTab.topHits_sorted.rb",
|
86
|
+
"description": ["Extract the best match of metagenome-derived proteins",
|
87
|
+
"(from the 'A metagenome') against a Gene Ontology collection."],
|
88
|
+
"values": ["sort","a_mg.cds-go.blast.tsv",null,null,null,null,1,null,null,
|
89
|
+
null,"a_mg.cds-go.blast-bm.tsv"]
|
90
|
+
},
|
91
|
+
{
|
92
|
+
"task": "BlastTab.sumPerHit.pl",
|
93
|
+
"description": ["Count the number of reads per gene in a mapping of a",
|
94
|
+
"metagenome to a metagenome-derived genes (from the 'A metagenome')."],
|
95
|
+
"values": [null,null,null,null,null,null,null,"a_mg.reads-cds.blast.tsv",
|
96
|
+
null,"a_mg.reads-cds.counts.tsv"]
|
97
|
+
},
|
98
|
+
{
|
99
|
+
"task": "BlastTab.sumPerHit.pl",
|
100
|
+
"description": ["Estimate the total abundance of Gene Ontology",
|
101
|
+
"annotations in the A metagenome, using metagenome-derived proteins,",
|
102
|
+
"and normalizing by the read counts of each protein."],
|
103
|
+
"values": ["a_mg.reads-cds.counts.tsv",null,null,null,null,true,null,
|
104
|
+
"a_mg.cds-go.blast.tsv",null,"a_mg.go.read-counts.tsv"]
|
105
|
+
},
|
106
|
+
{
|
107
|
+
"_": "== Examples of diversity ==",
|
108
|
+
"task": "Table.barplot.R",
|
109
|
+
"description": ["Barplot with the distribution of bacterial phyla in",
|
110
|
+
"four different sites, with taxa sorted by variance."],
|
111
|
+
"values": ["phyla_counts.tsv","250,100,75,200",null,null,null,null,null,
|
112
|
+
null,true,"var",2,null,null,"phyla_counts.pdf",10,null]
|
113
|
+
},
|
114
|
+
{
|
115
|
+
"task": "Chao1.pl",
|
116
|
+
"description": ["Phylum-richness estimated by the Chao1 index with 95%",
|
117
|
+
"confidence, using the distributions of bacterial phyla in four",
|
118
|
+
"different sites."],
|
119
|
+
"values": ["phyla_counts.tsv",null,1,null,null,true,null,
|
120
|
+
"phyla_chao1.tsv"]
|
121
|
+
},
|
122
|
+
{
|
123
|
+
"task": "AlphaDiversity.pl",
|
124
|
+
"description": ["Phylum-diversity estimated by the indices of Shannon",
|
125
|
+
"(H'), Inverse Simpson (1/Lambda), and true diversity of order 1 (1D),",
|
126
|
+
"using the distributions of bacterial phyla in four different sites."],
|
127
|
+
"values": ["phyla_counts.tsv",null,1,null,null,true,null,true,1,null,
|
128
|
+
"phyla_diversity.tsv"]
|
129
|
+
},
|
130
|
+
{
|
131
|
+
"_": "== Other miscelaneous examples ==",
|
132
|
+
"task": "CharTable.classify.rb",
|
133
|
+
"description": ["Classification of anthrax genomes based on can-SNPs, as",
|
134
|
+
"described in Van Ert 2007 (PLoS ONE 2(5):e461)."],
|
135
|
+
"values": ["anthrax-cansnp-data.tsv","anthrax-cansnp-key.tsv",
|
136
|
+
"anthrax-cansnp-classif.tsv","anthrax-cansnp-classif.nwk",null]
|
137
|
+
},
|
138
|
+
{
|
139
|
+
"task": "TRIBS.test.R",
|
140
|
+
"description": ["Test overclustering of Xanthomonas oryzae genomes",
|
141
|
+
"encoding for PilA using Transformed-space Resampling In Biased Sets",
|
142
|
+
"(TRIBS)."],
|
143
|
+
"values": ["Xanthomonas_oryzae.aai-mat.tsv","Xanthomonas_oryzae-PilA.txt",
|
144
|
+
5000,null,null,null,null,0,"Xanthomonas_oryzae-PilA.tribs.Rdata",100]
|
145
|
+
},
|
146
|
+
{
|
147
|
+
"task": "TRIBS.plot-test.R",
|
148
|
+
"description": ["Show the TRIBS-normalized distances between Xanthomonas",
|
149
|
+
"oryzae genomes (grey) and X. oryzae encoding for PilA (red)."],
|
150
|
+
"values": ["Xanthomonas_oryzae-PilA.tribs.Rdata",null,null,null,null,null,
|
151
|
+
null,null,"Xanthomonas_oryzae-PilA.tribs.pdf",null,null]
|
152
|
+
}
|
153
|
+
]
|
154
|
+
}
|
@@ -0,0 +1,69 @@
|
|
1
|
+
#!/bin/bash
|
2
|
+
|
3
|
+
##################### VARIABLES
|
4
|
+
# Queue: Preferred queue. Delete (or comment) this line to allow
|
5
|
+
# automatic detection:
|
6
|
+
#QUEUE="biocluster-6"
|
7
|
+
# If you set the QUEUE variable, you MUST set the WTIME variable
|
8
|
+
# as well, containing the walltime to be asked for. The WTIME
|
9
|
+
# variable is ignored otherwise.
|
10
|
+
WTIME="120:00:00"
|
11
|
+
|
12
|
+
# Scratch: This is where the output will be created.
|
13
|
+
SCRATCH="$HOME/scratch/pipelines/assembly"
|
14
|
+
|
15
|
+
# Data folder: This is the folder that cointains the input files.
|
16
|
+
DATA="$HOME/data/trim"
|
17
|
+
|
18
|
+
# Location of Newbler's binaries
|
19
|
+
BIN454="$HOME/454/bin"
|
20
|
+
|
21
|
+
# Name(s) of the library(ies) to use, separated by spaces:
|
22
|
+
# This is determined by the name of your input files. For example,
|
23
|
+
# if your input files are: LLSEP.CoupledReads.fa and LWP.CoupledReads.fa,
|
24
|
+
# use:
|
25
|
+
# LIBRARIES="LLSEP LWP"
|
26
|
+
# It's strongly encouraged to use only one per CONFIG file.
|
27
|
+
LIBRARIES="A";
|
28
|
+
|
29
|
+
# Use .CoupledReads.fa and/or .SingleReads.fa (yes or no):
|
30
|
+
USECOUPLED=yes
|
31
|
+
USESINGLE=no
|
32
|
+
|
33
|
+
# Insert length (in bp): This is the average length of the entire insert,
|
34
|
+
# not just the gap length.
|
35
|
+
INSLEN=300
|
36
|
+
|
37
|
+
# Number of CPUs to use (for SOAP and Newbler):
|
38
|
+
PPN=16
|
39
|
+
|
40
|
+
# RAM multiplier: Multiply the estimated required RAM by this number:
|
41
|
+
RAMMULT=1
|
42
|
+
|
43
|
+
# Maximum number of simultaneous jobs: Uncomment and increase these values if
|
44
|
+
# you have increased resources (e.g., a dedicated queue); uncomment and decrease
|
45
|
+
# if the resources are scarce (e.g., a very busy queue or other simultaneous jobs).
|
46
|
+
#VELVETSIM=22
|
47
|
+
#SOAPSIM=8
|
48
|
+
|
49
|
+
# Extra parameters for Velvet: Any additional parameters to be passed to
|
50
|
+
# velvetg or velveth. If you have MP data, consider adding the option
|
51
|
+
# -shortMatePaired yes to VELVETG_EXTRA. If you have Nextera, consider
|
52
|
+
# adding the option above, plus the option -ins_length_sd <integer>, to
|
53
|
+
# indicate the standard deviation of the insert size. By default, the
|
54
|
+
# SD is assumed to be 10% of the average, but Nextera produces much
|
55
|
+
# wider distribution of sizes (i.e., larger SD). Typically you shouldn't
|
56
|
+
# need to add anything in VELVETH_EXTRA.
|
57
|
+
VELVETH_EXTRA=""
|
58
|
+
VELVETG_EXTRA=""
|
59
|
+
|
60
|
+
# Clean non-essential files (yes or no):
|
61
|
+
CLEANUP=yes
|
62
|
+
|
63
|
+
# Best k-mers: Space-delimited list of kmers selected from Velvet and SOAP.
|
64
|
+
# This is to be modified at the begining of step 4, and it's ignored in all
|
65
|
+
# the other steps.
|
66
|
+
K_VELVET="21 23 35"
|
67
|
+
K_SOAP="21 23 35"
|
68
|
+
|
69
|
+
|
@@ -0,0 +1 @@
|
|
1
|
+
../../Scripts/FastA.N50.pl
|
@@ -0,0 +1 @@
|
|
1
|
+
../../Scripts/FastA.filterN.pl
|
@@ -0,0 +1 @@
|
|
1
|
+
../../Scripts/FastA.length.pl
|
@@ -0,0 +1,189 @@
|
|
1
|
+
@author: Luis Miguel Rodriguez-R <lmrodriguezr at gmail dot com>
|
2
|
+
|
3
|
+
@update: Mar-17-2013
|
4
|
+
|
5
|
+
@license: artistic 2.0
|
6
|
+
|
7
|
+
@status: semi
|
8
|
+
|
9
|
+
@pbs: yes
|
10
|
+
|
11
|
+
# IMPORTANT
|
12
|
+
|
13
|
+
This pipeline was developed for the [PACE cluster](http://pace.gatech.edu/). You
|
14
|
+
are free to use it in other platforms with adequate adjustments. It is largely
|
15
|
+
based on Luo _et al._ 2012, ISME J.
|
16
|
+
|
17
|
+
# PURPOSE
|
18
|
+
|
19
|
+
This pipeline assemblies coupled and/or single reads from one or more libraries.
|
20
|
+
It assumes that the reads have been quality-checked and trimmed.
|
21
|
+
|
22
|
+
# HELP
|
23
|
+
|
24
|
+
1. Files preparation:
|
25
|
+
|
26
|
+
1.1. Copy this folder to the cluster.
|
27
|
+
|
28
|
+
1.2. Copy the sequences to the cluster. Only trimmed/filtered reads are used.
|
29
|
+
All the files are expected to be in the same folder, and the filenames must
|
30
|
+
end in `.CoupledReads.fa` or `.SingleReads.fa`.
|
31
|
+
|
32
|
+
1.3. Copy the file `CONFIG.mock.bash` to `CONFIG.<name>.bash`, where `<name>` is a
|
33
|
+
short name for your run (avoid characters other than alphanumeric).
|
34
|
+
|
35
|
+
1.4. Change the variables in `CONFIG.<name>.bash`. Notice that this pipeline
|
36
|
+
supports running several libraries at the same time, but it's strongly
|
37
|
+
recomended to run only one per config file, because the insert length
|
38
|
+
(in step 2) and the selected k-mers (in step 3) are fixed for all the
|
39
|
+
included libraries. Also, there is a technical consideration: The first
|
40
|
+
step will execute parallel jobs for each odd number between 21 and 63, and
|
41
|
+
SOAP will use 16 CPUs by default, which means 357 CPUs will be requested
|
42
|
+
per library in step 2. It's a bad idea to run many libraries at the same
|
43
|
+
time.
|
44
|
+
|
45
|
+
1.5. If you have Mate-paired datasets (for example, prepared with Nextera), first
|
46
|
+
reverse-complement all the reads. See also the `VELVETG_EXTRA` variable in
|
47
|
+
the `CONFIG.<name>.bash` file.
|
48
|
+
|
49
|
+
2. Velvet and SOAP assembly:
|
50
|
+
|
51
|
+
2.1. Execute `./RUNME-2.bash <name>` in the head node (see [troubleshooting](#troubleshooting) #1).
|
52
|
+
|
53
|
+
2.2. Monitor the tasks named velvet_* and soap_*.
|
54
|
+
|
55
|
+
2.3. Once completed, make sure the files .proc contain only the
|
56
|
+
word "done". To do this, you may execute:
|
57
|
+
```
|
58
|
+
grep -v '^done$' *.proc
|
59
|
+
```
|
60
|
+
|
61
|
+
If successful, the output of the above command should be empty. See
|
62
|
+
[Troubleshooting](#troubleshooting) #2 and #3 below if one or more of your jobs failed.
|
63
|
+
|
64
|
+
3. K-mers selection:
|
65
|
+
|
66
|
+
3.1. If you completed step 2, execute `./RUNME-3.bash <name>` in the head
|
67
|
+
node.
|
68
|
+
|
69
|
+
3.2. Once completed, download and open the files `*.n50.pdf`.
|
70
|
+
|
71
|
+
3.3. Select the three "best" k-mers for Velvet and for SOAP (they don't
|
72
|
+
have to be the same). There is no well-tested method to select the
|
73
|
+
"best", and this is why this protocol is not automated, but semi-
|
74
|
+
automated. A generally good rule-of-thumb is: pick one that optimizes
|
75
|
+
the amount of sequences used (these are the grey bars in the plot;
|
76
|
+
usually this is the smallest k-mer), pick one that optimizes the N50
|
77
|
+
(this is the dashed red line; usually this is a large k-mer), and pick
|
78
|
+
one that optimizes both (something in the middle). You can select
|
79
|
+
more or less than three k-mers, this is just a suggestion.
|
80
|
+
|
81
|
+
4. Newbler assembly:
|
82
|
+
|
83
|
+
4.1. Edit the file `CONFIG.<name>.bash`: set the variables `K_VELVET` and
|
84
|
+
`K_SOAP` to contain the lists of "best" selected k-mers for Velvet and
|
85
|
+
SOAP, respectively.
|
86
|
+
|
87
|
+
4.2. Execute `./RUNME-4.bash <name>` in the head node.
|
88
|
+
|
89
|
+
4.3. Monitor the task newbler_*. Once finished, your assembly is ready.
|
90
|
+
Once completed, make sure the file .newbler.proc contain only the
|
91
|
+
word "done". To do this, you may execute:
|
92
|
+
```
|
93
|
+
grep -v '^done$' *.proc
|
94
|
+
```
|
95
|
+
If successful, the output should be empty.
|
96
|
+
|
97
|
+
4.4. The final assembly should be located in the `SCRATCH` path, in a folder
|
98
|
+
named `<lib>.newbler/assembly/`. The file `454AllContigs.fna` contains
|
99
|
+
all the assembled contigs, `454LargeContigs.fna` contains the contigs
|
100
|
+
with 500bp or more in length, and `454NewblerMetrics.txt` contains some
|
101
|
+
relevant statistics.
|
102
|
+
|
103
|
+
|
104
|
+
# Comments
|
105
|
+
|
106
|
+
* Some scripts contained in this package are actually symlinks to files in the
|
107
|
+
_Scripts_ folder. Check the existance of these files when copied to
|
108
|
+
the cluster.
|
109
|
+
|
110
|
+
# Troubleshooting
|
111
|
+
|
112
|
+
1. Do I really have to change directory (`cd`) to the pipeline's folder everytime
|
113
|
+
I want to execute something?
|
114
|
+
|
115
|
+
No. Not really. For simplicity, this file tells you to execute, for example,
|
116
|
+
`./RUNME-2.bash`. However, you don't really have to be there, you can execute it
|
117
|
+
from any location. For example, if you saved this pipeline in your home
|
118
|
+
directory, you can just execute `~/assembly.pbs/RUNME-2.bash` insted from any
|
119
|
+
location in the head node.
|
120
|
+
|
121
|
+
2. I executed step 2, and Velvet worked but SOAP failed (or vice versa). Can I
|
122
|
+
submit only one of them?
|
123
|
+
|
124
|
+
Yes. To execute only Velvet, run:
|
125
|
+
```
|
126
|
+
./RUNME-2.bash <name> velvet
|
127
|
+
```
|
128
|
+
|
129
|
+
To execute only SOAP, run:
|
130
|
+
```
|
131
|
+
./RUNME-2.bash <name> soap
|
132
|
+
```
|
133
|
+
|
134
|
+
3. I ran step 2, and most of the jobs finished, but few of them failed. Can I
|
135
|
+
submit only few K-mers?
|
136
|
+
|
137
|
+
Yes. To execute one kmer (say, the k-mer 33 of SOAP), run:
|
138
|
+
```
|
139
|
+
./RUNME-2.bash <name> soap 33
|
140
|
+
```
|
141
|
+
|
142
|
+
You can also execute more than one kmer, using a comma-separated list. For
|
143
|
+
example, to re-submit the k-mers 37, 39, and 41 of Velvet, run:
|
144
|
+
```
|
145
|
+
./RUNME-2.bash <name> velvet 37,39,41
|
146
|
+
```
|
147
|
+
|
148
|
+
4. What are the numbers on the job names of step 2?
|
149
|
+
|
150
|
+
The K-mer. Each k-mer has it's own job, but they are "arrayed", to simplify
|
151
|
+
administration: notice that all the jobs of Velvet and all the jobs of SOAP
|
152
|
+
share the same job ID.
|
153
|
+
|
154
|
+
5. Some jobs are being killed, why?
|
155
|
+
|
156
|
+
5.1. First, check the log file created by the pipeline. The name is typically
|
157
|
+
the output prefix and the .log extension. For velvet, there are two log files,
|
158
|
+
the `.glog` and the `.hlog`. You may find the problem there.
|
159
|
+
|
160
|
+
5.2. Now, check the error file in your HOME directory. The name depends on the
|
161
|
+
job, the library and the task. For example: `~/soap_Mg_2-37.e1999838` is the
|
162
|
+
error file for step 2, task soap, library Mg_2, k-mer 37. The appending
|
163
|
+
number after the 'e' is the job ID. If this file contains errors probably
|
164
|
+
related to the pipeline, please let me know.
|
165
|
+
|
166
|
+
5.3. If you still have no clues, check the output file in your `HOME` directory. The
|
167
|
+
name is just like the name of the error file (see #5.2 above), but with 'o'
|
168
|
+
instead of 'e'. Compare the lines 'Resources' (what we asked the scheduler for)
|
169
|
+
and 'Rsrc Used' (what the job actually used). A typical problem is that your
|
170
|
+
job may need more RAM than we asked for (the value of 'mem' in both lines). If
|
171
|
+
the RAM used is larger than the RAM requested, the scheduler probably killed
|
172
|
+
your job. To solve this, just go to your config file, and set the variable
|
173
|
+
RAMMULT to a number larger than 1. For example, if you want to ask for double the
|
174
|
+
RAM, set `RAMMULT=2`. You can also include simple arithmetic operations, like
|
175
|
+
`RAMMULT=3/2`. If you want to add a fixed ammount of RAM, in Gib, use addition.
|
176
|
+
For example, to add 10G, set `RAMMULT=1+10`.
|
177
|
+
|
178
|
+
5.4. Still no idea? Try running the job again, sometimes the jobs fail with no
|
179
|
+
apparent reason, but they succeed when re-submited. If your job keeps failing,
|
180
|
+
please gather as much information (the log, error and output files should be
|
181
|
+
enough) and let me take a look.
|
182
|
+
|
183
|
+
6. In the step 2, some k-mers keep failing, and I just want to give up on them, can I?
|
184
|
+
|
185
|
+
Yes. Step 3 will analyze only completed jobs, so you can just ignore these faulty
|
186
|
+
k-mers. Very small k-mers, for example, sometimes need too much memory, and very
|
187
|
+
large k-mers in Velvet sometimes need too much time. If you don't think you're
|
188
|
+
missing too much, just ignore them.
|
189
|
+
|