miga-base 1.2.17.0 → 1.2.17.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (299) hide show
  1. checksums.yaml +4 -4
  2. data/lib/miga/version.rb +1 -1
  3. data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Archaea_SCG.hmm +41964 -0
  4. data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Bacteria_SCG.hmm +32439 -0
  5. data/utils/FastAAI/00.Libraries/01.SCG_HMMs/Complete_SCG_DB.hmm +62056 -0
  6. data/utils/FastAAI/FastAAI +3659 -0
  7. data/utils/FastAAI/FastAAI-legacy/FastAAI +1336 -0
  8. data/utils/FastAAI/FastAAI-legacy/kAAI_v1.0_virus.py +1296 -0
  9. data/utils/FastAAI/README.md +84 -0
  10. data/utils/enveomics/Docs/recplot2.md +244 -0
  11. data/utils/enveomics/Examples/aai-matrix.bash +66 -0
  12. data/utils/enveomics/Examples/ani-matrix.bash +66 -0
  13. data/utils/enveomics/Examples/essential-phylogeny.bash +105 -0
  14. data/utils/enveomics/Examples/unus-genome-phylogeny.bash +100 -0
  15. data/utils/enveomics/LICENSE.txt +73 -0
  16. data/utils/enveomics/Makefile +52 -0
  17. data/utils/enveomics/Manifest/Tasks/aasubs.json +103 -0
  18. data/utils/enveomics/Manifest/Tasks/blasttab.json +790 -0
  19. data/utils/enveomics/Manifest/Tasks/distances.json +161 -0
  20. data/utils/enveomics/Manifest/Tasks/fasta.json +802 -0
  21. data/utils/enveomics/Manifest/Tasks/fastq.json +291 -0
  22. data/utils/enveomics/Manifest/Tasks/graphics.json +126 -0
  23. data/utils/enveomics/Manifest/Tasks/mapping.json +137 -0
  24. data/utils/enveomics/Manifest/Tasks/ogs.json +382 -0
  25. data/utils/enveomics/Manifest/Tasks/other.json +906 -0
  26. data/utils/enveomics/Manifest/Tasks/remote.json +355 -0
  27. data/utils/enveomics/Manifest/Tasks/sequence-identity.json +650 -0
  28. data/utils/enveomics/Manifest/Tasks/tables.json +308 -0
  29. data/utils/enveomics/Manifest/Tasks/trees.json +68 -0
  30. data/utils/enveomics/Manifest/Tasks/variants.json +111 -0
  31. data/utils/enveomics/Manifest/categories.json +165 -0
  32. data/utils/enveomics/Manifest/examples.json +162 -0
  33. data/utils/enveomics/Manifest/tasks.json +4 -0
  34. data/utils/enveomics/Pipelines/assembly.pbs/CONFIG.mock.bash +69 -0
  35. data/utils/enveomics/Pipelines/assembly.pbs/FastA.N50.pl +1 -0
  36. data/utils/enveomics/Pipelines/assembly.pbs/FastA.filterN.pl +1 -0
  37. data/utils/enveomics/Pipelines/assembly.pbs/FastA.length.pl +1 -0
  38. data/utils/enveomics/Pipelines/assembly.pbs/README.md +189 -0
  39. data/utils/enveomics/Pipelines/assembly.pbs/RUNME-2.bash +112 -0
  40. data/utils/enveomics/Pipelines/assembly.pbs/RUNME-3.bash +23 -0
  41. data/utils/enveomics/Pipelines/assembly.pbs/RUNME-4.bash +44 -0
  42. data/utils/enveomics/Pipelines/assembly.pbs/RUNME.bash +50 -0
  43. data/utils/enveomics/Pipelines/assembly.pbs/kSelector.R +37 -0
  44. data/utils/enveomics/Pipelines/assembly.pbs/newbler.pbs +68 -0
  45. data/utils/enveomics/Pipelines/assembly.pbs/newbler_preparator.pl +49 -0
  46. data/utils/enveomics/Pipelines/assembly.pbs/soap.pbs +80 -0
  47. data/utils/enveomics/Pipelines/assembly.pbs/stats.pbs +57 -0
  48. data/utils/enveomics/Pipelines/assembly.pbs/velvet.pbs +63 -0
  49. data/utils/enveomics/Pipelines/blast.pbs/01.pbs.bash +38 -0
  50. data/utils/enveomics/Pipelines/blast.pbs/02.pbs.bash +73 -0
  51. data/utils/enveomics/Pipelines/blast.pbs/03.pbs.bash +21 -0
  52. data/utils/enveomics/Pipelines/blast.pbs/BlastTab.recover_job.pl +72 -0
  53. data/utils/enveomics/Pipelines/blast.pbs/CONFIG.mock.bash +98 -0
  54. data/utils/enveomics/Pipelines/blast.pbs/FastA.split.pl +1 -0
  55. data/utils/enveomics/Pipelines/blast.pbs/README.md +127 -0
  56. data/utils/enveomics/Pipelines/blast.pbs/RUNME.bash +109 -0
  57. data/utils/enveomics/Pipelines/blast.pbs/TASK.check.bash +128 -0
  58. data/utils/enveomics/Pipelines/blast.pbs/TASK.dry.bash +16 -0
  59. data/utils/enveomics/Pipelines/blast.pbs/TASK.eo.bash +22 -0
  60. data/utils/enveomics/Pipelines/blast.pbs/TASK.pause.bash +26 -0
  61. data/utils/enveomics/Pipelines/blast.pbs/TASK.run.bash +89 -0
  62. data/utils/enveomics/Pipelines/blast.pbs/sentinel.pbs.bash +29 -0
  63. data/utils/enveomics/Pipelines/idba.pbs/README.md +49 -0
  64. data/utils/enveomics/Pipelines/idba.pbs/RUNME.bash +95 -0
  65. data/utils/enveomics/Pipelines/idba.pbs/run.pbs +56 -0
  66. data/utils/enveomics/Pipelines/trim.pbs/README.md +54 -0
  67. data/utils/enveomics/Pipelines/trim.pbs/RUNME.bash +70 -0
  68. data/utils/enveomics/Pipelines/trim.pbs/run.pbs +130 -0
  69. data/utils/enveomics/README.md +42 -0
  70. data/utils/enveomics/Scripts/AAsubs.log2ratio.rb +171 -0
  71. data/utils/enveomics/Scripts/Aln.cat.rb +221 -0
  72. data/utils/enveomics/Scripts/Aln.convert.pl +35 -0
  73. data/utils/enveomics/Scripts/AlphaDiversity.pl +152 -0
  74. data/utils/enveomics/Scripts/BedGraph.tad.rb +93 -0
  75. data/utils/enveomics/Scripts/BedGraph.window.rb +71 -0
  76. data/utils/enveomics/Scripts/BlastPairwise.AAsubs.pl +102 -0
  77. data/utils/enveomics/Scripts/BlastTab.addlen.rb +63 -0
  78. data/utils/enveomics/Scripts/BlastTab.advance.bash +48 -0
  79. data/utils/enveomics/Scripts/BlastTab.best_hit_sorted.pl +55 -0
  80. data/utils/enveomics/Scripts/BlastTab.catsbj.pl +104 -0
  81. data/utils/enveomics/Scripts/BlastTab.cogCat.rb +76 -0
  82. data/utils/enveomics/Scripts/BlastTab.filter.pl +47 -0
  83. data/utils/enveomics/Scripts/BlastTab.kegg_pep2path_rest.pl +194 -0
  84. data/utils/enveomics/Scripts/BlastTab.metaxaPrep.pl +104 -0
  85. data/utils/enveomics/Scripts/BlastTab.pairedHits.rb +157 -0
  86. data/utils/enveomics/Scripts/BlastTab.recplot2.R +48 -0
  87. data/utils/enveomics/Scripts/BlastTab.seqdepth.pl +86 -0
  88. data/utils/enveomics/Scripts/BlastTab.seqdepth_ZIP.pl +119 -0
  89. data/utils/enveomics/Scripts/BlastTab.seqdepth_nomedian.pl +86 -0
  90. data/utils/enveomics/Scripts/BlastTab.subsample.pl +47 -0
  91. data/utils/enveomics/Scripts/BlastTab.sumPerHit.pl +114 -0
  92. data/utils/enveomics/Scripts/BlastTab.taxid2taxrank.pl +90 -0
  93. data/utils/enveomics/Scripts/BlastTab.topHits_sorted.rb +123 -0
  94. data/utils/enveomics/Scripts/Chao1.pl +97 -0
  95. data/utils/enveomics/Scripts/CharTable.classify.rb +234 -0
  96. data/utils/enveomics/Scripts/EBIseq2tax.rb +83 -0
  97. data/utils/enveomics/Scripts/FastA.N50.pl +60 -0
  98. data/utils/enveomics/Scripts/FastA.extract.rb +152 -0
  99. data/utils/enveomics/Scripts/FastA.filter.pl +52 -0
  100. data/utils/enveomics/Scripts/FastA.filterLen.pl +28 -0
  101. data/utils/enveomics/Scripts/FastA.filterN.pl +60 -0
  102. data/utils/enveomics/Scripts/FastA.fragment.rb +100 -0
  103. data/utils/enveomics/Scripts/FastA.gc.pl +42 -0
  104. data/utils/enveomics/Scripts/FastA.interpose.pl +93 -0
  105. data/utils/enveomics/Scripts/FastA.length.pl +38 -0
  106. data/utils/enveomics/Scripts/FastA.mask.rb +89 -0
  107. data/utils/enveomics/Scripts/FastA.per_file.pl +36 -0
  108. data/utils/enveomics/Scripts/FastA.qlen.pl +57 -0
  109. data/utils/enveomics/Scripts/FastA.rename.pl +65 -0
  110. data/utils/enveomics/Scripts/FastA.revcom.pl +23 -0
  111. data/utils/enveomics/Scripts/FastA.sample.rb +98 -0
  112. data/utils/enveomics/Scripts/FastA.slider.pl +85 -0
  113. data/utils/enveomics/Scripts/FastA.split.pl +55 -0
  114. data/utils/enveomics/Scripts/FastA.split.rb +79 -0
  115. data/utils/enveomics/Scripts/FastA.subsample.pl +131 -0
  116. data/utils/enveomics/Scripts/FastA.tag.rb +65 -0
  117. data/utils/enveomics/Scripts/FastA.toFastQ.rb +69 -0
  118. data/utils/enveomics/Scripts/FastA.wrap.rb +48 -0
  119. data/utils/enveomics/Scripts/FastQ.filter.pl +54 -0
  120. data/utils/enveomics/Scripts/FastQ.interpose.pl +90 -0
  121. data/utils/enveomics/Scripts/FastQ.maskQual.rb +89 -0
  122. data/utils/enveomics/Scripts/FastQ.offset.pl +90 -0
  123. data/utils/enveomics/Scripts/FastQ.split.pl +53 -0
  124. data/utils/enveomics/Scripts/FastQ.tag.rb +70 -0
  125. data/utils/enveomics/Scripts/FastQ.test-error.rb +81 -0
  126. data/utils/enveomics/Scripts/FastQ.toFastA.awk +24 -0
  127. data/utils/enveomics/Scripts/GFF.catsbj.pl +127 -0
  128. data/utils/enveomics/Scripts/GenBank.add_fields.rb +84 -0
  129. data/utils/enveomics/Scripts/HMM.essential.rb +351 -0
  130. data/utils/enveomics/Scripts/HMM.haai.rb +168 -0
  131. data/utils/enveomics/Scripts/HMMsearch.extractIds.rb +83 -0
  132. data/utils/enveomics/Scripts/JPlace.distances.rb +88 -0
  133. data/utils/enveomics/Scripts/JPlace.to_iToL.rb +320 -0
  134. data/utils/enveomics/Scripts/M5nr.getSequences.rb +81 -0
  135. data/utils/enveomics/Scripts/MeTaxa.distribution.pl +198 -0
  136. data/utils/enveomics/Scripts/MyTaxa.fragsByTax.pl +35 -0
  137. data/utils/enveomics/Scripts/MyTaxa.seq-taxrank.rb +49 -0
  138. data/utils/enveomics/Scripts/NCBIacc2tax.rb +92 -0
  139. data/utils/enveomics/Scripts/Newick.autoprune.R +27 -0
  140. data/utils/enveomics/Scripts/RAxML-EPA.to_iToL.pl +228 -0
  141. data/utils/enveomics/Scripts/RecPlot2.compareIdentities.R +32 -0
  142. data/utils/enveomics/Scripts/RefSeq.download.bash +48 -0
  143. data/utils/enveomics/Scripts/SRA.download.bash +55 -0
  144. data/utils/enveomics/Scripts/TRIBS.plot-test.R +36 -0
  145. data/utils/enveomics/Scripts/TRIBS.test.R +39 -0
  146. data/utils/enveomics/Scripts/Table.barplot.R +31 -0
  147. data/utils/enveomics/Scripts/Table.df2dist.R +30 -0
  148. data/utils/enveomics/Scripts/Table.filter.pl +61 -0
  149. data/utils/enveomics/Scripts/Table.merge.pl +77 -0
  150. data/utils/enveomics/Scripts/Table.prefScore.R +60 -0
  151. data/utils/enveomics/Scripts/Table.replace.rb +69 -0
  152. data/utils/enveomics/Scripts/Table.round.rb +63 -0
  153. data/utils/enveomics/Scripts/Table.split.pl +57 -0
  154. data/utils/enveomics/Scripts/Taxonomy.silva2ncbi.rb +227 -0
  155. data/utils/enveomics/Scripts/VCF.KaKs.rb +147 -0
  156. data/utils/enveomics/Scripts/VCF.SNPs.rb +88 -0
  157. data/utils/enveomics/Scripts/aai.rb +421 -0
  158. data/utils/enveomics/Scripts/ani.rb +362 -0
  159. data/utils/enveomics/Scripts/anir.rb +137 -0
  160. data/utils/enveomics/Scripts/clust.rand.rb +102 -0
  161. data/utils/enveomics/Scripts/gi2tax.rb +103 -0
  162. data/utils/enveomics/Scripts/in_silico_GA_GI.pl +96 -0
  163. data/utils/enveomics/Scripts/lib/data/dupont_2012_essential.hmm.gz +0 -0
  164. data/utils/enveomics/Scripts/lib/data/lee_2019_essential.hmm.gz +0 -0
  165. data/utils/enveomics/Scripts/lib/enveomics.R +1 -0
  166. data/utils/enveomics/Scripts/lib/enveomics_rb/anir.rb +293 -0
  167. data/utils/enveomics/Scripts/lib/enveomics_rb/bm_set.rb +175 -0
  168. data/utils/enveomics/Scripts/lib/enveomics_rb/enveomics.rb +24 -0
  169. data/utils/enveomics/Scripts/lib/enveomics_rb/errors.rb +17 -0
  170. data/utils/enveomics/Scripts/lib/enveomics_rb/gmm_em.rb +30 -0
  171. data/utils/enveomics/Scripts/lib/enveomics_rb/jplace.rb +253 -0
  172. data/utils/enveomics/Scripts/lib/enveomics_rb/match.rb +88 -0
  173. data/utils/enveomics/Scripts/lib/enveomics_rb/og.rb +182 -0
  174. data/utils/enveomics/Scripts/lib/enveomics_rb/rbm.rb +49 -0
  175. data/utils/enveomics/Scripts/lib/enveomics_rb/remote_data.rb +74 -0
  176. data/utils/enveomics/Scripts/lib/enveomics_rb/seq_range.rb +237 -0
  177. data/utils/enveomics/Scripts/lib/enveomics_rb/stats/rand.rb +31 -0
  178. data/utils/enveomics/Scripts/lib/enveomics_rb/stats/sample.rb +152 -0
  179. data/utils/enveomics/Scripts/lib/enveomics_rb/stats.rb +3 -0
  180. data/utils/enveomics/Scripts/lib/enveomics_rb/utils.rb +74 -0
  181. data/utils/enveomics/Scripts/lib/enveomics_rb/vcf.rb +135 -0
  182. data/utils/enveomics/Scripts/ogs.annotate.rb +88 -0
  183. data/utils/enveomics/Scripts/ogs.core-pan.rb +160 -0
  184. data/utils/enveomics/Scripts/ogs.extract.rb +125 -0
  185. data/utils/enveomics/Scripts/ogs.mcl.rb +186 -0
  186. data/utils/enveomics/Scripts/ogs.rb +104 -0
  187. data/utils/enveomics/Scripts/ogs.stats.rb +131 -0
  188. data/utils/enveomics/Scripts/rbm-legacy.rb +172 -0
  189. data/utils/enveomics/Scripts/rbm.rb +108 -0
  190. data/utils/enveomics/Scripts/sam.filter.rb +148 -0
  191. data/utils/enveomics/Tests/Makefile +10 -0
  192. data/utils/enveomics/Tests/Mgen_M2288.faa +3189 -0
  193. data/utils/enveomics/Tests/Mgen_M2288.fna +8282 -0
  194. data/utils/enveomics/Tests/Mgen_M2321.fna +8288 -0
  195. data/utils/enveomics/Tests/Nequ_Kin4M.faa +2970 -0
  196. data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.tribs.Rdata +0 -0
  197. data/utils/enveomics/Tests/Xanthomonas_oryzae-PilA.txt +7 -0
  198. data/utils/enveomics/Tests/Xanthomonas_oryzae.aai-mat.tsv +17 -0
  199. data/utils/enveomics/Tests/Xanthomonas_oryzae.aai.tsv +137 -0
  200. data/utils/enveomics/Tests/a_mg.cds-go.blast.tsv +123 -0
  201. data/utils/enveomics/Tests/a_mg.reads-cds.blast.tsv +200 -0
  202. data/utils/enveomics/Tests/a_mg.reads-cds.counts.tsv +55 -0
  203. data/utils/enveomics/Tests/alkB.nwk +1 -0
  204. data/utils/enveomics/Tests/anthrax-cansnp-data.tsv +13 -0
  205. data/utils/enveomics/Tests/anthrax-cansnp-key.tsv +17 -0
  206. data/utils/enveomics/Tests/hiv1.faa +59 -0
  207. data/utils/enveomics/Tests/hiv1.fna +134 -0
  208. data/utils/enveomics/Tests/hiv2.faa +70 -0
  209. data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv +233 -0
  210. data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.lim +1 -0
  211. data/utils/enveomics/Tests/hiv_mix-hiv1.blast.tsv.rec +233 -0
  212. data/utils/enveomics/Tests/phyla_counts.tsv +10 -0
  213. data/utils/enveomics/Tests/primate_lentivirus.ogs +11 -0
  214. data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv1.rbm +9 -0
  215. data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-hiv2.rbm +8 -0
  216. data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv1-siv.rbm +6 -0
  217. data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-hiv2.rbm +9 -0
  218. data/utils/enveomics/Tests/primate_lentivirus.rbm/hiv2-siv.rbm +6 -0
  219. data/utils/enveomics/Tests/primate_lentivirus.rbm/siv-siv.rbm +6 -0
  220. data/utils/enveomics/build_enveomics_r.bash +45 -0
  221. data/utils/enveomics/enveomics.R/DESCRIPTION +31 -0
  222. data/utils/enveomics/enveomics.R/NAMESPACE +39 -0
  223. data/utils/enveomics/enveomics.R/R/autoprune.R +155 -0
  224. data/utils/enveomics/enveomics.R/R/barplot.R +184 -0
  225. data/utils/enveomics/enveomics.R/R/cliopts.R +135 -0
  226. data/utils/enveomics/enveomics.R/R/df2dist.R +154 -0
  227. data/utils/enveomics/enveomics.R/R/growthcurve.R +331 -0
  228. data/utils/enveomics/enveomics.R/R/prefscore.R +79 -0
  229. data/utils/enveomics/enveomics.R/R/recplot.R +354 -0
  230. data/utils/enveomics/enveomics.R/R/recplot2.R +1631 -0
  231. data/utils/enveomics/enveomics.R/R/tribs.R +583 -0
  232. data/utils/enveomics/enveomics.R/R/utils.R +80 -0
  233. data/utils/enveomics/enveomics.R/README.md +81 -0
  234. data/utils/enveomics/enveomics.R/data/growth.curves.rda +0 -0
  235. data/utils/enveomics/enveomics.R/data/phyla.counts.rda +0 -0
  236. data/utils/enveomics/enveomics.R/man/cash-enve.GrowthCurve-method.Rd +16 -0
  237. data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2-method.Rd +16 -0
  238. data/utils/enveomics/enveomics.R/man/cash-enve.RecPlot2.Peak-method.Rd +16 -0
  239. data/utils/enveomics/enveomics.R/man/enve.GrowthCurve-class.Rd +25 -0
  240. data/utils/enveomics/enveomics.R/man/enve.TRIBS-class.Rd +46 -0
  241. data/utils/enveomics/enveomics.R/man/enve.TRIBS.merge.Rd +23 -0
  242. data/utils/enveomics/enveomics.R/man/enve.TRIBStest-class.Rd +47 -0
  243. data/utils/enveomics/enveomics.R/man/enve.__prune.iter.Rd +23 -0
  244. data/utils/enveomics/enveomics.R/man/enve.__prune.reduce.Rd +23 -0
  245. data/utils/enveomics/enveomics.R/man/enve.__tribs.Rd +40 -0
  246. data/utils/enveomics/enveomics.R/man/enve.barplot.Rd +103 -0
  247. data/utils/enveomics/enveomics.R/man/enve.cliopts.Rd +67 -0
  248. data/utils/enveomics/enveomics.R/man/enve.col.alpha.Rd +24 -0
  249. data/utils/enveomics/enveomics.R/man/enve.col2alpha.Rd +19 -0
  250. data/utils/enveomics/enveomics.R/man/enve.df2dist.Rd +45 -0
  251. data/utils/enveomics/enveomics.R/man/enve.df2dist.group.Rd +44 -0
  252. data/utils/enveomics/enveomics.R/man/enve.df2dist.list.Rd +47 -0
  253. data/utils/enveomics/enveomics.R/man/enve.growthcurve.Rd +75 -0
  254. data/utils/enveomics/enveomics.R/man/enve.prefscore.Rd +50 -0
  255. data/utils/enveomics/enveomics.R/man/enve.prune.dist.Rd +44 -0
  256. data/utils/enveomics/enveomics.R/man/enve.recplot.Rd +139 -0
  257. data/utils/enveomics/enveomics.R/man/enve.recplot2-class.Rd +45 -0
  258. data/utils/enveomics/enveomics.R/man/enve.recplot2.ANIr.Rd +24 -0
  259. data/utils/enveomics/enveomics.R/man/enve.recplot2.Rd +77 -0
  260. data/utils/enveomics/enveomics.R/man/enve.recplot2.__counts.Rd +25 -0
  261. data/utils/enveomics/enveomics.R/man/enve.recplot2.__peakHist.Rd +21 -0
  262. data/utils/enveomics/enveomics.R/man/enve.recplot2.__whichClosestPeak.Rd +19 -0
  263. data/utils/enveomics/enveomics.R/man/enve.recplot2.changeCutoff.Rd +19 -0
  264. data/utils/enveomics/enveomics.R/man/enve.recplot2.compareIdentities.Rd +47 -0
  265. data/utils/enveomics/enveomics.R/man/enve.recplot2.coordinates.Rd +29 -0
  266. data/utils/enveomics/enveomics.R/man/enve.recplot2.corePeak.Rd +18 -0
  267. data/utils/enveomics/enveomics.R/man/enve.recplot2.extractWindows.Rd +45 -0
  268. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.Rd +36 -0
  269. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_e.Rd +19 -0
  270. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__em_m.Rd +19 -0
  271. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__emauto_one.Rd +27 -0
  272. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mow_one.Rd +52 -0
  273. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.__mower.Rd +17 -0
  274. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.em.Rd +51 -0
  275. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.emauto.Rd +43 -0
  276. data/utils/enveomics/enveomics.R/man/enve.recplot2.findPeaks.mower.Rd +82 -0
  277. data/utils/enveomics/enveomics.R/man/enve.recplot2.peak-class.Rd +59 -0
  278. data/utils/enveomics/enveomics.R/man/enve.recplot2.seqdepth.Rd +27 -0
  279. data/utils/enveomics/enveomics.R/man/enve.recplot2.windowDepthThreshold.Rd +36 -0
  280. data/utils/enveomics/enveomics.R/man/enve.selvector.Rd +23 -0
  281. data/utils/enveomics/enveomics.R/man/enve.tribs.Rd +68 -0
  282. data/utils/enveomics/enveomics.R/man/enve.tribs.test.Rd +28 -0
  283. data/utils/enveomics/enveomics.R/man/enve.truncate.Rd +27 -0
  284. data/utils/enveomics/enveomics.R/man/growth.curves.Rd +14 -0
  285. data/utils/enveomics/enveomics.R/man/phyla.counts.Rd +13 -0
  286. data/utils/enveomics/enveomics.R/man/plot.enve.GrowthCurve.Rd +78 -0
  287. data/utils/enveomics/enveomics.R/man/plot.enve.TRIBS.Rd +46 -0
  288. data/utils/enveomics/enveomics.R/man/plot.enve.TRIBStest.Rd +45 -0
  289. data/utils/enveomics/enveomics.R/man/plot.enve.recplot2.Rd +125 -0
  290. data/utils/enveomics/enveomics.R/man/summary.enve.GrowthCurve.Rd +19 -0
  291. data/utils/enveomics/enveomics.R/man/summary.enve.TRIBS.Rd +19 -0
  292. data/utils/enveomics/enveomics.R/man/summary.enve.TRIBStest.Rd +19 -0
  293. data/utils/enveomics/globals.mk +8 -0
  294. data/utils/enveomics/manifest.json +9 -0
  295. data/utils/multitrim/Multitrim How-To.pdf +0 -0
  296. data/utils/multitrim/README.md +67 -0
  297. data/utils/multitrim/multitrim.py +1555 -0
  298. data/utils/multitrim/multitrim.yml +13 -0
  299. metadata +301 -5
@@ -0,0 +1,13 @@
1
+
2
+ \name{phyla.counts}
3
+ \docType{data}
4
+ \alias{phyla.counts}
5
+ \title{Counts of microbial phyla in four sites}
6
+ \description{
7
+ This data set gives the counts of phyla in three different
8
+ sites.
9
+ }
10
+ \usage{phyla.counts}
11
+ \format{A data frame with 9 rows (phyla) and 4 rows (sites).}
12
+ \keyword{datasets}
13
+
@@ -0,0 +1,78 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/growthcurve.R
3
+ \name{plot.enve.GrowthCurve}
4
+ \alias{plot.enve.GrowthCurve}
5
+ \title{Enveomics: Plot of Growth Curve}
6
+ \usage{
7
+ \method{plot}{enve.GrowthCurve}(
8
+ x,
9
+ col,
10
+ pt.alpha = 0.9,
11
+ ln.alpha = 1,
12
+ ln.lwd = 1,
13
+ ln.lty = 1,
14
+ band.alpha = 0.4,
15
+ band.density = NULL,
16
+ band.angle = 45,
17
+ xp.alpha = 0.5,
18
+ xp.lwd = 1,
19
+ xp.lty = 1,
20
+ pch = 19,
21
+ new = TRUE,
22
+ legend = new,
23
+ add.params = FALSE,
24
+ ...
25
+ )
26
+ }
27
+ \arguments{
28
+ \item{x}{An \code{\link{enve.GrowthCurve}} object to plot.}
29
+
30
+ \item{col}{Base colors to use for the different samples. Can be recycled.
31
+ By default, grey for one sample or rainbow colors for more than one.}
32
+
33
+ \item{pt.alpha}{Color alpha for the observed data points, using \code{col}
34
+ as a base.}
35
+
36
+ \item{ln.alpha}{Color alpha for the fitted growth curve, using \code{col}
37
+ as a base.}
38
+
39
+ \item{ln.lwd}{Line width for the fitted curve.}
40
+
41
+ \item{ln.lty}{Line type for the fitted curve.}
42
+
43
+ \item{band.alpha}{Color alpha for the confidence interval band of the
44
+ fitted growth curve, using \code{col} as a base.}
45
+
46
+ \item{band.density}{Density of the filling pattern in the interval band.
47
+ If \code{NULL}, a solid color is used.}
48
+
49
+ \item{band.angle}{Angle of the density filling pattern in the interval
50
+ band. Ignored if \code{band.density} is \code{NULL}.}
51
+
52
+ \item{xp.alpha}{Color alpha for the line connecting individual experiments,
53
+ using \code{col} as a base.}
54
+
55
+ \item{xp.lwd}{Width of line for the experiments.}
56
+
57
+ \item{xp.lty}{Type of line for the experiments.}
58
+
59
+ \item{pch}{Point character for observed data points.}
60
+
61
+ \item{new}{Should a new plot be generated? If \code{FALSE}, the existing
62
+ canvas is used.}
63
+
64
+ \item{legend}{Should the plot include a legend? If \code{FALSE}, no legend
65
+ is added. If \code{TRUE}, a legend is added in the bottom-right corner.
66
+ Otherwise, a legend is added in the position specified as \code{xy.coords}.}
67
+
68
+ \item{add.params}{Should the legend include the parameters of the fitted
69
+ model?}
70
+
71
+ \item{...}{Any other graphic parameters.}
72
+ }
73
+ \description{
74
+ Plots an \code{\link{enve.GrowthCurve}} object.
75
+ }
76
+ \author{
77
+ Luis M. Rodriguez-R [aut, cre]
78
+ }
@@ -0,0 +1,46 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/tribs.R
3
+ \name{plot.enve.TRIBS}
4
+ \alias{plot.enve.TRIBS}
5
+ \title{Enveomics: TRIBS Plot}
6
+ \usage{
7
+ \method{plot}{enve.TRIBS}(
8
+ x,
9
+ new = TRUE,
10
+ type = c("boxplot", "points"),
11
+ col = "#00000044",
12
+ pt.cex = 1/2,
13
+ pt.pch = 19,
14
+ pt.col = col,
15
+ ln.col = col,
16
+ ...
17
+ )
18
+ }
19
+ \arguments{
20
+ \item{x}{\code{\link{enve.TRIBS}} object to plot.}
21
+
22
+ \item{new}{Should a new canvas be drawn?}
23
+
24
+ \item{type}{Type of plot. The \strong{points} plot shows all the replicates, the
25
+ \strong{boxplot} plot represents the values found by
26
+ \code{\link[grDevices]{boxplot.stats}}.
27
+ as areas, and plots the outliers as points.}
28
+
29
+ \item{col}{Color of the areas and/or the points.}
30
+
31
+ \item{pt.cex}{Size of the points.}
32
+
33
+ \item{pt.pch}{Points character.}
34
+
35
+ \item{pt.col}{Color of the points.}
36
+
37
+ \item{ln.col}{Color of the lines.}
38
+
39
+ \item{...}{Any additional parameters supported by \code{plot}.}
40
+ }
41
+ \description{
42
+ Plot an \code{\link{enve.TRIBS}} object.
43
+ }
44
+ \author{
45
+ Luis M. Rodriguez-R [aut, cre]
46
+ }
@@ -0,0 +1,45 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/tribs.R
3
+ \name{plot.enve.TRIBStest}
4
+ \alias{plot.enve.TRIBStest}
5
+ \title{Enveomics: TRIBS Plot Test}
6
+ \usage{
7
+ \method{plot}{enve.TRIBStest}(
8
+ x,
9
+ type = c("overlap", "difference"),
10
+ col = "#00000044",
11
+ col1 = col,
12
+ col2 = "#44001144",
13
+ ylab = "Probability",
14
+ xlim = range(attr(x, "dist.mids")),
15
+ ylim = c(0, max(c(attr(x, "all.dist"), attr(x, "sel.dist")))),
16
+ ...
17
+ )
18
+ }
19
+ \arguments{
20
+ \item{x}{\code{\link{enve.TRIBStest}} object to plot.}
21
+
22
+ \item{type}{What to plot. \code{overlap} generates a plot of the two contrasting empirical
23
+ PDFs (to compare against each other), \code{difference} produces a plot of the
24
+ differences between the empirical PDFs (to compare against zero).}
25
+
26
+ \item{col}{Main color of the plot if type=\code{difference}.}
27
+
28
+ \item{col1}{First color of the plot if type=\code{overlap}.}
29
+
30
+ \item{col2}{Second color of the plot if type=\code{overlap}.}
31
+
32
+ \item{ylab}{Y-axis label.}
33
+
34
+ \item{xlim}{X-axis limits.}
35
+
36
+ \item{ylim}{Y-axis limits.}
37
+
38
+ \item{...}{Any other graphical arguments.}
39
+ }
40
+ \description{
41
+ Plots an \code{\link{enve.TRIBStest}} object.
42
+ }
43
+ \author{
44
+ Luis M. Rodriguez-R [aut, cre]
45
+ }
@@ -0,0 +1,125 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/recplot2.R
3
+ \name{plot.enve.RecPlot2}
4
+ \alias{plot.enve.RecPlot2}
5
+ \title{Enveomics: Recruitment Plot (2)}
6
+ \usage{
7
+ \method{plot}{enve.RecPlot2}(
8
+ x,
9
+ layout = matrix(c(5, 5, 2, 1, 4, 3), nrow = 2),
10
+ panel.fun = list(),
11
+ widths = c(1, 7, 2),
12
+ heights = c(1, 2),
13
+ palette = grey((100:0)/100),
14
+ underlay.group = TRUE,
15
+ peaks.col = "darkred",
16
+ use.peaks,
17
+ id.lim = range(x$id.breaks),
18
+ pos.lim = range(x$pos.breaks),
19
+ pos.units = c("Mbp", "Kbp", "bp"),
20
+ mar = list(`1` = c(5, 4, 1, 1) + 0.1, `2` = c(ifelse(any(layout == 1), 1, 5), 4, 4, 1)
21
+ + 0.1, `3` = c(5, ifelse(any(layout == 1), 1, 4), 1, 2) + 0.1, `4` =
22
+ c(ifelse(any(layout == 1), 1, 5), ifelse(any(layout == 2), 1, 4), 4, 2) + 0.1, `5` =
23
+ c(5, 3, 4, 1) + 0.1, `6` = c(5, 4, 4, 2) + 0.1),
24
+ pos.splines = 0,
25
+ id.splines = 1/2,
26
+ in.lwd = ifelse(is.null(pos.splines) || pos.splines > 0, 1/2, 2),
27
+ out.lwd = ifelse(is.null(pos.splines) || pos.splines > 0, 1/2, 2),
28
+ id.lwd = ifelse(is.null(id.splines) || id.splines > 0, 1/2, 2),
29
+ in.col = "darkblue",
30
+ out.col = "lightblue",
31
+ id.col = "black",
32
+ breaks.col = "#AAAAAA40",
33
+ peaks.opts = list(),
34
+ ...
35
+ )
36
+ }
37
+ \arguments{
38
+ \item{x}{\code{\link{enve.RecPlot2}} object to plot.}
39
+
40
+ \item{layout}{Matrix indicating the position of the different panels in the layout,
41
+ where:
42
+ \itemize{
43
+ \item 0: Empty space
44
+ \item 1: Counts matrix
45
+ \item 2: position histogram (sequencing depth)
46
+ \item 3: identity histogram
47
+ \item 4: Populations histogram (histogram of sequencing depths)
48
+ \item 5: Color scale for the counts matrix (vertical)
49
+ \item 6: Color scale of the counts matrix (horizontal)
50
+ }
51
+ Only panels indicated here will be plotted. To plot only one panel
52
+ simply set this to the number of the panel you want to plot.}
53
+
54
+ \item{panel.fun}{List of functions to be executed after drawing each panel. Use the
55
+ indices in \code{layout} (as characters) as keys. Functions for indices
56
+ missing in \code{layout} are ignored. For example, to add a vertical line
57
+ at the 3Mbp mark in both the position histogram and the counts matrix:
58
+ \code{list('1'=function() abline(v=3), '2'=function() abline(v=3))}.
59
+ Note that the X-axis in both panels is in Mbp by default. To change
60
+ this behavior, set \code{pos.units} accordingly.}
61
+
62
+ \item{widths}{Relative widths of the columns of \code{layout}.}
63
+
64
+ \item{heights}{Relative heights of the rows of \code{layout}.}
65
+
66
+ \item{palette}{Colors to be used to represent the counts matrix, sorted from no hits
67
+ to the maximum sequencing depth.}
68
+
69
+ \item{underlay.group}{If TRUE, it indicates the in-group and out-group areas couloured based
70
+ on \code{in.col} and \code{out.col}. Requires support for semi-transparency.}
71
+
72
+ \item{peaks.col}{If not \code{NA}, it attempts to represent peaks in the population histogram
73
+ in the specified color. Set to \code{NA} to avoid peak-finding.}
74
+
75
+ \item{use.peaks}{A list of \code{\link{enve.RecPlot2.Peak}} objects, as returned by
76
+ \code{\link{enve.recplot2.findPeaks}}. If passed, \code{peaks.opts} is ignored.}
77
+
78
+ \item{id.lim}{Limits of identities to represent.}
79
+
80
+ \item{pos.lim}{Limits of positions to represent (in bp, regardless of \code{pos.units}).}
81
+
82
+ \item{pos.units}{Units in which the positions should be represented (powers of 1,000
83
+ base pairs).}
84
+
85
+ \item{mar}{Margins of the panels as a list, with the character representation of
86
+ the number of the panel as index (see \code{layout}).}
87
+
88
+ \item{pos.splines}{Smoothing parameter for the splines in the position histogram. Zero
89
+ (0) for no splines. Use \code{NULL} to automatically detect by leave-one-out
90
+ cross-validation.}
91
+
92
+ \item{id.splines}{Smoothing parameter for the splines in the identity histogram. Zero
93
+ (0) for no splines. Use \code{NULL} to automatically detect by leave-one-out
94
+ cross-validation.}
95
+
96
+ \item{in.lwd}{Line width for the sequencing depth of in-group matches.}
97
+
98
+ \item{out.lwd}{Line width for the sequencing depth of out-group matches.}
99
+
100
+ \item{id.lwd}{Line width for the identity histogram.}
101
+
102
+ \item{in.col}{Color associated to in-group matches.}
103
+
104
+ \item{out.col}{Color associated to out-group matches.}
105
+
106
+ \item{id.col}{Color for the identity histogram.}
107
+
108
+ \item{breaks.col}{Color of the vertical lines indicating sequence breaks.}
109
+
110
+ \item{peaks.opts}{Options passed to \code{\link{enve.recplot2.findPeaks}},
111
+ if \code{peaks.col} is not \code{NA}.}
112
+
113
+ \item{...}{Any other graphic parameters (currently ignored).}
114
+ }
115
+ \value{
116
+ Returns a list of \code{\link{enve.RecPlot2.Peak}} objects (see
117
+ \code{\link{enve.recplot2.findPeaks}}). If \code{peaks.col=NA} or
118
+ \code{layout} doesn't include 4, returns \code{NA}.
119
+ }
120
+ \description{
121
+ Plots an \code{\link{enve.RecPlot2}} object.
122
+ }
123
+ \author{
124
+ Luis M. Rodriguez-R [aut, cre]
125
+ }
@@ -0,0 +1,19 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/growthcurve.R
3
+ \name{summary.enve.GrowthCurve}
4
+ \alias{summary.enve.GrowthCurve}
5
+ \title{Enveomics: Summary of Growth Curve}
6
+ \usage{
7
+ \method{summary}{enve.GrowthCurve}(object, ...)
8
+ }
9
+ \arguments{
10
+ \item{object}{An \code{\link{enve.GrowthCurve}} object.}
11
+
12
+ \item{...}{No additional parameters are currently supported.}
13
+ }
14
+ \description{
15
+ Summary of an \code{\link{enve.GrowthCurve}} object.
16
+ }
17
+ \author{
18
+ Luis M. Rodriguez-R [aut, cre]
19
+ }
@@ -0,0 +1,19 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/tribs.R
3
+ \name{summary.enve.TRIBS}
4
+ \alias{summary.enve.TRIBS}
5
+ \title{Enveomics: TRIBS Summary}
6
+ \usage{
7
+ \method{summary}{enve.TRIBS}(object, ...)
8
+ }
9
+ \arguments{
10
+ \item{object}{\code{\link{enve.TRIBS}} object.}
11
+
12
+ \item{...}{No additional parameters are currently supported.}
13
+ }
14
+ \description{
15
+ Summary of an \code{\link{enve.TRIBS}} object.
16
+ }
17
+ \author{
18
+ Luis M. Rodriguez-R [aut, cre]
19
+ }
@@ -0,0 +1,19 @@
1
+ % Generated by roxygen2: do not edit by hand
2
+ % Please edit documentation in R/tribs.R
3
+ \name{summary.enve.TRIBStest}
4
+ \alias{summary.enve.TRIBStest}
5
+ \title{Enveomics: TRIBS Summary Test}
6
+ \usage{
7
+ \method{summary}{enve.TRIBStest}(object, ...)
8
+ }
9
+ \arguments{
10
+ \item{object}{\code{\link{enve.TRIBStest}} object.}
11
+
12
+ \item{...}{No additional parameters are currently supported.}
13
+ }
14
+ \description{
15
+ Summary of an \code{\link{enve.TRIBStest}} object.
16
+ }
17
+ \author{
18
+ Luis M. Rodriguez-R [aut, cre]
19
+ }
@@ -0,0 +1,8 @@
1
+ # Global variables for the Enve-omics collection
2
+
3
+ R=R
4
+ prefix=/usr/local
5
+ bindir=$(prefix)/bin
6
+ mandir=$(prefix)/man/man1
7
+ SCRIPTS := $(wildcard Scripts/*.*)
8
+
@@ -0,0 +1,9 @@
1
+ {
2
+ "_": ["This is not standard JSON, to parse use EnveJSON, available at:",
3
+ "https://github.com/lmrodriguezr/enveomics-gui/."],
4
+ "_include": [
5
+ "Manifest/categories.json",
6
+ "Manifest/examples.json",
7
+ "Manifest/tasks.json"
8
+ ]
9
+ }
@@ -0,0 +1,67 @@
1
+ # multitrim
2
+
3
+ This is the development script for the new MiGA trimming approach.
4
+
5
+ To install the requirements, create a conda environment using multitrim.yml. Navigate to the directory in which multitrim.yml is place, and enter the following command:
6
+
7
+ conda env create -f multitrim.yml.
8
+
9
+ This will create a conda environment with the correct tools and will allow you to run the multitrim python script. Activate it using:
10
+
11
+ conda activate multitrim
12
+
13
+ The python script can be run for paired end reads as:
14
+
15
+ python3 multitrim.py -1 [FORWARD READS] -2 [REVERSE READS] --max -o [OUTPUT DIRECTORY]
16
+
17
+ For single-end reads, run it as:
18
+
19
+ python3 multitrim.py -u [SE READS] --max -o [OUTPUT DIRECTORY]
20
+
21
+ NOTE:
22
+
23
+ Currently Falco is under development. The post-trim QC may fail. If this happens, the python script will issue an error and the post trim QC HTML will be missing. Please email me if this happens.
24
+
25
+ # User Manual
26
+
27
+ https://docs.google.com/presentation/d/1U87oUzMn-t-lJwOv3oVLv2oR9CpEVGYJpXV8dsBU4zM/edit?usp=sharing
28
+
29
+ <hr />
30
+
31
+ # Workflow Overview
32
+
33
+ * A subsample of up to 100K reads is taken from the input(s)
34
+ * The subsamples are run through FaQCs with report only mode on (no trimming) to detect adapters. Possible adapters come from this file: https://github.com/bio-miga/miga/blob/master/utils/adapters.fa
35
+ * The adapters detected (if any) are considered present if FaQCs reports them in a default 0.1% of reads. All adapters which are a part of the detected illumina kit(s) are included, e.g. detecting any one ilumina SE adapter will include ALL illumina SE adapters in the trim. The "families" of adapters can be seen at line breaks in the linked adapters.fa file
36
+ * Detected adapters are supplied to both FaQCs and Fastp in succession, so both tools attempt to trim adapters:
37
+ * First, FaQCs is run on the input reads with the -q 27 parameter, meaning that bases with < 27 quality score count against FaQCs' internal score parameter. This causes trimming to occur when enough <27 qual bases are found, and proceeds from both 5' and 3' ends separately.
38
+ * Second, Fastp is run on the post-FaQCs reads using a sliding window of size 3 and min avg. quality of 20. This is identical in behavior to trimmomatic's sliding window, but fastp is faster.
39
+ * Reads < 50 bp in length are removed.
40
+ * The final post-trim reads are output
41
+ * QC reports are performed on pre/post trim reads all at once.
42
+
43
+ <hr />
44
+
45
+ # Brief Summary
46
+
47
+ * Input reads -> FaQCs trims originals -> fastp trims FaQCs outputs -> output reads
48
+
49
+ <hr />
50
+
51
+ # Tools multitrim uses
52
+
53
+ Read Trimmers
54
+
55
+ * Tools for trimming:
56
+
57
+ * FaQCs: https://github.com/LANL-Bioinformatics/FaQCs
58
+
59
+ * Fastp: https://github.com/OpenGene/fastp
60
+
61
+ QC
62
+
63
+ * Falco: https://github.com/smithlabcode/falco/tree/master/src
64
+
65
+ Sampling
66
+
67
+ * SeqTK: https://github.com/lh3/seqtk