RubyGems - viral_seq - Versions diffs - 1.0.11 → 1.1.1 - Mend

viral_seq 1.0.11 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/.gitignore +0 -1
data/Gemfile.lock +1 -1
data/README.md +93 -11
data/bin/tcs +34 -6
data/bin/tcs_log +102 -0
data/docs/assets/img/cover.jpg +0 -0
data/docs/dr.json +67 -0
data/docs/sample_miseq_data/hivdr_control/r1.fastq.gz +0 -0
data/docs/sample_miseq_data/hivdr_control/r2.fastq.gz +0 -0
data/lib/viral_seq.rb +1 -1
data/lib/viral_seq/enumerable.rb +0 -10
data/lib/viral_seq/math.rb +3 -3
data/lib/viral_seq/seq_hash.rb +1 -1
data/lib/viral_seq/seq_hash_pair.rb +6 -4
data/lib/viral_seq/tcs_core.rb +34 -5
data/lib/viral_seq/tcs_dr.rb +71 -0
data/lib/viral_seq/tcs_json.rb +41 -10
data/lib/viral_seq/version.rb +2 -2
metadata +9 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eb5906a2a3f0c98fa84a15f7fd4b35160f766317a6603b18c62e2e2476af01fd
-  data.tar.gz: 435a32d9ce5078b18b633c14c1585c30835b48d3f013d24bc6b9cc98fef51a55
+  metadata.gz: 7a283f3a09cc5d9807e7622cd1ddf27197919955e85d6472b34fc14b66749c03
+  data.tar.gz: 4f90c5a9c7ea0ec148ba7d45ee88dc441f79da67a97654734194a773499ebb8e
 SHA512:
-  metadata.gz: a65fcfe551b59d2f022f96d009f089941a3bd293545e840ba294ce43b3cb087b2f3a7fef26829fe3b229c9469ca4e5c907362bbf05a5384947af97a12409a5aa
-  data.tar.gz: e4283b03e33aa67feb1dcf623d2613440bcc7299ab33ff77b7dce1ef9617b46e1763d17dda64d54b5fb3c6de37da66880abe43baae932ac36b70d0e2b0cc88d1
+  metadata.gz: 385a94eb93c3d8d9116c16a0d8af56ba714ba6191a454076acf881a036de80d1d598f3fcd1a4de841745ca08a1ad3e8bc028a30db9f96c19f3b217ef4583d652
+  data.tar.gz: 714d035b6f65863746cafb120c9cf6eccb8261f3eac69985bad96e5275351eec71aa3b744ee9b462e2dc3e0e199c2d4112386f6a2d7eef89b5b7824c1ab769be

data/.gitignore CHANGED Viewed

@@ -2,7 +2,6 @@
 /.yardoc
 /_yardoc/
 /coverage/
-/doc/
 /pkg/
 /spec/reports/
 /tmp/

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    viral_seq (1.0.10)
+    viral_seq (1.0.13)
       colorize (~> 0.1)
       muscle_bio (~> 0.4)

data/README.md CHANGED Viewed

@@ -1,8 +1,24 @@
 # ViralSeq
+[![Gem Version](https://badge.fury.io/rb/viral_seq.svg)](https://rubygems.org/gems/viral_seq)
+![GitHub](https://img.shields.io/github/license/viralseq/viral_seq)
+![Gem](https://img.shields.io/gem/dt/viral_seq?color=%23E9967A)
+![GitHub last commit](https://img.shields.io/github/last-commit/viralseq/viral_seq?color=%2300BFFF)
+[![Join the chat at https://gitter.im/viral_seq/community](https://badges.gitter.im/viral_seq/community.svg)](https://gitter.im/viral_seq/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 A Ruby Gem containing bioinformatics tools for processing viral NGS data.
-Specifically for Primer-ID sequencing and HIV drug resistance analysis.
+Specifically for Primer ID sequencing and HIV drug resistance analysis.
+## Illustration for the Primer ID Sequencing
+![Primer ID Sequencing](./docs/assets/img/cover.jpg)
+### Reference readings on the Primer ID sequencing
+[Explantion of Primer ID sequencing](https://doi.org/10.21769/BioProtoc.3938)
+[Primer ID MiSeq protocol](https://doi.org/10.1128/JVI.00522-15)
+[Application of Primer ID sequencing in COVID-19 research](https://doi.org/10.1126/scitranslmed.abb5883)
 ## Install
@@ -14,20 +30,55 @@ Specifically for Primer-ID sequencing and HIV drug resistance analysis.
 ### Excutables
-Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
+### `tcs`
+Use executable `tcs` pipeline to process **Primer ID MiSeq sequencing** data.
+Example commands:
 ```bash
-    $ locator -i sequence.fasta -o sequence.fasta.csv
+    $ tcs -p params.json # run TCS pipeline with params.json
+    $ tcs -p params.json -i DIRECTORY
+    # run TCS pipeline with params.json and DIRECTORY
+    # if DIRECTORY is not defined in params.json
+    $ tcs -dr -i DIRECTORY
+    # run tcs-dr (MPID HIV drug resistance sequencing) pipeline
+    # DIRECTORY needs to be given.
+    $ tcs -j # CLI to generate params.json
+    $ tcs -h # print out the help
 ```
-Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data.
+[sample params.json for the tcs-dr pipeline](./docs/dr.json)
+---
+### `tcs_log`
+Use `tcs_log` script to pool run logs and TCS fasta files after one batch of `tcs` jobs.
+Example file structure:
+```
+batch_tcs_jobs/
+      ├── lib1
+      ├── lib2
+      ├── lib3
+      ├── lib4
+      ├── ...
+```
+Example command:
 ```bash
-    $ tcs -p params.json # run TCS pipeline with params.json
-    $ tcs -j # CLI to generate params.json
-    $ tcs -h # print out the help
+    $ tcs_log batch_tcs_jobs
 ```
+---
+### `locator`
+Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
+```bash
+    $ locator -i sequence.fasta -o sequence.fasta.csv
+```
+---
 ## Some Examples
 Load all ViralSeq classes by requiring 'viral_seq.rb' in your Ruby scripts.
@@ -80,16 +131,47 @@ qc_seqhash.sdrm_hiv_pr(cut_off)
 ```
 ## Known issues
-  1. have a conflict with rails.
+  1. ~~have a conflict with rails.~~
+  2. ~~Update on 03032021. Still have conflict. But in rails gem file, can just use `requires: false` globally and only require "viral_seq" when the module is needed in controller.~~
+  3. The conflict seems to be resovled. It was from a combination of using `!` as a function for factorial and the gem name `viral_seq`. @_@
 ## Updates
-### Version 1.1.1-03022021
+### Version 1.1.1-04012021
+  1. Added warning when paired_raw_sequence less than 0.1% of total_raw_sequence.
+  2. Added option `-i WORKING_DIRECTORY` to the `tcs` script.
+  If the `params.json` file does not contain the path to the working directory, it will append path to the run params.
+  3. Added option `-dr` to the `tcs` script.
+### Version 1.1.0-03252021
+  1. Optimized the algorithm of end-join.
+  2. Fixed a bug in the `tcs` pipeline that sometimes combined tcs files are not saved.
+  3. Added `tcs_log` command to pool run logs and tcs files from one batch of tcs jobs.
+  4. Added the preset of MPID-HIVDR params file [***dr.json***](./docs/dr.json) in /docs.
+  5. Add `platform_format` option in the json generator of the `tcs` Pipeline.
+  Users can choose from 3 MiSeq platforms for processing their sequencing data.
+  MiSeq 300x7x300 is the default option.
+### Version 1.0.14-03052021
+  1. Add a function `ViralSeq::TcsCore.validate_file_name` to check MiSeq paired-end file names.
+### Version 1.0.13-03032021
+  1. Fixed the conflict with rails.
+### Version 1.0.12-03032021
+  1. Fixed an issue that may cause conflicts with ActiveRecord.
+### Version 1.0.11-03022021
-  1. Fixed a issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
+  1. Fixed an issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
   2. fixed an issue loading class 'OptionParser'in some ruby environments.
-### Version 1.1.0-11112020:
+### Version 1.0.10-11112020:
   1. Modularize TCS pipeline. Move key functions into /viral_seq/tcs_core.rb
   2. `tcs_json_generator` is removed. This CLI is delivered within the `tcs` pipeline, by running `tcs -j`. The scripts are included in the /viral_seq/tcs_json.rb

data/bin/tcs CHANGED Viewed

@@ -23,7 +23,7 @@
 # THE SOFTWARE.
 # Use JSON file as the run param
-# run tcs_json_generator.rb to generate param json file.
+# run `tcs -j` to generate param json file.
 require 'viral_seq'
 require 'json'
@@ -46,6 +46,14 @@ OptionParser.new do |opts|
     options[:params_json] = p
   end
+  opts.on("-i", "--input PATH_TO_WORKING_DIRECTORY", "Path to the working directory") do |p|
+    options[:input] = p
+  end
+  opts.on("-dr", "--dr_pipeline", "HIV drug resistance MPID pipeline") do |p|
+    options[:dr] = true
+  end
   opts.on("-h", "--help", "Prints this help") do
     puts opts
     exit
@@ -64,15 +72,21 @@ end.parse!
 if options[:json_generator]
   params = ViralSeq::TcsJson.generate
+elsif options[:dr]
+  params = ViralSeq::TcsDr::PARAMS
 elsif (options[:params_json] && File.exist?(options[:params_json]))
   params = JSON.parse(File.read(options[:params_json]), symbolize_names: true)
 else
   abort "No params JSON file found. Script terminated.".red
 end
-indir = params[:raw_sequence_dir]
+if options[:input]
+  indir = options[:input]
+else
+  indir = params[:raw_sequence_dir]
+end
-unless File.exist?(indir)
+unless indir and File.exist?(indir)
   abort "No input sequence directory found. Script terminated.".red.bold
 end
@@ -115,6 +129,12 @@ else
   error_rate = 0.02
 end
+if params[:platform_format]
+  $platform_sequencing_length = params[:platform_format]
+else
+  $platform_sequencing_length = 300
+end
 primers = params[:primer_pairs]
 if primers.empty?
   ViralSeq::TcsCore.log_and_abort log, "No primer information. Script terminated."
@@ -123,6 +143,7 @@ end
 primers.each do |primer|
   summary_json = {}
+  summary_json[:warnings] = []
   summary_json[:tcs_version] = ViralSeq::TCS_VERSION
   summary_json[:viralseq_version] = ViralSeq::VERSION
   summary_json[:runtime] = Time.now.to_s
@@ -175,6 +196,10 @@ primers.each do |primer|
   paired_seq_number = common_keys.size
   log.puts Time.now.to_s + "\t" +  "Paired raw sequences are : #{paired_seq_number.to_s}"
   summary_json[:paired_raw_sequence] = paired_seq_number
+  if paired_seq_number < raw_sequence_number * 0.001
+    summary_json[:warnings] <<
+      "WARNING: Filtered raw sequneces less than 0.1% of the total raw sequences. Possible contamination."
+  end
   common_keys.each do |seqtag|
     r1_seq = r1_passed_seq[seqtag]
@@ -273,7 +298,6 @@ primers.each do |primer|
       r1_sub_seq << bio_r1[seq_name]
       r2_sub_seq << bio_r2[seq_name]
     end
     #consensus name including the Primer ID and number of raw sequences of that Primer ID, library name and setname.
     consensus_name = ">" + primer_id + "_" + seq_with_same_primer_id.size.to_s + "_" + libname + "_" + region
     r1_consensus = ViralSeq::SeqHash.array(r1_sub_seq).consensus(majority_cut_off)
@@ -364,6 +388,7 @@ primers.each do |primer|
     shp = ViralSeq::SeqHashPair.fa(out_dir_consensus)
     joined_sh = end_join(out_dir_consensus, primer[:end_join_option], primer[:overlap])
     log.puts Time.now.to_s + "\t" + "Paired TCS number: " + joined_sh.size.to_s
     summary_json[:combined_tcs] = joined_sh.size
     if export_raw
@@ -433,12 +458,15 @@ primers.each do |primer|
       trim_end = primer[:trim_ref_end]
       trim_ref = primer[:trim_ref].to_sym
       joined_sh = joined_sh.trim(trim_start, trim_end, trim_ref)
-      joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
       if export_raw
         joined_sh_raw = joined_sh_raw.trim(trim_start, trim_end, trim_ref)
-        joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
       end
     end
+    joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
+    if export_raw
+      joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
+    end
   end
   File.open(outfile_log, "w") do |f|

data/bin/tcs_log ADDED Viewed

@@ -0,0 +1,102 @@
+#!/usr/bin/env ruby
+# pool run logs from one batch of tcs jobs
+# file structure:
+#   batch_tcs_jobs/
+#   ├── lib1
+#   ├── lib2
+#   ├── lib3
+#   ├── lib4
+#   ├── ...
+#
+# command example:
+#   $ tcs_log batch_tcs_jobs
+require 'viral_seq'
+require 'pathname'
+require 'json'
+require 'fileutils'
+indir = ARGV[0].chomp
+indir_basename = File.basename(indir)
+indir_dirname = File.dirname(indir)
+tcs_dir = File.join(indir_dirname, (indir_basename + "_tcs"))
+Dir.mkdir(tcs_dir) unless File.directory?(tcs_dir)
+libs = []
+Dir.chdir(indir) {libs = Dir.glob("*")}
+outdir2 = File.join(tcs_dir, "combined_TCS_per_lib")
+outdir3 = File.join(tcs_dir, "TCS_per_region")
+outdir4 = File.join(tcs_dir, "combined_TCS_per_region")
+Dir.mkdir(outdir2) unless File.directory?(outdir2)
+Dir.mkdir(outdir3) unless File.directory?(outdir3)
+Dir.mkdir(outdir4) unless File.directory?(outdir4)
+log_file = File.join(tcs_dir,"log.csv")
+log = File.open(log_file,'w')
+header = %w{
+  lib_name
+  Region
+  Raw_Sequences_per_barcode
+  R1_Raw
+  R2_Raw
+  Paired_Raw
+  Cutoff
+  PID_Length
+  Consensus1
+  Consensus2
+  Distinct_to_Raw
+  Resampling_index
+  Combined_TCS
+  Combined_TCS_after_QC
+  WARNINGS
+}
+log.puts header.join(',')
+libs.each do |lib|
+  Dir.mkdir(File.join(outdir2, lib)) unless File.directory?(File.join(outdir2, lib))
+  fasta_files = []
+  json_files = []
+  Dir.chdir(File.join(indir, lib)) do
+     fasta_files = Dir.glob("**/*.fasta")
+     json_files = Dir.glob("**/log.json")
+  end
+  fasta_files.each do |f|
+    path_array = Pathname(f).each_filename.to_a
+    region = path_array[0]
+    if path_array[-1] == "combined.fasta"
+      FileUtils.cp(File.join(indir, lib, f), File.join(outdir2, lib, (lib + "_" + region)))
+      Dir.mkdir(File.join(outdir4,region)) unless File.directory?(File.join(outdir4,region))
+      FileUtils.cp(File.join(indir, lib, f), File.join(outdir4, region, (lib + "_" + region)))
+    else
+      Dir.mkdir(File.join(outdir3,region)) unless File.directory?(File.join(outdir3,region))
+      Dir.mkdir(File.join(outdir3,region, lib)) unless File.directory?(File.join(outdir3,region, lib))
+      FileUtils.cp(File.join(indir, lib, f), File.join(outdir3, region, lib, (lib + "_" + region + "_" + path_array[-1])))
+    end
+  end
+  json_files.each do |f|
+    json_log = JSON.parse(File.read(File.join(indir, lib, f)), symbolize_names: true)
+    log.print [lib,
+               json_log[:primer_set_name],
+               json_log[:total_raw_sequence],
+               json_log[:r1_filtered_raw],
+               json_log[:r2_filtered_raw],
+               json_log[:paired_raw_sequence],
+               json_log[:consensus_cutoff],
+               json_log[:length_of_pid],
+               json_log[:total_tcs_with_ambiguities],
+               json_log[:total_tcs],
+               json_log[:distinct_to_raw],
+               json_log[:resampling_param],
+               json_log[:combined_tcs],
+               json_log[:combined_tcs_after_qc],
+               json_log[:warnings],
+             ].join(',') + "\n"
+  end
+end
+log.close

data/docs/assets/img/cover.jpg ADDED Viewed

Binary file

data/docs/dr.json ADDED Viewed

@@ -0,0 +1,67 @@
+{
+  "platform_error_rate": 0.02,
+  "primer_pairs": [
+    {
+      "region": "RT",
+      "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
+      "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
+      "majority": 0.5,
+      "end_join": true,
+      "end_join_option": 1,
+      "overlap": 0,
+      "TCS_QC": true,
+      "ref_genome": "HXB2",
+      "ref_start": 2648,
+      "ref_end": 3257,
+      "indel": true,
+      "trim": false
+    },
+    {
+      "region": "PR",
+      "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
+      "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
+      "majority": 0.5,
+      "end_join": true,
+      "end_join_option": 3,
+      "TCS_QC": true,
+      "ref_genome": "HXB2",
+      "ref_start": 0,
+      "ref_end": 2591,
+      "indel": true,
+      "trim": true,
+      "trim_ref": "HXB2",
+      "trim_ref_start": 2253,
+      "trim_ref_end": 2549
+    },
+    {
+      "region": "IN",
+      "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
+      "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
+      "majority": 0.5,
+      "end_join": true,
+      "end_join_option": 3,
+      "overlap": 171,
+      "TCS_QC": true,
+      "ref_genome": "HXB2",
+      "ref_start": 4384,
+      "ref_end": 4751,
+      "indel": false,
+      "trim": false
+    },
+    {
+      "region": "V1V3",
+      "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
+      "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
+      "majority": 0.5,
+      "end_join": true,
+      "end_join_option": 1,
+      "overlap": 0,
+      "TCS_QC": true,
+      "ref_genome": "HXB2",
+      "ref_start": 6585,
+      "ref_end": 7208,
+      "indel": true,
+      "trim": false
+    }
+  ]
+}

data/docs/sample_miseq_data/hivdr_control/r1.fastq.gz ADDED Viewed

Binary file

data/docs/sample_miseq_data/hivdr_control/r2.fastq.gz ADDED Viewed

Binary file

data/lib/viral_seq.rb CHANGED Viewed

@@ -37,6 +37,6 @@ require_relative "viral_seq/string"
 require_relative "viral_seq/version"
 require_relative "viral_seq/tcs_core"
 require_relative "viral_seq/tcs_json"
+require_relative "viral_seq/tcs_dr"
 require "muscle_bio"

data/lib/viral_seq/enumerable.rb CHANGED Viewed

@@ -3,10 +3,6 @@
 #   array = [1,2,3,4,5,6,7,8,9,10]
 #   array.median
 #   => 5.5
-# @example sum
-#   array = [1,2,3,4,5,6,7,8,9,10]
-#   array.sum
-#   => 55
 # @example average number (mean)
 #   array = [1,2,3,4,5,6,7,8,9,10]
 #   array.mean
@@ -45,12 +41,6 @@ module Enumerable
     len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
   end
-  # generate summed value
-  # @return [Numeric] summed value
-  def sum
-     self.inject(0){|accum, i| accum + i }
-  end
   # generate mean number
   # @return [Float] mean value
   def mean

data/lib/viral_seq/math.rb CHANGED Viewed

@@ -67,7 +67,7 @@ module ViralSeq
         @k = k
         @poisson_hash = {}
         (0..k).each do |n|
-          p = (rate**n * ::Math::E**(-rate))/!n
+          p = (rate**n * ::Math::E**(-rate))/n.factorial
           @poisson_hash[n] = p
         end
       end
@@ -155,9 +155,9 @@ class Integer
   # factorial method for an Integer
   # @return [Integer] factorial for given Integer
   # @example factorial for 5
-  #   !5
+  #   5.factorial
   #   => 120
-  def !
+  def factorial
     if self == 0
       return 1
     else

data/lib/viral_seq/seq_hash.rb CHANGED Viewed

@@ -394,7 +394,6 @@ module ViralSeq
             end
           end
         end
         consensus_seq += call_consensus_base(max_base_list)
       end
       return consensus_seq
@@ -742,6 +741,7 @@ module ViralSeq
       seq_hash_unique_pass = []
       seq_hash_unique.each do |seq|
+        next if seq.nil?
         loc = ViralSeq::Sequence.new('', seq).locator(ref_option, path_to_muscle)
         next unless loc # if locator tool fails, skip this seq.
         if start_nt.include?(loc[0]) && end_nt.include?(loc[1])

data/lib/viral_seq/seq_hash_pair.rb CHANGED Viewed

@@ -110,19 +110,21 @@ module ViralSeq
       raise ArgumentError.new(":overlap has to be Integer, input #{overlap} invalid.") unless overlap.is_a? Integer
       raise ArgumentError.new(":diff has to be float or integer, input #{diff} invalid.") unless (diff.is_a? Integer or diff.is_a? Float)
       joined_seq = {}
-      seq_pair_hash.uniq_hash.each do |seq_pair, seq_names|
+      seq_pair_hash.each do |seq_name,seq_pair|
         r1_seq = seq_pair[0]
         r2_seq = seq_pair[1]
         if overlap.zero?
           joined_sequence = r1_seq + r2_seq
+        elsif diff.zero?
+          if r1_seq[-overlap..-1] == r2_seq[0,overlap]
+            joined_sequence= r1_seq + r2_seq[overlap..-1]
+          end
         elsif r1_seq[-overlap..-1].compare_with(r2_seq[0,overlap]) <= (overlap * diff)
           joined_sequence= r1_seq + r2_seq[overlap..-1]
         else
           next
         end
-        seq_names.each do |seq_name|
-          joined_seq[seq_name] = joined_sequence
-        end
+        joined_seq[seq_name] = joined_sequence if joined_sequence
       end
       joined_seq_hash = ViralSeq::SeqHash.new

data/lib/viral_seq/tcs_core.rb CHANGED Viewed

@@ -102,9 +102,9 @@ module ViralSeq
       end
       # sort array of file names to determine if there is potential errors
-      # input name_array array of file names
-      # output hash { }
-      # need to change for each file name have an error code. and a bool to show if all pass
+      # @param name_array [Array] array of file names
+      # @return [hash] name check results
       def validate_file_name(name_array)
         errors = {
                    file_type_error: [] ,
@@ -165,6 +165,13 @@ module ViralSeq
           end
         end
+        file_name_with_lib_name = {}
+        passed_libs.each do |lib_name, files|
+          files.each do |f|
+            file_name_with_lib_name[f] = lib_name
+          end
+        end
         passed_names = []
         passed_libs.values.each { |names| passed_names += names}
@@ -175,7 +182,27 @@ module ViralSeq
           pass = true
         end
-        return { errors: errors, all_pass: pass, passed_names: passed_names, passed_libs: passed_libs }
+        file_name_with_error_type = {}
+        errors.each do |type, files|
+          files.each do |f|
+            file_name_with_error_type[f] ||= []
+            file_name_with_error_type[f] << type.to_s.tr("_", "\s")
+          end
+        end
+        file_check = []
+        name_array.each do |name|
+          file_check_hash = {}
+          file_check_hash[:fileName] = name
+          file_check_hash[:errors] = file_name_with_error_type[name]
+          file_check_hash[:libName] = file_name_with_lib_name[name]
+          file_check << file_check_hash
+        end
+        return { allPass: pass, files: file_check }
       end
       # filter r1 raw sequences for non-specific primers.
@@ -278,7 +305,9 @@ module ViralSeq
       end
       def general_filter(seq)
-        if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
+        if seq.size < $platform_sequencing_length
+          return false
+        elsif seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
           return false
         elsif seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
           return false

data/lib/viral_seq/tcs_dr.rb ADDED Viewed

@@ -0,0 +1,71 @@
+module ViralSeq
+  class TcsDr
+    PARAMS = {:platform_error_rate=>0.02,
+               :primer_pairs=>
+                [{:region=>"RT",
+                  :cdna=>
+                   "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
+                  :forward=>
+                   "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
+                  :majority=>0.5,
+                  :end_join=>true,
+                  :end_join_option=>1,
+                  :overlap=>0,
+                  :TCS_QC=>true,
+                  :ref_genome=>"HXB2",
+                  :ref_start=>2648,
+                  :ref_end=>3257,
+                  :indel=>true,
+                  :trim=>false},
+                 {:region=>"PR",
+                  :cdna=>
+                   "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
+                  :forward=>
+                   "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
+                  :majority=>0.5,
+                  :end_join=>true,
+                  :end_join_option=>3,
+                  :TCS_QC=>true,
+                  :ref_genome=>"HXB2",
+                  :ref_start=>0,
+                  :ref_end=>2591,
+                  :indel=>true,
+                  :trim=>true,
+                  :trim_ref=>"HXB2",
+                  :trim_ref_start=>2253,
+                  :trim_ref_end=>2549},
+                 {:region=>"IN",
+                  :cdna=>
+                   "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
+                  :forward=>"GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
+                  :majority=>0.5,
+                  :end_join=>true,
+                  :end_join_option=>3,
+                  :overlap=>171,
+                  :TCS_QC=>true,
+                  :ref_genome=>"HXB2",
+                  :ref_start=>4384,
+                  :ref_end=>4751,
+                  :indel=>false,
+                  :trim=>false},
+                 {:region=>"V1V3",
+                  :cdna=>
+                   "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
+                  :forward=>
+                   "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
+                  :majority=>0.5,
+                  :end_join=>true,
+                  :end_join_option=>1,
+                  :overlap=>0,
+                  :TCS_QC=>true,
+                  :ref_genome=>"HXB2",
+                  :ref_start=>6585,
+                  :ref_end=>7208,
+                  :indel=>true,
+                  :trim=>false}
+                  ]
+                }
+  end
+end

data/lib/viral_seq/tcs_json.rb CHANGED Viewed

@@ -13,6 +13,22 @@ module ViralSeq
         print '> '
         param[:raw_sequence_dir] = gets.chomp.rstrip
+        puts "Choose MiSeq Platform (1-3):\n1. 150x7x150\n2. 250x7x250\n3. 300x7x300 (default)"
+        print "> "
+        pf_option = gets.chomp.rstrip
+        # while ![1,2,3].include?(pf_option.to_i)
+        #   print "Entered MiSeq Platform #{pf_option.red.bold} not valid (choose 1-3), try again\n> "
+        #   pf_option = gets.chomp.rstrip
+        # end
+        case pf_option.to_i
+        when 1
+          param[:platform_format] = 150
+        when 2
+          param[:platform_format] = 250
+        else
+          param[:platform_format] = 300
+        end
         puts 'Enter the estimated platform error rate (for TCS cut-off calculation), default as ' + '0.02'.red.bold
         print '> '
         input_error = gets.chomp.rstrip.to_f
@@ -52,12 +68,12 @@ module ViralSeq
           if ej =~ /y|yes/i
             data[:end_join] = true
-            print "End-join option? Choose from (1-4):\n
-            1: simple join, no overlap
-            2: known overlap \n
-            3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap\n
-            4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap\n
-            > "
+            puts "End-join option? Choose from (1-4):"
+            puts "1: simple join, no overlap"
+            puts "2: known overlap"
+            puts "3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap"
+            puts "4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap"
+            print "> "
             ej_option = gets.chomp.rstrip
             while ![1,2,3,4].include?(ej_option.to_i)
               puts "Entered end-join option #{ej_option.red.bold} not valid (choose 1-4), try again"
@@ -138,7 +154,12 @@ module ViralSeq
         if save_option =~ /y|yes/i
           print "Path to save JSON file:\n> "
           path = gets.chomp.rstrip
-          File.open(path, 'w') {|f| f.puts JSON.pretty_generate(param)}
+          while !validate_path_name(path)
+            print "Entered path no valid, try again.\n".red.bold
+            print "Path to save JSON file:\n> "
+            path = gets.chomp.rstrip
+          end
+          File.open(validate_path_name(path), 'w') {|f| f.puts JSON.pretty_generate(param)}
         end
         print "\nDo you wish to execute tcs pipeline with the input params now? Y/N \n> "
@@ -147,7 +168,7 @@ module ViralSeq
         if rsp =~ /y/i
           return param
         else
-          abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`"
+          abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`".blue
         end
       end
@@ -172,7 +193,17 @@ module ViralSeq
               when 3
                 :MAC239
               end
-      end
-    end
+      end # end of get_ref
+      def validate_path_name(path)
+        if path.empty?
+          return false
+        elsif File.directory? path
+          return File.join(path, 'params.json')
+        elsif File.directory?(File.dirname(path))
+          return path
+        end
+      end # end of validate_path_name
+    end # end of class << self
   end # end TcsJson
 end # end main module

data/lib/viral_seq/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 # version info and histroy
 module ViralSeq
-  VERSION = "1.0.11"
-  TCS_VERSION = "2.1.1"
+  VERSION = "1.1.1"
+  TCS_VERSION = "2.3.0"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: viral_seq
 version: !ruby/object:Gem::Version
-  version: 1.0.11
+  version: 1.1.1
 platform: ruby
 authors:
 - Shuntai Zhou
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-03-02 00:00:00.000000000 Z
+date: 2021-04-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -90,6 +90,7 @@ email:
 executables:
 - locator
 - tcs
+- tcs_log
 extensions: []
 extra_rdoc_files: []
 files:
@@ -104,6 +105,11 @@ files:
 - Rakefile
 - bin/locator
 - bin/tcs
+- bin/tcs_log
+- docs/assets/img/cover.jpg
+- docs/dr.json
+- docs/sample_miseq_data/hivdr_control/r1.fastq.gz
+- docs/sample_miseq_data/hivdr_control/r2.fastq.gz
 - lib/viral_seq.rb
 - lib/viral_seq/constant.rb
 - lib/viral_seq/enumerable.rb
@@ -120,6 +126,7 @@ files:
 - lib/viral_seq/sequence.rb
 - lib/viral_seq/string.rb
 - lib/viral_seq/tcs_core.rb
+- lib/viral_seq/tcs_dr.rb
 - lib/viral_seq/tcs_json.rb
 - lib/viral_seq/version.rb
 - viral_seq.gemspec