RubyGems - viral_seq - Versions diffs - 1.1.0 → 1.2.2 - Mend

viral_seq 1.1.0 → 1.2.2

Files changed (22) hide show

checksums.yaml +4 -4
data/Gemfile.lock +16 -3
data/README.md +99 -12
data/bin/tcs +54 -10
data/bin/tcs_log +20 -1
data/bin/tcs_sdrm +409 -0
data/docs/assets/img/cover.jpg +0 -0
data/{doc → docs}/dr.json +0 -1
data/docs/sample_miseq_data/hivdr_control/r1.fastq.gz +0 -0
data/docs/sample_miseq_data/hivdr_control/r2.fastq.gz +0 -0
data/lib/viral_seq.rb +5 -1
data/lib/viral_seq/constant.rb +41 -4
data/lib/viral_seq/hivdr.rb +1 -1
data/lib/viral_seq/muscle.rb +3 -2
data/lib/viral_seq/recency.rb +52 -0
data/lib/viral_seq/sdrm.rb +101 -35
data/lib/viral_seq/seq_hash.rb +24 -4
data/lib/viral_seq/sequence.rb +1 -84
data/lib/viral_seq/tcs_dr.rb +71 -0
data/lib/viral_seq/version.rb +2 -2
data/viral_seq.gemspec +11 -0
metadata +72 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ea453e452e6832e942512cdb94462c33af89ffd8295017806c9aa6ff7ec77ad4
-  data.tar.gz: 2bb89d193e0e84ebe0791882c53e226a0a934ea3b9d1e61f87b8ffff6c22af1b
+  metadata.gz: a235cae95121a8522a47620eb9f8c05a3e2e416084743cd23df43aff7870a2c4
+  data.tar.gz: f0ce3a9412774eed703b0b0b663e7bb2dccf340f3f558cffdca85e920291794d
 SHA512:
-  metadata.gz: 9dc0403ecaea119d3aa3e832305a0bd4f038fdb71789dcd036080fa89b0e454ee79001b6042df171364e4207a93b2d4d5747336b2fb7f8fb7d83103f5d641134
-  data.tar.gz: 510ccfce7d717b56d55e2477ae01124009d1f53f010635759cf2f69afe0132313e08db9abaae1ec6d8d894961beba1c2d70a637eafa9b57b05f0aac3372cd0ca
+  metadata.gz: b97f98e40b8257281bd29cee40942d16084cf175933fc8357838ebb2a9eede1ab93ba323dbf315afb300f0a7852b2c6d939235831124710fc6f16f109e3eafc5
+  data.tar.gz: 4d660da22c69ce1ff929ed7f67d2b03aad662bb0237e9a93d9a8ea6bd1866d8544ad108db9ab8a11eee2df992395e41b68ffc43a8d1dbb132cc1f83a897676ef

data/Gemfile.lock CHANGED Viewed

@@ -1,16 +1,27 @@
 PATH
   remote: .
   specs:
-    viral_seq (1.0.13)
-      colorize (~> 0.1)
-      muscle_bio (~> 0.4)
+    viral_seq (1.1.1)
+      colorize (>= 0.1)
+      combine_pdf (>= 1.0.0)
+      muscle_bio (>= 0.4)
+      prawn (>= 2.3.0)
+      prawn-table (>= 0.2.0)
 GEM
   remote: https://rubygems.org/
   specs:
     colorize (0.8.1)
+    combine_pdf (1.0.21)
+      ruby-rc4 (>= 0.1.5)
     diff-lcs (1.3)
     muscle_bio (0.4.0)
+    pdf-core (0.9.0)
+    prawn (2.4.0)
+      pdf-core (~> 0.9.0)
+      ttfunk (~> 1.7)
+    prawn-table (0.2.2)
+      prawn (>= 1.3.0, < 3.0.0)
     rake (13.0.1)
     rspec (3.8.0)
       rspec-core (~> 3.8.0)
@@ -25,6 +36,8 @@ GEM
       diff-lcs (>= 1.2.0, < 2.0)
       rspec-support (~> 3.8.0)
     rspec-support (3.8.0)
+    ruby-rc4 (0.1.5)
+    ttfunk (1.7.0)
 PLATFORMS
   ruby

data/README.md CHANGED Viewed

@@ -1,5 +1,11 @@
 # ViralSeq
+[![Gem Version](https://img.shields.io/gem/v/viral_seq?color=%2300e673&style=flat-square)](https://rubygems.org/gems/viral_seq)
+![GitHub](https://img.shields.io/github/license/viralseq/viral_seq)
+![Gem](https://img.shields.io/gem/dt/viral_seq?color=%23E9967A)
+![GitHub last commit](https://img.shields.io/github/last-commit/viralseq/viral_seq?color=%2300BFFF)
+[![Join the chat at https://gitter.im/viral_seq/community](https://badges.gitter.im/viral_seq/community.svg)](https://gitter.im/viral_seq/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 A Ruby Gem containing bioinformatics tools for processing viral NGS data.
 Specifically for Primer ID sequencing and HIV drug resistance analysis.
@@ -7,11 +13,12 @@ Specifically for Primer ID sequencing and HIV drug resistance analysis.
 ## Illustration for the Primer ID Sequencing
-![Primer ID Sequencing](https://storage.googleapis.com/tcs-dr-public/pid.png)
+![Primer ID Sequencing](./docs/assets/img/cover.jpg)
 ### Reference readings on the Primer ID sequencing
-[Primer ID JID paper](https://doi.org/10.21769/BioProtoc.3938)
-[Primer ID MiSeq protocol](https://doi.org/10.1128/JVI.00522-15)
+[Explantion of Primer ID sequencing](https://doi.org/10.21769/BioProtoc.3938)
+[Primer ID MiSeq protocol](https://doi.org/10.1128/JVI.00522-15)
+[Application of Primer ID sequencing in COVID-19 research](https://doi.org/10.1126/scitranslmed.abb5883)
 ## Install
@@ -24,14 +31,23 @@ Specifically for Primer ID sequencing and HIV drug resistance analysis.
 ### Excutables
 ### `tcs`
-Use executable `tcs` pipeline to process **Primer ID MiSeq sequencing** data.
+Use executable `tcs` pipeline (v2.3.2) to process **Primer ID MiSeq sequencing** data.
 Example commands:
 ```bash
     $ tcs -p params.json # run TCS pipeline with params.json
+    $ tcs -p params.json -i DIRECTORY
+    # run TCS pipeline with params.json and DIRECTORY
+    # if DIRECTORY is not defined in params.json
+    $ tcs -dr -i DIRECTORY
+    # run tcs-dr (MPID HIV drug resistance sequencing) pipeline
+    # DIRECTORY needs to be given.
     $ tcs -j # CLI to generate params.json
     $ tcs -h # print out the help
 ```
+[sample params.json for the tcs-dr pipeline](./docs/dr.json)
 ---
 ### `tcs_log`
@@ -53,6 +69,44 @@ Example command:
     $ tcs_log batch_tcs_jobs
 ```
+---
+### `tcs_sdrm`
+Use `tcs_sdrm` pipeline for HIV-1 drug resistance mutation and recency.
+Example command:
+```bash
+    $ tcs_sdrm libs_dir
+```
+lib_dir file structure:
+```
+libs_dir/
+├── lib1
+  ├── lib1_RT
+  ├── lib1_PR
+  ├── lib1_IN
+  ├── lib1_V1V3
+├── lib2
+  ├── lib1_RT
+  ├── lib1_PR
+  ├── lib1_IN
+  ├── lib1_V1V3
+├── ...
+```
+Output data in a new dir as 'libs_dir_SDRM'
+**Note: [R](https://www.r-project.org/) and the following R libraries are required:**
+- phangorn
+- ape
+- scales
+- ggforce
+- cowplot
+- magrittr
+- gridExtra
 ---
 ### `locator`
@@ -93,7 +147,7 @@ qc_seqhash = aligned_seqhash.hiv_seq_qc(2253, 2549, false, :HXB2)
 Further filter out sequences with Apobec3g/f hypermutations
 ```ruby
-qc_seqhash = qc_seqhash.a3g
+qc_seqhash = qc_seqhash.a3g[:filtered_seq]
 ```
 Calculate nucleotide diveristy π
@@ -121,15 +175,48 @@ qc_seqhash.sdrm_hiv_pr(cut_off)
 ## Updates
+### Version 1.2.2-05272021
+  1. Fixed a bug in the `tcs` pipeline that sometimes causes `SystemStackError`.
+  `tcs` pipeline upgraded to v2.3.2
+### Version 1.2.1-05172021
+  1. Added a function in R to check and install missing R packages for `tcs_sdrm` pipeline.
+### Version 1.2.0-05102021
+  1. Added `tcs_sdrm` pipeline as an excutable.
+  `tcs_sdrm` processes `tcs`-processed HIV MPID-NGS data for drug resistance mutations, recency and phylogentic analysis.
+  2. Added function ViralSeq::SeqHash#sample.
+  3. Added recency determining function `ViralSeq::Recency::define`
+  4. Fixed a few bugs related to `tcs_sdrm`.
+### Version 1.1.2-04262021
+  1. Added function `ViralSeq::DRMs.sdrm_json` to export SDRM as json object.
+  2. Added a random string to the temp file names for `muscle_bio` to avoid issues when running scripts in parallel.
+  3. Added `--keep-original` flag to the `tcs` pipeline.
+### Version 1.1.1-04012021
+  1. Added warning when paired_raw_sequence less than 0.1% of total_raw_sequence.
+  2. Added option `-i WORKING_DIRECTORY` to the `tcs` script.
+  If the `params.json` file does not contain the path to the working directory, it will append path to the run params.
+  3. Added option `-dr` to the `tcs` script.
 ### Version 1.1.0-03252021
-    1. Optimized the algorithm of end-join.
-    2. Fixed a bug in the `tcs` pipeline that sometimes combined tcs files are not saved.
-    3. Added `tcs_log` command to pool run logs and tcs files from one batch of tcs jobs.
-    4. Added the preset of MPID-HIVDR params file ***dr.json*** in /doc.
-    5. Add `platform_format` option in the json generator of the `tcs` Pipeline.
-    Users can choose from 3 MiSeq platforms for processing their sequencing data.
-    MiSeq 300x7x300 is the default option.
+  1. Optimized the algorithm of end-join.
+  2. Fixed a bug in the `tcs` pipeline that sometimes combined tcs files are not saved.
+  3. Added `tcs_log` command to pool run logs and tcs files from one batch of tcs jobs.
+  4. Added the preset of MPID-HIVDR params file [***dr.json***](./docs/dr.json) in /docs.
+  5. Add `platform_format` option in the json generator of the `tcs` Pipeline.
+  Users can choose from 3 MiSeq platforms for processing their sequencing data.
+  MiSeq 300x7x300 is the default option.
 ### Version 1.0.14-03052021

data/bin/tcs CHANGED Viewed

@@ -46,11 +46,23 @@ OptionParser.new do |opts|
     options[:params_json] = p
   end
+  opts.on("-i", "--input PATH_TO_WORKING_DIRECTORY", "Path to the working directory") do |p|
+    options[:input] = p
+  end
+  opts.on("-dr", "--dr_pipeline", "HIV drug resistance MPID pipeline") do |p|
+    options[:dr] = true
+  end
   opts.on("-h", "--help", "Prints this help") do
     puts opts
     exit
   end
+  opts.on("--keep-original", "keep raw sequence files") do
+    options[:keep] = true
+  end
   opts.on("-v", "--version", "Version info") do
     puts "tcs version: " + ViralSeq::TCS_VERSION.red.bold
     puts "viral_seq version: " + ViralSeq::VERSION.red.bold
@@ -64,15 +76,21 @@ end.parse!
 if options[:json_generator]
   params = ViralSeq::TcsJson.generate
+elsif options[:dr]
+  params = ViralSeq::TcsDr::PARAMS
 elsif (options[:params_json] && File.exist?(options[:params_json]))
   params = JSON.parse(File.read(options[:params_json]), symbolize_names: true)
 else
   abort "No params JSON file found. Script terminated.".red
 end
-indir = params[:raw_sequence_dir]
+if options[:input]
+  indir = options[:input]
+else
+  indir = params[:raw_sequence_dir]
+end
-unless File.exist?(indir)
+unless indir and File.exist?(indir)
   abort "No input sequence directory found. Script terminated.".red.bold
 end
@@ -129,6 +147,7 @@ end
 primers.each do |primer|
   summary_json = {}
+  summary_json[:warnings] = []
   summary_json[:tcs_version] = ViralSeq::TCS_VERSION
   summary_json[:viralseq_version] = ViralSeq::VERSION
   summary_json[:runtime] = Time.now.to_s
@@ -140,6 +159,7 @@ primers.each do |primer|
   forward_primer = primer[:forward]
   export_raw = primer[:export_raw]
+  limit_raw = primer[:limit_raw]
   unless cdna_primer
     log.puts Time.now.to_s + "\t" + region + " does not have cDNA primer sequence. #{region} skipped."
@@ -181,6 +201,10 @@ primers.each do |primer|
   paired_seq_number = common_keys.size
   log.puts Time.now.to_s + "\t" +  "Paired raw sequences are : #{paired_seq_number.to_s}"
   summary_json[:paired_raw_sequence] = paired_seq_number
+  if paired_seq_number < raw_sequence_number * 0.001
+    summary_json[:warnings] <<
+      "WARNING: Filtered raw sequneces less than 0.1% of the total raw sequences. Possible contamination."
+  end
   common_keys.each do |seqtag|
     r1_seq = r1_passed_seq[seqtag]
@@ -242,7 +266,13 @@ primers.each do |primer|
     raw_r1_f = File.open(outfile_raw_r1, 'w')
     raw_r2_f = File.open(outfile_raw_r2, 'w')
-    bio_r1.keys.each do |k|
+    if limit_raw
+      raw_keys = bio_r1.keys.sample(limit_raw.to_i)
+    else
+      raw_keys = bio_r1.keys
+    end
+    raw_keys.each do |k|
       raw_r1_f.puts k + "_r1"
       raw_r2_f.puts k + "_r2"
       raw_r1_f.puts bio_r1[k]
@@ -341,9 +371,21 @@ primers.each do |primer|
   # Primer ID distribution in .json file
   out_pid_json = File.join(out_dir_set, 'primer_id.json')
   pid_json = {}
-  pid_json[:primer_id_in_use] = Hash[*(primer_id_in_use.sort_by {|k, v| [-v,k]}.flatten)]
-  pid_json[:primer_id_distribution] = Hash[*(primer_id_dis.sort_by{|k,v| k}.flatten)]
-  pid_json[:primer_id_frequency] = Hash[*(primer_id_count.sort_by {|k, v| [-v,k]}.flatten)]
+  pid_json[:primer_id_in_use] = {}
+  primer_id_in_use.sort_by {|k, v| [-v,k]}.each do |k,v|
+    pid_json[:primer_id_in_use][k] = v
+  end
+  pid_json[:primer_id_distribution] = {}
+  primer_id_dis.sort_by{|k,v| k}.each do |k,v|
+    pid_json[:primer_id_distribution][k] = v
+  end
+  pid_json[:primer_id_frequency] = {}
+  primer_id_count.sort_by {|k,v| [-v,k]}.each do |k,v|
+    pid_json[:primer_id_frequency][k] = v
+  end
   File.open(out_pid_json, 'w') do |f|
     f.puts JSON.pretty_generate(pid_json)
   end
@@ -455,9 +497,11 @@ primers.each do |primer|
   end
 end
-log.puts Time.now.to_s + "\t" + "Removing raw sequence files..."
-File.unlink(r1_f)
-File.unlink(r2_f)
-log.puts Time.now.to_s + "\t" + "TCS pipeline successfuly exercuted."
+unless options[:keep]
+  log.puts Time.now.to_s + "\t" + "Removing raw sequence files..."
+  File.unlink(r1_f)
+  File.unlink(r2_f)
+end
+log.puts Time.now.to_s + "\t" + "TCS pipeline successfuly executed."
 log.close
 puts "DONE!"

data/bin/tcs_log CHANGED Viewed

@@ -37,8 +37,26 @@ Dir.mkdir(outdir4) unless File.directory?(outdir4)
 log_file = File.join(tcs_dir,"log.csv")
 log = File.open(log_file,'w')
-log.puts "lib name,Region,Raw Sequences per barcode,R1 Raw,R2 Raw,Paired Raw,Cutoff,PID Length,Consensus1,Consensus2,Distinct to Raw,Resampling index,Combined TCS,Combined TCS after QC"
+header = %w{
+  lib_name
+  Region
+  Raw_Sequences_per_barcode
+  R1_Raw
+  R2_Raw
+  Paired_Raw
+  Cutoff
+  PID_Length
+  Consensus1
+  Consensus2
+  Distinct_to_Raw
+  Resampling_index
+  Combined_TCS
+  Combined_TCS_after_QC
+  WARNINGS
+}
+log.puts header.join(',')
 libs.each do |lib|
   Dir.mkdir(File.join(outdir2, lib)) unless File.directory?(File.join(outdir2, lib))
   fasta_files = []
@@ -77,6 +95,7 @@ libs.each do |lib|
                json_log[:resampling_param],
                json_log[:combined_tcs],
                json_log[:combined_tcs_after_qc],
+               json_log[:warnings],
              ].join(',') + "\n"
   end
 end

data/bin/tcs_sdrm ADDED Viewed

@@ -0,0 +1,409 @@
+#!/usr/bin/env ruby
+# tcs/sdrm pipeline for HIV-1 drug resistance mutation and recency
+#
+# command example:
+#   $ tcs_sdrm libs_dir
+#
+# lib_dir file structure:
+#   libs_dir
+#   ├── lib1
+#     ├── lib1_RT
+#     ├── lib1_PR
+#     ├── lib1_IN
+#     ├── lib1_V1V3
+#   ├── lib2
+#     ├── lib1_RT
+#     ├── lib1_PR
+#     ├── lib1_IN
+#     ├── lib1_V1V3
+#   ├── ...
+#
+# output data in a new dir as 'libs_dir_SDRM'
+require 'viral_seq'
+require 'json'
+require 'csv'
+require 'fileutils'
+require 'prawn'
+require 'prawn/table'
+require 'combine_pdf'
+unless ARGV[0] && File.directory?(ARGV[0])
+  abort "No sequence data provided. `tcs_sdrm` pipeline aborted. "
+end
+begin
+  r_version = `R --version`.split("\n")[0]
+  r_check = `R -e '#{ViralSeq::R_SCRIPT_CHECK_PACKAGES}' > /dev/null 2>&1`
+rescue Errno::ENOENT
+  abort '"R" is not installed. Install R at https://www.r-project.org/' +
+        "\n`tcs_sdrm` pipeline aborted."
+end
+def abstract_line(data)
+  return_data = data[3] + data[2] + data[4] + ":" +
+                (data[6].to_f * 100).round(2).to_s + "(" +
+                (data[7].to_f * 100).round(2).to_s + "-" +
+                (data[8].to_f * 100).round(2).to_s + "); "
+end
+# run params
+log = []
+log << { time: Time.now }
+log << { viral_seq_version: ViralSeq::VERSION }
+log << { tcs_version: ViralSeq::TCS_VERSION }
+log << { R_version: r_version}
+sdrm_list = {}
+sdrm_list[:nrti] = ViralSeq::DRMs.sdrm_json(:nrti)
+sdrm_list[:nnrti] = ViralSeq::DRMs.sdrm_json(:nnrti)
+sdrm_list[:hiv_pr] = ViralSeq::DRMs.sdrm_json(:hiv_pr)
+sdrm_list[:hiv_in] = ViralSeq::DRMs.sdrm_json(:hiv_in)
+log << { sdrm_list: sdrm_list }
+# input dir
+indir = ARGV[0]
+libs = Dir[indir + "/*"]
+log << { processed_libs: libs }
+#output dir
+outdir = indir + "_SDRM"
+Dir.mkdir(outdir) unless File.directory?(outdir)
+libs.each do |lib|
+  r_script = ViralSeq::R_SCRIPT.dup
+  next unless File.directory?(lib)
+  lib_name = File.basename(lib)
+  out_lib_dir = File.join(outdir, lib_name)
+  Dir.mkdir(out_lib_dir) unless File.directory?(out_lib_dir)
+  sub_seq_files = Dir[lib + "/*"]
+  seq_summary_file = File.join(out_lib_dir, (lib_name + "_summary.csv"))
+  seq_summary_out = File.open(seq_summary_file, "w")
+  seq_summary_out.puts 'Region,TCS,TCS with A3G/F hypermutation,TCS with stop codon,' +
+                       'TCS w/o hypermutation and stop codon,' +
+                       'Poisson cutoff for minority mutation (>=),Pi,Dist20'
+  point_mutation_file = File.join(out_lib_dir, (lib_name + "_substitution.csv"))
+  point_mutation_out = File.open(point_mutation_file, "w")
+  point_mutation_out.puts "region,TCS,AA position,wild type,mutation," +
+                          "number,percentage,95% CI low, 95% CI high, notes"
+  linkage_file = File.join(out_lib_dir, (lib_name + "_linkage.csv"))
+  linkage_out = File.open(linkage_file, "w")
+  linkage_out.puts "region,TCS,mutation linkage,number," +
+                   "percentage,95% CI low, 95% CI high, notes"
+  aa_report_file = File.join(out_lib_dir, (lib_name + "_aa.csv"))
+  aa_report_out = File.open(aa_report_file, "w")
+  aa_report_out.puts "region,ref.aa.positions,TCS.number," +
+                     ViralSeq::AMINO_ACID_LIST.join(",")
+  summary_json_file = File.join(out_lib_dir, (lib_name + "_summary.json"))
+  summary_json_out = File.open(summary_json_file,"w")
+  filtered_seq_dir = File.join(out_lib_dir, (lib_name + "_filtered_seq"))
+  Dir.mkdir(filtered_seq_dir) unless File.directory?(filtered_seq_dir)
+  aln_seq_dir = File.join(out_lib_dir, (lib_name + "_aln_seq"))
+  Dir.mkdir(aln_seq_dir) unless File.directory?(aln_seq_dir)
+  point_mutation_list = []
+  linkage_list = []
+  aa_report_list = []
+  summary_hash = {}
+  sub_seq_files.each do |sub_seq|
+    seq_basename = File.basename(sub_seq)
+    seqs = ViralSeq::SeqHash.fa(sub_seq)
+    next if seqs.size < 3
+    if seq_basename =~ /V1V3/i
+      summary_hash[:V1V3] = "#{seqs.size.to_s},NA,NA,NA,NA"
+      FileUtils.cp(sub_seq, filtered_seq_dir)
+    elsif seq_basename =~ /PR/i
+      a3g_check = seqs.a3g
+      a3g_seqs = a3g_check[:a3g_seq]
+      a3g_filtered_seqs = a3g_check[:filtered_seq]
+      stop_codon_check = a3g_filtered_seqs.stop_codon
+      stop_codon_seqs = stop_codon_check[:with_stop_codon]
+      filtered_seqs = stop_codon_check[:without_stop_codon]
+      poisson_minority_cutoff = filtered_seqs.pm
+      summary_hash[:PR] = [
+                            seqs.size.to_s,
+                            a3g_seqs.size.to_s,
+                            stop_codon_seqs.size.to_s,
+                            filtered_seqs.size.to_s,
+                            poisson_minority_cutoff.to_s
+                          ].join(',')
+      next if filtered_seqs.size < 3
+      filtered_seqs.write_nt_fa(File.join(filtered_seq_dir,seq_basename))
+      sdrm = filtered_seqs.sdrm_hiv_pr(poisson_minority_cutoff)
+      point_mutation_list += sdrm[0]
+      linkage_list += sdrm[1]
+      aa_report_list += sdrm[2]
+    elsif seq_basename =~/IN/i
+      a3g_check = seqs.a3g
+      a3g_seqs = a3g_check[:a3g_seq]
+      a3g_filtered_seqs = a3g_check[:filtered_seq]
+      stop_codon_check = a3g_filtered_seqs.stop_codon(2)
+      stop_codon_seqs = stop_codon_check[:with_stop_codon]
+      filtered_seqs = stop_codon_check[:without_stop_codon]
+      poisson_minority_cutoff = filtered_seqs.pm
+      summary_hash[:IN] = [
+                            seqs.size.to_s,
+                            a3g_seqs.size.to_s,
+                            stop_codon_seqs.size.to_s,
+                            filtered_seqs.size.to_s,
+                            poisson_minority_cutoff.to_s
+                          ].join(',')
+      next if filtered_seqs.size < 3
+      filtered_seqs.write_nt_fa(File.join(filtered_seq_dir,seq_basename))
+      sdrm = filtered_seqs.sdrm_hiv_in(poisson_minority_cutoff)
+      point_mutation_list += sdrm[0]
+      linkage_list += sdrm[1]
+      aa_report_list += sdrm[2]
+    elsif seq_basename =~/RT/i
+      rt_seq1 = {}
+      rt_seq2 = {}
+      seqs.dna_hash.each do |k,v|
+        rt_seq1[k] = v[0,267]
+        rt_seq2[k] = v[267..-1]
+      end
+      rt1 = ViralSeq::SeqHash.new(rt_seq1)
+      rt2 = ViralSeq::SeqHash.new(rt_seq2)
+      rt1_a3g = rt1.a3g
+      rt2_a3g = rt2.a3g
+      hypermut_seq_rt1 = rt1_a3g[:a3g_seq]
+      hypermut_seq_rt2 = rt2_a3g[:a3g_seq]
+      rt1_stop_codon = rt1.stop_codon(1)[:with_stop_codon]
+      rt2_stop_codon = rt2.stop_codon(2)[:with_stop_codon]
+      hypermut_seq_keys = (hypermut_seq_rt1.dna_hash.keys | hypermut_seq_rt2.dna_hash.keys)
+      stop_codon_seq_keys = (rt1_stop_codon.dna_hash.keys | rt2_stop_codon.dna_hash.keys)
+      reject_keys = (hypermut_seq_keys | stop_codon_seq_keys)
+      filtered_seqs = ViralSeq::SeqHash.new(seqs.dna_hash.reject {|k,v| reject_keys.include?(k) })
+      poisson_minority_cutoff = filtered_seqs.pm
+      summary_hash[:RT] = [
+                            seqs.size.to_s,
+                            hypermut_seq_keys.size.to_s,
+                            stop_codon_seq_keys.size.to_s,
+                            filtered_seqs.size.to_s,
+                            poisson_minority_cutoff.to_s
+                          ].join(',')
+      next if filtered_seqs.size < 3
+      filtered_seqs.write_nt_fa(File.join(filtered_seq_dir,seq_basename))
+      sdrm = filtered_seqs.sdrm_hiv_rt(poisson_minority_cutoff)
+      point_mutation_list += sdrm[0]
+      linkage_list += sdrm[1]
+      aa_report_list += sdrm[2]
+    end
+  end
+  point_mutation_list.each do |record|
+    point_mutation_out.puts record.join(",")
+  end
+  linkage_list.each do |record|
+    linkage_out.puts record.join(",")
+  end
+  aa_report_list.each do |record|
+    aa_report_out.puts record.join(",")
+  end
+  filtered_seq_files = Dir[filtered_seq_dir + "/*"]
+  out_r_csv = File.join(out_lib_dir, (lib_name + "_pi.csv"))
+  out_r_pdf = File.join(out_lib_dir, (lib_name + "_pi.pdf"))
+  if filtered_seq_files.size > 0
+    filtered_seq_files.each do |seq_file|
+      filtered_sh = ViralSeq::SeqHash.fa(seq_file)
+      next if filtered_sh.size < 3
+      aligned_sh = filtered_sh.random_select(1000).align
+      aligned_sh.write_nt_fa(File.join(aln_seq_dir, File.basename(seq_file)))
+    end
+    r_script.gsub!(/PATH_TO_FASTA/,aln_seq_dir)
+    File.unlink(out_r_csv) if File.exist?(out_r_csv)
+    File.unlink(out_r_pdf) if File.exist?(out_r_pdf)
+    r_script.gsub!(/OUTPUT_CSV/,out_r_csv)
+    r_script.gsub!(/OUTPUT_PDF/,out_r_pdf)
+    r_script_file = File.join(out_lib_dir, "/pi.R")
+    File.open(r_script_file,"w") {|line| line.puts r_script}
+    print `Rscript #{r_script_file} 1> /dev/null 2> /dev/null`
+    if File.exist?(out_r_csv)
+      pi_csv = File.readlines(out_r_csv)
+      pi_csv.each do |line|
+        line.chomp!
+        data = line.split(",")
+        tag = data[0].split("_")[-1].gsub(/\W/,"").to_sym
+        summary_hash[tag] += "," + data[1].to_f.round(4).to_s + "," + data[2].to_f.round(4).to_s
+      end
+      [:PR, :RT, :IN, :V1V3].each do |regions|
+        next unless summary_hash[regions]
+        seq_summary_out.puts regions.to_s + "," + summary_hash[regions]
+      end
+      File.unlink(out_r_csv)
+    end
+    File.unlink(r_script_file)
+  end
+  seq_summary_out.close
+  point_mutation_out.close
+  linkage_out.close
+  aa_report_out.close
+  summary_lines = File.readlines(seq_summary_file)
+  summary_lines.shift
+  tcs_PR = 0
+  tcs_RT = 0
+  tcs_IN = 0
+  tcs_V1V3 = 0
+  pi_RT = 0.0
+  pi_V1V3 = 0.0
+  dist20_RT = 0.0
+  dist20_V1V3 = 0.0
+  summary_lines.each do |line|
+      data = line.chomp.split(",")
+      if data[0] == "PR"
+          tcs_PR = data[4].to_i
+      elsif data[0] == "RT"
+          tcs_RT = data[4].to_i
+          pi_RT = data[6].to_f
+          dist20_RT = data[7].to_f
+      elsif data[0] == "IN"
+          tcs_IN = data[4].to_i
+      elsif data[0] == "V1V3"
+          tcs_V1V3 = data[1].to_i
+          pi_V1V3 = data[6].to_f
+          dist20_V1V3 = data[7].to_f
+      end
+  end
+  recency = ViralSeq::Recency.define(
+                              tcs_RT: tcs_RT,
+                              tcs_V1V3: tcs_V1V3,
+                              pi_RT: pi_RT,
+                              dist20_RT: dist20_RT,
+                              pi_V1V3: pi_V1V3,
+                              dist20_V1V3: dist20_V1V3
+                              )
+  sdrm_lines = File.readlines(point_mutation_file)
+  sdrm_lines.shift
+  sdrm_PR = ""
+  sdrm_RT = ""
+  sdrm_IN = ""
+  sdrm_lines.each do |line|
+      data = line.chomp.split(",")
+      next if data[-1] == "*"
+      if data[0] == "PR"
+          sdrm_PR += abstract_line(data)
+      elsif data[0] =~ /NRTI/
+          sdrm_RT += abstract_line(data)
+      elsif data[0] == "IN"
+          sdrm_IN += abstract_line(data)
+      end
+  end
+  summary_json = [
+    sample_id: lib_name,
+    tcs_PR: tcs_PR,
+    tcs_RT: tcs_RT,
+    tcs_IN: tcs_IN,
+    tcs_V1V3: tcs_V1V3,
+    pi_RT: pi_RT,
+    dist20_RT: dist20_RT,
+    dist20_V1V3: dist20_V1V3,
+    recency: recency,
+    sdrm_PR: sdrm_PR,
+    sdrm_RT: sdrm_RT,
+    sdrm_IN: sdrm_IN
+  ]
+  summary_json_out.puts JSON.pretty_generate(summary_json)
+  summary_json_out.close
+  csvs = [
+    {
+      name: "summary",
+      title: "Summary",
+      file: seq_summary_file,
+      newPDF: "",
+      table_width: [65,55,110,110,110,110,60,60],
+      extra_text: ""
+    },
+    {
+      name: "substitution",
+      title: "Surveillance Drug Resistance Mutations",
+      file: point_mutation_file,
+      newPDF: "",
+      table_width: [65,55,85,80,60,65,85,85,85,45],
+      extra_text: "* Mutation below Poisson cut-off for minority mutations"
+    },
+    {
+      name: "linkage",
+      title: "Mutation Linkage",
+      file: linkage_file,
+      newPDF: "",
+      table_width: [55,50,250,60,80,80,80,45],
+      extra_text: "* Mutation below Poisson cut-off for minority mutations"
+    }
+  ]
+  csvs.each do |csv|
+    file_name = File.join(out_lib_dir, (csv[:name] + ".pdf"))
+    next unless File.exist? csv[:file]
+    Prawn::Document.generate(file_name, :page_layout => :landscape) do |pdf|
+      pdf.text((File.basename(lib, ".*") + ': ' + csv[:title]),
+      :size => 20,
+      :align => :center,
+      :style => :bold)
+      pdf.move_down 20
+      table_data = CSV.open(csv[:file]).to_a
+      header = table_data.first
+      pdf.table(table_data,
+        :header => header,
+        :position => :center,
+        :column_widths => csv[:table_width],
+        :row_colors => ["B6B6B6", "FFFFFF"],
+        :cell_style => {:align => :center, :size => 10}) do |table|
+        table.row(0).style :font_style => :bold, :size => 12 #, :background_color => 'ff00ff'
+      end
+      pdf.move_down 5
+      pdf.text(csv[:extra_text], :size => 8, :align => :justify,)
+    end
+    csv[:newPDF] = file_name
+  end
+  pdf = CombinePDF.new
+  csvs.each do |csv|
+    pdf << CombinePDF.load(csv[:newPDF]) if File.exist?(csv[:newPDF])
+  end
+  pdf << CombinePDF.load(out_r_pdf) if File.exist?(out_r_pdf)
+  pdf.number_pages location: [:bottom_right],
+  number_format: "Swanstrom\'s lab HIV SDRM Pipeline, version #{$sdrm_version_number} by S.Z. and M.U.C.   Page %s",
+  font_size: 6,
+  opacity: 0.5
+  pdf.save File.join(out_lib_dir, (lib_name + ".pdf"))
+  csvs.each do |csv|
+    File.unlink csv[:newPDF]
+  end
+end
+log_file = File.join(File.dirname(indir), "sdrm_log.json")
+File.open(log_file, 'w') { |f| f.puts JSON.pretty_generate(log) }
+FileUtils.touch(File.join(outdir, ".done"))