RubyGems - viral_seq - Versions diffs - 1.0.9 → 1.0.10 - Mend

viral_seq 1.0.9 → 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/Gemfile.lock +1 -1
data/README.md +45 -32
data/bin/tcs +72 -141
data/lib/viral_seq.rb +3 -0
data/lib/viral_seq/seq_hash.rb +13 -6
data/lib/viral_seq/seq_hash_pair.rb +6 -0
data/lib/viral_seq/tcs_core.rb +303 -0
data/lib/viral_seq/tcs_json.rb +178 -0
data/lib/viral_seq/version.rb +2 -2
metadata +4 -4
data/bin/tcs_json_generator +0 -166

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4921d3609d6ffc7fd6fbafd7a4a86e5818d47ed855393addd68b20f28b9d214f
-  data.tar.gz: a9e18c01b287885f8f6238343d9633a52d4ae5ea061347e73bd4f3e86788b2a4
+  metadata.gz: 14d880e9f39b2b87892bec9d4377b358643c880cf32c81872cff51e1007bc23b
+  data.tar.gz: 6ee1c3293e2b0403a2eac033335f7575625b2d35f32127b5b57be53e94b4ec7d
 SHA512:
-  metadata.gz: dd21b57e17751f6c3e475f05b7a565d295ac7592b7c02f8d89ed49192834bee444f08ee9ebf48e41922c8caaf37a03651d5d0c9aa89d97ccc2edb9aad8224d5f
-  data.tar.gz: d1162424ea877d9839c179cacc330c81cd3508fcff07b64a1e753c7c706485d1dcb9a6b60aec9ce02ed33b91bbd4386ed58329c17e247ba086e7d81ed107bfd4
+  metadata.gz: 951b75ced84aa21cf5650baa6970f60a617d3f29d20c14acadacefabea23d6b584f25990453c2008f30197aaef055a94edbdbb45494bb12b6343d90bc6bd45fb
+  data.tar.gz: 68ac69b4ebd5438a8f73780db823c94aa5a78c7c26d02cfd6bec979244dd1d6452c3698ade0606ddbbaccc480ad85e603171c11648dbb0110c2f5dbb3355bb35

data/Gemfile.lock CHANGED

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    viral_seq (1.0.9)
+    viral_seq (1.0.10)
       colorize (~> 0.1)
       muscle_bio (~> 0.4)

data/README.md CHANGED

@@ -4,72 +4,76 @@ A Ruby Gem containing bioinformatics tools for processing viral NGS data.
 Specifically for Primer-ID sequencing and HIV drug resistance analysis.
-## Installation
+## Install
+```bash
     $ gem install viral_seq
+```
 ## Usage
-#### Load all ViralSeq classes by requiring 'viral_seq.rb'
+### Excutables
-```ruby
-#!/usr/bin/env ruby
-require 'viral_seq'
-```
-#### Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
+Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
+```bash
     $ locator -i sequence.fasta -o sequence.fasta.csv
+```
+Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data.
-#### Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data. Parameter json file can be generated using `tcs_json_generator` or at https://tcs-dr-dept-tcs.cloudapps.unc.edu/generator.php
-    $ tcs params.json
-#### Use executable `tcs_json_generator` to generate params .json file for the `tcs` pipeline.
+```bash
+    $ tcs -p params.json # run TCS pipeline with params.json
+    $ tcs -j # CLI to generate params.json
+    $ tcs -h # print out the help
+```
-    $ tcs_json_generator
+## Some Examples
+Load all ViralSeq classes by requiring 'viral_seq.rb' in your Ruby scripts.
-## Some Examples
+```ruby
+#!/usr/bin/env ruby
+require 'viral_seq'
+```
-#### Load nucleotide sequences from a FASTA format sequence file
+Load nucleotide sequences from a FASTA format sequence file
 ```ruby
 my_seqhash = ViralSeq::SeqHash.fa('my_seq_file.fasta')
 ```
-#### Make an alignment (using MUSCLE)
+Make an alignment (using MUSCLE)
 ```ruby
 aligned_seqhash = my_seqhash.align
 ```
-#### Filter nucleotide sequences with the reference coordinates (HIV Protease)
+Filter nucleotide sequences with the reference coordinates (HIV Protease)
 ```ruby
 qc_seqhash = aligned_seqhash.hiv_seq_qc(2253, 2549, false, :HXB2)
 ```
-#### Further filter out sequences with Apobec3g/f hypermutations
+Further filter out sequences with Apobec3g/f hypermutations
 ```ruby
 qc_seqhash = qc_seqhash.a3g
 ```
-#### Calculate nucleotide diveristy π
+Calculate nucleotide diveristy π
 ```ruby
 qc_seqhash.pi
 ```
-#### Calculate cut-off for minority variants based on Poisson model
+Calculate cut-off for minority variants based on Poisson model
 ```ruby
 cut_off = qc_seqhash.pm
 ```
-#### Examine for drug resistance mutations for HIV PR region
+Examine for drug resistance mutations for HIV PR region
 ```ruby
 qc_seqhash.sdrm_hiv_pr(cut_off)
@@ -77,13 +81,22 @@ qc_seqhash.sdrm_hiv_pr(cut_off)
 ## Updates
-Version 1.0.9-07182020:
+### Version 1.1.0-11112020:
+  1. Modularize TCS pipeline. Move key functions into /viral_seq/tcs_core.rb
+  2. `tcs_json_generator` is removed. This CLI is delivered within the `tcs` pipeline, by running `tcs -j`. The scripts are included in the /viral_seq/tcs_json.rb
+  3. consensus model now includes a true simple majority model, where no nt needs to be over 50% to be called.
+  4. a few optimizations.
+  5. TCS 2.1.0 delivered.
+  6. Tried parallel processing. Cannot achieve the goal because `parallel` gem by default can't pool data from memory of child processors and `in_threads` does not help with the speed.
+### Version 1.0.9-07182020:
   1. Change ViralSeq::SeqHash#stop_codon and ViralSeq::SeqHash#a3g_hypermut return value to hash object.
   2. TCS pipeline updated to version 2.0.1. Add optional `export_raw: TRUE/FALSE` in json params. If `export_raw` is `TRUE`, raw sequence reads (have to pass quality filters) will be exported, along with TCS reads.
-Version 1.0.8-02282020:
+### Version 1.0.8-02282020:
   1. TCS pipeline (version 2.0.0) added as executable.
       tcs  -  main TCS pipeline script.
@@ -94,14 +107,14 @@ Version 1.0.8-02282020:
   3. Bug fix for several methods.
-Version 1.0.7-01282020:
+### Version 1.0.7-01282020:
   1. Several methods added, including
       ViralSeq::SeqHash#error_table
       ViralSeq::SeqHash#random_select
   2. Improved performance for several functions.
-Version 1.0.6-07232019:
+### Version 1.0.6-07232019:
   1. Several methods added to ViralSeq::SeqHash, including
       ViralSeq::SeqHash#size
@@ -110,33 +123,33 @@ Version 1.0.6-07232019:
       ViralSeq::SeqHash#mutation
   2. Update documentations and rspec samples.
-Version 1.0.5-07112019:
+### Version 1.0.5-07112019:
   1. Update ViralSeq::SeqHash#sequence_locator.
      Program will try to determine the direction (`+` or `-` of the query sequence)
   2. update executable `locator` to have a column of `direction` in output .csv file
-Version 1.0.4-07102019:
+### Version 1.0.4-07102019:
   1. Use home directory (Dir.home) instead of the directory of the script file for temp MUSCLE file.
   2. Fix bugs in bin `locator`
-Version 1.0.3-07102019:
+### Version 1.0.3-07102019:
   1. Bug fix.
-Version 1.0.2-07102019:
+### Version 1.0.2-07102019:
   1. Fixed a gem loading issue.
-Version 1.0.1-07102019:
+### Version 1.0.1-07102019:
   1. Add keyword argument :model to ViralSeq::SeqHashPair#join2.
   2. Add method ViralSeq::SeqHash#sequence_locator (also: #loc), a function to locate sequences on HIV/SIV reference genomes, as HIV Sequence Locator from LANL.
   3. Add executable 'locator'. An HIV/SIV sequence locator tool similar to LANL Sequence Locator.
   4. update documentations
-Version 1.0.0-07092019:
+### Version 1.0.0-07092019:
   1. Rewrote the whole ViralSeq gem, grouping methods into modules and classes under main Module::ViralSeq

data/bin/tcs CHANGED

@@ -28,114 +28,79 @@
 require 'viral_seq'
 require 'json'
 require 'colorize'
+require 'OptionParser'
+options = {}
-# calculate consensus cutoff
+banner = '-'*50 + "\n" +
+        '| The TCS Pipeline ' + "Version #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |' + "\n" +
+        '-'*50 + "\n"
-def calculate_cut_off(m, error_rate = 0.02)
-  n = 0
-  case error_rate
-  when 0.005...0.015
-    if m <= 10
-      n = 2
-    else
-      n = 1.09*10**-26*m**6 + 7.82*10**-22*m**5 - 1.93*10**-16*m**4 + 1.01*10**-11*m**3 - 2.31*10**-7*m**2 + 0.00645*m + 2.872
-    end
+OptionParser.new do |opts|
+  opts.banner = banner + "Usage: tcs -j"
+  opts.on "-j", "--json_generator", "Command line interfac to generate new params json file" do |j|
+    options[:json_generator] = true
+  end
-  when 0...0.005
-    if m <= 10
-      n = 2
-    else
-      n = -9.59*10**-27*m**6 + 3.27*10**-21*m**5 - 3.05*10**-16*m**4 + 1.2*10**-11*m**3 - 2.19*10**-7*m**2 + 0.004044*m + 2.273
-    end
+  opts.on("-p", "--params PARAMS_JSON", "Execute the pipeline with input params json file") do |p|
+    options[:params_json] = p
+  end
-  else
-    if m <= 10
-      n = 2
-    elsif m <= 8500
-      n = -1.24*10**-21*m**6 + 3.53*10**-17*m**5 - 3.90*10**-13*m**4 + 2.12*10**-9*m**3 - 6.06*10**-6*m**2 + 1.80*10**-2*m + 3.15
-    else
-      n = 0.0079 * m + 9.4869
-    end
+  opts.on("-h", "--help", "Prints this help") do
+    puts opts
+    exit
   end
-  n = n.round
-  n = 2 if n < 3
-  return n
-end
+  opts.on("-v", "--version", "Version info") do
+    puts "tcs version: " + ViralSeq::TCS_VERSION.red.bold
+    puts "viral_seq version: " + ViralSeq::VERSION.red.bold
+    exit
+  end
-puts "\n" + '-'*50
-puts '| The TCS Pipeline ' + "Version #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
-puts '-'*50 + "\n"
+  # opts.on("--no-parallel", "toggle off parallel processing") do
+  #   options[:no_parallel] = true
+  # end
+end.parse!
-unless ARGV[0]
-  raise "No JSON param file found. Script terminated."
+if options[:json_generator]
+  params = ViralSeq::TcsJson.generate
+elsif (options[:params_json] && File.exist?(options[:params_json]))
+  params = JSON.parse(File.read(options[:params_json]), symbolize_names: true)
+else
+  abort "No params JSON file found. Script terminated.".red
 end
-params = JSON.parse(File.read(ARGV[0]), symbolize_names: true)
 indir = params[:raw_sequence_dir]
 unless File.exist?(indir)
-  raise "No input sequence directory found. Script terminated."
-end
-libname = File.basename(indir)
-# obtain R1 and R2 file path
-files = []
-Dir.chdir(indir) do
-  files = Dir.glob("*")
+  abort "No input sequence directory found. Script terminated.".red.bold
 end
-if files.empty?
-  raise "Input dir does not contain files. Script terminated."
-end
+# log file
-r1_f = ""
-r2_f = ""
-# unzip .fasta.gz
-def unzip_r(indir, f)
-  r_file = indir + "/" + f
-  if f =~ /.gz/
-    `gzip -d #{r_file}`
-    new_f = f.sub ".gz", ""
-    r_file = File.join(indir, new_f)
-  end
-  return r_file
-end
 runtime_log_file = File.join(indir,"runtime.log")
 log = File.open(runtime_log_file, "w")
 log.puts "TSC pipeline Version " + ViralSeq::TCS_VERSION.to_s
 log.puts "viral_seq Version " + ViralSeq::VERSION.to_s
 log.puts Time.now.to_s + "\t" + "Start TCS pipeline..."
+libname = File.basename indir
-files.each do |f|
-  t = f.split("_")
-  if t.size == 1
-    tag = f
-  else
-    tag = f.split("_")[1..-1].join("_")
-  end
+seq_files = ViralSeq::TcsCore.r1r2 indir
-  if tag =~ /r1/i
-    r1_f = unzip_r(indir, f)
-  elsif tag =~ /r2/i
-    r2_f = unzip_r(indir, f)
-  end
-end
-unless File.exist?(r1_f)
-  log.puts "R1 file not found. Script terminated."
-  raise "R1 file not found. Script terminated."
+if seq_files[:r1_file].size > 0 and seq_files[:r2_file].size > 0
+  r1_f = seq_files[:r1_file]
+  r2_f = seq_files[:r2_file]
+elsif seq_files[:r1_file].size > 0 and seq_files[:r2_file].empty?
+  exit_sig = "Missing R2 file. Aborted."
+elsif seq_files[:r2_file].size > 0 and seq_files[:r1_file].empty?
+  exit_sig = "Missing R1 file. Aborted."
+else
+  exit_sig = "Cannot determine R1 R2 file in #{indir}. Aborted."
 end
-unless File.exist?(r2_f)
-  log.puts "R2 file not found. Script terminated."
-  raise "R2 file not found. Script terminated."
+if exit_sig
+  ViralSeq::TcsCore.log_and_abort log, exit_sig
 end
 r1_fastq_sh = ViralSeq::SeqHash.fq(r1_f)
@@ -152,10 +117,10 @@ end
 primers = params[:primer_pairs]
 if primers.empty?
-  log.puts "No primer information. Script terminated."
-  raise "No primer information. Script terminated."
+  ViralSeq::TcsCore.log_and_abort log, "No primer information. Script terminated."
 end
 primers.each do |primer|
   summary_json = {}
   summary_json[:tcs_version] = ViralSeq::TCS_VERSION
@@ -179,66 +144,25 @@ primers.each do |primer|
   summary_json[:cdan_primer] = cdna_primer
   summary_json[:forward_primer] = forward_primer
-  primer[:majority] ? majority_cut_off = primer[:majority] : majority_cut_off = 0.5
+  primer[:majority] ? majority_cut_off = primer[:majority] : majority_cut_off = 0
   summary_json[:majority_cut_off] = majority_cut_off
   summary_json[:total_raw_sequence] = raw_sequence_number
   log.puts Time.now.to_s + "\t" +  "Porcessing #{region}..."
-  r1_raw = r1_fastq_sh.dna_hash
-  r2_raw = r2_fastq_sh.dna_hash
+  # filter R1
   log.puts Time.now.to_s + "\t" +  "filtering R1..."
-  # obtain biological forward primer sequence
-  if forward_primer.match(/(N+)(\w+)$/)
-    forward_n = $1.size
-    forward_bio_primer = $2
-  else
-    forward_n = 0
-    forward_bio_primer = forward_primer
-  end
-  forward_bio_primer_size = forward_bio_primer.size
-  forward_starting_number = forward_n + forward_bio_primer_size
-  # filter R1 sequences with forward primers.
-  forward_primer_ref = forward_bio_primer.nt_parser
-  r1_passed_seq = {}
-  r1_raw.each do |name,seq|
-    next if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
-    next if seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
-    next if seq =~ /T{11}/ # a string of poly-T indicates adaptor sequence
-    primer_region_seq = seq[forward_n, forward_bio_primer_size]
-    if primer_region_seq =~ forward_primer_ref
-      r1_passed_seq[name.split("\s")[0]] = seq
-    end
-  end
+  filter_r1 = ViralSeq::TcsCore.filter_r1(r1_fastq_sh, forward_primer)
+  r1_passed_seq = filter_r1[:r1_passed_seq]
   log.puts Time.now.to_s + "\t" +  "R1 filtered: #{r1_passed_seq.size.to_s}"
   summary_json[:r1_filtered_raw] = r1_passed_seq.size
+  # filter R2
   log.puts Time.now.to_s + "\t" +  "filtering R2..."
-  # obtain biological reverse primer sequence
-  cdna_primer.match(/(N+)(\w+)$/)
-  pid_length = $1.size
-  cdna_bio_primer = $2
-  cdna_bio_primer_size = cdna_bio_primer.size
-  reverse_starting_number = pid_length + cdna_bio_primer_size
-  # filter R2 sequences with cDNA primers.
-  cdna_primer_ref = cdna_bio_primer.nt_parser
-  r2_passed_seq = {}
-  r2_raw.each do |name, seq|
-    next if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
-    next if seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
-    next if seq =~ /T{11}/ # a string of poly-T indicates adaptor sequence
-    primer_region_seq = seq[pid_length, cdna_bio_primer_size]
-    if primer_region_seq =~ cdna_primer_ref
-      r2_passed_seq[name.split("\s")[0]] = seq
-    end
-  end
+  filter_r2 = ViralSeq::TcsCore.filter_r2(r2_fastq_sh, cdna_primer)
+  r2_passed_seq = filter_r2[:r2_passed_seq]
+  pid_length = filter_r2[:pid_length]
   log.puts Time.now.to_s + "\t" +  "R2 filtered: #{r2_passed_seq.size.to_s}"
   summary_json[:r2_filtered_raw] = r2_passed_seq.size
@@ -257,8 +181,8 @@ primers.each do |primer|
     r2_seq = r2_passed_seq[seqtag]
     pid = r2_seq[0, pid_length]
     id[seqtag] = pid
-    bio_r2[seqtag] = r2_seq[reverse_starting_number..-2]
-    bio_r1[seqtag] = r1_seq[forward_starting_number..-2]
+    bio_r2[seqtag] = r2_seq[filter_r2[:reverse_starting_number]..-2]
+    bio_r1[seqtag] = r1_seq[filter_r1[:forward_starting_number]..-2]
   end
   # TCS cut-off
@@ -278,11 +202,10 @@ primers.each do |primer|
   end
   max_id = primer_id_dis.keys.sort[-5..-1].mean
-  consensus_cutoff = calculate_cut_off(max_id,error_rate)
+  consensus_cutoff = ViralSeq::TcsCore.calculate_cut_off(max_id,error_rate)
   log.puts Time.now.to_s + "\t" +  "Consensus cut-off is #{consensus_cutoff.to_s}"
   summary_json[:consensus_cutoff] = consensus_cutoff
   summary_json[:length_of_pid] = pid_length
   log.puts Time.now.to_s + "\t" +  "Creating consensus..."
   # Primer ID over the cut-off
@@ -355,6 +278,8 @@ primers.each do |primer|
     consensus_name = ">" + primer_id + "_" + seq_with_same_primer_id.size.to_s + "_" + libname + "_" + region
     r1_consensus = ViralSeq::SeqHash.array(r1_sub_seq).consensus(majority_cut_off)
     r2_consensus = ViralSeq::SeqHash.array(r2_sub_seq).consensus(majority_cut_off)
+    # hide the following two lines if allowing sequence to have ambiguities.
     next if r1_consensus =~ /[^ATCG]/
     next if r2_consensus =~ /[^ATCG]/
@@ -404,6 +329,7 @@ primers.each do |primer|
   f1.close
   f2.close
+  # Primer ID distribution in .json file
   out_pid_json = File.join(out_dir_set, 'primer_id.json')
   pid_json = {}
   pid_json[:primer_id_in_use] = Hash[*(primer_id_in_use.sort_by {|k, v| [-v,k]}.flatten)]
@@ -413,11 +339,14 @@ primers.each do |primer|
     f.puts JSON.pretty_generate(pid_json)
   end
+  # start end-join
   def end_join(dir, option, overlap)
     shp = ViralSeq::SeqHashPair.fa(dir)
     case option
     when 1
       joined_sh = shp.join1()
+    when 2
+      joined_sh = shp.join1(overlap)
     when 3
       joined_sh = shp.join2
     when 4
@@ -489,9 +418,10 @@ primers.each do |primer|
       joined_sh = joined_sh.hiv_seq_qc(ref_start, ref_end, indel, ref_genome)
       if export_raw
-        joined_sh_raw = joined_sh.hiv_seq_qc(ref_start, ref_end, indel, ref_genome)
+        joined_sh_raw = joined_sh_raw.hiv_seq_qc(ref_start, ref_end, indel, ref_genome)
       end
     end
     log.puts Time.now.to_s + "\t" + "Paired TCS number after QC based on reference genome: " + joined_sh.size.to_s
     summary_json[:combined_tcs_after_qc] = joined_sh.size
     if primer[:trim]
@@ -499,10 +429,11 @@ primers.each do |primer|
       trim_end = primer[:trim_ref_end]
       trim_ref = primer[:trim_ref].to_sym
       joined_sh = joined_sh.trim(trim_start, trim_end, trim_ref)
-    end
-    joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
-    if export_raw
-      joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.fasta"))
+      joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
+      if export_raw
+        joined_sh_raw = joined_sh_raw.trim(trim_start, trim_end, trim_ref)
+        joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
+      end
     end
   end

data/lib/viral_seq.rb CHANGED

@@ -35,5 +35,8 @@ require_relative "viral_seq/seq_hash_pair"
 require_relative "viral_seq/sequence"
 require_relative "viral_seq/string"
 require_relative "viral_seq/version"
+require_relative "viral_seq/tcs_core"
+require_relative "viral_seq/tcs_json"
 require "muscle_bio"

data/lib/viral_seq/seq_hash.rb CHANGED

@@ -9,7 +9,7 @@ module ViralSeq
   #     # align with MUSCLE
   #   filtered_seqhash = aligned_pr_seqhash.hiv_seq_qc(2253, 2549, false, :HXB2)
   #     # filter nt sequences with the reference coordinates
-  #   filtered_seqhash = aligned_pr_seqhash.stop_codon[1]
+  #   filtered_seqhash = aligned_pr_seqhash.stop_codon[:without_stop_codon]
   #     # return a new ViralSeq::SeqHash object without stop codons
   #   filtered_seqhash = filtered_seqhash.a3g[1]
   #     # further filter out sequences with A3G hypermutations
@@ -351,7 +351,7 @@ module ViralSeq
     # create one consensus sequence from @dna_hash with an optional majority cut-off for mixed bases.
-    # @param cutoff [Float] majority cut-off for calling consensus bases. defult at simple majority (0.5), position with 15% "A" and 85% "G" will be called as "G" with 20% cut-off and as "R" with 10% cut-off.
+    # @param cutoff [Float] majority cut-off for calling consensus bases. defult at (0.5), position with 15% "A" and 85% "G" will be called as "G" with 20% cut-off and as "R" with 10% cut-off. Using (0) will return use simply majority rule (no cutoff)
     # @return [String] consensus sequence
     # @example consensus sequence from an array of sequences.
     #   seq_array = %w{ ATTTTTTTTT
@@ -383,11 +383,18 @@ module ViralSeq
         base_count = all_base.count_freq
         max_base_list = []
-        base_count.each do |k,v|
-          if v/seq_size.to_f >= cutoff
-            max_base_list << k
+        if cutoff.zero?
+          max_count = base_count.values.max
+          max_base_hash = base_count.select {|_k,v| v == max_count}
+          max_base_list = max_base_hash.keys
+        else
+          base_count.each do |k,v|
+            if v/seq_size.to_f >= cutoff
+              max_base_list << k
+            end
           end
         end
         consensus_seq += call_consensus_base(max_base_list)
       end
       return consensus_seq
@@ -398,7 +405,7 @@ module ViralSeq
     #   # control pattern: G[YN|RC] -> A[YN|RC]
     #   # use the sample consensus to determine potential a3g sites
     #   # Two criteria to identify hypermutation
-    #   # 1. Fisher's exact test on the frequencies of G to A mutation at A3G positons vs. non-A3G positions
+    #   # 1. Fisher's exact test on the frequencies of G to A mutation at A3G positions vs. non-A3G positions
     #   # 2. Poisson distribution of G to A mutations at A3G positions, outliers sequences
     #   # note:  criteria 2 only applies on a sequence file containing more than 20 sequences,
     #   #        b/c Poisson model does not do well on small sample size.

data/lib/viral_seq/seq_hash_pair.rb CHANGED

@@ -80,6 +80,12 @@ module ViralSeq
       alias_method :fa, :new_from_fasta
     end
+    # the size of nt sequence hash of the SeqHashPair object
+    # @return [Integer] size of nt sequence hash of the SeqHash object
+    def size
+      self.dna_hash.size
+    end
     # Pair-end join function for KNOWN overlap size.
     # @param overlap [Integer] how many bases are overlapped. `0` means no overlap, R1 and R2 will be simply put together.
     # @param diff [Integer, Float] the maximum mismatch rate allowed for the overlapping region. default at 0.0, i.e. no mis-match allowed.

data/lib/viral_seq/tcs_core.rb ADDED

@@ -0,0 +1,303 @@
+module ViralSeq
+  # Core functions for `tcs` pipeline
+  class TcsCore
+    class << self
+      # methods to calculate TCS consensus cut-off based on the maximum numbers of PIDs and platform error rate.
+      def calculate_cut_off(m, error_rate = 0.02)
+        n = 0
+        case error_rate
+        when 0.005...0.015
+          if m <= 10
+            n = 2
+          else
+            n = 1.09*10**-26*m**6 + 7.82*10**-22*m**5 - 1.93*10**-16*m**4 + 1.01*10**-11*m**3 - 2.31*10**-7*m**2 + 0.00645*m + 2.872
+          end
+        when 0...0.005
+          if m <= 10
+            n = 2
+          else
+            n = -9.59*10**-27*m**6 + 3.27*10**-21*m**5 - 3.05*10**-16*m**4 + 1.2*10**-11*m**3 - 2.19*10**-7*m**2 + 0.004044*m + 2.273
+          end
+        else
+          if m <= 10
+            n = 2
+          elsif m <= 8500
+            n = -1.24*10**-21*m**6 + 3.53*10**-17*m**5 - 3.90*10**-13*m**4 + 2.12*10**-9*m**3 - 6.06*10**-6*m**2 + 1.80*10**-2*m + 3.15
+          else
+            n = 0.0079 * m + 9.4869
+          end
+        end
+        n = n.round
+        n = 2 if n < 3
+        return n
+      end
+      # identify which file in the directory is R1 file, and which is R2 file based on file names
+      # input as directory (Dir object or a string of path)
+      # by default, .gz files will be unzipped.
+      # return as an hash of {r1_file: file1, r1_file: file2}
+      def r1r2(directory, unzip = true)
+        files = []
+        Dir.chdir(directory) { files = Dir.glob "*" }
+        r1_file = ""
+        r2_file = ""
+        files.each do |f|
+          tag = parser_file_name(f)[:tag]
+          if tag.include? "R1"
+            unzip ? r1_file = unzip_r(directory, f) : r1_file = File.join(directory, f)
+          elsif tag.include? "R2"
+            unzip ? r2_file = unzip_r(directory, f) : r2_file = File.join(directory, f)
+          end
+        end
+        return { r1_file: r1_file, r2_file: r2_file }
+      end # end of ViralSeq:TcsCore.r1r2
+      # sort directories containing mulitple r1 and r2 files.
+      # use the library name (first string before "_") to seperate libraries
+      # out_dir is the Dir object or string of the output directory, by default named as directory + "_sorted"
+      # return a hash as { with_both_r1_r2: [lib1, lib2, ...], missing_r1: [lib1, lib2, ...], missing_r2: [lib1, lib2, ...], error: [lib1, lib2, ...]}
+      def sort_by_lib(directory, out_dir = directory + "_sorted")
+        Dir.mkdir(out_dir) unless File.directory?(out_dir)
+        files = []
+        Dir.chdir(directory) {files = Dir.glob("*")}
+        files.each do |file|
+          path = File.join(directory,file)
+          index = file.split("_")[0]
+          index_dir = File.join(out_dir, index)
+          Dir.mkdir(index_dir) unless File.directory?(index_dir)
+          File.rename(path, File.join(index_dir, file))
+        end
+        return_obj = { with_both_r1_r2: [],
+                       missing_r1: [],
+                       missing_r2: [],
+                       error: []
+                      }
+        libs = []
+        Dir.chdir(out_dir) { libs = Dir.glob('*') }
+        libs.each do |lib|
+          file_check = ViralSeq::TcsCore.r1r2(File.join(out_dir, lib))
+          if !file_check[:r1_file].empty? and !file_check[:r2_file].empty?
+            return_obj[:with_both_r1_r2] << lib
+          elsif file_check[:r1_file].empty? and !file_check[:r2_file].empty?
+            return_obj[:missing_r1] << lib
+          elsif file_check[:r2_file].empty? and !file_check[:r1_file].empty?
+            return_obj[:missing_r2] << lib
+          else
+            return_obj[:error] << lib
+          end
+        end
+        return return_obj
+      end
+      # sort array of file names to determine if there is potential errors
+      # input name_array array of file names
+      # output hash { }
+      def validate_file_name(name_array)
+        errors = { file_type_error: [] ,
+                   missing_r1_file: [] ,
+                   missing_r2_file: [] ,
+                   extra_r1_r2_file: [],
+                   no_region_tag: [] ,
+                   multiple_region_tag: []}
+        passed_libs = {}
+        name_with_r1_r2 = []
+        name_array.each do |name|
+          tag = parser_file_name(name)[:tag]
+          if name !~ /\.fastq\Z|\.fastq\.gz\Z/
+            errors[:file_type_error] << name
+          elsif tag.count("R1") == 0 and tag.count("R2") == 0
+            errors[:no_region_tag] << name
+          elsif tag.count("R1") > 0 and tag.count("R2") > 0
+            errors[:multiple_region_tag] << name
+          elsif tag.count("R1") > 1 or tag.count("R2") > 1
+            errors[:multiple_region_tag] << name
+          else
+            name_with_r1_r2 << name
+          end
+        end
+        libs = {}
+        name_with_r1_r2.map do |name|
+          libname = parser_file_name(name)[:libname]
+          libs[libname] ||= []
+          libs[libname] << name
+        end
+        libs.each do |libname, files|
+          count_r1_file = 0
+          count_r2_file = 0
+          files.each do |name|
+            tag = parser_file_name(name)[:tag]
+            if tag.include? "R1"
+              count_r1_file += 1
+            elsif tag.include? "R2"
+              count_r2_file += 1
+            end
+          end
+          if count_r1_file > 1 or count_r2_file > 1
+            errors[:extra_r1_r2_file] += files
+          elsif count_r1_file.zero?
+            errors[:missing_r1_file] += files
+          elsif count_r2_file.zero?
+            errors[:missing_r2_file] += files
+          else
+            passed_libs[libname] = files
+          end
+        end
+        passed_names = []
+        passed_libs.values.each { |names| passed_names += names}
+        if passed_names.size < name_array.size
+          pass = false
+        else
+          pass = true
+        end
+        return { errors: errors, all_pass: pass, passed_names: passed_names, passed_libs: passed_libs }
+      end
+      # filter r1 raw sequences for non-specific primers.
+      # input r1_sh, SeqHash obj.
+      # return filtered Hash of sequence name and seq pair, in the object { r1_filtered_seq: r1_filtered_seq_pair }
+      def filter_r1(r1_sh, forward_primer)
+        if forward_primer.match(/(N+)(\w+)$/)
+          forward_n = $1.size
+          forward_bio_primer = $2
+        else
+          forward_n = 0
+          forward_bio_primer = forward_primer
+        end
+        forward_bio_primer_size = forward_bio_primer.size
+        forward_starting_number = forward_n + forward_bio_primer_size
+        forward_primer_ref = forward_bio_primer.nt_parser
+        r1_passed_seq = {}
+        r1_raw = r1_sh.dna_hash
+        proc_filter = proc do |name|
+          seq = r1_raw[name]
+          next unless general_filter seq
+          primer_region_seq = seq[forward_n, forward_bio_primer_size]
+          if primer_region_seq =~ forward_primer_ref
+            new_name = remove_tag name
+            r1_passed_seq[new_name] = seq
+          end
+        end
+        r1_raw.keys.map do |name|
+          proc_filter.call name
+        end
+        return { r1_passed_seq: r1_passed_seq, forward_starting_number: forward_starting_number }
+      end # end of filter_r1
+      # filter r2 raw sequences for non-specific primers.
+      # input r2_sh, SeqHash obj.
+      # return filtered Hash of sequence name and seq pair, as well as the length of PID.
+      def filter_r2(r2_sh, cdna_primer)
+        r2_raw = r2_sh.dna_hash
+        cdna_primer.match(/(N+)(\w+)$/)
+        pid_length = $1.size
+        cdna_bio_primer = $2
+        cdna_bio_primer_size = cdna_bio_primer.size
+        reverse_starting_number = pid_length + cdna_bio_primer_size
+        cdna_primer_ref = cdna_bio_primer.nt_parser
+        r2_passed_seq = {}
+        proc_filter = proc do |name|
+          seq = r2_raw[name]
+          next unless general_filter seq
+          primer_region_seq = seq[pid_length, cdna_bio_primer_size]
+          if primer_region_seq =~ cdna_primer_ref
+            new_name = remove_tag name
+            r2_passed_seq[new_name] = seq
+          end
+        end
+        r2_raw.keys.map do |name|
+          proc_filter.call name
+        end
+        return { r2_passed_seq: r2_passed_seq, pid_length: pid_length, reverse_starting_number: reverse_starting_number }
+      end # end of filter_r2
+      # puts error message in the log file handler, and abort with the same infor
+      def log_and_abort(log, infor)
+        log.puts Time.now.to_s + "\t" + infor
+        log.close
+        abort infor.red.bold
+      end
+      private
+      def unzip_r(indir, f)
+        r_file = File.join(indir, f)
+        if f =~ /.gz/
+          `gzip -d #{r_file}`
+          new_f = f.sub ".gz", ""
+          r_file = File.join(indir, new_f)
+        end
+        return r_file
+      end
+      def parser_file_name(file_name)
+        t = file_name.split(".")[0].split("_")
+        if t.size == 1
+          libname = "lib"
+          tag = [ t[0].upcase ]
+        else
+          libname = t[0]
+          tag = t[1..-1].map(&:upcase)
+        end
+        return {libname: libname, tag: tag}
+      end
+      def general_filter(seq)
+        if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
+          return false
+        elsif seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
+          return false
+        elsif seq =~ /T{11}/ # a string of poly-T indicates adaptor sequence
+          return false
+        else
+          return true
+        end
+      end
+      # remove region info tags from the raw MiSeq sequences.
+      def remove_tag(seq_name)
+        if seq_name =~ /\s/
+          new_tag = $`
+        else
+          new_tag = seq_name[0..-3]
+        end
+      end
+    end # end of class << self
+  end # end of TcsCore module
+end # end of main module

data/lib/viral_seq/tcs_json.rb ADDED

@@ -0,0 +1,178 @@
+module ViralSeq
+  class TcsJson
+    class << self
+      def generate
+        puts '-'*58
+        puts '| JSON Parameter Generator for ' + "TCS #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
+        puts '-'*58 + "\n"
+        param = {}
+        puts 'Enter the path to the directory that contains the MiSeq pair-end R1 and R2 .fastq or .fastq.gz file'
+        print '> '
+        param[:raw_sequence_dir] = gets.chomp.rstrip
+        puts 'Enter the estimated platform error rate (for TCS cut-off calculation), default as ' + '0.02'.red.bold
+        print '> '
+        input_error = gets.chomp.rstrip.to_f
+        if input_error == 0.0
+          param[:platform_error_rate] = 0.02
+        else
+          param[:platform_error_rate] = input_error
+        end
+        param[:primer_pairs] = []
+        loop do
+          data = {}
+          puts "Enter the name for the sequenced region: "
+          print '> '
+          data[:region] = gets.chomp.rstrip
+          puts "Enter the #{"cDNA".red.bold} primer sequence: "
+          print '> '
+          data[:cdna] = gets.chomp.rstrip
+          puts "Enter the #{"forward".blue.bold} primer sequence: "
+          print '> '
+          data[:forward] = gets.chomp.rstrip
+          puts "Enter supermajority cut-off (0.5 - 1.0). Default Simple Majority"
+          print '> '
+          mj = gets.chomp.rstrip.to_f
+          if (0.5..1.0).include?(mj)
+            data[:majority] = mj
+          else
+            data[:majority] = 0
+          end
+          print "Need end-join? Y/N \n> "
+          ej = gets.chomp.rstrip
+          if ej =~ /y|yes/i
+            data[:end_join] = true
+            print "End-join option? Choose from (1-4):\n
+            1: simple join, no overlap
+            2: known overlap \n
+            3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap\n
+            4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap\n
+            > "
+            ej_option = gets.chomp.rstrip
+            while ![1,2,3,4].include?(ej_option.to_i)
+              puts "Entered end-join option #{ej_option.red.bold} not valid (choose 1-4), try again"
+              ej_option = gets.chomp.rstrip.to_i
+            end
+            case ej_option.to_i
+            when 1
+              data[:end_join_option] = 1
+              data[:overlap] = 0
+            when 2
+              data[:end_join_option] = 1
+              print "overlap bases: \n> "
+              ol = gets.chomp.rstrip.to_i
+              data[:overlap] = ol
+            when 3
+              data[:end_join_option] = 3
+            when 4
+              data[:end_join_option] = 4
+            end
+            print "Need QC for TCS? (support for HIV-1 and SIV)? Y/N \n> "
+            qc = gets.chomp.rstrip
+            if qc =~ /y|yes/i
+              data[:TCS_QC] = true
+              data[:ref_genome] = get_ref
+              print "reference 5'end ref position or posiiton range, 0 if no need to match this end \n> "
+              data[:ref_start] = gets.chomp.rstrip.to_i
+              print "reference 3'end ref position or posiiton range: 0 if no need to match this end \n> "
+              data[:ref_end] = gets.chomp.rstrip.to_i
+              print "allow indels? (default as yes) Y/N \n> "
+              indel = gets.chomp.rstrip
+              if indel =~ /n|no/i
+                data[:indel] = false
+              else
+                data[:indel] = true
+              end
+            else
+              data[:TCS_QC] = false
+            end
+            print "Need trimming to a reference genome? Y/N \n> "
+            trim_option = gets.chomp.rstrip
+            if trim_option =~ /y|yes/i
+              data[:trim] = true
+              data[:trim_ref] = get_ref
+              print "reference 5'end ref position \n> "
+              data[:trim_ref_start] = gets.chomp.rstrip.to_i
+              print "reference 3'end ref position \n> "
+              data[:trim_ref_end] = gets.chomp.rstrip.to_i
+            else
+              data[:trim] = false
+            end
+          else
+            data[:end_join] = false
+          end
+          param[:primer_pairs] << data
+          print "Do you wish to conintue? Y/N \n> "
+          continue_sig = gets.chomp.rstrip
+          break unless continue_sig =~ /y|yes/i
+        end
+        puts "\nYour JSON string is:"
+        puts JSON.pretty_generate(param)
+        print "\nDo you wish to save it as a file? Y/N \n> "
+        save_option = gets.chomp.rstrip
+        if save_option =~ /y|yes/i
+          print "Path to save JSON file:\n> "
+          path = gets.chomp.rstrip
+          File.open(path, 'w') {|f| f.puts JSON.pretty_generate(param)}
+        end
+        print "\nDo you wish to execute tcs pipeline with the input params now? Y/N \n> "
+        rsp = gets.chomp.rstrip
+        if rsp =~ /y/i
+          return param
+        else
+          abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`"
+        end
+      end
+      private
+      def get_ref
+        puts "Choose reference genome (1-3):"
+        puts "1. HIV-1 HXB2".red.bold
+        puts "2. HIV-1 NL4-3".blue.bold
+        puts "3. SIV MAC239".magenta.bold
+        print "> "
+        ref_option = gets.chomp.rstrip
+        while ![1,2,3].include?(ref_option.to_i)
+          print "Entered end-join option #{ref_option.to_s.red.bold} not valid (choose 1-3), try again\n> "
+          ref_option = gets.chomp.rstrip.to_i
+        end
+        ref = case ref_option.to_i
+              when 1
+                :HXB2
+              when 2
+                :NL43
+              when 3
+                :MAC239
+              end
+      end
+    end
+  end # end TcsJson
+end # end main module

data/lib/viral_seq/version.rb CHANGED

@@ -2,6 +2,6 @@
 # version info and histroy
 module ViralSeq
-  VERSION = "1.0.9"
-  TCS_VERSION = "2.0.1"
+  VERSION = "1.0.10"
+  TCS_VERSION = "2.1.0"
 end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: viral_seq
 version: !ruby/object:Gem::Version
-  version: 1.0.9
+  version: 1.0.10
 platform: ruby
 authors:
 - Shuntai Zhou
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-07-19 00:00:00.000000000 Z
+date: 2020-11-12 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -90,7 +90,6 @@ email:
 executables:
 - locator
 - tcs
-- tcs_json_generator
 extensions: []
 extra_rdoc_files: []
 files:
@@ -105,7 +104,6 @@ files:
 - Rakefile
 - bin/locator
 - bin/tcs
-- bin/tcs_json_generator
 - lib/viral_seq.rb
 - lib/viral_seq/constant.rb
 - lib/viral_seq/enumerable.rb
@@ -120,6 +118,8 @@ files:
 - lib/viral_seq/seq_hash_pair.rb
 - lib/viral_seq/sequence.rb
 - lib/viral_seq/string.rb
+- lib/viral_seq/tcs_core.rb
+- lib/viral_seq/tcs_json.rb
 - lib/viral_seq/version.rb
 - viral_seq.gemspec
 homepage: https://github.com/ViralSeq/viral_seq

data/bin/tcs_json_generator DELETED

@@ -1,166 +0,0 @@
-#!/usr/bin/env ruby
-# TCS pipeline JSON params generator.
-require 'viral_seq'
-require 'colorize'
-require 'json'
-def get_ref
-  puts "Choose reference genome (1-3):"
-  puts "1. HIV-1 HXB2".red.bold
-  puts "2. HIV-1 NL4-3".blue.bold
-  puts "3. SIV MAC239".magenta.bold
-  print "> "
-  ref_option = gets.chomp.rstrip
-  while ![1,2,3].include?(ref_option.to_i)
-    print "Entered end-join option #{ref_option.to_s.red.bold} not valid (choose 1-3), try again\n> "
-    ref_option = gets.chomp.rstrip.to_i
-  end
-  ref = case ref_option.to_i
-        when 1
-          :HXB2
-        when 2
-          :NL43
-        when 3
-          :MAC239
-        end
-end
-puts "\n" + '-'*58
-puts '| JSON Parameter Generator for ' + "TCS #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
-puts '-'*58 + "\n"
-param = {}
-puts 'Enter the path to the directory that contains the MiSeq pair-end R1 and R2 .fastq or .fastq.gz file'
-print '> '
-param[:raw_sequence_dir] = gets.chomp.rstrip
-puts 'Enter the estimated platform error rate (for TCS cut-off calculation), default as ' + '0.02'.red.bold
-print '> '
-input_error = gets.chomp.rstrip.to_f
-if input_error == 0.0
-  param[:platform_error_rate] = 0.02
-else
-  param[:platform_error_rate] = input_error
-end
-param[:primer_pairs] = []
-loop do
-  data = {}
-  puts "Enter the name for the sequenced region: "
-  print '> '
-  data[:region] = gets.chomp.rstrip
-  puts "Enter the #{"cDNA".red.bold} primer sequence: "
-  print '> '
-  data[:cdna] = gets.chomp.rstrip
-  puts "Enter the #{"forward".blue.bold} primer sequence: "
-  print '> '
-  data[:forward] = gets.chomp.rstrip
-  puts "Enter supermajority cut-off (0.5 - 0.9). Default: " + "0.5".blue.bold + " (simple majority)"
-  print '> '
-  mj = gets.chomp.rstrip.to_f
-  if (0.5..0.9).include?(mj)
-    data[:majority] = mj
-  else
-    data[:majority] = 0.5
-  end
-  print "Need end-join? Y/N \n> "
-  ej = gets.chomp.rstrip
-  if ej =~ /y|yes/i
-    data[:end_join] = true
-    print "End-join option? Choose from (1-4):\n
-    1: simple join, no overlap
-    2: known overlap \n
-    3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap\n
-    4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap\n
-    > "
-    ej_option = gets.chomp.rstrip
-    while ![1,2,3,4].include?(ej_option.to_i)
-      puts "Entered end-join option #{ej_option.red.bold} not valid (choose 1-4), try again"
-      ej_option = gets.chomp.rstrip.to_i
-    end
-    case ej_option.to_i
-    when 1
-      data[:end_join_option] = 1
-      data[:overlap] = 0
-    when 2
-      data[:end_join_option] = 1
-      print "overlap bases: \n> "
-      ol = gets.chomp.rstrip.to_i
-      data[:overlap] = ol
-    when 3
-      data[:end_join_option] = 3
-    when 4
-      data[:end_join_option] = 4
-    end
-    print "Need QC for TCS? (support for HIV-1 and SIV)? Y/N \n> "
-    qc = gets.chomp.rstrip
-    if qc =~ /y|yes/i
-      data[:TCS_QC] = true
-      data[:ref_genome] = get_ref
-      print "reference 5'end ref position or posiiton range, 0 if no need to match this end \n> "
-      data[:ref_start] = gets.chomp.rstrip.to_i
-      print "reference 3'end ref position or posiiton range: 0 if no need to match this end \n> "
-      data[:ref_end] = gets.chomp.rstrip.to_i
-      print "allow indels? (default as yes) Y/N \n> "
-      indel = gets.chomp.rstrip
-      if indel =~ /n|no/i
-        data[:indel] = false
-      else
-        data[:indel] = true
-      end
-    else
-      data[:TCS_QC] = false
-    end
-    print "Need trimming to a reference genome? Y/N \n> "
-    trim_option = gets.chomp.rstrip
-    if trim_option =~ /y|yes/i
-      data[:trim] = true
-      data[:trim_ref] = get_ref
-      print "reference 5'end ref position \n> "
-      data[:trim_ref_start] = gets.chomp.rstrip.to_i
-      print "reference 3'end ref position \n> "
-      data[:trim_ref_end] = gets.chomp.rstrip.to_i
-    else
-      data[:trim] = false
-    end
-  else
-    data[:end_join] = false
-  end
-  param[:primer_pairs] << data
-  print "Do you wish to conintue? Y/N \n> "
-  continue_sig = gets.chomp.rstrip
-  break unless continue_sig =~ /y|yes/i
-end
-puts "\nYour JSON string is:"
-puts JSON.pretty_generate(param)
-print "\nDo you wish to save it as a file? Y/N \n> "
-save_option = gets.chomp.rstrip
-if save_option =~ /y|yes/i
-  print "Path to save JSON file:\n> "
-  path = gets.chomp.rstrip
-  File.open(path, 'w') {|f| f.puts JSON.pretty_generate(param)}
-end