viral_seq 1.0.11 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb5906a2a3f0c98fa84a15f7fd4b35160f766317a6603b18c62e2e2476af01fd
4
- data.tar.gz: 435a32d9ce5078b18b633c14c1585c30835b48d3f013d24bc6b9cc98fef51a55
3
+ metadata.gz: 7a283f3a09cc5d9807e7622cd1ddf27197919955e85d6472b34fc14b66749c03
4
+ data.tar.gz: 4f90c5a9c7ea0ec148ba7d45ee88dc441f79da67a97654734194a773499ebb8e
5
5
  SHA512:
6
- metadata.gz: a65fcfe551b59d2f022f96d009f089941a3bd293545e840ba294ce43b3cb087b2f3a7fef26829fe3b229c9469ca4e5c907362bbf05a5384947af97a12409a5aa
7
- data.tar.gz: e4283b03e33aa67feb1dcf623d2613440bcc7299ab33ff77b7dce1ef9617b46e1763d17dda64d54b5fb3c6de37da66880abe43baae932ac36b70d0e2b0cc88d1
6
+ metadata.gz: 385a94eb93c3d8d9116c16a0d8af56ba714ba6191a454076acf881a036de80d1d598f3fcd1a4de841745ca08a1ad3e8bc028a30db9f96c19f3b217ef4583d652
7
+ data.tar.gz: 714d035b6f65863746cafb120c9cf6eccb8261f3eac69985bad96e5275351eec71aa3b744ee9b462e2dc3e0e199c2d4112386f6a2d7eef89b5b7824c1ab769be
data/.gitignore CHANGED
@@ -2,7 +2,6 @@
2
2
  /.yardoc
3
3
  /_yardoc/
4
4
  /coverage/
5
- /doc/
6
5
  /pkg/
7
6
  /spec/reports/
8
7
  /tmp/
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- viral_seq (1.0.10)
4
+ viral_seq (1.0.13)
5
5
  colorize (~> 0.1)
6
6
  muscle_bio (~> 0.4)
7
7
 
data/README.md CHANGED
@@ -1,8 +1,24 @@
1
1
  # ViralSeq
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/viral_seq.svg)](https://rubygems.org/gems/viral_seq)
4
+ ![GitHub](https://img.shields.io/github/license/viralseq/viral_seq)
5
+ ![Gem](https://img.shields.io/gem/dt/viral_seq?color=%23E9967A)
6
+ ![GitHub last commit](https://img.shields.io/github/last-commit/viralseq/viral_seq?color=%2300BFFF)
7
+ [![Join the chat at https://gitter.im/viral_seq/community](https://badges.gitter.im/viral_seq/community.svg)](https://gitter.im/viral_seq/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
8
+
3
9
  A Ruby Gem containing bioinformatics tools for processing viral NGS data.
4
10
 
5
- Specifically for Primer-ID sequencing and HIV drug resistance analysis.
11
+ Specifically for Primer ID sequencing and HIV drug resistance analysis.
12
+
13
+ ## Illustration for the Primer ID Sequencing
14
+
15
+
16
+ ![Primer ID Sequencing](./docs/assets/img/cover.jpg)
17
+
18
+ ### Reference readings on the Primer ID sequencing
19
+ [Explantion of Primer ID sequencing](https://doi.org/10.21769/BioProtoc.3938)
20
+ [Primer ID MiSeq protocol](https://doi.org/10.1128/JVI.00522-15)
21
+ [Application of Primer ID sequencing in COVID-19 research](https://doi.org/10.1126/scitranslmed.abb5883)
6
22
 
7
23
  ## Install
8
24
 
@@ -14,20 +30,55 @@ Specifically for Primer-ID sequencing and HIV drug resistance analysis.
14
30
 
15
31
  ### Excutables
16
32
 
17
- Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
33
+ ### `tcs`
34
+ Use executable `tcs` pipeline to process **Primer ID MiSeq sequencing** data.
18
35
 
36
+ Example commands:
19
37
  ```bash
20
- $ locator -i sequence.fasta -o sequence.fasta.csv
38
+ $ tcs -p params.json # run TCS pipeline with params.json
39
+ $ tcs -p params.json -i DIRECTORY
40
+ # run TCS pipeline with params.json and DIRECTORY
41
+ # if DIRECTORY is not defined in params.json
42
+ $ tcs -dr -i DIRECTORY
43
+ # run tcs-dr (MPID HIV drug resistance sequencing) pipeline
44
+ # DIRECTORY needs to be given.
45
+ $ tcs -j # CLI to generate params.json
46
+ $ tcs -h # print out the help
21
47
  ```
22
48
 
23
- Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data.
49
+ [sample params.json for the tcs-dr pipeline](./docs/dr.json)
50
+
51
+ ---
52
+ ### `tcs_log`
53
+
54
+ Use `tcs_log` script to pool run logs and TCS fasta files after one batch of `tcs` jobs.
24
55
 
56
+
57
+ Example file structure:
58
+ ```
59
+ batch_tcs_jobs/
60
+ ├── lib1
61
+ ├── lib2
62
+ ├── lib3
63
+ ├── lib4
64
+ ├── ...
65
+ ```
66
+
67
+ Example command:
25
68
  ```bash
26
- $ tcs -p params.json # run TCS pipeline with params.json
27
- $ tcs -j # CLI to generate params.json
28
- $ tcs -h # print out the help
69
+ $ tcs_log batch_tcs_jobs
29
70
  ```
30
71
 
72
+ ---
73
+
74
+ ### `locator`
75
+ Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
76
+
77
+ ```bash
78
+ $ locator -i sequence.fasta -o sequence.fasta.csv
79
+ ```
80
+ ---
81
+
31
82
  ## Some Examples
32
83
 
33
84
  Load all ViralSeq classes by requiring 'viral_seq.rb' in your Ruby scripts.
@@ -80,16 +131,47 @@ qc_seqhash.sdrm_hiv_pr(cut_off)
80
131
  ```
81
132
  ## Known issues
82
133
 
83
- 1. have a conflict with rails.
134
+ 1. ~~have a conflict with rails.~~
135
+ 2. ~~Update on 03032021. Still have conflict. But in rails gem file, can just use `requires: false` globally and only require "viral_seq" when the module is needed in controller.~~
136
+ 3. The conflict seems to be resovled. It was from a combination of using `!` as a function for factorial and the gem name `viral_seq`. @_@
84
137
 
85
138
  ## Updates
86
139
 
87
- ### Version 1.1.1-03022021
140
+ ### Version 1.1.1-04012021
141
+
142
+ 1. Added warning when paired_raw_sequence less than 0.1% of total_raw_sequence.
143
+ 2. Added option `-i WORKING_DIRECTORY` to the `tcs` script.
144
+ If the `params.json` file does not contain the path to the working directory, it will append path to the run params.
145
+ 3. Added option `-dr` to the `tcs` script.
146
+
147
+ ### Version 1.1.0-03252021
148
+
149
+ 1. Optimized the algorithm of end-join.
150
+ 2. Fixed a bug in the `tcs` pipeline that sometimes combined tcs files are not saved.
151
+ 3. Added `tcs_log` command to pool run logs and tcs files from one batch of tcs jobs.
152
+ 4. Added the preset of MPID-HIVDR params file [***dr.json***](./docs/dr.json) in /docs.
153
+ 5. Add `platform_format` option in the json generator of the `tcs` Pipeline.
154
+ Users can choose from 3 MiSeq platforms for processing their sequencing data.
155
+ MiSeq 300x7x300 is the default option.
156
+
157
+ ### Version 1.0.14-03052021
158
+
159
+ 1. Add a function `ViralSeq::TcsCore.validate_file_name` to check MiSeq paired-end file names.
160
+
161
+ ### Version 1.0.13-03032021
162
+
163
+ 1. Fixed the conflict with rails.
164
+
165
+ ### Version 1.0.12-03032021
166
+
167
+ 1. Fixed an issue that may cause conflicts with ActiveRecord.
168
+
169
+ ### Version 1.0.11-03022021
88
170
 
89
- 1. Fixed a issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
171
+ 1. Fixed an issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
90
172
  2. fixed an issue loading class 'OptionParser'in some ruby environments.
91
173
 
92
- ### Version 1.1.0-11112020:
174
+ ### Version 1.0.10-11112020:
93
175
 
94
176
  1. Modularize TCS pipeline. Move key functions into /viral_seq/tcs_core.rb
95
177
  2. `tcs_json_generator` is removed. This CLI is delivered within the `tcs` pipeline, by running `tcs -j`. The scripts are included in the /viral_seq/tcs_json.rb
data/bin/tcs CHANGED
@@ -23,7 +23,7 @@
23
23
  # THE SOFTWARE.
24
24
 
25
25
  # Use JSON file as the run param
26
- # run tcs_json_generator.rb to generate param json file.
26
+ # run `tcs -j` to generate param json file.
27
27
 
28
28
  require 'viral_seq'
29
29
  require 'json'
@@ -46,6 +46,14 @@ OptionParser.new do |opts|
46
46
  options[:params_json] = p
47
47
  end
48
48
 
49
+ opts.on("-i", "--input PATH_TO_WORKING_DIRECTORY", "Path to the working directory") do |p|
50
+ options[:input] = p
51
+ end
52
+
53
+ opts.on("-dr", "--dr_pipeline", "HIV drug resistance MPID pipeline") do |p|
54
+ options[:dr] = true
55
+ end
56
+
49
57
  opts.on("-h", "--help", "Prints this help") do
50
58
  puts opts
51
59
  exit
@@ -64,15 +72,21 @@ end.parse!
64
72
 
65
73
  if options[:json_generator]
66
74
  params = ViralSeq::TcsJson.generate
75
+ elsif options[:dr]
76
+ params = ViralSeq::TcsDr::PARAMS
67
77
  elsif (options[:params_json] && File.exist?(options[:params_json]))
68
78
  params = JSON.parse(File.read(options[:params_json]), symbolize_names: true)
69
79
  else
70
80
  abort "No params JSON file found. Script terminated.".red
71
81
  end
72
82
 
73
- indir = params[:raw_sequence_dir]
83
+ if options[:input]
84
+ indir = options[:input]
85
+ else
86
+ indir = params[:raw_sequence_dir]
87
+ end
74
88
 
75
- unless File.exist?(indir)
89
+ unless indir and File.exist?(indir)
76
90
  abort "No input sequence directory found. Script terminated.".red.bold
77
91
  end
78
92
 
@@ -115,6 +129,12 @@ else
115
129
  error_rate = 0.02
116
130
  end
117
131
 
132
+ if params[:platform_format]
133
+ $platform_sequencing_length = params[:platform_format]
134
+ else
135
+ $platform_sequencing_length = 300
136
+ end
137
+
118
138
  primers = params[:primer_pairs]
119
139
  if primers.empty?
120
140
  ViralSeq::TcsCore.log_and_abort log, "No primer information. Script terminated."
@@ -123,6 +143,7 @@ end
123
143
 
124
144
  primers.each do |primer|
125
145
  summary_json = {}
146
+ summary_json[:warnings] = []
126
147
  summary_json[:tcs_version] = ViralSeq::TCS_VERSION
127
148
  summary_json[:viralseq_version] = ViralSeq::VERSION
128
149
  summary_json[:runtime] = Time.now.to_s
@@ -175,6 +196,10 @@ primers.each do |primer|
175
196
  paired_seq_number = common_keys.size
176
197
  log.puts Time.now.to_s + "\t" + "Paired raw sequences are : #{paired_seq_number.to_s}"
177
198
  summary_json[:paired_raw_sequence] = paired_seq_number
199
+ if paired_seq_number < raw_sequence_number * 0.001
200
+ summary_json[:warnings] <<
201
+ "WARNING: Filtered raw sequneces less than 0.1% of the total raw sequences. Possible contamination."
202
+ end
178
203
 
179
204
  common_keys.each do |seqtag|
180
205
  r1_seq = r1_passed_seq[seqtag]
@@ -273,7 +298,6 @@ primers.each do |primer|
273
298
  r1_sub_seq << bio_r1[seq_name]
274
299
  r2_sub_seq << bio_r2[seq_name]
275
300
  end
276
-
277
301
  #consensus name including the Primer ID and number of raw sequences of that Primer ID, library name and setname.
278
302
  consensus_name = ">" + primer_id + "_" + seq_with_same_primer_id.size.to_s + "_" + libname + "_" + region
279
303
  r1_consensus = ViralSeq::SeqHash.array(r1_sub_seq).consensus(majority_cut_off)
@@ -364,6 +388,7 @@ primers.each do |primer|
364
388
  shp = ViralSeq::SeqHashPair.fa(out_dir_consensus)
365
389
  joined_sh = end_join(out_dir_consensus, primer[:end_join_option], primer[:overlap])
366
390
  log.puts Time.now.to_s + "\t" + "Paired TCS number: " + joined_sh.size.to_s
391
+
367
392
  summary_json[:combined_tcs] = joined_sh.size
368
393
 
369
394
  if export_raw
@@ -433,12 +458,15 @@ primers.each do |primer|
433
458
  trim_end = primer[:trim_ref_end]
434
459
  trim_ref = primer[:trim_ref].to_sym
435
460
  joined_sh = joined_sh.trim(trim_start, trim_end, trim_ref)
436
- joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
437
461
  if export_raw
438
462
  joined_sh_raw = joined_sh_raw.trim(trim_start, trim_end, trim_ref)
439
- joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
440
463
  end
441
464
  end
465
+
466
+ joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
467
+ if export_raw
468
+ joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
469
+ end
442
470
  end
443
471
 
444
472
  File.open(outfile_log, "w") do |f|
data/bin/tcs_log ADDED
@@ -0,0 +1,102 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # pool run logs from one batch of tcs jobs
4
+ # file structure:
5
+ # batch_tcs_jobs/
6
+ # ├── lib1
7
+ # ├── lib2
8
+ # ├── lib3
9
+ # ├── lib4
10
+ # ├── ...
11
+ #
12
+ # command example:
13
+ # $ tcs_log batch_tcs_jobs
14
+
15
+ require 'viral_seq'
16
+ require 'pathname'
17
+ require 'json'
18
+ require 'fileutils'
19
+
20
+ indir = ARGV[0].chomp
21
+ indir_basename = File.basename(indir)
22
+ indir_dirname = File.dirname(indir)
23
+
24
+ tcs_dir = File.join(indir_dirname, (indir_basename + "_tcs"))
25
+ Dir.mkdir(tcs_dir) unless File.directory?(tcs_dir)
26
+
27
+ libs = []
28
+ Dir.chdir(indir) {libs = Dir.glob("*")}
29
+
30
+ outdir2 = File.join(tcs_dir, "combined_TCS_per_lib")
31
+ outdir3 = File.join(tcs_dir, "TCS_per_region")
32
+ outdir4 = File.join(tcs_dir, "combined_TCS_per_region")
33
+
34
+ Dir.mkdir(outdir2) unless File.directory?(outdir2)
35
+ Dir.mkdir(outdir3) unless File.directory?(outdir3)
36
+ Dir.mkdir(outdir4) unless File.directory?(outdir4)
37
+
38
+ log_file = File.join(tcs_dir,"log.csv")
39
+ log = File.open(log_file,'w')
40
+
41
+ header = %w{
42
+ lib_name
43
+ Region
44
+ Raw_Sequences_per_barcode
45
+ R1_Raw
46
+ R2_Raw
47
+ Paired_Raw
48
+ Cutoff
49
+ PID_Length
50
+ Consensus1
51
+ Consensus2
52
+ Distinct_to_Raw
53
+ Resampling_index
54
+ Combined_TCS
55
+ Combined_TCS_after_QC
56
+ WARNINGS
57
+ }
58
+
59
+ log.puts header.join(',')
60
+ libs.each do |lib|
61
+ Dir.mkdir(File.join(outdir2, lib)) unless File.directory?(File.join(outdir2, lib))
62
+ fasta_files = []
63
+ json_files = []
64
+ Dir.chdir(File.join(indir, lib)) do
65
+ fasta_files = Dir.glob("**/*.fasta")
66
+ json_files = Dir.glob("**/log.json")
67
+ end
68
+ fasta_files.each do |f|
69
+ path_array = Pathname(f).each_filename.to_a
70
+ region = path_array[0]
71
+ if path_array[-1] == "combined.fasta"
72
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir2, lib, (lib + "_" + region)))
73
+ Dir.mkdir(File.join(outdir4,region)) unless File.directory?(File.join(outdir4,region))
74
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir4, region, (lib + "_" + region)))
75
+ else
76
+ Dir.mkdir(File.join(outdir3,region)) unless File.directory?(File.join(outdir3,region))
77
+ Dir.mkdir(File.join(outdir3,region, lib)) unless File.directory?(File.join(outdir3,region, lib))
78
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir3, region, lib, (lib + "_" + region + "_" + path_array[-1])))
79
+ end
80
+ end
81
+
82
+ json_files.each do |f|
83
+ json_log = JSON.parse(File.read(File.join(indir, lib, f)), symbolize_names: true)
84
+ log.print [lib,
85
+ json_log[:primer_set_name],
86
+ json_log[:total_raw_sequence],
87
+ json_log[:r1_filtered_raw],
88
+ json_log[:r2_filtered_raw],
89
+ json_log[:paired_raw_sequence],
90
+ json_log[:consensus_cutoff],
91
+ json_log[:length_of_pid],
92
+ json_log[:total_tcs_with_ambiguities],
93
+ json_log[:total_tcs],
94
+ json_log[:distinct_to_raw],
95
+ json_log[:resampling_param],
96
+ json_log[:combined_tcs],
97
+ json_log[:combined_tcs_after_qc],
98
+ json_log[:warnings],
99
+ ].join(',') + "\n"
100
+ end
101
+ end
102
+ log.close
Binary file
data/docs/dr.json ADDED
@@ -0,0 +1,67 @@
1
+ {
2
+ "platform_error_rate": 0.02,
3
+ "primer_pairs": [
4
+ {
5
+ "region": "RT",
6
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
7
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
8
+ "majority": 0.5,
9
+ "end_join": true,
10
+ "end_join_option": 1,
11
+ "overlap": 0,
12
+ "TCS_QC": true,
13
+ "ref_genome": "HXB2",
14
+ "ref_start": 2648,
15
+ "ref_end": 3257,
16
+ "indel": true,
17
+ "trim": false
18
+ },
19
+ {
20
+ "region": "PR",
21
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
22
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
23
+ "majority": 0.5,
24
+ "end_join": true,
25
+ "end_join_option": 3,
26
+ "TCS_QC": true,
27
+ "ref_genome": "HXB2",
28
+ "ref_start": 0,
29
+ "ref_end": 2591,
30
+ "indel": true,
31
+ "trim": true,
32
+ "trim_ref": "HXB2",
33
+ "trim_ref_start": 2253,
34
+ "trim_ref_end": 2549
35
+ },
36
+ {
37
+ "region": "IN",
38
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
39
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
40
+ "majority": 0.5,
41
+ "end_join": true,
42
+ "end_join_option": 3,
43
+ "overlap": 171,
44
+ "TCS_QC": true,
45
+ "ref_genome": "HXB2",
46
+ "ref_start": 4384,
47
+ "ref_end": 4751,
48
+ "indel": false,
49
+ "trim": false
50
+ },
51
+ {
52
+ "region": "V1V3",
53
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
54
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
55
+ "majority": 0.5,
56
+ "end_join": true,
57
+ "end_join_option": 1,
58
+ "overlap": 0,
59
+ "TCS_QC": true,
60
+ "ref_genome": "HXB2",
61
+ "ref_start": 6585,
62
+ "ref_end": 7208,
63
+ "indel": true,
64
+ "trim": false
65
+ }
66
+ ]
67
+ }
data/lib/viral_seq.rb CHANGED
@@ -37,6 +37,6 @@ require_relative "viral_seq/string"
37
37
  require_relative "viral_seq/version"
38
38
  require_relative "viral_seq/tcs_core"
39
39
  require_relative "viral_seq/tcs_json"
40
-
40
+ require_relative "viral_seq/tcs_dr"
41
41
 
42
42
  require "muscle_bio"
@@ -3,10 +3,6 @@
3
3
  # array = [1,2,3,4,5,6,7,8,9,10]
4
4
  # array.median
5
5
  # => 5.5
6
- # @example sum
7
- # array = [1,2,3,4,5,6,7,8,9,10]
8
- # array.sum
9
- # => 55
10
6
  # @example average number (mean)
11
7
  # array = [1,2,3,4,5,6,7,8,9,10]
12
8
  # array.mean
@@ -45,12 +41,6 @@ module Enumerable
45
41
  len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
46
42
  end
47
43
 
48
- # generate summed value
49
- # @return [Numeric] summed value
50
- def sum
51
- self.inject(0){|accum, i| accum + i }
52
- end
53
-
54
44
  # generate mean number
55
45
  # @return [Float] mean value
56
46
  def mean
@@ -67,7 +67,7 @@ module ViralSeq
67
67
  @k = k
68
68
  @poisson_hash = {}
69
69
  (0..k).each do |n|
70
- p = (rate**n * ::Math::E**(-rate))/!n
70
+ p = (rate**n * ::Math::E**(-rate))/n.factorial
71
71
  @poisson_hash[n] = p
72
72
  end
73
73
  end
@@ -155,9 +155,9 @@ class Integer
155
155
  # factorial method for an Integer
156
156
  # @return [Integer] factorial for given Integer
157
157
  # @example factorial for 5
158
- # !5
158
+ # 5.factorial
159
159
  # => 120
160
- def !
160
+ def factorial
161
161
  if self == 0
162
162
  return 1
163
163
  else
@@ -394,7 +394,6 @@ module ViralSeq
394
394
  end
395
395
  end
396
396
  end
397
-
398
397
  consensus_seq += call_consensus_base(max_base_list)
399
398
  end
400
399
  return consensus_seq
@@ -742,6 +741,7 @@ module ViralSeq
742
741
  seq_hash_unique_pass = []
743
742
 
744
743
  seq_hash_unique.each do |seq|
744
+ next if seq.nil?
745
745
  loc = ViralSeq::Sequence.new('', seq).locator(ref_option, path_to_muscle)
746
746
  next unless loc # if locator tool fails, skip this seq.
747
747
  if start_nt.include?(loc[0]) && end_nt.include?(loc[1])
@@ -110,19 +110,21 @@ module ViralSeq
110
110
  raise ArgumentError.new(":overlap has to be Integer, input #{overlap} invalid.") unless overlap.is_a? Integer
111
111
  raise ArgumentError.new(":diff has to be float or integer, input #{diff} invalid.") unless (diff.is_a? Integer or diff.is_a? Float)
112
112
  joined_seq = {}
113
- seq_pair_hash.uniq_hash.each do |seq_pair, seq_names|
113
+ seq_pair_hash.each do |seq_name,seq_pair|
114
114
  r1_seq = seq_pair[0]
115
115
  r2_seq = seq_pair[1]
116
116
  if overlap.zero?
117
117
  joined_sequence = r1_seq + r2_seq
118
+ elsif diff.zero?
119
+ if r1_seq[-overlap..-1] == r2_seq[0,overlap]
120
+ joined_sequence= r1_seq + r2_seq[overlap..-1]
121
+ end
118
122
  elsif r1_seq[-overlap..-1].compare_with(r2_seq[0,overlap]) <= (overlap * diff)
119
123
  joined_sequence= r1_seq + r2_seq[overlap..-1]
120
124
  else
121
125
  next
122
126
  end
123
- seq_names.each do |seq_name|
124
- joined_seq[seq_name] = joined_sequence
125
- end
127
+ joined_seq[seq_name] = joined_sequence if joined_sequence
126
128
  end
127
129
 
128
130
  joined_seq_hash = ViralSeq::SeqHash.new
@@ -102,9 +102,9 @@ module ViralSeq
102
102
  end
103
103
 
104
104
  # sort array of file names to determine if there is potential errors
105
- # input name_array array of file names
106
- # output hash { }
107
- # need to change for each file name have an error code. and a bool to show if all pass
105
+ # @param name_array [Array] array of file names
106
+ # @return [hash] name check results
107
+
108
108
  def validate_file_name(name_array)
109
109
  errors = {
110
110
  file_type_error: [] ,
@@ -165,6 +165,13 @@ module ViralSeq
165
165
  end
166
166
  end
167
167
 
168
+ file_name_with_lib_name = {}
169
+ passed_libs.each do |lib_name, files|
170
+ files.each do |f|
171
+ file_name_with_lib_name[f] = lib_name
172
+ end
173
+ end
174
+
168
175
  passed_names = []
169
176
 
170
177
  passed_libs.values.each { |names| passed_names += names}
@@ -175,7 +182,27 @@ module ViralSeq
175
182
  pass = true
176
183
  end
177
184
 
178
- return { errors: errors, all_pass: pass, passed_names: passed_names, passed_libs: passed_libs }
185
+ file_name_with_error_type = {}
186
+
187
+ errors.each do |type, files|
188
+ files.each do |f|
189
+ file_name_with_error_type[f] ||= []
190
+ file_name_with_error_type[f] << type.to_s.tr("_", "\s")
191
+ end
192
+ end
193
+
194
+ file_check = []
195
+
196
+ name_array.each do |name|
197
+ file_check_hash = {}
198
+ file_check_hash[:fileName] = name
199
+ file_check_hash[:errors] = file_name_with_error_type[name]
200
+ file_check_hash[:libName] = file_name_with_lib_name[name]
201
+
202
+ file_check << file_check_hash
203
+ end
204
+
205
+ return { allPass: pass, files: file_check }
179
206
  end
180
207
 
181
208
  # filter r1 raw sequences for non-specific primers.
@@ -278,7 +305,9 @@ module ViralSeq
278
305
  end
279
306
 
280
307
  def general_filter(seq)
281
- if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
308
+ if seq.size < $platform_sequencing_length
309
+ return false
310
+ elsif seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
282
311
  return false
283
312
  elsif seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
284
313
  return false
@@ -0,0 +1,71 @@
1
+ module ViralSeq
2
+
3
+ class TcsDr
4
+ PARAMS = {:platform_error_rate=>0.02,
5
+ :primer_pairs=>
6
+ [{:region=>"RT",
7
+ :cdna=>
8
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
9
+ :forward=>
10
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
11
+ :majority=>0.5,
12
+ :end_join=>true,
13
+ :end_join_option=>1,
14
+ :overlap=>0,
15
+ :TCS_QC=>true,
16
+ :ref_genome=>"HXB2",
17
+ :ref_start=>2648,
18
+ :ref_end=>3257,
19
+ :indel=>true,
20
+ :trim=>false},
21
+ {:region=>"PR",
22
+ :cdna=>
23
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
24
+ :forward=>
25
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
26
+ :majority=>0.5,
27
+ :end_join=>true,
28
+ :end_join_option=>3,
29
+ :TCS_QC=>true,
30
+ :ref_genome=>"HXB2",
31
+ :ref_start=>0,
32
+ :ref_end=>2591,
33
+ :indel=>true,
34
+ :trim=>true,
35
+ :trim_ref=>"HXB2",
36
+ :trim_ref_start=>2253,
37
+ :trim_ref_end=>2549},
38
+ {:region=>"IN",
39
+ :cdna=>
40
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
41
+ :forward=>"GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
42
+ :majority=>0.5,
43
+ :end_join=>true,
44
+ :end_join_option=>3,
45
+ :overlap=>171,
46
+ :TCS_QC=>true,
47
+ :ref_genome=>"HXB2",
48
+ :ref_start=>4384,
49
+ :ref_end=>4751,
50
+ :indel=>false,
51
+ :trim=>false},
52
+ {:region=>"V1V3",
53
+ :cdna=>
54
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
55
+ :forward=>
56
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
57
+ :majority=>0.5,
58
+ :end_join=>true,
59
+ :end_join_option=>1,
60
+ :overlap=>0,
61
+ :TCS_QC=>true,
62
+ :ref_genome=>"HXB2",
63
+ :ref_start=>6585,
64
+ :ref_end=>7208,
65
+ :indel=>true,
66
+ :trim=>false}
67
+ ]
68
+ }
69
+ end
70
+
71
+ end
@@ -13,6 +13,22 @@ module ViralSeq
13
13
  print '> '
14
14
  param[:raw_sequence_dir] = gets.chomp.rstrip
15
15
 
16
+ puts "Choose MiSeq Platform (1-3):\n1. 150x7x150\n2. 250x7x250\n3. 300x7x300 (default)"
17
+ print "> "
18
+ pf_option = gets.chomp.rstrip
19
+ # while ![1,2,3].include?(pf_option.to_i)
20
+ # print "Entered MiSeq Platform #{pf_option.red.bold} not valid (choose 1-3), try again\n> "
21
+ # pf_option = gets.chomp.rstrip
22
+ # end
23
+ case pf_option.to_i
24
+ when 1
25
+ param[:platform_format] = 150
26
+ when 2
27
+ param[:platform_format] = 250
28
+ else
29
+ param[:platform_format] = 300
30
+ end
31
+
16
32
  puts 'Enter the estimated platform error rate (for TCS cut-off calculation), default as ' + '0.02'.red.bold
17
33
  print '> '
18
34
  input_error = gets.chomp.rstrip.to_f
@@ -52,12 +68,12 @@ module ViralSeq
52
68
  if ej =~ /y|yes/i
53
69
  data[:end_join] = true
54
70
 
55
- print "End-join option? Choose from (1-4):\n
56
- 1: simple join, no overlap
57
- 2: known overlap \n
58
- 3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap\n
59
- 4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap\n
60
- > "
71
+ puts "End-join option? Choose from (1-4):"
72
+ puts "1: simple join, no overlap"
73
+ puts "2: known overlap"
74
+ puts "3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap"
75
+ puts "4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap"
76
+ print "> "
61
77
  ej_option = gets.chomp.rstrip
62
78
  while ![1,2,3,4].include?(ej_option.to_i)
63
79
  puts "Entered end-join option #{ej_option.red.bold} not valid (choose 1-4), try again"
@@ -138,7 +154,12 @@ module ViralSeq
138
154
  if save_option =~ /y|yes/i
139
155
  print "Path to save JSON file:\n> "
140
156
  path = gets.chomp.rstrip
141
- File.open(path, 'w') {|f| f.puts JSON.pretty_generate(param)}
157
+ while !validate_path_name(path)
158
+ print "Entered path no valid, try again.\n".red.bold
159
+ print "Path to save JSON file:\n> "
160
+ path = gets.chomp.rstrip
161
+ end
162
+ File.open(validate_path_name(path), 'w') {|f| f.puts JSON.pretty_generate(param)}
142
163
  end
143
164
 
144
165
  print "\nDo you wish to execute tcs pipeline with the input params now? Y/N \n> "
@@ -147,7 +168,7 @@ module ViralSeq
147
168
  if rsp =~ /y/i
148
169
  return param
149
170
  else
150
- abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`"
171
+ abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`".blue
151
172
  end
152
173
 
153
174
  end
@@ -172,7 +193,17 @@ module ViralSeq
172
193
  when 3
173
194
  :MAC239
174
195
  end
175
- end
176
- end
196
+ end # end of get_ref
197
+
198
+ def validate_path_name(path)
199
+ if path.empty?
200
+ return false
201
+ elsif File.directory? path
202
+ return File.join(path, 'params.json')
203
+ elsif File.directory?(File.dirname(path))
204
+ return path
205
+ end
206
+ end # end of validate_path_name
207
+ end # end of class << self
177
208
  end # end TcsJson
178
209
  end # end main module
@@ -2,6 +2,6 @@
2
2
  # version info and histroy
3
3
 
4
4
  module ViralSeq
5
- VERSION = "1.0.11"
6
- TCS_VERSION = "2.1.1"
5
+ VERSION = "1.1.1"
6
+ TCS_VERSION = "2.3.0"
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: viral_seq
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.11
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shuntai Zhou
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2021-03-02 00:00:00.000000000 Z
12
+ date: 2021-04-01 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -90,6 +90,7 @@ email:
90
90
  executables:
91
91
  - locator
92
92
  - tcs
93
+ - tcs_log
93
94
  extensions: []
94
95
  extra_rdoc_files: []
95
96
  files:
@@ -104,6 +105,11 @@ files:
104
105
  - Rakefile
105
106
  - bin/locator
106
107
  - bin/tcs
108
+ - bin/tcs_log
109
+ - docs/assets/img/cover.jpg
110
+ - docs/dr.json
111
+ - docs/sample_miseq_data/hivdr_control/r1.fastq.gz
112
+ - docs/sample_miseq_data/hivdr_control/r2.fastq.gz
107
113
  - lib/viral_seq.rb
108
114
  - lib/viral_seq/constant.rb
109
115
  - lib/viral_seq/enumerable.rb
@@ -120,6 +126,7 @@ files:
120
126
  - lib/viral_seq/sequence.rb
121
127
  - lib/viral_seq/string.rb
122
128
  - lib/viral_seq/tcs_core.rb
129
+ - lib/viral_seq/tcs_dr.rb
123
130
  - lib/viral_seq/tcs_json.rb
124
131
  - lib/viral_seq/version.rb
125
132
  - viral_seq.gemspec