viral_seq 1.0.11 → 1.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb5906a2a3f0c98fa84a15f7fd4b35160f766317a6603b18c62e2e2476af01fd
4
- data.tar.gz: 435a32d9ce5078b18b633c14c1585c30835b48d3f013d24bc6b9cc98fef51a55
3
+ metadata.gz: 7a283f3a09cc5d9807e7622cd1ddf27197919955e85d6472b34fc14b66749c03
4
+ data.tar.gz: 4f90c5a9c7ea0ec148ba7d45ee88dc441f79da67a97654734194a773499ebb8e
5
5
  SHA512:
6
- metadata.gz: a65fcfe551b59d2f022f96d009f089941a3bd293545e840ba294ce43b3cb087b2f3a7fef26829fe3b229c9469ca4e5c907362bbf05a5384947af97a12409a5aa
7
- data.tar.gz: e4283b03e33aa67feb1dcf623d2613440bcc7299ab33ff77b7dce1ef9617b46e1763d17dda64d54b5fb3c6de37da66880abe43baae932ac36b70d0e2b0cc88d1
6
+ metadata.gz: 385a94eb93c3d8d9116c16a0d8af56ba714ba6191a454076acf881a036de80d1d598f3fcd1a4de841745ca08a1ad3e8bc028a30db9f96c19f3b217ef4583d652
7
+ data.tar.gz: 714d035b6f65863746cafb120c9cf6eccb8261f3eac69985bad96e5275351eec71aa3b744ee9b462e2dc3e0e199c2d4112386f6a2d7eef89b5b7824c1ab769be
data/.gitignore CHANGED
@@ -2,7 +2,6 @@
2
2
  /.yardoc
3
3
  /_yardoc/
4
4
  /coverage/
5
- /doc/
6
5
  /pkg/
7
6
  /spec/reports/
8
7
  /tmp/
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- viral_seq (1.0.10)
4
+ viral_seq (1.0.13)
5
5
  colorize (~> 0.1)
6
6
  muscle_bio (~> 0.4)
7
7
 
data/README.md CHANGED
@@ -1,8 +1,24 @@
1
1
  # ViralSeq
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/viral_seq.svg)](https://rubygems.org/gems/viral_seq)
4
+ ![GitHub](https://img.shields.io/github/license/viralseq/viral_seq)
5
+ ![Gem](https://img.shields.io/gem/dt/viral_seq?color=%23E9967A)
6
+ ![GitHub last commit](https://img.shields.io/github/last-commit/viralseq/viral_seq?color=%2300BFFF)
7
+ [![Join the chat at https://gitter.im/viral_seq/community](https://badges.gitter.im/viral_seq/community.svg)](https://gitter.im/viral_seq/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
8
+
3
9
  A Ruby Gem containing bioinformatics tools for processing viral NGS data.
4
10
 
5
- Specifically for Primer-ID sequencing and HIV drug resistance analysis.
11
+ Specifically for Primer ID sequencing and HIV drug resistance analysis.
12
+
13
+ ## Illustration for the Primer ID Sequencing
14
+
15
+
16
+ ![Primer ID Sequencing](./docs/assets/img/cover.jpg)
17
+
18
+ ### Reference readings on the Primer ID sequencing
19
+ [Explantion of Primer ID sequencing](https://doi.org/10.21769/BioProtoc.3938)
20
+ [Primer ID MiSeq protocol](https://doi.org/10.1128/JVI.00522-15)
21
+ [Application of Primer ID sequencing in COVID-19 research](https://doi.org/10.1126/scitranslmed.abb5883)
6
22
 
7
23
  ## Install
8
24
 
@@ -14,20 +30,55 @@ Specifically for Primer-ID sequencing and HIV drug resistance analysis.
14
30
 
15
31
  ### Excutables
16
32
 
17
- Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
33
+ ### `tcs`
34
+ Use executable `tcs` pipeline to process **Primer ID MiSeq sequencing** data.
18
35
 
36
+ Example commands:
19
37
  ```bash
20
- $ locator -i sequence.fasta -o sequence.fasta.csv
38
+ $ tcs -p params.json # run TCS pipeline with params.json
39
+ $ tcs -p params.json -i DIRECTORY
40
+ # run TCS pipeline with params.json and DIRECTORY
41
+ # if DIRECTORY is not defined in params.json
42
+ $ tcs -dr -i DIRECTORY
43
+ # run tcs-dr (MPID HIV drug resistance sequencing) pipeline
44
+ # DIRECTORY needs to be given.
45
+ $ tcs -j # CLI to generate params.json
46
+ $ tcs -h # print out the help
21
47
  ```
22
48
 
23
- Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data.
49
+ [sample params.json for the tcs-dr pipeline](./docs/dr.json)
50
+
51
+ ---
52
+ ### `tcs_log`
53
+
54
+ Use `tcs_log` script to pool run logs and TCS fasta files after one batch of `tcs` jobs.
24
55
 
56
+
57
+ Example file structure:
58
+ ```
59
+ batch_tcs_jobs/
60
+ ├── lib1
61
+ ├── lib2
62
+ ├── lib3
63
+ ├── lib4
64
+ ├── ...
65
+ ```
66
+
67
+ Example command:
25
68
  ```bash
26
- $ tcs -p params.json # run TCS pipeline with params.json
27
- $ tcs -j # CLI to generate params.json
28
- $ tcs -h # print out the help
69
+ $ tcs_log batch_tcs_jobs
29
70
  ```
30
71
 
72
+ ---
73
+
74
+ ### `locator`
75
+ Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
76
+
77
+ ```bash
78
+ $ locator -i sequence.fasta -o sequence.fasta.csv
79
+ ```
80
+ ---
81
+
31
82
  ## Some Examples
32
83
 
33
84
  Load all ViralSeq classes by requiring 'viral_seq.rb' in your Ruby scripts.
@@ -80,16 +131,47 @@ qc_seqhash.sdrm_hiv_pr(cut_off)
80
131
  ```
81
132
  ## Known issues
82
133
 
83
- 1. have a conflict with rails.
134
+ 1. ~~have a conflict with rails.~~
135
+ 2. ~~Update on 03032021. Still have conflict. But in rails gem file, can just use `requires: false` globally and only require "viral_seq" when the module is needed in controller.~~
136
+ 3. The conflict seems to be resovled. It was from a combination of using `!` as a function for factorial and the gem name `viral_seq`. @_@
84
137
 
85
138
  ## Updates
86
139
 
87
- ### Version 1.1.1-03022021
140
+ ### Version 1.1.1-04012021
141
+
142
+ 1. Added warning when paired_raw_sequence less than 0.1% of total_raw_sequence.
143
+ 2. Added option `-i WORKING_DIRECTORY` to the `tcs` script.
144
+ If the `params.json` file does not contain the path to the working directory, it will append path to the run params.
145
+ 3. Added option `-dr` to the `tcs` script.
146
+
147
+ ### Version 1.1.0-03252021
148
+
149
+ 1. Optimized the algorithm of end-join.
150
+ 2. Fixed a bug in the `tcs` pipeline that sometimes combined tcs files are not saved.
151
+ 3. Added `tcs_log` command to pool run logs and tcs files from one batch of tcs jobs.
152
+ 4. Added the preset of MPID-HIVDR params file [***dr.json***](./docs/dr.json) in /docs.
153
+ 5. Add `platform_format` option in the json generator of the `tcs` Pipeline.
154
+ Users can choose from 3 MiSeq platforms for processing their sequencing data.
155
+ MiSeq 300x7x300 is the default option.
156
+
157
+ ### Version 1.0.14-03052021
158
+
159
+ 1. Add a function `ViralSeq::TcsCore.validate_file_name` to check MiSeq paired-end file names.
160
+
161
+ ### Version 1.0.13-03032021
162
+
163
+ 1. Fixed the conflict with rails.
164
+
165
+ ### Version 1.0.12-03032021
166
+
167
+ 1. Fixed an issue that may cause conflicts with ActiveRecord.
168
+
169
+ ### Version 1.0.11-03022021
88
170
 
89
- 1. Fixed a issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
171
+ 1. Fixed an issue when calculating Poisson cutoff for minority mutations `ViralSeq::SeqHash.pm`.
90
172
  2. fixed an issue loading class 'OptionParser'in some ruby environments.
91
173
 
92
- ### Version 1.1.0-11112020:
174
+ ### Version 1.0.10-11112020:
93
175
 
94
176
  1. Modularize TCS pipeline. Move key functions into /viral_seq/tcs_core.rb
95
177
  2. `tcs_json_generator` is removed. This CLI is delivered within the `tcs` pipeline, by running `tcs -j`. The scripts are included in the /viral_seq/tcs_json.rb
data/bin/tcs CHANGED
@@ -23,7 +23,7 @@
23
23
  # THE SOFTWARE.
24
24
 
25
25
  # Use JSON file as the run param
26
- # run tcs_json_generator.rb to generate param json file.
26
+ # run `tcs -j` to generate param json file.
27
27
 
28
28
  require 'viral_seq'
29
29
  require 'json'
@@ -46,6 +46,14 @@ OptionParser.new do |opts|
46
46
  options[:params_json] = p
47
47
  end
48
48
 
49
+ opts.on("-i", "--input PATH_TO_WORKING_DIRECTORY", "Path to the working directory") do |p|
50
+ options[:input] = p
51
+ end
52
+
53
+ opts.on("-dr", "--dr_pipeline", "HIV drug resistance MPID pipeline") do |p|
54
+ options[:dr] = true
55
+ end
56
+
49
57
  opts.on("-h", "--help", "Prints this help") do
50
58
  puts opts
51
59
  exit
@@ -64,15 +72,21 @@ end.parse!
64
72
 
65
73
  if options[:json_generator]
66
74
  params = ViralSeq::TcsJson.generate
75
+ elsif options[:dr]
76
+ params = ViralSeq::TcsDr::PARAMS
67
77
  elsif (options[:params_json] && File.exist?(options[:params_json]))
68
78
  params = JSON.parse(File.read(options[:params_json]), symbolize_names: true)
69
79
  else
70
80
  abort "No params JSON file found. Script terminated.".red
71
81
  end
72
82
 
73
- indir = params[:raw_sequence_dir]
83
+ if options[:input]
84
+ indir = options[:input]
85
+ else
86
+ indir = params[:raw_sequence_dir]
87
+ end
74
88
 
75
- unless File.exist?(indir)
89
+ unless indir and File.exist?(indir)
76
90
  abort "No input sequence directory found. Script terminated.".red.bold
77
91
  end
78
92
 
@@ -115,6 +129,12 @@ else
115
129
  error_rate = 0.02
116
130
  end
117
131
 
132
+ if params[:platform_format]
133
+ $platform_sequencing_length = params[:platform_format]
134
+ else
135
+ $platform_sequencing_length = 300
136
+ end
137
+
118
138
  primers = params[:primer_pairs]
119
139
  if primers.empty?
120
140
  ViralSeq::TcsCore.log_and_abort log, "No primer information. Script terminated."
@@ -123,6 +143,7 @@ end
123
143
 
124
144
  primers.each do |primer|
125
145
  summary_json = {}
146
+ summary_json[:warnings] = []
126
147
  summary_json[:tcs_version] = ViralSeq::TCS_VERSION
127
148
  summary_json[:viralseq_version] = ViralSeq::VERSION
128
149
  summary_json[:runtime] = Time.now.to_s
@@ -175,6 +196,10 @@ primers.each do |primer|
175
196
  paired_seq_number = common_keys.size
176
197
  log.puts Time.now.to_s + "\t" + "Paired raw sequences are : #{paired_seq_number.to_s}"
177
198
  summary_json[:paired_raw_sequence] = paired_seq_number
199
+ if paired_seq_number < raw_sequence_number * 0.001
200
+ summary_json[:warnings] <<
201
+ "WARNING: Filtered raw sequneces less than 0.1% of the total raw sequences. Possible contamination."
202
+ end
178
203
 
179
204
  common_keys.each do |seqtag|
180
205
  r1_seq = r1_passed_seq[seqtag]
@@ -273,7 +298,6 @@ primers.each do |primer|
273
298
  r1_sub_seq << bio_r1[seq_name]
274
299
  r2_sub_seq << bio_r2[seq_name]
275
300
  end
276
-
277
301
  #consensus name including the Primer ID and number of raw sequences of that Primer ID, library name and setname.
278
302
  consensus_name = ">" + primer_id + "_" + seq_with_same_primer_id.size.to_s + "_" + libname + "_" + region
279
303
  r1_consensus = ViralSeq::SeqHash.array(r1_sub_seq).consensus(majority_cut_off)
@@ -364,6 +388,7 @@ primers.each do |primer|
364
388
  shp = ViralSeq::SeqHashPair.fa(out_dir_consensus)
365
389
  joined_sh = end_join(out_dir_consensus, primer[:end_join_option], primer[:overlap])
366
390
  log.puts Time.now.to_s + "\t" + "Paired TCS number: " + joined_sh.size.to_s
391
+
367
392
  summary_json[:combined_tcs] = joined_sh.size
368
393
 
369
394
  if export_raw
@@ -433,12 +458,15 @@ primers.each do |primer|
433
458
  trim_end = primer[:trim_ref_end]
434
459
  trim_ref = primer[:trim_ref].to_sym
435
460
  joined_sh = joined_sh.trim(trim_start, trim_end, trim_ref)
436
- joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
437
461
  if export_raw
438
462
  joined_sh_raw = joined_sh_raw.trim(trim_start, trim_end, trim_ref)
439
- joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
440
463
  end
441
464
  end
465
+
466
+ joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
467
+ if export_raw
468
+ joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.raw.fasta"))
469
+ end
442
470
  end
443
471
 
444
472
  File.open(outfile_log, "w") do |f|
data/bin/tcs_log ADDED
@@ -0,0 +1,102 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # pool run logs from one batch of tcs jobs
4
+ # file structure:
5
+ # batch_tcs_jobs/
6
+ # ├── lib1
7
+ # ├── lib2
8
+ # ├── lib3
9
+ # ├── lib4
10
+ # ├── ...
11
+ #
12
+ # command example:
13
+ # $ tcs_log batch_tcs_jobs
14
+
15
+ require 'viral_seq'
16
+ require 'pathname'
17
+ require 'json'
18
+ require 'fileutils'
19
+
20
+ indir = ARGV[0].chomp
21
+ indir_basename = File.basename(indir)
22
+ indir_dirname = File.dirname(indir)
23
+
24
+ tcs_dir = File.join(indir_dirname, (indir_basename + "_tcs"))
25
+ Dir.mkdir(tcs_dir) unless File.directory?(tcs_dir)
26
+
27
+ libs = []
28
+ Dir.chdir(indir) {libs = Dir.glob("*")}
29
+
30
+ outdir2 = File.join(tcs_dir, "combined_TCS_per_lib")
31
+ outdir3 = File.join(tcs_dir, "TCS_per_region")
32
+ outdir4 = File.join(tcs_dir, "combined_TCS_per_region")
33
+
34
+ Dir.mkdir(outdir2) unless File.directory?(outdir2)
35
+ Dir.mkdir(outdir3) unless File.directory?(outdir3)
36
+ Dir.mkdir(outdir4) unless File.directory?(outdir4)
37
+
38
+ log_file = File.join(tcs_dir,"log.csv")
39
+ log = File.open(log_file,'w')
40
+
41
+ header = %w{
42
+ lib_name
43
+ Region
44
+ Raw_Sequences_per_barcode
45
+ R1_Raw
46
+ R2_Raw
47
+ Paired_Raw
48
+ Cutoff
49
+ PID_Length
50
+ Consensus1
51
+ Consensus2
52
+ Distinct_to_Raw
53
+ Resampling_index
54
+ Combined_TCS
55
+ Combined_TCS_after_QC
56
+ WARNINGS
57
+ }
58
+
59
+ log.puts header.join(',')
60
+ libs.each do |lib|
61
+ Dir.mkdir(File.join(outdir2, lib)) unless File.directory?(File.join(outdir2, lib))
62
+ fasta_files = []
63
+ json_files = []
64
+ Dir.chdir(File.join(indir, lib)) do
65
+ fasta_files = Dir.glob("**/*.fasta")
66
+ json_files = Dir.glob("**/log.json")
67
+ end
68
+ fasta_files.each do |f|
69
+ path_array = Pathname(f).each_filename.to_a
70
+ region = path_array[0]
71
+ if path_array[-1] == "combined.fasta"
72
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir2, lib, (lib + "_" + region)))
73
+ Dir.mkdir(File.join(outdir4,region)) unless File.directory?(File.join(outdir4,region))
74
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir4, region, (lib + "_" + region)))
75
+ else
76
+ Dir.mkdir(File.join(outdir3,region)) unless File.directory?(File.join(outdir3,region))
77
+ Dir.mkdir(File.join(outdir3,region, lib)) unless File.directory?(File.join(outdir3,region, lib))
78
+ FileUtils.cp(File.join(indir, lib, f), File.join(outdir3, region, lib, (lib + "_" + region + "_" + path_array[-1])))
79
+ end
80
+ end
81
+
82
+ json_files.each do |f|
83
+ json_log = JSON.parse(File.read(File.join(indir, lib, f)), symbolize_names: true)
84
+ log.print [lib,
85
+ json_log[:primer_set_name],
86
+ json_log[:total_raw_sequence],
87
+ json_log[:r1_filtered_raw],
88
+ json_log[:r2_filtered_raw],
89
+ json_log[:paired_raw_sequence],
90
+ json_log[:consensus_cutoff],
91
+ json_log[:length_of_pid],
92
+ json_log[:total_tcs_with_ambiguities],
93
+ json_log[:total_tcs],
94
+ json_log[:distinct_to_raw],
95
+ json_log[:resampling_param],
96
+ json_log[:combined_tcs],
97
+ json_log[:combined_tcs_after_qc],
98
+ json_log[:warnings],
99
+ ].join(',') + "\n"
100
+ end
101
+ end
102
+ log.close
Binary file
data/docs/dr.json ADDED
@@ -0,0 +1,67 @@
1
+ {
2
+ "platform_error_rate": 0.02,
3
+ "primer_pairs": [
4
+ {
5
+ "region": "RT",
6
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
7
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
8
+ "majority": 0.5,
9
+ "end_join": true,
10
+ "end_join_option": 1,
11
+ "overlap": 0,
12
+ "TCS_QC": true,
13
+ "ref_genome": "HXB2",
14
+ "ref_start": 2648,
15
+ "ref_end": 3257,
16
+ "indel": true,
17
+ "trim": false
18
+ },
19
+ {
20
+ "region": "PR",
21
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
22
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
23
+ "majority": 0.5,
24
+ "end_join": true,
25
+ "end_join_option": 3,
26
+ "TCS_QC": true,
27
+ "ref_genome": "HXB2",
28
+ "ref_start": 0,
29
+ "ref_end": 2591,
30
+ "indel": true,
31
+ "trim": true,
32
+ "trim_ref": "HXB2",
33
+ "trim_ref_start": 2253,
34
+ "trim_ref_end": 2549
35
+ },
36
+ {
37
+ "region": "IN",
38
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
39
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
40
+ "majority": 0.5,
41
+ "end_join": true,
42
+ "end_join_option": 3,
43
+ "overlap": 171,
44
+ "TCS_QC": true,
45
+ "ref_genome": "HXB2",
46
+ "ref_start": 4384,
47
+ "ref_end": 4751,
48
+ "indel": false,
49
+ "trim": false
50
+ },
51
+ {
52
+ "region": "V1V3",
53
+ "cdna": "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
54
+ "forward": "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
55
+ "majority": 0.5,
56
+ "end_join": true,
57
+ "end_join_option": 1,
58
+ "overlap": 0,
59
+ "TCS_QC": true,
60
+ "ref_genome": "HXB2",
61
+ "ref_start": 6585,
62
+ "ref_end": 7208,
63
+ "indel": true,
64
+ "trim": false
65
+ }
66
+ ]
67
+ }
data/lib/viral_seq.rb CHANGED
@@ -37,6 +37,6 @@ require_relative "viral_seq/string"
37
37
  require_relative "viral_seq/version"
38
38
  require_relative "viral_seq/tcs_core"
39
39
  require_relative "viral_seq/tcs_json"
40
-
40
+ require_relative "viral_seq/tcs_dr"
41
41
 
42
42
  require "muscle_bio"
@@ -3,10 +3,6 @@
3
3
  # array = [1,2,3,4,5,6,7,8,9,10]
4
4
  # array.median
5
5
  # => 5.5
6
- # @example sum
7
- # array = [1,2,3,4,5,6,7,8,9,10]
8
- # array.sum
9
- # => 55
10
6
  # @example average number (mean)
11
7
  # array = [1,2,3,4,5,6,7,8,9,10]
12
8
  # array.mean
@@ -45,12 +41,6 @@ module Enumerable
45
41
  len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
46
42
  end
47
43
 
48
- # generate summed value
49
- # @return [Numeric] summed value
50
- def sum
51
- self.inject(0){|accum, i| accum + i }
52
- end
53
-
54
44
  # generate mean number
55
45
  # @return [Float] mean value
56
46
  def mean
@@ -67,7 +67,7 @@ module ViralSeq
67
67
  @k = k
68
68
  @poisson_hash = {}
69
69
  (0..k).each do |n|
70
- p = (rate**n * ::Math::E**(-rate))/!n
70
+ p = (rate**n * ::Math::E**(-rate))/n.factorial
71
71
  @poisson_hash[n] = p
72
72
  end
73
73
  end
@@ -155,9 +155,9 @@ class Integer
155
155
  # factorial method for an Integer
156
156
  # @return [Integer] factorial for given Integer
157
157
  # @example factorial for 5
158
- # !5
158
+ # 5.factorial
159
159
  # => 120
160
- def !
160
+ def factorial
161
161
  if self == 0
162
162
  return 1
163
163
  else
@@ -394,7 +394,6 @@ module ViralSeq
394
394
  end
395
395
  end
396
396
  end
397
-
398
397
  consensus_seq += call_consensus_base(max_base_list)
399
398
  end
400
399
  return consensus_seq
@@ -742,6 +741,7 @@ module ViralSeq
742
741
  seq_hash_unique_pass = []
743
742
 
744
743
  seq_hash_unique.each do |seq|
744
+ next if seq.nil?
745
745
  loc = ViralSeq::Sequence.new('', seq).locator(ref_option, path_to_muscle)
746
746
  next unless loc # if locator tool fails, skip this seq.
747
747
  if start_nt.include?(loc[0]) && end_nt.include?(loc[1])
@@ -110,19 +110,21 @@ module ViralSeq
110
110
  raise ArgumentError.new(":overlap has to be Integer, input #{overlap} invalid.") unless overlap.is_a? Integer
111
111
  raise ArgumentError.new(":diff has to be float or integer, input #{diff} invalid.") unless (diff.is_a? Integer or diff.is_a? Float)
112
112
  joined_seq = {}
113
- seq_pair_hash.uniq_hash.each do |seq_pair, seq_names|
113
+ seq_pair_hash.each do |seq_name,seq_pair|
114
114
  r1_seq = seq_pair[0]
115
115
  r2_seq = seq_pair[1]
116
116
  if overlap.zero?
117
117
  joined_sequence = r1_seq + r2_seq
118
+ elsif diff.zero?
119
+ if r1_seq[-overlap..-1] == r2_seq[0,overlap]
120
+ joined_sequence= r1_seq + r2_seq[overlap..-1]
121
+ end
118
122
  elsif r1_seq[-overlap..-1].compare_with(r2_seq[0,overlap]) <= (overlap * diff)
119
123
  joined_sequence= r1_seq + r2_seq[overlap..-1]
120
124
  else
121
125
  next
122
126
  end
123
- seq_names.each do |seq_name|
124
- joined_seq[seq_name] = joined_sequence
125
- end
127
+ joined_seq[seq_name] = joined_sequence if joined_sequence
126
128
  end
127
129
 
128
130
  joined_seq_hash = ViralSeq::SeqHash.new
@@ -102,9 +102,9 @@ module ViralSeq
102
102
  end
103
103
 
104
104
  # sort array of file names to determine if there is potential errors
105
- # input name_array array of file names
106
- # output hash { }
107
- # need to change for each file name have an error code. and a bool to show if all pass
105
+ # @param name_array [Array] array of file names
106
+ # @return [hash] name check results
107
+
108
108
  def validate_file_name(name_array)
109
109
  errors = {
110
110
  file_type_error: [] ,
@@ -165,6 +165,13 @@ module ViralSeq
165
165
  end
166
166
  end
167
167
 
168
+ file_name_with_lib_name = {}
169
+ passed_libs.each do |lib_name, files|
170
+ files.each do |f|
171
+ file_name_with_lib_name[f] = lib_name
172
+ end
173
+ end
174
+
168
175
  passed_names = []
169
176
 
170
177
  passed_libs.values.each { |names| passed_names += names}
@@ -175,7 +182,27 @@ module ViralSeq
175
182
  pass = true
176
183
  end
177
184
 
178
- return { errors: errors, all_pass: pass, passed_names: passed_names, passed_libs: passed_libs }
185
+ file_name_with_error_type = {}
186
+
187
+ errors.each do |type, files|
188
+ files.each do |f|
189
+ file_name_with_error_type[f] ||= []
190
+ file_name_with_error_type[f] << type.to_s.tr("_", "\s")
191
+ end
192
+ end
193
+
194
+ file_check = []
195
+
196
+ name_array.each do |name|
197
+ file_check_hash = {}
198
+ file_check_hash[:fileName] = name
199
+ file_check_hash[:errors] = file_name_with_error_type[name]
200
+ file_check_hash[:libName] = file_name_with_lib_name[name]
201
+
202
+ file_check << file_check_hash
203
+ end
204
+
205
+ return { allPass: pass, files: file_check }
179
206
  end
180
207
 
181
208
  # filter r1 raw sequences for non-specific primers.
@@ -278,7 +305,9 @@ module ViralSeq
278
305
  end
279
306
 
280
307
  def general_filter(seq)
281
- if seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
308
+ if seq.size < $platform_sequencing_length
309
+ return false
310
+ elsif seq[1..-2] =~ /N/ # sequences with ambiguities except the 1st and last position removed
282
311
  return false
283
312
  elsif seq =~ /A{11}/ # a string of poly-A indicates adaptor sequence
284
313
  return false
@@ -0,0 +1,71 @@
1
+ module ViralSeq
2
+
3
+ class TcsDr
4
+ PARAMS = {:platform_error_rate=>0.02,
5
+ :primer_pairs=>
6
+ [{:region=>"RT",
7
+ :cdna=>
8
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCACTATAGGCTGTACTGTCCATTTATC",
9
+ :forward=>
10
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNGGCCATTGACAGAAGAAAAAATAAAAGC",
11
+ :majority=>0.5,
12
+ :end_join=>true,
13
+ :end_join_option=>1,
14
+ :overlap=>0,
15
+ :TCS_QC=>true,
16
+ :ref_genome=>"HXB2",
17
+ :ref_start=>2648,
18
+ :ref_end=>3257,
19
+ :indel=>true,
20
+ :trim=>false},
21
+ {:region=>"PR",
22
+ :cdna=>
23
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNCAGTTTAACTTTTGGGCCATCCATTCC",
24
+ :forward=>
25
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTCAGAGCAGACCAGAGCCAACAGCCCCA",
26
+ :majority=>0.5,
27
+ :end_join=>true,
28
+ :end_join_option=>3,
29
+ :TCS_QC=>true,
30
+ :ref_genome=>"HXB2",
31
+ :ref_start=>0,
32
+ :ref_end=>2591,
33
+ :indel=>true,
34
+ :trim=>true,
35
+ :trim_ref=>"HXB2",
36
+ :trim_ref_start=>2253,
37
+ :trim_ref_end=>2549},
38
+ {:region=>"IN",
39
+ :cdna=>
40
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNATCGAATACTGCCATTTGTACTGC",
41
+ :forward=>"GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNAAAAGGAGAAGCCATGCATG",
42
+ :majority=>0.5,
43
+ :end_join=>true,
44
+ :end_join_option=>3,
45
+ :overlap=>171,
46
+ :TCS_QC=>true,
47
+ :ref_genome=>"HXB2",
48
+ :ref_start=>4384,
49
+ :ref_end=>4751,
50
+ :indel=>false,
51
+ :trim=>false},
52
+ {:region=>"V1V3",
53
+ :cdna=>
54
+ "GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNCAGTCCATTTTGCTYTAYTRABVTTACAATRTGC",
55
+ :forward=>
56
+ "GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAGNNNNTTATGGGATCAAAGCCTAAAGCCATGTGTA",
57
+ :majority=>0.5,
58
+ :end_join=>true,
59
+ :end_join_option=>1,
60
+ :overlap=>0,
61
+ :TCS_QC=>true,
62
+ :ref_genome=>"HXB2",
63
+ :ref_start=>6585,
64
+ :ref_end=>7208,
65
+ :indel=>true,
66
+ :trim=>false}
67
+ ]
68
+ }
69
+ end
70
+
71
+ end
@@ -13,6 +13,22 @@ module ViralSeq
13
13
  print '> '
14
14
  param[:raw_sequence_dir] = gets.chomp.rstrip
15
15
 
16
+ puts "Choose MiSeq Platform (1-3):\n1. 150x7x150\n2. 250x7x250\n3. 300x7x300 (default)"
17
+ print "> "
18
+ pf_option = gets.chomp.rstrip
19
+ # while ![1,2,3].include?(pf_option.to_i)
20
+ # print "Entered MiSeq Platform #{pf_option.red.bold} not valid (choose 1-3), try again\n> "
21
+ # pf_option = gets.chomp.rstrip
22
+ # end
23
+ case pf_option.to_i
24
+ when 1
25
+ param[:platform_format] = 150
26
+ when 2
27
+ param[:platform_format] = 250
28
+ else
29
+ param[:platform_format] = 300
30
+ end
31
+
16
32
  puts 'Enter the estimated platform error rate (for TCS cut-off calculation), default as ' + '0.02'.red.bold
17
33
  print '> '
18
34
  input_error = gets.chomp.rstrip.to_f
@@ -52,12 +68,12 @@ module ViralSeq
52
68
  if ej =~ /y|yes/i
53
69
  data[:end_join] = true
54
70
 
55
- print "End-join option? Choose from (1-4):\n
56
- 1: simple join, no overlap
57
- 2: known overlap \n
58
- 3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap\n
59
- 4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap\n
60
- > "
71
+ puts "End-join option? Choose from (1-4):"
72
+ puts "1: simple join, no overlap"
73
+ puts "2: known overlap"
74
+ puts "3: unknow overlap, use sample consensus to determine overlap, all sequence pairs have same overlap"
75
+ puts "4: unknow overlap, determine overlap by individual sequence pairs, sequence pairs can have different overlap"
76
+ print "> "
61
77
  ej_option = gets.chomp.rstrip
62
78
  while ![1,2,3,4].include?(ej_option.to_i)
63
79
  puts "Entered end-join option #{ej_option.red.bold} not valid (choose 1-4), try again"
@@ -138,7 +154,12 @@ module ViralSeq
138
154
  if save_option =~ /y|yes/i
139
155
  print "Path to save JSON file:\n> "
140
156
  path = gets.chomp.rstrip
141
- File.open(path, 'w') {|f| f.puts JSON.pretty_generate(param)}
157
+ while !validate_path_name(path)
158
+ print "Entered path no valid, try again.\n".red.bold
159
+ print "Path to save JSON file:\n> "
160
+ path = gets.chomp.rstrip
161
+ end
162
+ File.open(validate_path_name(path), 'w') {|f| f.puts JSON.pretty_generate(param)}
142
163
  end
143
164
 
144
165
  print "\nDo you wish to execute tcs pipeline with the input params now? Y/N \n> "
@@ -147,7 +168,7 @@ module ViralSeq
147
168
  if rsp =~ /y/i
148
169
  return param
149
170
  else
150
- abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`"
171
+ abort "Params json file generated. You can execute tcs pipeline using `tcs -p [params.json]`".blue
151
172
  end
152
173
 
153
174
  end
@@ -172,7 +193,17 @@ module ViralSeq
172
193
  when 3
173
194
  :MAC239
174
195
  end
175
- end
176
- end
196
+ end # end of get_ref
197
+
198
+ def validate_path_name(path)
199
+ if path.empty?
200
+ return false
201
+ elsif File.directory? path
202
+ return File.join(path, 'params.json')
203
+ elsif File.directory?(File.dirname(path))
204
+ return path
205
+ end
206
+ end # end of validate_path_name
207
+ end # end of class << self
177
208
  end # end TcsJson
178
209
  end # end main module
@@ -2,6 +2,6 @@
2
2
  # version info and histroy
3
3
 
4
4
  module ViralSeq
5
- VERSION = "1.0.11"
6
- TCS_VERSION = "2.1.1"
5
+ VERSION = "1.1.1"
6
+ TCS_VERSION = "2.3.0"
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: viral_seq
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.11
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shuntai Zhou
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2021-03-02 00:00:00.000000000 Z
12
+ date: 2021-04-01 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -90,6 +90,7 @@ email:
90
90
  executables:
91
91
  - locator
92
92
  - tcs
93
+ - tcs_log
93
94
  extensions: []
94
95
  extra_rdoc_files: []
95
96
  files:
@@ -104,6 +105,11 @@ files:
104
105
  - Rakefile
105
106
  - bin/locator
106
107
  - bin/tcs
108
+ - bin/tcs_log
109
+ - docs/assets/img/cover.jpg
110
+ - docs/dr.json
111
+ - docs/sample_miseq_data/hivdr_control/r1.fastq.gz
112
+ - docs/sample_miseq_data/hivdr_control/r2.fastq.gz
107
113
  - lib/viral_seq.rb
108
114
  - lib/viral_seq/constant.rb
109
115
  - lib/viral_seq/enumerable.rb
@@ -120,6 +126,7 @@ files:
120
126
  - lib/viral_seq/sequence.rb
121
127
  - lib/viral_seq/string.rb
122
128
  - lib/viral_seq/tcs_core.rb
129
+ - lib/viral_seq/tcs_dr.rb
123
130
  - lib/viral_seq/tcs_json.rb
124
131
  - lib/viral_seq/version.rb
125
132
  - viral_seq.gemspec