viral_seq 1.0.8 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8d79f0676fb23cdc25fb3b0161b5665ecfe082e2401f40a1de3a782d9fb3d52a
4
- data.tar.gz: 01a09f4cfca1274bfb1b870cdad62614def01fdaded727ce9100eec377962401
3
+ metadata.gz: 4921d3609d6ffc7fd6fbafd7a4a86e5818d47ed855393addd68b20f28b9d214f
4
+ data.tar.gz: a9e18c01b287885f8f6238343d9633a52d4ae5ea061347e73bd4f3e86788b2a4
5
5
  SHA512:
6
- metadata.gz: 042f11da57209003bc84b0f7c764a9953f0ca6c1fcd00a5e943be531162bc06c9d54e3c4ceb1305c91fe5795894e3da394a196899a4f1df83d97b826c5582411
7
- data.tar.gz: b2b2bfb9a8e6d023f610b19311a1a1ea331fbaa804cf20aebc3a34f6b049240ec43fe10e92b9f00feef3fd78e922fe0ed39281146693358998020036b9553504
6
+ metadata.gz: dd21b57e17751f6c3e475f05b7a565d295ac7592b7c02f8d89ed49192834bee444f08ee9ebf48e41922c8caaf37a03651d5d0c9aa89d97ccc2edb9aad8224d5f
7
+ data.tar.gz: d1162424ea877d9839c179cacc330c81cd3508fcff07b64a1e753c7c706485d1dcb9a6b60aec9ce02ed33b91bbd4386ed58329c17e247ba086e7d81ed107bfd4
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- viral_seq (1.0.8)
4
+ viral_seq (1.0.9)
5
5
  colorize (~> 0.1)
6
6
  muscle_bio (~> 0.4)
7
7
 
@@ -11,7 +11,7 @@ GEM
11
11
  colorize (0.8.1)
12
12
  diff-lcs (1.3)
13
13
  muscle_bio (0.4.0)
14
- rake (10.5.0)
14
+ rake (13.0.1)
15
15
  rspec (3.8.0)
16
16
  rspec-core (~> 3.8.0)
17
17
  rspec-expectations (~> 3.8.0)
@@ -31,7 +31,7 @@ PLATFORMS
31
31
 
32
32
  DEPENDENCIES
33
33
  bundler (~> 2.0)
34
- rake (~> 10.0)
34
+ rake (~> 13.0)
35
35
  rspec (~> 3.0)
36
36
  viral_seq!
37
37
 
data/README.md CHANGED
@@ -12,101 +12,133 @@ Specifically for Primer-ID sequencing and HIV drug resistance analysis.
12
12
 
13
13
  #### Load all ViralSeq classes by requiring 'viral_seq.rb'
14
14
 
15
- #!/usr/bin/env ruby
16
- require 'viral_seq'
15
+ ```ruby
16
+ #!/usr/bin/env ruby
17
+ require 'viral_seq'
18
+ ```
17
19
 
18
20
  #### Use executable `locator` to get the coordinates of the sequences on HIV/SIV reference genome from a FASTA file through a terminal
19
21
 
20
22
  $ locator -i sequence.fasta -o sequence.fasta.csv
21
23
 
24
+
25
+ #### Use executable `tcs` pipeline to process Primer ID MiSeq sequencing data. Parameter json file can be generated using `tcs_json_generator` or at https://tcs-dr-dept-tcs.cloudapps.unc.edu/generator.php
26
+
27
+ $ tcs params.json
28
+
29
+ #### Use executable `tcs_json_generator` to generate params .json file for the `tcs` pipeline.
30
+
31
+ $ tcs_json_generator
32
+
33
+
22
34
  ## Some Examples
23
35
 
24
36
  #### Load nucleotide sequences from a FASTA format sequence file
25
37
 
26
- my_seqhash = ViralSeq::SeqHash.fa('my_seq_file.fasta')
38
+ ```ruby
39
+ my_seqhash = ViralSeq::SeqHash.fa('my_seq_file.fasta')
40
+ ```
27
41
 
28
42
  #### Make an alignment (using MUSCLE)
29
43
 
30
- aligned_seqhash = my_seqhash.align
44
+ ```ruby
45
+ aligned_seqhash = my_seqhash.align
46
+ ```
31
47
 
32
48
  #### Filter nucleotide sequences with the reference coordinates (HIV Protease)
33
49
 
34
- qc_seqhash = aligned_seqhash.hiv_seq_qc(2253, 2549, false, :HXB2)
50
+ ```ruby
51
+ qc_seqhash = aligned_seqhash.hiv_seq_qc(2253, 2549, false, :HXB2)
52
+ ```
35
53
 
36
54
  #### Further filter out sequences with Apobec3g/f hypermutations
37
55
 
38
- qc_seqhash = qc_seqhash.a3g
56
+ ```ruby
57
+ qc_seqhash = qc_seqhash.a3g
58
+ ```
39
59
 
40
60
  #### Calculate nucleotide diveristy π
41
61
 
42
- qc_seqhash.pi
62
+ ```ruby
63
+ qc_seqhash.pi
64
+ ```
43
65
 
44
66
  #### Calculate cut-off for minority variants based on Poisson model
45
67
 
46
- cut_off = qc_seqhash.pm
68
+ ```ruby
69
+ cut_off = qc_seqhash.pm
70
+ ```
47
71
 
48
72
  #### Examine for drug resistance mutations for HIV PR region
49
73
 
50
- qc_seqhash.sdrm_hiv_pr(cut_off)
74
+ ```ruby
75
+ qc_seqhash.sdrm_hiv_pr(cut_off)
76
+ ```
51
77
 
52
78
  ## Updates
53
79
 
80
+ Version 1.0.9-07182020:
81
+
82
+ 1. Change ViralSeq::SeqHash#stop_codon and ViralSeq::SeqHash#a3g_hypermut return value to hash object.
83
+
84
+ 2. TCS pipeline updated to version 2.0.1. Add optional `export_raw: TRUE/FALSE` in json params. If `export_raw` is `TRUE`, raw sequence reads (have to pass quality filters) will be exported, along with TCS reads.
85
+
54
86
  Version 1.0.8-02282020:
55
87
 
56
- 1. TCS pipeline added as executable.
57
- tcs - main TCS pipeline script.
58
- tcs_json_generator - step-by-step script to generate json file for tcs pipeline.
88
+ 1. TCS pipeline (version 2.0.0) added as executable.
89
+ tcs - main TCS pipeline script.
90
+ tcs_json_generator - step-by-step script to generate json file for tcs pipeline.
59
91
 
60
- 2. Methods added:
61
- ViralSeq::SeqHash#trim
92
+ 2. Methods added:
93
+ ViralSeq::SeqHash#trim
62
94
 
63
- 3. Bug fix for several methods.
95
+ 3. Bug fix for several methods.
64
96
 
65
97
  Version 1.0.7-01282020:
66
98
 
67
- 1. Several methods added, including
68
- ViralSeq::SeqHash#error_table
69
- ViralSeq::SeqHash#random_select
70
- 2. Improved performance for several functions.
99
+ 1. Several methods added, including
100
+ ViralSeq::SeqHash#error_table
101
+ ViralSeq::SeqHash#random_select
102
+ 2. Improved performance for several functions.
71
103
 
72
104
  Version 1.0.6-07232019:
73
105
 
74
- 1. Several methods added to ViralSeq::SeqHash, including
75
- ViralSeq::SeqHash#size
76
- ViralSeq::SeqHash#+
77
- ViralSeq::SeqHash#write_nt_fa
78
- ViralSeq::SeqHash#mutation
79
- 2. Update documentations and rspec samples.
106
+ 1. Several methods added to ViralSeq::SeqHash, including
107
+ ViralSeq::SeqHash#size
108
+ ViralSeq::SeqHash#+
109
+ ViralSeq::SeqHash#write_nt_fa
110
+ ViralSeq::SeqHash#mutation
111
+ 2. Update documentations and rspec samples.
80
112
 
81
113
  Version 1.0.5-07112019:
82
114
 
83
- 1. Update ViralSeq::SeqHash#sequence_locator.
84
- Program will try to determine the direction (`+` or `-` of the query sequence)
85
- 2. update executable `locator` to have a column of `direction` in output .csv file
115
+ 1. Update ViralSeq::SeqHash#sequence_locator.
116
+ Program will try to determine the direction (`+` or `-` of the query sequence)
117
+ 2. update executable `locator` to have a column of `direction` in output .csv file
86
118
 
87
119
  Version 1.0.4-07102019:
88
120
 
89
- 1. Use home directory (Dir.home) instead of the directory of the script file for temp MUSCLE file.
90
- 2. Fix bugs in bin `locator`
121
+ 1. Use home directory (Dir.home) instead of the directory of the script file for temp MUSCLE file.
122
+ 2. Fix bugs in bin `locator`
91
123
 
92
124
  Version 1.0.3-07102019:
93
125
 
94
- 1. Bug fix.
126
+ 1. Bug fix.
95
127
 
96
128
  Version 1.0.2-07102019:
97
129
 
98
- 1. Fixed a gem loading issue.
130
+ 1. Fixed a gem loading issue.
99
131
 
100
132
  Version 1.0.1-07102019:
101
133
 
102
- 1. Add keyword argument :model to ViralSeq::SeqHashPair#join2.
103
- 2. Add method ViralSeq::SeqHash#sequence_locator (also: #loc), a function to locate sequences on HIV/SIV reference genomes, as HIV Sequence Locator from LANL.
104
- 3. Add executable 'locator'. An HIV/SIV sequence locator tool similar to LANL Sequence Locator.
105
- 4. update documentations
134
+ 1. Add keyword argument :model to ViralSeq::SeqHashPair#join2.
135
+ 2. Add method ViralSeq::SeqHash#sequence_locator (also: #loc), a function to locate sequences on HIV/SIV reference genomes, as HIV Sequence Locator from LANL.
136
+ 3. Add executable 'locator'. An HIV/SIV sequence locator tool similar to LANL Sequence Locator.
137
+ 4. update documentations
106
138
 
107
139
  Version 1.0.0-07092019:
108
140
 
109
- 1. Rewrote the whole ViralSeq gem, grouping methods into modules and classes under main Module::ViralSeq
141
+ 1. Rewrote the whole ViralSeq gem, grouping methods into modules and classes under main Module::ViralSeq
110
142
 
111
143
  ## Development
112
144
 
data/bin/tcs CHANGED
@@ -29,69 +29,6 @@ require 'viral_seq'
29
29
  require 'json'
30
30
  require 'colorize'
31
31
 
32
- # updated the ViralSeq module. Push with the new version.
33
-
34
- module ViralSeq
35
- class SeqHash
36
- def self.new_from_fastq(fastq_file)
37
- count = 0
38
- sequence_a = []
39
- quality_a = []
40
- count_seq = 0
41
-
42
- File.open(fastq_file,'r') do |file|
43
- file.readlines.collect do |line|
44
- count +=1
45
- count_m = count % 4
46
- if count_m == 1
47
- line.tr!('@','>')
48
- sequence_a << line.chomp
49
- quality_a << line.chomp
50
- count_seq += 1
51
- elsif count_m == 2
52
- sequence_a << line.chomp
53
- elsif count_m == 0
54
- quality_a << line.chomp
55
- end
56
- end
57
- end
58
- sequence_hash = Hash[sequence_a.each_slice(2).to_a]
59
- quality_hash = Hash[quality_a.each_slice(2).to_a]
60
-
61
- seq_hash = ViralSeq::SeqHash.new
62
- seq_hash.dna_hash = sequence_hash
63
- seq_hash.qc_hash = quality_hash
64
- seq_hash.title = File.basename(fastq_file,".*")
65
- seq_hash.file = fastq_file
66
- return seq_hash
67
- end # end of ::new_from_fastq
68
-
69
- class << self
70
- alias_method :fq, :new_from_fastq
71
- end
72
- end
73
- end
74
-
75
- module ViralSeq
76
- class SeqHash
77
- def trim(start_nt, end_nt, ref_option = :HXB2, path_to_muscle = false)
78
- seq_hash = self.dna_hash.dup
79
- seq_hash_unique = seq_hash.uniq_hash
80
- trimmed_seq_hash = {}
81
- seq_hash_unique.each do |seq, names|
82
- trimmed_seq = ViralSeq::Sequence.new('', seq).sequence_clip(start_nt, end_nt, ref_option, path_to_muscle).dna
83
- names.each do |name|
84
- trimmed_seq_hash[name] = trimmed_seq
85
- end
86
- end
87
- return_seq_hash = self.dup
88
- return_seq_hash.dna_hash = trimmed_seq_hash
89
- return return_seq_hash
90
- end
91
- end
92
- end
93
-
94
- # end of additonal methods. Delete before publish
95
32
 
96
33
  # calculate consensus cutoff
97
34
 
@@ -127,12 +64,9 @@ def calculate_cut_off(m, error_rate = 0.02)
127
64
  return n
128
65
  end
129
66
 
130
-
131
- TCS_VERSION = "2.0.0"
132
-
133
- puts "\n" + '-'*58
134
- puts '| JSON Parameter Generator for ' + "TCS #{TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
135
- puts '-'*58 + "\n"
67
+ puts "\n" + '-'*50
68
+ puts '| The TCS Pipeline ' + "Version #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
69
+ puts '-'*50 + "\n"
136
70
 
137
71
  unless ARGV[0]
138
72
  raise "No JSON param file found. Script terminated."
@@ -173,7 +107,7 @@ def unzip_r(indir, f)
173
107
  end
174
108
  runtime_log_file = File.join(indir,"runtime.log")
175
109
  log = File.open(runtime_log_file, "w")
176
- log.puts "TSC pipeline Version " + TCS_VERSION.to_s
110
+ log.puts "TSC pipeline Version " + ViralSeq::TCS_VERSION.to_s
177
111
  log.puts "viral_seq Version " + ViralSeq::VERSION.to_s
178
112
  log.puts Time.now.to_s + "\t" + "Start TCS pipeline..."
179
113
 
@@ -224,7 +158,7 @@ end
224
158
 
225
159
  primers.each do |primer|
226
160
  summary_json = {}
227
- summary_json[:tcs_version] = TCS_VERSION
161
+ summary_json[:tcs_version] = ViralSeq::TCS_VERSION
228
162
  summary_json[:viralseq_version] = ViralSeq::VERSION
229
163
  summary_json[:runtime] = Time.now.to_s
230
164
 
@@ -233,6 +167,9 @@ primers.each do |primer|
233
167
 
234
168
  cdna_primer = primer[:cdna]
235
169
  forward_primer = primer[:forward]
170
+
171
+ export_raw = primer[:export_raw]
172
+
236
173
  unless cdna_primer
237
174
  log.puts Time.now.to_s + "\t" + region + " does not have cDNA primer sequence. #{region} skipped."
238
175
  end
@@ -363,10 +300,30 @@ primers.each do |primer|
363
300
  out_dir_consensus = File.join(out_dir_set, "consensus")
364
301
  Dir.mkdir(out_dir_consensus) unless File.directory?(out_dir_consensus)
365
302
 
366
- outfile_r1 = File.join(out_dir_consensus, 'r1.txt')
367
- outfile_r2 = File.join(out_dir_consensus, 'r2.txt')
303
+ outfile_r1 = File.join(out_dir_consensus, 'r1.fasta')
304
+ outfile_r2 = File.join(out_dir_consensus, 'r2.fasta')
368
305
  outfile_log = File.join(out_dir_set, 'log.json')
369
306
 
307
+ # if export_raw is true, create dir for raw sequence
308
+ if export_raw
309
+ out_dir_raw = File.join(out_dir_set, "raw")
310
+ Dir.mkdir(out_dir_raw) unless File.directory?(out_dir_raw)
311
+ outfile_raw_r1 = File.join(out_dir_raw, 'r1.raw.fasta')
312
+ outfile_raw_r2 = File.join(out_dir_raw, 'r2.raw.fasta')
313
+ raw_r1_f = File.open(outfile_raw_r1, 'w')
314
+ raw_r2_f = File.open(outfile_raw_r2, 'w')
315
+
316
+ bio_r1.keys.each do |k|
317
+ raw_r1_f.puts k + "_r1"
318
+ raw_r2_f.puts k + "_r2"
319
+ raw_r1_f.puts bio_r1[k]
320
+ raw_r2_f.puts bio_r2[k].rc
321
+ end
322
+
323
+ raw_r1_f.close
324
+ raw_r2_f.close
325
+ end
326
+
370
327
  # create TCS
371
328
 
372
329
  pid_seqtag_hash = {}
@@ -456,19 +413,30 @@ primers.each do |primer|
456
413
  f.puts JSON.pretty_generate(pid_json)
457
414
  end
458
415
 
459
- if primer[:end_join]
460
- log.puts Time.now.to_s + "\t" + "Start end-pairing for TCS..."
461
- shp = ViralSeq::SeqHashPair.fa(out_dir_consensus)
462
- case primer[:end_join_option]
416
+ def end_join(dir, option, overlap)
417
+ shp = ViralSeq::SeqHashPair.fa(dir)
418
+ case option
463
419
  when 1
464
- joined_sh = shp.join1(primer[:overlap])
420
+ joined_sh = shp.join1()
465
421
  when 3
466
422
  joined_sh = shp.join2
467
423
  when 4
468
424
  joined_sh = shp.join2(model: :indiv)
469
425
  end
426
+ return joined_sh
427
+ end
428
+
429
+ if primer[:end_join]
430
+ log.puts Time.now.to_s + "\t" + "Start end-pairing for TCS..."
431
+ shp = ViralSeq::SeqHashPair.fa(out_dir_consensus)
432
+ joined_sh = end_join(out_dir_consensus, primer[:end_join_option], primer[:overlap])
470
433
  log.puts Time.now.to_s + "\t" + "Paired TCS number: " + joined_sh.size.to_s
471
434
  summary_json[:combined_tcs] = joined_sh.size
435
+
436
+ if export_raw
437
+ joined_sh_raw = end_join(out_dir_raw, primer[:end_join_option], primer[:overlap])
438
+ end
439
+
472
440
  else
473
441
  File.open(outfile_log, "w") do |f|
474
442
  f.puts JSON.pretty_generate(summary_json)
@@ -501,8 +469,28 @@ primers.each do |primer|
501
469
  joined_seq[seq_name] = seq + new_r2_seq[seq_name]
502
470
  end
503
471
  joined_sh = ViralSeq::SeqHash.new(joined_seq)
472
+
473
+ if export_raw
474
+ r1_sh_raw = ViralSeq::SeqHash.fa(outfile_raw_r1)
475
+ r2_sh_raw = ViralSeq::SeqHash.fa(outfile_raw_r2)
476
+ r1_sh_raw = r1_sh_raw.hiv_seq_qc(ref_start, (0..(ViralSeq::RefSeq.get(ref_genome).size - 1)), indel, ref_genome)
477
+ r2_sh_raw = r2_sh_raw.hiv_seq_qc((0..(ViralSeq::RefSeq.get(ref_genome).size - 1)), ref_end, indel, ref_genome)
478
+ new_r1_seq_raw = r1_sh_raw.dna_hash.each_with_object({}) {|(k, v), h| h[k[0..-4]] = v}
479
+ new_r2_seq_raw = r2_sh_raw.dna_hash.each_with_object({}) {|(k, v), h| h[k[0..-4]] = v}
480
+ joined_seq_raw = {}
481
+ new_r1_seq_raw.each do |seq_name, seq|
482
+ next unless seq
483
+ next unless new_r2_seq_raw[seq_name]
484
+ joined_seq_raw[seq_name] = seq + new_r2_seq_raw[seq_name]
485
+ end
486
+ joined_sh_raw = ViralSeq::SeqHash.new(joined_seq_raw)
487
+ end
504
488
  else
505
489
  joined_sh = joined_sh.hiv_seq_qc(ref_start, ref_end, indel, ref_genome)
490
+
491
+ if export_raw
492
+ joined_sh_raw = joined_sh.hiv_seq_qc(ref_start, ref_end, indel, ref_genome)
493
+ end
506
494
  end
507
495
  log.puts Time.now.to_s + "\t" + "Paired TCS number after QC based on reference genome: " + joined_sh.size.to_s
508
496
  summary_json[:combined_tcs_after_qc] = joined_sh.size
@@ -512,7 +500,10 @@ primers.each do |primer|
512
500
  trim_ref = primer[:trim_ref].to_sym
513
501
  joined_sh = joined_sh.trim(trim_start, trim_end, trim_ref)
514
502
  end
515
- joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.txt"))
503
+ joined_sh.write_nt_fa(File.join(out_dir_consensus, "combined.fasta"))
504
+ if export_raw
505
+ joined_sh_raw.write_nt_fa(File.join(out_dir_raw, "combined.fasta"))
506
+ end
516
507
  end
517
508
 
518
509
  File.open(outfile_log, "w") do |f|
@@ -2,6 +2,7 @@
2
2
 
3
3
  # TCS pipeline JSON params generator.
4
4
 
5
+ require 'viral_seq'
5
6
  require 'colorize'
6
7
  require 'json'
7
8
 
@@ -26,10 +27,8 @@ def get_ref
26
27
  end
27
28
  end
28
29
 
29
- TCS_VERSION = "2.0.0"
30
-
31
30
  puts "\n" + '-'*58
32
- puts '| JSON Parameter Generator for ' + "TCS #{TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
31
+ puts '| JSON Parameter Generator for ' + "TCS #{ViralSeq::TCS_VERSION}".red.bold + " by " + "Shuntai Zhou".blue.bold + ' |'
33
32
  puts '-'*58 + "\n"
34
33
 
35
34
  param = {}
@@ -48,8 +47,8 @@ else
48
47
  end
49
48
 
50
49
  param[:primer_pairs] = []
51
- continue = true
52
- while continue
50
+
51
+ loop do
53
52
  data = {}
54
53
  puts "Enter the name for the sequenced region: "
55
54
  print '> '
@@ -147,14 +146,11 @@ while continue
147
146
  data[:end_join] = false
148
147
  end
149
148
 
149
+ param[:primer_pairs] << data
150
150
  print "Do you wish to conintue? Y/N \n> "
151
151
  continue_sig = gets.chomp.rstrip
152
- if continue_sig =~ /y|yes/i
153
- continue = true
154
- else
155
- continue = false
156
- end
157
- param[:primer_pairs] << data
152
+ break unless continue_sig =~ /y|yes/i
153
+
158
154
  end
159
155
 
160
156
  puts "\nYour JSON string is:"
@@ -313,22 +313,22 @@ module ViralSeq
313
313
 
314
314
  # screen for sequences with stop codons.
315
315
  # @param (see #translate)
316
- # @return [Array] of two elements [seqhash_stop_codon, seqhash_no_stop_codon],
316
+ # @return [Hash] of two SeqHash objects {with_stop_codon: seqHash, without_stop_codon: seqHash},
317
317
  #
318
- # # seqhash_stop_codon: ViralSeq::SeqHash object with stop codons
319
- # # seqhash_no_stop_codon: ViralSeq::SeqHash object without stop codons
318
+ # # :with_stop_codon : ViralSeq::SeqHash object with stop codons
319
+ # # :without_stop_codon: ViralSeq::SeqHash object without stop codons
320
320
  # @example given a hash of sequences, return a sub-hash with sequences only contains stop codons
321
321
  # my_seqhash = ViralSeq::SeqHash.fa('my_fasta_file.fasta')
322
322
  # my_seqhash.dna_hash
323
323
  # => {">seq1"=>"ATAAGAACG", ">seq2"=>"ATATGAACG", ">seq3"=>"ATGAGAACG", ">seq4"=>"TATTAGACG", ">seq5"=>"CGCTGAACG"}
324
- # stop_codon_seqhash = my_seqhash.stop_codon[0]
324
+ # stop_codon_seqhash = my_seqhash.stop_codon[:with_stop_codon]
325
325
  # stop_codon_seqhash.dna_hash
326
326
  # => {">seq2"=>"ATATGAACG", ">seq4"=>"TATTAGACG", ">seq5"=>"CGCTGAACG"}
327
327
  # stop_codon_seqhash.aa_hash
328
328
  # => {">seq2"=>"I*T", ">seq4"=>"Y*T", ">seq5"=>"R*T"}
329
329
  # stop_codon_seqhash.title
330
330
  # => "my_fasta_file_stop"
331
- # filtered_seqhash = my_seqhash.stop_codon[1]
331
+ # filtered_seqhash = my_seqhash.stop_codon[:without_stop_codon]
332
332
  # filtered_seqhash.aa_hash
333
333
  # {">seq1"=>"IRT", ">seq3"=>"MRT"}
334
334
 
@@ -343,7 +343,10 @@ module ViralSeq
343
343
  seqhash1.title = self.title + "_stop"
344
344
  keys2 = aa_seqs.keys - keys
345
345
  seqhash2 = self.sub(keys2)
346
- return [seqhash1, seqhash2]
346
+ return {
347
+ with_stop_codon: seqhash1,
348
+ without_stop_codon: seqhash2
349
+ }
347
350
  end #end of #stop_codon
348
351
 
349
352
 
@@ -399,10 +402,10 @@ module ViralSeq
399
402
  # # 2. Poisson distribution of G to A mutations at A3G positions, outliers sequences
400
403
  # # note: criteria 2 only applies on a sequence file containing more than 20 sequences,
401
404
  # # b/c Poisson model does not do well on small sample size.
402
- # @return [Array] three values.
403
- # first value, `array[0]`: a ViralSeq:SeqHash object for sequences with hypermutations
404
- # second value, `array[1]`: a ViralSeq:SeqHash object for sequences without hypermutations
405
- # third value, `array[2]`: a two-demensional array `[[a,b], [c,d]]` for statistic_info, including the following information,
405
+ # @return [Hash] three paris.
406
+ # :a3g_seq: a ViralSeq:SeqHash object for sequences with hypermutations
407
+ # :filtered_seq : a ViralSeq:SeqHash object for sequences without hypermutations
408
+ # :stats : a two-demensional array `[[a,b], [c,d]]` for statistic_info, including the following information,
406
409
  # # sequence tag
407
410
  # # G to A mutation numbers at potential a3g positions
408
411
  # # total potential a3g G positions
@@ -413,17 +416,17 @@ module ViralSeq
413
416
  # @example identify apobec3gf mutations from a sequence fasta file
414
417
  # my_seqhash = ViralSeq::SeqHash.fa('spec/sample_files/sample_a3g_sequence1.fasta')
415
418
  # hypermut = my_seqhash.a3g
416
- # hypermut[0].dna_hash.keys
419
+ # hypermut[:a3g_seq].dna_hash.keys
417
420
  # => [">Seq7", ">Seq14"]
418
- # hypermut[1].dna_hash.keys
421
+ # hypermut[:filtered_seq].dna_hash.keys
419
422
  # => [">Seq1", ">Seq2", ">Seq5"]
420
- # hypermut[2]
423
+ # hypermut[:stats]
421
424
  # => [[">Seq7", 23, 68, 1, 54, 18.26, 4.308329383112348e-06], [">Seq14", 45, 68, 9, 54, 3.97, 5.2143571971582974e-08]]
422
425
  #
423
426
  # @example identify apobec3gf mutations from another sequence fasta file
424
427
  # my_seqhash = ViralSeq::SeqHash.fa('spec/sample_files/sample_a3g_sequence2.fasta')
425
428
  # hypermut = my_seqhash.a3g
426
- # hypermut[2]
429
+ # hypermut[:stats]
427
430
  # => [[">CTAACACTCA_134_a3g-sample2", 4, 35, 0, 51, Infinity, 0.02465676660128911], [">ATAGTGCCCA_60_a3g-sample2", 4, 35, 1, 51, 5.83, 0.1534487353839561]]
428
431
  # # notice sequence ">ATAGTGCCCA_60_a3g-sample2" has a p value at 0.15, greater than 0.05,
429
432
  # # but it is still called as hypermutation sequence b/c it's Poisson outlier sequence.
@@ -516,7 +519,10 @@ module ViralSeq
516
519
  hm_seq_hash.title = self.title + "_hypermut"
517
520
  hm_seq_hash.file = self.file
518
521
  filtered_seq_hash = self.sub(self.dna_hash.keys - hm_hash.keys)
519
- return [hm_seq_hash, filtered_seq_hash, hm_hash.values]
522
+ return { a3g_seq: hm_seq_hash,
523
+ filtered_seq: filtered_seq_hash,
524
+ stats: hm_hash.values
525
+ }
520
526
  end #end of #a3g_hypermut
521
527
 
522
528
  alias_method :a3g, :a3g_hypermut
@@ -730,6 +736,7 @@ module ViralSeq
730
736
 
731
737
  seq_hash_unique.each do |seq|
732
738
  loc = ViralSeq::Sequence.new('', seq).locator(ref_option, path_to_muscle)
739
+ next unless loc # if locator tool fails, skip this seq.
733
740
  if start_nt.include?(loc[0]) && end_nt.include?(loc[1])
734
741
  if indel
735
742
  seq_hash_unique_pass << seq
@@ -1151,7 +1158,7 @@ module ViralSeq
1151
1158
  # @param ref_option [Symbol], name of reference genomes, options are `:HXB2`, `:NL43`, `:MAC239`
1152
1159
  # @param path_to_muscle [String], path to the muscle executable, if not provided, use MuscleBio to run Muscle
1153
1160
  # @return [ViralSeq::SeqHash] a new ViralSeq::SeqHash object with trimmed sequences
1154
-
1161
+
1155
1162
  def trim(start_nt, end_nt, ref_option = :HXB2, path_to_muscle = false)
1156
1163
  seq_hash = self.dna_hash.dup
1157
1164
  seq_hash_unique = seq_hash.uniq_hash
@@ -2,6 +2,6 @@
2
2
  # version info and histroy
3
3
 
4
4
  module ViralSeq
5
- VERSION = "1.0.8"
6
- TCS_VERSION = "2.0.0"
5
+ VERSION = "1.0.9"
6
+ TCS_VERSION = "2.0.1"
7
7
  end
@@ -26,7 +26,7 @@ Gem::Specification.new do |spec|
26
26
  spec.post_install_message = "Thanks for installing!"
27
27
 
28
28
  spec.add_development_dependency "bundler", "~> 2.0"
29
- spec.add_development_dependency "rake", "~> 10.0"
29
+ spec.add_development_dependency "rake", "~> 13.0"
30
30
  spec.add_development_dependency "rspec", "~> 3.0"
31
31
 
32
32
  # muscle_bio gem required
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: viral_seq
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.8
4
+ version: 1.0.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shuntai Zhou
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2020-02-29 00:00:00.000000000 Z
12
+ date: 2020-07-19 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -31,14 +31,14 @@ dependencies:
31
31
  requirements:
32
32
  - - "~>"
33
33
  - !ruby/object:Gem::Version
34
- version: '10.0'
34
+ version: '13.0'
35
35
  type: :development
36
36
  prerelease: false
37
37
  version_requirements: !ruby/object:Gem::Requirement
38
38
  requirements:
39
39
  - - "~>"
40
40
  - !ruby/object:Gem::Version
41
- version: '10.0'
41
+ version: '13.0'
42
42
  - !ruby/object:Gem::Dependency
43
43
  name: rspec
44
44
  requirement: !ruby/object:Gem::Requirement