crb-blast 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 9f7c5625ff9b0713d25b0c8a4c28c56aa1316cb3
4
- data.tar.gz: eaebeeedcfcc156bc7743a47683e8ce509abbe38
3
+ metadata.gz: 8ae488cec70a923add9cbe9e6f4a4fe0d69471d8
4
+ data.tar.gz: 7628e84ef78228bcc268d7deb6eb2a667131d405
5
5
  SHA512:
6
- metadata.gz: 6fc0fbe6d7f2399a68409e6211f013a457a95937665d6acd391d809df0c829583e6d7f5f1b6a12cf50aa6310731f1e206c5f25349100b1e295f37f1ea5a75baf
7
- data.tar.gz: 8519e10ea5de7822542d0ce05d94dbe956e9151237b49bcab89bbab74d460acfa93f9beed8ce9dcf1816b5d49365d837daf87d48c998cfde4a14266053e2f142
6
+ metadata.gz: f52df39062d0adf04187fca27635a4822fd2cdfb41944518773985bc08504243c02bd63564fa44656cf910909c9d3d1004c56a6f251d9918d07d2ef88f38eb2f
7
+ data.tar.gz: 1e46630e56d52f9056ab92a27eb33171aee066885b418855f9c9b61afccba31b5f34f2de2b53a0592d564e710cef3cf7f1f46a6eb148e57ffbb76b1025f24110
data/.gitignore ADDED
@@ -0,0 +1,13 @@
1
+ *.lock
2
+ *.nhr
3
+ *.nin
4
+ *.nsq
5
+ *.phr
6
+ *.pin
7
+ *.psq
8
+ *.blast
9
+ coverage
10
+ *~
11
+ *.gem
12
+ *fa
13
+ *tsv
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/README.md ADDED
@@ -0,0 +1,95 @@
1
+ CRB-BLAST
2
+ =========
3
+
4
+ Conditional Reciprocal Best BLAST - high confidence ortholog assignment.
5
+
6
+ ### What is Conditional Reciprocal Best BLAST?
7
+
8
+ CRB-BLAST is a novel method for finding orthologs between one set of sequences and another. This is particularly useful in genome and transcriptome annotation.
9
+
10
+ CRB-BLAST initially performs a standard reciprocal best BLAST. It does this by performing BLAST alignments of query->target and target->query. Reciprocal best BLAST hits are those where the best match for any given query sequence in the query->target alignment is also the best hit of the match in the reverse (target->query) alignment.
11
+
12
+ Reciprocal best BLAST is a very conservative way to assign orthologs. The main innovation in CRB-BLAST is to learn an appropriate e-value cutoff to apply to each pairwise alignment by taking into account the overall relatedness of the two datasets being compared. This is done by fitting a function to the distribution of alignment e-values over sequence lengths. The function provides the e-value cutoff for a sequence of given length.
13
+
14
+ CRB-BLAST greatly improves the accuracy of ortholog assignment for de-novo transcriptome assembly ([Aubry et al. 2014](http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004365)).
15
+
16
+ The CRB-BLAST algorithm was designed by [Steve Kelly](http://www.stevekellylab.com), and this implementation is by Chris Boursnell and Richard Smith-Unna. The original reference implementation from the paper is available for online use at http://www.stevekellylab.com/software/conditional-orthology-assignment.
17
+
18
+ ### Installation
19
+
20
+ To install CRB-BLAST, simply use rubygems:
21
+
22
+ `gem install crb-blast`
23
+
24
+ ### Prerequisites
25
+
26
+ - NCBI BLAST+ (preferably the latest version) should be installed and in your PATH.
27
+ - Ruby v2.0 or later. If you don't have Ruby, we suggest installing it with [RVM](http://rvm.io).
28
+
29
+ `\curl -sSL https://get.rvm.io | bash -s stable --ruby`
30
+
31
+
32
+ ### Usage
33
+
34
+ CRB-BLAST can be run from the command-line as a standalone program, or used as a library in your own code.
35
+
36
+ #### Command-line usage
37
+
38
+ CRB-BLAST can be run from the command line with:
39
+
40
+ ```
41
+ crb-blast
42
+ ```
43
+
44
+ The options are
45
+
46
+ ```
47
+ --query, -q <s>: query fasta file in nucleotide format
48
+ --target, -t <s>: target fasta file as nucleotide or protein
49
+ --evalue, -e <f>: e-value cut off for BLAST. Format 1e-5 (default: 1.0e-05)
50
+ --threads, -h <i>: number of threads to run BLAST with (default: 1)
51
+ --output, -o <s>: output file as tsv
52
+ --split, -s: split the fasta files into chunks and run multiple blast
53
+ jobs and then combine them.
54
+ --help, -l: Show this message
55
+ ```
56
+
57
+ An example command is:
58
+
59
+ ```bash
60
+ crb-blast --query assembly.fa --target reference_proteins.fa --threads 8 --output annotation.tsv
61
+ ```
62
+
63
+ #### Library usage
64
+
65
+ To include the gem in your code just `require 'crb-blast'`
66
+
67
+ A quick example:
68
+
69
+ ```ruby
70
+ blaster = CRB_Blast.new('test/query.fasta', 'test/target.fasta')
71
+ blaster.run(1e-5, 4, true) # to run with an evalue cutoff of 1e-5 and 4 threads
72
+ ```
73
+
74
+ A longer example with each step at a time:
75
+
76
+ ```ruby
77
+ blaster = CRB_Blast.new('test/query.fasta', 'test/target.fasta')
78
+ blaster.makedb
79
+ blaster.run_blast(1e-5, 6, true)
80
+ blaster.load_outputs
81
+ blaster.find_reciprocals
82
+ blaster.find_secondaries
83
+ ```
84
+
85
+ ### Getting help
86
+
87
+ Please use the issue tracker if you find bugs or have trouble running CRB-BLAST.
88
+
89
+ Chris Boursnell <cmb211@cam.ac.uk> maintains this software.
90
+
91
+ ### License
92
+
93
+ This is adademic software - please cite us if you use it in your work.
94
+
95
+ CRB-BLAST is released under the MIT license.
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ require 'rake/testtask'
2
+
3
+ Rake::TestTask.new do |t|
4
+ t.libs << 'test'
5
+ end
6
+
7
+ desc "Run tests"
8
+ task :default => :test
data/bin/crb-blast CHANGED
@@ -11,7 +11,7 @@ require 'bindeps'
11
11
  opts = Trollop::options do
12
12
  banner <<-EOS
13
13
 
14
- CRB-Blast v0.3 by Chris Boursnell <cmb211@cam.ac.uk>
14
+ CRB-Blast v0.3.2 by Chris Boursnell <cmb211@cam.ac.uk>
15
15
 
16
16
  Conditional Reciprocal Best BLAST
17
17
 
@@ -48,7 +48,7 @@ EOS
48
48
 
49
49
  opt :split,
50
50
  "split the fasta files into chunks and run multiple blast jobs and then"+
51
- "combine them."
51
+ " combine them."
52
52
  end
53
53
 
54
54
  Trollop::die :query, "must exist" if !File.exist?(opts[:query])
@@ -58,7 +58,7 @@ gem_dir = Gem.loaded_specs['crb-blast'].full_gem_path
58
58
  gem_deps = File.join(gem_dir, 'deps', 'deps.yaml')
59
59
  Bindeps.require gem_deps
60
60
 
61
- blaster = CRB_Blast.new(opts.query, opts.target)
61
+ blaster = CRB_Blast::CRB_Blast.new(opts.query, opts.target)
62
62
  dbs = blaster.makedb
63
63
  run = blaster.run_blast(opts.evalue, opts.threads, opts.split)
64
64
  load = blaster.load_outputs
data/build ADDED
@@ -0,0 +1 @@
1
+ gem build crb-blast.gemspec
data/crb-blast.gemspec ADDED
@@ -0,0 +1,26 @@
1
+ Gem::Specification.new do |gem|
2
+ gem.name = 'crb-blast'
3
+ gem.version = '0.4.0'
4
+ gem.date = '2014-07-23'
5
+ gem.summary = "Run conditional reciprocal best blast"
6
+ gem.description = "See summary"
7
+ gem.authors = ["Chris Boursnell", "Richard Smith-Unna"]
8
+ gem.email = 'cmb211@cam.ac.uk'
9
+ gem.files = `git ls-files`.split("\n")
10
+ gem.executables = ["crb-blast"]
11
+ gem.require_paths = %w( lib )
12
+ gem.homepage = 'http://rubygems.org/gems/crb-blast'
13
+ gem.license = 'MIT'
14
+
15
+ gem.add_dependency 'trollop', '~> 2.0'
16
+ gem.add_dependency 'bio', '~> 1.4', '>= 1.4.3'
17
+ gem.add_dependency 'which', '0.0.2'
18
+ gem.add_dependency 'threach', '~> 0.2', '>= 0.2.0'
19
+ gem.add_dependency 'bindeps', '~> 0.0', '>= 0.0.7'
20
+
21
+ gem.add_development_dependency 'rake', '~> 10.3', '>= 10.3.2'
22
+ gem.add_development_dependency 'turn', '~> 0.9', '>= 0.9.7'
23
+ gem.add_development_dependency 'simplecov', '~> 0.8', '>= 0.8.2'
24
+ gem.add_development_dependency 'shoulda-context', '~> 1.2', '>= 1.2.1'
25
+ gem.add_development_dependency 'coveralls', '~> 0.7'
26
+ end
data/deps/deps.yaml ADDED
@@ -0,0 +1,27 @@
1
+ blastplus:
2
+ binaries:
3
+ - makeblastdb
4
+ - blastn
5
+ - tblastn
6
+ - blastp
7
+ - blastx
8
+ - tblastx
9
+ - makembindex
10
+ - psiblast
11
+ - rpsblast
12
+ - blastdbcmd
13
+ - segmasker
14
+ - dustmasker
15
+ - blast_formatter
16
+ - windowmasker
17
+ - blastdb_aliastool
18
+ - deltablast
19
+ - rpstblastn
20
+ - blastdbcheck
21
+ version:
22
+ number: '2.2.29'
23
+ command: 'blastx -version'
24
+ url:
25
+ 64bit:
26
+ macosx: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.29/ncbi-blast-2.2.29+-universal-macosx.tar.gz
27
+ linux: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.29/ncbi-blast-2.2.29+-x64-linux.tar.gz
data/lib/crb-blast.rb CHANGED
@@ -1,447 +1,3 @@
1
- #!/usr/bin/env ruby
2
-
3
- require 'bio'
4
- require 'which'
5
- require 'hit'
6
- require 'threach'
7
-
8
- class Bio::FastaFormat
9
- def isNucl?
10
- Bio::Sequence.guess(self.seq, 0.9, 500) == Bio::Sequence::NA
11
- end
12
-
13
- def isProt?
14
- Bio::Sequence.guess(self.seq, 0.9, 500) == Bio::Sequence::AA
15
- end
16
- end
17
-
18
- class CRB_Blast
19
-
20
- include Which
21
-
22
- attr_accessor :query_name, :target_name, :reciprocals
23
- attr_accessor :missed
24
- attr_accessor :target_is_prot, :query_is_prot
25
- attr_accessor :query_results, :target_results, :working_dir
26
-
27
- def initialize query, target, output=nil
28
- @query = query
29
- @target = target
30
- if output.nil?
31
- #@working_dir = File.expand_path(File.dirname(query)) # no trailing /
32
- @working_dir = "."
33
- else
34
- @working_dir = File.expand_path(output)
35
- mkcmd = "mkdir #{@working_dir}"
36
- if !Dir.exist?(@working_dir)
37
- puts mkcmd
38
- `#{mkcmd}`
39
- end
40
- end
41
- @makedb_path = which('makeblastdb')
42
- raise 'makeblastdb was not in the PATH' if @makedb_path.empty?
43
- @blastn_path = which('blastn')
44
- raise 'blastn was not in the PATH' if @blastn_path.empty?
45
- @tblastn_path = which('tblastn')
46
- raise 'tblastn was not in the PATH' if @tblastn_path.empty?
47
- @blastx_path = which('blastx')
48
- raise 'blastx was not in the PATH' if @blastx_path.empty?
49
- @blastp_path = which('blastp')
50
- raise 'blastp was not in the PATH' if @blastp_path.empty?
51
- @makedb_path = @makedb_path.first
52
- @blastn_path = @blastn_path.first
53
- @tblastn_path = @tblastn_path.first
54
- @blastx_path = @blastx_path.first
55
- @blastp_path = @blastp_path.first
56
- end
57
-
58
- #
59
- # makes a blast database from the query and the target
60
- #
61
- def makedb
62
- # only scan the first few hundred entries
63
- n = 100
64
- # check if the query is a nucl or prot seq
65
- query_file = Bio::FastaFormat.open(@query)
66
- count_p=0
67
- count=0
68
- query_file.take(n).each do |entry|
69
- count_p += 1 if entry.isProt?
70
- count += 1
71
- end
72
- if count_p > count*0.9
73
- @query_is_prot = true
74
- else
75
- @query_is_prot = false
76
- end
77
-
78
- # check if the target is a nucl or prot seq
79
- target_file = Bio::FastaFormat.open(@target)
80
- count_p=0
81
- count=0
82
- target_file.take(n).each do |entry|
83
- count_p += 1 if entry.isProt?
84
- count += 1
85
- end
86
- if count_p > count*0.9
87
- @target_is_prot = true
88
- else
89
- @target_is_prot = false
90
- end
91
- # construct the output database names
92
- @query_name = File.basename(@query).split('.')[0..-2].join('.')
93
- @target_name = File.basename(@target).split('.')[0..-2].join('.')
94
-
95
- # check if the databases already exist in @working_dir
96
- make_query_db_cmd = "#{@makedb_path} -in #{@query}"
97
- make_query_db_cmd << " -dbtype nucl " if !@query_is_prot
98
- make_query_db_cmd << " -dbtype prot " if @query_is_prot
99
- make_query_db_cmd << " -title #{query_name} "
100
- make_query_db_cmd << " -out #{@working_dir}/#{query_name}"
101
- db_query = "#{query_name}.nsq" if !@query_is_prot
102
- db_query = "#{query_name}.psq" if @query_is_prot
103
- if !File.exists?("#{@working_dir}/#{db_query}")
104
- `#{make_query_db_cmd}`
105
- end
106
-
107
- make_target_db_cmd = "#{@makedb_path} -in #{@target}"
108
- make_target_db_cmd << " -dbtype nucl " if !@target_is_prot
109
- make_target_db_cmd << " -dbtype prot " if @target_is_prot
110
- make_target_db_cmd << " -title #{target_name} "
111
- make_target_db_cmd << " -out #{@working_dir}/#{target_name}"
112
-
113
- db_target = "#{target_name}.nsq" if !@target_is_prot
114
- db_target = "#{target_name}.psq" if @target_is_prot
115
- if !File.exists?("#{@working_dir}/#{db_target}")
116
- `#{make_target_db_cmd}`
117
- end
118
- @databases = true
119
- [@query_name, @target_name]
120
- end
121
-
122
- def run_blast(evalue, threads, split)
123
- if @databases
124
- @output1 = "#{@working_dir}/#{query_name}_into_#{target_name}.1.blast"
125
- @output2 = "#{@working_dir}/#{target_name}_into_#{query_name}.2.blast"
126
- if @query_is_prot
127
- if @target_is_prot
128
- bin1 = "#{@blastp_path} "
129
- bin2 = "#{@blastp_path} "
130
- else
131
- bin1 = "#{@tblastn_path} "
132
- bin2 = "#{@blastx_path} "
133
- end
134
- else
135
- if @target_is_prot
136
- bin1 = "#{@blastx_path} "
137
- bin2 = "#{@tblastn_path} "
138
- else
139
- bin1 = "#{@blastn_path} "
140
- bin2 = "#{@blastn_path} "
141
- end
142
- end
143
- if split and threads > 1
144
- run_blast_with_splitting evalue, threads, bin1, bin2
145
- else
146
- run_blast_with_threads evalue, threads, bin1, bin2
147
- end
148
- return true
149
- else
150
- return false
151
- end
152
- end
153
-
154
- def run_blast_with_threads evalue, threads, bin1, bin2
155
- # puts "running blast with #{threads} threads"
156
- cmd1 = "#{bin1} -query #{@query} -db #{@working_dir}/#{@target_name} "
157
- cmd1 << " -out #{@output1} -evalue #{evalue} "
158
- cmd1 << " -outfmt \"6 std qlen slen\" "
159
- cmd1 << " -max_target_seqs 50 "
160
- cmd1 << " -num_threads #{threads}"
161
-
162
- cmd2 = "#{bin2} -query #{@target} -db #{@working_dir}/#{@query_name} "
163
- cmd2 << " -out #{@output2} -evalue #{evalue} "
164
- cmd2 << " -outfmt \"6 std qlen slen\" "
165
- cmd2 << " -max_target_seqs 50 "
166
- cmd2 << " -num_threads #{threads}"
167
- if !File.exist?("#{@output1}")
168
- `#{cmd1}`
169
- end
170
-
171
- if !File.exist?("#{@output2}")
172
- `#{cmd2}`
173
- end
174
- end
175
-
176
- def run_blast_with_splitting evalue, threads, bin1, bin2
177
- # puts "running blast by splitting input into #{threads} pieces"
178
- blasts=[]
179
- files = split_input(@query, threads)
180
- files.threach(threads) do |thread|
181
- cmd1 = "#{bin1} -query #{thread} -db #{@working_dir}/#{@target_name} "
182
- cmd1 << " -out #{thread}.blast -evalue #{evalue} "
183
- cmd1 << " -outfmt \"6 std qlen slen\" "
184
- cmd1 << " -max_target_seqs 50 "
185
- cmd1 << " -num_threads 1"
186
- if !File.exists?("#{thread}.blast")
187
- `#{cmd1}`
188
- end
189
- blasts << "#{thread}.blast"
190
- end
191
- cat_cmd = "cat "
192
- cat_cmd << blasts.join(" ")
193
- cat_cmd << " > #{@output1}"
194
- `#{cat_cmd}`
195
- files.each do |file|
196
- File.delete(file) if File.exist?(file)
197
- end
198
- blasts.each do |b|
199
- File.delete(b) # delete intermediate blast output files
200
- end
201
-
202
- blasts=[]
203
- files = split_input(@target, threads)
204
- files.threach(threads) do |thread|
205
- cmd2 = "#{bin2} -query #{thread} -db #{@working_dir}/#{@query_name} "
206
- cmd2 << " -out #{thread}.blast -evalue #{evalue} "
207
- cmd2 << " -outfmt \"6 std qlen slen\" "
208
- cmd2 << " -max_target_seqs 50 "
209
- cmd2 << " -num_threads 1"
210
- if !File.exists?("#{thread}.blast")
211
- `#{cmd2}`
212
- end
213
- blasts << "#{thread}.blast"
214
- end
215
- cat_cmd = "cat "
216
- cat_cmd << blasts.join(" ")
217
- cat_cmd << " > #{@output2}"
218
- `#{cat_cmd}`
219
- files.each do |file|
220
- File.delete(file) if File.exist?(file)
221
- end
222
- blasts.each do |b|
223
- File.delete(b) # delete intermediate blast output files
224
- end
225
-
226
- end
227
-
228
- def split_input filename, pieces
229
- input = {}
230
- name = nil
231
- seq=""
232
- File.open(filename).each_line do |line|
233
- if line =~ /^>(.*)$/
234
- if name
235
- input[name]=seq
236
- seq=""
237
- end
238
- name = $1
239
- else
240
- seq << line.chomp
241
- end
242
- end
243
- input[name]=seq
244
- # construct list of output file handles
245
- outputs=[]
246
- output_files=[]
247
- pieces.times do |n|
248
- outfile = "#{filename}_chunk_#{n}.fasta"
249
- outfile = File.expand_path(outfile)
250
- outputs[n] = File.open("#{outfile}", "w")
251
- output_files[n] = "#{outfile}"
252
- end
253
- # write sequences
254
- count=0
255
- input.each_pair do |name, seq|
256
- outputs[count].write(">#{name}\n")
257
- outputs[count].write("#{seq}\n")
258
- count += 1
259
- count %= pieces
260
- end
261
- outputs.each do |out|
262
- out.close
263
- end
264
- output_files
265
- end
266
-
267
- def load_outputs
268
- if File.exist?("#{@working_dir}/reciprocal_hits.txt")
269
- # puts "reciprocal output already exists"
270
- else
271
- @query_results = Hash.new
272
- @target_results = Hash.new
273
- q_count=0
274
- t_count=0
275
- if !File.exists?("#{@output1}")
276
- raise RuntimeError.new("can't find #{@output1}")
277
- end
278
- if !File.exists?("#{@output2}")
279
- raise RuntimeError.new("can't find #{@output2}")
280
- end
281
- if File.exists?("#{@output1}") and File.exists?("#{@output2}")
282
- File.open("#{@output1}").each_line do |line|
283
- cols = line.chomp.split("\t")
284
- hit = Hit.new(cols)
285
- @query_results[hit.query] = [] if !@query_results.has_key?(hit.query)
286
- @query_results[hit.query] << hit
287
- q_count += 1
288
- end
289
- File.open("#{@output2}").each_line do |line|
290
- cols = line.chomp.split("\t")
291
- hit = Hit.new(cols)
292
- @target_results[hit.query] = [] if !@target_results.has_key?(hit.query)
293
- @target_results[hit.query] << hit
294
- t_count += 1
295
- end
296
- else
297
- raise "need to run blast first"
298
- end
299
- end
300
- [q_count, t_count]
301
- end
302
-
303
- # fills @reciprocals with strict reciprocal hits from the blast results
304
- def find_reciprocals
305
- if File.exist?("#{@working_dir}/reciprocal_hits.txt")
306
- # puts "reciprocal output already exists"
307
- else
308
- @reciprocals = Hash.new
309
- @missed = Hash.new
310
- @evalues = []
311
- @longest = 0
312
- hits = 0
313
- @query_results.each_pair do |query_id, list_of_hits|
314
- list_of_hits.each_with_index do |target_hit, query_index|
315
- if @target_results.has_key?(target_hit.target)
316
- list_of_hits_2 = @target_results[target_hit.target]
317
- list_of_hits_2.each_with_index do |query_hit2, target_index|
318
- if query_index == 0 && target_index == 0 &&
319
- query_id == query_hit2.target
320
- e = target_hit.evalue.to_f
321
- e = 1e-200 if e==0
322
- e = -Math.log10(e)
323
- if !@reciprocals.key?(query_id)
324
- @reciprocals[query_id] = []
325
- end
326
- @reciprocals[query_id] << target_hit
327
- hits += 1
328
- @longest = target_hit.alnlen if target_hit.alnlen > @longest
329
- @evalues << {:e => e, :length => target_hit.alnlen}
330
- elsif query_id == query_hit2.target
331
- if !@missed.key?(query_id)
332
- @missed[query_id] = []
333
- end
334
- @missed[query_id] << target_hit
335
- end
336
- end
337
- end
338
- end
339
- end
340
- end
341
- return hits
342
- end
343
-
344
- def find_secondaries
345
-
346
- if File.exist?("#{@working_dir}/reciprocal_hits.txt")
347
- # puts "reciprocal output already exists"
348
- else
349
- length_hash = Hash.new
350
- fitting = Hash.new
351
- @evalues.each do |h|
352
- length_hash[h[:length]] = [] if !length_hash.key?(h[:length])
353
- length_hash[h[:length]] << h
354
- end
355
-
356
- (10..@longest).each do |centre|
357
- e = 0
358
- count = 0
359
- s = centre*0.1
360
- s = s.to_i
361
- s = 5 if s < 5
362
- (-s..s).each do |side|
363
- if length_hash.has_key?(centre+side)
364
- length_hash[centre+side].each do |point|
365
- e += point[:e]
366
- count += 1
367
- end
368
- end
369
- end
370
- if count>0
371
- mean = e/count
372
- fitting[centre] = mean
373
- end
374
- end
375
- hits = 0
376
- @missed.each_pair do |id, list|
377
- list.each do |hit|
378
- l = hit.alnlen.to_i
379
- e = hit.evalue
380
- e = 1e-200 if e==0
381
- e = -Math.log10(e)
382
- if fitting.has_key?(l)
383
- if e >= fitting[l]
384
- if !@reciprocals.key?(id)
385
- @reciprocals[id] = []
386
- found=false
387
- @reciprocals[id].each do |existing_hit|
388
- if existing_hit.query == hit.query &&
389
- existing_hit.target == hit.target
390
- found=true
391
- end
392
- end
393
- if !found
394
- @reciprocals[id] << hit
395
- hits += 1
396
- end
397
- end
398
- end
399
- end
400
- end
401
- end
402
- end
403
- return hits
404
- end
405
-
406
- def clear_memory
407
- # running lots of jobs at the same time was keeping a lot of stuff in
408
- # memory that you might not want so this empties out those big hashes.
409
- @query_results = nil
410
- @target_results = nil
411
- end
412
-
413
- def run evalue, threads, split
414
- makedb
415
- run_blast evalue, threads, split
416
- load_outputs
417
- find_reciprocals
418
- find_secondaries
419
- end
420
-
421
- def size
422
- hits=0
423
- @reciprocals.each_pair do |key, list|
424
- list.each do |hit|
425
- hits += 1
426
- end
427
- end
428
- hits
429
- end
430
-
431
- def write_output
432
- s=""
433
- unless @reciprocals.nil?
434
- @reciprocals.each_pair do |query_id, hits|
435
- hits.each do |hit|
436
- s << "#{hit}\n"
437
- end
438
- end
439
- File.open("#{@working_dir}/reciprocal_hits.txt", "w") {|f| f.write s }
440
- end
441
- end
442
-
443
- def has_reciprocal? contig
444
- return true if @reciprocals.has_key?(contig)
445
- return false
446
- end
447
- end
1
+ require 'crb-blast/cmd'
2
+ require 'crb-blast/hit'
3
+ require 'crb-blast/crb-blast'