npsearch 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,68 @@
1
+ # NeuroPeptideSearch (NpSearch)
2
+ [![Gem Version](https://badge.fury.io/rb/NpSearch.svg)](http://badge.fury.io/rb/NpSearch)
3
+ [![Build Status](https://travis-ci.org/IsmailM/NeuroPeptideSearch.svg?branch=master)](https://travis-ci.org/IsmailM/NeuroPeptideSearch)
4
+ [![Dependency Status](https://gemnasium.com/IsmailM/NeuroPeptideSearch.svg)](https://gemnasium.com/IsmailM/NeuroPeptideSearch)
5
+ [![Inline docs](http://inch-ci.org/github/IsmailM/NeuroPeptideSearch.png?branch=master)](http://inch-ci.org/github/IsmailM/NeuroPeptideSearch)
6
+
7
+ > A tool to identify noval Neuropeptides.
8
+
9
+ NpSearch (NeuroPeptideSearch) is a program that searches for potential neuropeptides precursors based on the motifs commonly found on a neuropeptide. Ideally, the input would be transcriptome or protein data since there are no introns to worry about and the signal peptide would be attached to the front of the precursor.
10
+
11
+ Currently, the program produces a long list of sequences that fulfil all the requirements to be a potential neuropeptide. This list needs to be further analysed to find potential neuropeptides. Future versions of the program will automatically analyse the output file and extract a list of highly likely neuropeptides.
12
+
13
+ NpSearch produces a number of files - the final output files is produced as a fasta file and as a colour coded html file that can be opened by any web browser or even in a word processor.
14
+
15
+ Note: For this program to work, you will need to obtain a copy of Signal P 4.1 from cbs at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp" and link this to the program. Alternatively you will require an output text file from the Signal P which you can input into the program.
16
+
17
+ ** Currently only supported on Mac OS & Linux
18
+
19
+ If you use this program, please cite us:
20
+
21
+ Moghul I, Rowe M, Priyam A, ELphick M & Wurm Y <em>(in prep)</em> NpSearch: A Tool to Identify Novel Neuropeptides
22
+
23
+ ## Installation
24
+
25
+ 1. Simply open the terminal and type this
26
+ ```
27
+ $ gem install npsearch
28
+ ```
29
+ ## Usage
30
+
31
+ * Usage: npsearch [Options] -i [Input File] -o [Output Folder Name]
32
+
33
+ * Mandatory Options:
34
+
35
+ -i, --input [file] The input file (in fasta format). Can be a relative or a full
36
+ path.
37
+ -o, --output [folder name] The path to the output folder. This will be created if the
38
+ folder does not exist.
39
+
40
+ * Optional Options:
41
+ -m, --motif [Query Motif] By default NpSearch only searches for dibasic cleavage site
42
+ ("KR", "RR" or "KK"). This option allows one to change the
43
+ set of cleavage sites to be searched.
44
+ The period "." can be used to denote any character. Multiple
45
+ motifs query can be used by using a pipeline character ("|")
46
+ between each query and putting the motif query in speech marks
47
+ e.g. "KR|RR|R..R"
48
+ Advanced Users: Regular expressions are supported.
49
+ -c, --cut_off N Changes the minimum Open Reading
50
+ Frame from the default 10 amino acid residues to N amino acid
51
+ residues.
52
+ -s, --signalp_file [file] Is used to supply the signal peptide results to the program.
53
+ These signal peptide results must be created using the SignalP
54
+ program (Version 4.x), downloadable from CBS. If this argument
55
+ isn't suplied, then NpSearch will try to run a local version
56
+ of the Signal P script.
57
+ -e, --extract_orf Only extracts the Open Reading Frames.
58
+ -v, --verbose Provides more information on each step taken in this program.
59
+ -h, --help Display this screen
60
+ --version Shows version
61
+
62
+ ## Contributing
63
+
64
+ 1. Fork it
65
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
66
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
67
+ 4. Push to the branch (`git push origin my-new-feature`)
68
+ 5. Create new Pull Request
@@ -0,0 +1,14 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ task default: [:build]
5
+ desc 'Installs the ruby gem'
6
+ task :build do
7
+ exec("gem build np_search.gemspec && gem install ./NpSearch-#{NpSearch::VERSION}.gem")
8
+ end
9
+
10
+ task :test do
11
+ Rake::TestTask.new do |t|
12
+ t.pattern = 'test/test_np_search.rb'
13
+ end
14
+ end
@@ -0,0 +1,165 @@
1
+ #!/usr/bin/env ruby
2
+ require 'optparse'
3
+
4
+ require 'npsearch'
5
+ require 'npsearch/arg_validator'
6
+ require 'npsearch/version'
7
+
8
+ opt = {}
9
+ optparse = OptionParser.new do |opts|
10
+ opts.banner = <<Banner
11
+
12
+ * Usage: npsearch [Options] -i [Input File] -o [Output Folder Name]
13
+
14
+ * Mandatory Options:
15
+
16
+ Banner
17
+
18
+ opt[:input_file] = nil
19
+ opts.on('-i', '--input [file]', 'Path to the input fasta file') do |f|
20
+ opt[:input_file] = f
21
+ end
22
+
23
+ opts.separator ''
24
+ opts.separator '* Optional Options:'
25
+
26
+ opt[:motif] = 'KR|RR|KK'
27
+ opts.on('-m', '--motif [Query Motif]', 'By default NpSearch only searches',
28
+ ' for dibasic cleavage site ("KR", "RR" or "KK"). This option allows',
29
+ ' one to change the set of cleavage sites to be searched.',
30
+ ' The period "." can be used to denote any character. Multiple',
31
+ ' motifs query can be used by using a pipeline character ("|")',
32
+ ' between each query and putting the motif query in speech marks',
33
+ ' e.g. "KR|RR|R..R"',
34
+ ' Advanced Users: Regular expressions are supported.') do |motif|
35
+ opt[:motif] = motif
36
+ end
37
+
38
+ opt[:cut_off] = 10
39
+ opts.on('-c', '--cut_off N', Integer, 'Changes the minimum Open Reading',
40
+ ' Frame from the default 10 amino acid residues to N amino acid',
41
+ ' residues.') do |n|
42
+ opt[:cut_off] = n
43
+ end
44
+
45
+ opt[:signalp_file] = nil
46
+ opts.on('-s', '--signalp_file [file]',
47
+ 'Is used to supply the signal peptide results to the program. These',
48
+ ' signal peptide results must be created using the SignalP program',
49
+ " (Version 4.x), downloadable from CBS. If this argument isn't ",
50
+ ' suplied, then NpSearch will try to run a local version of the',
51
+ ' Signal P script.') do |signalp_file|
52
+ opt[:signalp_file] = signalp_file
53
+ end
54
+
55
+ opt[:extract_orf] = false
56
+ opts.on('-e', '--extract_orf', 'Only extracts the Open Reading Frames.') do
57
+ opt[:extract_orf] = true
58
+ end
59
+
60
+ opt[:verbose] = false
61
+ opts.on('-v', '--verbose', 'Provides more information on each step taken',
62
+ ' in this program.') do
63
+ opt[:verbose] = true
64
+ end
65
+
66
+ opts.on('-h', '--help', 'Display this screen') do
67
+ puts opts
68
+ exit
69
+ end
70
+
71
+ opts.on('--version', 'Shows version') do
72
+ puts NpSearch::VERSION
73
+ exit
74
+ end
75
+ end
76
+ optparse.parse!
77
+
78
+ NpSearch.init(opt)
79
+ NpSearch.run
80
+
81
+
82
+
83
+
84
+ # ############# Argument Validation...##############
85
+ # arg_vldr = NpSearch::ArgValidators.new(opt[:verbose])
86
+ # input_type = arg_vldr.arg(opt[:motif], opt[:input], opt[:output_dir],
87
+ # opt[:cut_off], opt[:extract_orf], opt[:signalp_file],
88
+ # optparse.help)
89
+
90
+ # ############# General Validation...##############
91
+ # vldr = NpSearch::Validators.new
92
+ # vldr.output_dir(opt[:output_dir])
93
+ # if opt[:signalp_file].nil? && opt[:extract_orf] == false
94
+ # sp_dir = vldr.signalp_dir
95
+ # end
96
+
97
+ # ############# Converting input file to Bio::FastaFormat. #############
98
+ # input_read = NpSearch::Input.read(opt[:input], input_type)
99
+
100
+ # ############# Extract_ORF #############
101
+ # if input_type == 'genetic'
102
+ # # Translate Sequences in all 6 frames
103
+ # translated = NpSearch::Translation.translate(input_read)
104
+ # translated.to_fasta('translated seq.', "#{opt[:output_dir]}/1_protein.fa")
105
+ # # Extract all possible ORF that are longer than the ORF_min_length
106
+ # orf = NpSearch::Translation.extract_orf(translated, opt[:cut_off])
107
+ # orf.to_fasta('Open Reading Frames', "#{opt[:output_dir]}/2_orf.fa")
108
+
109
+ # if opt[:extract_orf]
110
+ # puts "\nSuccess: All output files created in the directory:" \
111
+ # "#{opt[:output_dir]}'.\n "
112
+ # exit
113
+ # end
114
+ # end
115
+
116
+ # ############# Setting up more variables...##############
117
+ # if opt[:motif] == 'neuro_clv'
118
+ # motif = 'KK|KR|RR|' \
119
+ # 'R..R|R....R|R......R|H..R|H....R|H......R|K..R|K....R|K......R'
120
+ # else
121
+ # motif = opt[:motif]
122
+ # end
123
+ # vldr.motif_type(motif)
124
+
125
+ # if input_type == 'genetic'
126
+ # sp_input_file = "#{opt[:output_dir]}/2_orf.fa"
127
+ # sp_hash = orf
128
+ # file_number = 3
129
+ # else # i.e. if the input is protein
130
+ # sp_input_file = opt[:input]
131
+ # sp_hash = input_read
132
+ # file_number = 1
133
+ # end
134
+
135
+ # if opt[:signalp_file].nil?
136
+ # sp_out_file = "#{opt[:output_dir]}/#{file_number}_signalp_out.txt"
137
+ # file_number += 1
138
+ # NpSearch::Signalp.signalp(sp_dir, sp_input_file, sp_out_file)
139
+ # else
140
+ # sp_out_file = opt[:signalp_file]
141
+ # file_number = 1
142
+ # end
143
+
144
+ # ############# Signal P Results file Validation #############
145
+ # vldr.sp_results(sp_out_file)
146
+
147
+ # ############# Extract sequences with a signal peptide #############
148
+ # secretome = NpSearch::Analysis.parse(sp_out_file, sp_hash, motif)
149
+ # secretome.to_fasta('secretome file',
150
+ # "#{opt[:output_dir]}/#{file_number}_secretome.fa")
151
+ # file_number += 1
152
+
153
+ # ############# Remove any duplicate data #############
154
+ # flattened_seq = NpSearch::Analysis.flattener(secretome)
155
+
156
+ # ############# Creating Output Files #############
157
+ # flattened_seq.to_fasta('fasta output file',
158
+ # "#{opt[:output_dir]}/#{file_number}_output.fa")
159
+ # flattened_seq.to_html(motif,
160
+ # "#{opt[:output_dir]}/#{file_number}_output.html")
161
+
162
+ # ############# Success #############
163
+ # puts # a blank line.
164
+ # puts "Success: All output files created in the directory:'#{opt[:output_dir]}'."
165
+ # puts # a blank line
@@ -0,0 +1,96 @@
1
+ require 'bio'
2
+ require 'fileutils'
3
+
4
+ # require 'npsearch/arg_validator'
5
+ require 'npsearch/logger'
6
+ require 'npsearch/sequence'
7
+ require 'npsearch/signalp'
8
+ require 'npsearch/pool'
9
+
10
+ # Top level module / namespace.
11
+ module NpSearch
12
+ class <<self
13
+ MIN_ORF_SIZE = 40 # amino acids (including potential signal peptide)
14
+
15
+ attr_accessor :opt
16
+ attr_accessor :sequences
17
+
18
+ def logger
19
+ @logger ||= Logger.new(STDERR, @opt[:verbose])
20
+ end
21
+
22
+ def init(opt)
23
+ # @opt = args_validation(opt)
24
+ @opt = opt
25
+ @sequences = []
26
+ @opt[:num_threads] = 8
27
+ @opt[:type] = guess_sequence_type
28
+ @opt[:signalp_path] = '/Volumes/Data/programs/signalp-4.1/signalp'
29
+ @pool = Pool.new(@opt[:num_threads]) if @opt[:num_threads] > 1
30
+ end
31
+
32
+ def run
33
+ iterate_input_file
34
+ # score_sequence
35
+ # scan(?<=(KR|RR|KK))(\w+?)(?=(KR|RR|KK|$))
36
+ @sequences.each { |s| puts ">#{s.id}\n#{s.seq}" }
37
+ end
38
+
39
+ private
40
+
41
+ def iterate_input_file
42
+ biofastafile = Bio::FlatFile.open(Bio::FastaFormat, @opt[:input_file])
43
+ biofastafile.each_entry do |entry|
44
+ if @opt[:num_threads] > 1
45
+ @pool.schedule(entry) { |e| initialise_seqs(e) }
46
+ else
47
+ initialise_seqs(entry)
48
+ end
49
+ end
50
+ @pool.shutdown if @opt[:num_threads] > 1
51
+ end
52
+
53
+ def initialise_seqs(entry)
54
+ if @opt[:type] == :protein
55
+ initialise_protein_seq(entry.entry_id, entry.aaseq)
56
+ else
57
+ initialise_transcriptomic_seq(entry.entry_id, entry.naseq)
58
+ end
59
+ end
60
+
61
+ def initialise_protein_seq(id, seq)
62
+ sp = Signalp.analyse_sequence(seq)
63
+ @sequences << Sequence.new(id, seq, sp) if sp[:sp] == 'Y'
64
+ end
65
+
66
+ def initialise_transcriptomic_seq(id, naseq)
67
+ (1..6).each do |f|
68
+ translated_seq = naseq.translate(f)
69
+ orfs = translated_seq.to_s.scan(/(?=(M\w{#{MIN_ORF_SIZE},}))./).flatten
70
+ initialise_orfs(id, orfs, f)
71
+ end
72
+ end
73
+
74
+ def initialise_orfs(id, orfs, frame)
75
+ idx = 0
76
+ orfs.each do |orf|
77
+ sp = Signalp.analyse_sequence(orf)
78
+ next if sp[:sp] == 'N'
79
+ seq = Sequence.new(id, orf, sp)
80
+ seq.translated_frame = frame
81
+ seq.orf_index = idx
82
+ @sequences << seq
83
+ idx += 1
84
+ end
85
+ end
86
+
87
+ def guess_sequence_type
88
+ fasta_content = IO.binread(@opt[:input_file])
89
+ # removing non-letter and ambiguous characters
90
+ cleaned_sequence = fasta_content.gsub(/[^A-Z]|[NX]/i, '')
91
+ return nil if cleaned_sequence.length < 10 # conservative
92
+ type = Bio::Sequence.new(cleaned_sequence).guess(0.9)
93
+ (type == Bio::Sequence::NA) ? :nucleotide : :protein
94
+ end
95
+ end
96
+ end
@@ -0,0 +1,264 @@
1
+ module NpSearch
2
+ class ArgValidators
3
+
4
+
5
+ # Changes the logger level to output extra info when the verbose option is
6
+ # true.
7
+ def initialize(verbose_opt)
8
+ LOG.level = Logger::INFO if verbose_opt == true
9
+ end
10
+
11
+ # Runs all the arguments method...
12
+ def arg(motif, input, output_dir, orf_min_length, extract_orf,
13
+ signalp_file, help_banner)
14
+ comp_arg(input, motif, output_dir, extract_orf, help_banner)
15
+ input_type = guess_input_type(input)
16
+ extract_orf_conflict(input_type, extract_orf)
17
+ input_sp_file_conflict(input_type, signalp_file)
18
+ orf_min_length(orf_min_length)
19
+ input_type
20
+ end
21
+
22
+ # Ensures that the compulsory input arguments are supplied...
23
+ def comp_arg(input, motif, output_dir, extract_orf, help_banner)
24
+ comp_arg_error(motif, 'Query Motif ("-m" option)') if extract_orf == false
25
+ comp_arg_error(input, 'Input file ("-i option")')
26
+ comp_arg_error(output_dir, 'Output Folder ("-o" option)')
27
+ return unless input.nil? || (motif.nil? && extract_orf == false)
28
+ puts help_banner
29
+ exit
30
+ end
31
+
32
+ # Ensures that a message is provided for all missing compulsory args.
33
+ # Run from comp_arg method
34
+ def comp_arg_error(arg, message)
35
+ puts 'Usage Error: No ' + message + ' is supplied' if arg.nil?
36
+ end
37
+
38
+ # Guesses the type of data within the input file on the first 100 lines of
39
+ # the file (ignores all identifiers (lines that start with a '>').
40
+ # It has a 80% threshold.
41
+ def guess_input_type(input_file)
42
+ input_file_format(input_file)
43
+ sequences = []
44
+ File.open(input_file, 'r') do |file_stream|
45
+ file_stream.readlines[0..100].each do |line|
46
+ sequences << line.to_s unless line.match(/^>/)
47
+ end
48
+ end
49
+ type = Bio::Sequence.new(sequences).guess(0.8)
50
+ if type == Bio::Sequence::NA
51
+ input_type = 'genetic'
52
+ elsif type == Bio::Sequence::AA
53
+ input_type = 'protein'
54
+ end
55
+ input_type
56
+ end
57
+
58
+ # Ensures that the input file a) exists b) is not empty and c) is a fasta
59
+ # file. Run from the guess_input_type method.
60
+ def input_file_format(input_file)
61
+ unless File.exist?(input_file)
62
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
63
+ ' does not exist.')
64
+ end
65
+ if File.zero?(input_file)
66
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
67
+ ' is empty.')
68
+ end
69
+ unless File.probably_fasta?(input_file)
70
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
71
+ ' does not seem to be in fasta format. Only' \
72
+ ' input files in fasta format are supported.')
73
+ end
74
+ end
75
+
76
+ # Ensures that the extract_orf option is only used with genetic data.
77
+ def extract_orf_conflict(input_type, extract_orf)
78
+ return unless input_type == 'protein' && extract_orf == true
79
+ fail ArgumentError('Usage Error: Conflicting arguments detected:' \
80
+ ' Protein data detected within the input file,' \
81
+ ' when using the Extract_ORF option (option' \
82
+ ' "-e"). This option is only available when' \
83
+ ' input file contains genetic data.')
84
+ end
85
+
86
+ # Ensures that the protein data (or open reading frames) are supplied as
87
+ # the input file when the signal p output file is passed.
88
+ def input_sp_file_conflict(input_type, signalp_file)
89
+ return unless input_type == 'genetic' && !signalp_file.nil?
90
+ fail ArgumentError('Usage Error: Conflicting arguments detected' \
91
+ ': Genetic data detected within the input file' \
92
+ ' when using the Signal P Input Option (Option' \
93
+ ' "-s"). The Signal P input Option requires the' \
94
+ ' input of two files: the Signal P Script Result' \
95
+ ' files (at the "-s" option) and the protein' \
96
+ ' data file used to run the Signal P Script.')
97
+ end
98
+
99
+ # Ensures that the ORF minimum length is a number. Any digits after the
100
+ # decimal place are ignored.
101
+ def orf_min_length(orf_min_length)
102
+ return unless orf_min_length.to_i < 1
103
+ fail ArgumentError('Usage Error: The Open Reading Frames minimum' \
104
+ ' length can only be a full integer.')
105
+ end
106
+ end
107
+
108
+ class Validators
109
+ # Checks for the presence of the output directory; if not found, it asks
110
+ # the user whether they want to create the output directory.
111
+ def output_dir(output_dir)
112
+ unless File.directory? output_dir # If output_dir doesn't exist
113
+ fail IOError, "\n\nThe output directory deoes not exist\n\n"
114
+ end
115
+ rescue IOError
116
+ puts # a blank line
117
+ puts 'The output directory does not exist.'
118
+ puts # a blank line
119
+ puts "The directory '#{output_dir}' will be created in this location."
120
+ puts 'Do you to continue? [y/n]'
121
+ print '> '
122
+ inp = $stdin.gets.chomp
123
+ until inp.downcase == 'n' || inp.downcase == 'y' || inp == ''
124
+ puts # a blank line
125
+ puts "The input: '#{inp}' is not recognised - 'y' or 'n' are the" \
126
+ ' only recognisable inputs.'
127
+ puts 'Please try again.'
128
+ puts "The directory '#{output_dir}' will be created in this" \
129
+ ' location.'
130
+ puts 'Do you to continue? [y/n]'
131
+ print '> '
132
+ inp = $stdin.gets.chomp
133
+ end
134
+ if inp.downcase == 'y' || inp == ''
135
+ FileUtils.mkdir_p "#{output_dir}"
136
+ puts 'Created output directory...'
137
+ elsif inp.downcase == 'n'
138
+ raise ArgumentError('Critical Error: An output directory is' \
139
+ ' required; please create an output directory' \
140
+ ' and then try again.')
141
+ end
142
+ end
143
+
144
+ # Ensures that the Signal P Script is present. If not found in the home
145
+ # directory, it asks the user for its location.
146
+ def signalp_dir
147
+ signalp_dir = "#{Dir.home}/SignalPeptide"
148
+ if File.exist? "#{signalp_dir}/signalp"
149
+ signalp_directory = signalp_dir
150
+ else
151
+ begin
152
+ fail IOError('The Signal P Script directory cannot be found at' \
153
+ " the following location: '#{signalp_dir}/'.")
154
+ rescue IOError
155
+ puts # a blank line
156
+ puts 'Error: The Signal P Script directory cannot be found at the' \
157
+ " following location: '#{signalp_dir}/'."
158
+ puts # a blank line
159
+ puts 'Please enter the full path or a relative path to the Signal' \
160
+ ' P Script directory (i.e. to the folder containing the' \
161
+ ' Signal P script). Refer to the online tutorial for more help'
162
+ print '> '
163
+ inp = $stdin.gets.chomp
164
+ until (File.exist? "#{signalp_dir}/signalp") ||
165
+ (File.exist? "#{inp}/signalp")
166
+ puts # a blank line
167
+ puts 'The Signal P directory cannot be found at the following' \
168
+ " location: '#{inp}'"
169
+ puts 'Please enter the full path or a relative path to the Signal' \
170
+ ' Peptide directory again.'
171
+ print '> '
172
+ inp = $stdin.gets.chomp
173
+ end
174
+ signalp_directory = inp
175
+ puts # a blank line
176
+ puts "The Signal P directory has been found at '#{signalp_directory}'"
177
+ FileUtils.ln_s "#{signalp_directory}", "#{Dir.home}/SignalPeptide",
178
+ force: true
179
+ puts # a blank line
180
+ end
181
+ end
182
+ signalp_directory
183
+ end
184
+
185
+ # Ensures that the supported version of the Signal P Script has been linked
186
+ # to NpSearch. Run from the 'sp_results' method.
187
+ def sp_version(input_file)
188
+ File.open(input_file, 'r') do |file_stream|
189
+ first_line = file_stream.readline
190
+ if first_line.match(/# SignalP-4.1/)
191
+ return true
192
+ else
193
+ return false
194
+ end
195
+ end
196
+ end
197
+
198
+ # Ensures that the critical columns in the tabular results produced by the
199
+ # Signal P script are conserved. Run from the 'sp_results' method.
200
+ def sp_column(_input_file)
201
+ File.open('signalp_out.txt', 'r') do |file_stream|
202
+ secondline = file_stream.readlines[1]
203
+ row = secondline.gsub(/\s+/m, ' ').chomp.split(' ')
204
+ if row[1] != 'name' && row[4] != 'Ymax' && row[5] != 'pos' &&
205
+ row[9] != 'D'
206
+ return true
207
+ else
208
+ return false
209
+ end
210
+ end
211
+ end
212
+
213
+ # Ensure that the right version of the Signal P script is used (via
214
+ # 'sp_version' Method). If the wrong signal p script has been linked to
215
+ # NpSearch, check whether the critical columns in the tabular results
216
+ # produced by the Signal P Script are conserved (via 'sp_column'
217
+ # Method).
218
+ def sp_results(signalp_output_file)
219
+ return if sp_version(signalp_output_file)
220
+ # i.e. if Signal P is the wrong version
221
+ if sp_column(signalp_output_file) # If wrong version but correct columns
222
+ puts # a blank line
223
+ puts 'Warning: The wrong version of signalp has been linked.' \
224
+ ' However, the signal peptide output file still seems to' \
225
+ ' be in the right format.'
226
+ else
227
+ puts # a blank line
228
+ puts 'Warning: The wrong version of the signal p has been linked' \
229
+ ' and the signal peptide output is in an unrecognised format.'
230
+ puts 'Continuing may give you meaningless results.'
231
+ end
232
+ puts # a blank line
233
+ puts 'Do you still want to continue? [y/n]'
234
+ print '> '
235
+ inp = $stdin.gets.chomp
236
+ until inp.downcase == 'n' || inp.downcase == 'y'
237
+ puts # a blank line
238
+ puts "The input: '#{inp}' is not recognised - 'y' or 'n' are the" \
239
+ ' only recognisable inputs.'
240
+ puts 'Please try again.'
241
+ end
242
+ if inp.downcase == 'y'
243
+ puts 'Continuing.'
244
+ elsif inp.downcase == 'n'
245
+ fail IOError('Critical Error: NpSearch only supports SignalP 4.1' \
246
+ ' (downloadable form CBS) Please ensure the version' \
247
+ ' of the signal p script is downloaded.')
248
+ end
249
+ end
250
+
251
+ # Guesses the type of the data in the supplied motif. It ignores all
252
+ # non-word characters (e.g. '|' that is used for regex). It has a 90%
253
+ # threshold.
254
+ def motif_type(motif)
255
+ motif_seq = Bio::Sequence.new(motif.gsub(/\W/, ''))
256
+ type = motif_seq.guess(0.9)
257
+ return unless type.to_s != 'Bio::Sequence::AA'
258
+ fail IOError('Critical Error: There seems to be an error in' \
259
+ ' processing the motif. Please ensure that the motif' \
260
+ ' contains amino acid residues that you wish to search' \
261
+ ' for.')
262
+ end
263
+ end
264
+ end