npsearch 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,68 @@
1
+ # NeuroPeptideSearch (NpSearch)
2
+ [![Gem Version](https://badge.fury.io/rb/NpSearch.svg)](http://badge.fury.io/rb/NpSearch)
3
+ [![Build Status](https://travis-ci.org/IsmailM/NeuroPeptideSearch.svg?branch=master)](https://travis-ci.org/IsmailM/NeuroPeptideSearch)
4
+ [![Dependency Status](https://gemnasium.com/IsmailM/NeuroPeptideSearch.svg)](https://gemnasium.com/IsmailM/NeuroPeptideSearch)
5
+ [![Inline docs](http://inch-ci.org/github/IsmailM/NeuroPeptideSearch.png?branch=master)](http://inch-ci.org/github/IsmailM/NeuroPeptideSearch)
6
+
7
+ > A tool to identify noval Neuropeptides.
8
+
9
+ NpSearch (NeuroPeptideSearch) is a program that searches for potential neuropeptides precursors based on the motifs commonly found on a neuropeptide. Ideally, the input would be transcriptome or protein data since there are no introns to worry about and the signal peptide would be attached to the front of the precursor.
10
+
11
+ Currently, the program produces a long list of sequences that fulfil all the requirements to be a potential neuropeptide. This list needs to be further analysed to find potential neuropeptides. Future versions of the program will automatically analyse the output file and extract a list of highly likely neuropeptides.
12
+
13
+ NpSearch produces a number of files - the final output files is produced as a fasta file and as a colour coded html file that can be opened by any web browser or even in a word processor.
14
+
15
+ Note: For this program to work, you will need to obtain a copy of Signal P 4.1 from cbs at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp" and link this to the program. Alternatively you will require an output text file from the Signal P which you can input into the program.
16
+
17
+ ** Currently only supported on Mac OS & Linux
18
+
19
+ If you use this program, please cite us:
20
+
21
+ Moghul I, Rowe M, Priyam A, ELphick M & Wurm Y <em>(in prep)</em> NpSearch: A Tool to Identify Novel Neuropeptides
22
+
23
+ ## Installation
24
+
25
+ 1. Simply open the terminal and type this
26
+ ```
27
+ $ gem install npsearch
28
+ ```
29
+ ## Usage
30
+
31
+ * Usage: npsearch [Options] -i [Input File] -o [Output Folder Name]
32
+
33
+ * Mandatory Options:
34
+
35
+ -i, --input [file] The input file (in fasta format). Can be a relative or a full
36
+ path.
37
+ -o, --output [folder name] The path to the output folder. This will be created if the
38
+ folder does not exist.
39
+
40
+ * Optional Options:
41
+ -m, --motif [Query Motif] By default NpSearch only searches for dibasic cleavage site
42
+ ("KR", "RR" or "KK"). This option allows one to change the
43
+ set of cleavage sites to be searched.
44
+ The period "." can be used to denote any character. Multiple
45
+ motifs query can be used by using a pipeline character ("|")
46
+ between each query and putting the motif query in speech marks
47
+ e.g. "KR|RR|R..R"
48
+ Advanced Users: Regular expressions are supported.
49
+ -c, --cut_off N Changes the minimum Open Reading
50
+ Frame from the default 10 amino acid residues to N amino acid
51
+ residues.
52
+ -s, --signalp_file [file] Is used to supply the signal peptide results to the program.
53
+ These signal peptide results must be created using the SignalP
54
+ program (Version 4.x), downloadable from CBS. If this argument
55
+ isn't suplied, then NpSearch will try to run a local version
56
+ of the Signal P script.
57
+ -e, --extract_orf Only extracts the Open Reading Frames.
58
+ -v, --verbose Provides more information on each step taken in this program.
59
+ -h, --help Display this screen
60
+ --version Shows version
61
+
62
+ ## Contributing
63
+
64
+ 1. Fork it
65
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
66
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
67
+ 4. Push to the branch (`git push origin my-new-feature`)
68
+ 5. Create new Pull Request
@@ -0,0 +1,14 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ task default: [:build]
5
+ desc 'Installs the ruby gem'
6
+ task :build do
7
+ exec("gem build np_search.gemspec && gem install ./NpSearch-#{NpSearch::VERSION}.gem")
8
+ end
9
+
10
+ task :test do
11
+ Rake::TestTask.new do |t|
12
+ t.pattern = 'test/test_np_search.rb'
13
+ end
14
+ end
@@ -0,0 +1,165 @@
1
+ #!/usr/bin/env ruby
2
+ require 'optparse'
3
+
4
+ require 'npsearch'
5
+ require 'npsearch/arg_validator'
6
+ require 'npsearch/version'
7
+
8
+ opt = {}
9
+ optparse = OptionParser.new do |opts|
10
+ opts.banner = <<Banner
11
+
12
+ * Usage: npsearch [Options] -i [Input File] -o [Output Folder Name]
13
+
14
+ * Mandatory Options:
15
+
16
+ Banner
17
+
18
+ opt[:input_file] = nil
19
+ opts.on('-i', '--input [file]', 'Path to the input fasta file') do |f|
20
+ opt[:input_file] = f
21
+ end
22
+
23
+ opts.separator ''
24
+ opts.separator '* Optional Options:'
25
+
26
+ opt[:motif] = 'KR|RR|KK'
27
+ opts.on('-m', '--motif [Query Motif]', 'By default NpSearch only searches',
28
+ ' for dibasic cleavage site ("KR", "RR" or "KK"). This option allows',
29
+ ' one to change the set of cleavage sites to be searched.',
30
+ ' The period "." can be used to denote any character. Multiple',
31
+ ' motifs query can be used by using a pipeline character ("|")',
32
+ ' between each query and putting the motif query in speech marks',
33
+ ' e.g. "KR|RR|R..R"',
34
+ ' Advanced Users: Regular expressions are supported.') do |motif|
35
+ opt[:motif] = motif
36
+ end
37
+
38
+ opt[:cut_off] = 10
39
+ opts.on('-c', '--cut_off N', Integer, 'Changes the minimum Open Reading',
40
+ ' Frame from the default 10 amino acid residues to N amino acid',
41
+ ' residues.') do |n|
42
+ opt[:cut_off] = n
43
+ end
44
+
45
+ opt[:signalp_file] = nil
46
+ opts.on('-s', '--signalp_file [file]',
47
+ 'Is used to supply the signal peptide results to the program. These',
48
+ ' signal peptide results must be created using the SignalP program',
49
+ " (Version 4.x), downloadable from CBS. If this argument isn't ",
50
+ ' suplied, then NpSearch will try to run a local version of the',
51
+ ' Signal P script.') do |signalp_file|
52
+ opt[:signalp_file] = signalp_file
53
+ end
54
+
55
+ opt[:extract_orf] = false
56
+ opts.on('-e', '--extract_orf', 'Only extracts the Open Reading Frames.') do
57
+ opt[:extract_orf] = true
58
+ end
59
+
60
+ opt[:verbose] = false
61
+ opts.on('-v', '--verbose', 'Provides more information on each step taken',
62
+ ' in this program.') do
63
+ opt[:verbose] = true
64
+ end
65
+
66
+ opts.on('-h', '--help', 'Display this screen') do
67
+ puts opts
68
+ exit
69
+ end
70
+
71
+ opts.on('--version', 'Shows version') do
72
+ puts NpSearch::VERSION
73
+ exit
74
+ end
75
+ end
76
+ optparse.parse!
77
+
78
+ NpSearch.init(opt)
79
+ NpSearch.run
80
+
81
+
82
+
83
+
84
+ # ############# Argument Validation...##############
85
+ # arg_vldr = NpSearch::ArgValidators.new(opt[:verbose])
86
+ # input_type = arg_vldr.arg(opt[:motif], opt[:input], opt[:output_dir],
87
+ # opt[:cut_off], opt[:extract_orf], opt[:signalp_file],
88
+ # optparse.help)
89
+
90
+ # ############# General Validation...##############
91
+ # vldr = NpSearch::Validators.new
92
+ # vldr.output_dir(opt[:output_dir])
93
+ # if opt[:signalp_file].nil? && opt[:extract_orf] == false
94
+ # sp_dir = vldr.signalp_dir
95
+ # end
96
+
97
+ # ############# Converting input file to Bio::FastaFormat. #############
98
+ # input_read = NpSearch::Input.read(opt[:input], input_type)
99
+
100
+ # ############# Extract_ORF #############
101
+ # if input_type == 'genetic'
102
+ # # Translate Sequences in all 6 frames
103
+ # translated = NpSearch::Translation.translate(input_read)
104
+ # translated.to_fasta('translated seq.', "#{opt[:output_dir]}/1_protein.fa")
105
+ # # Extract all possible ORF that are longer than the ORF_min_length
106
+ # orf = NpSearch::Translation.extract_orf(translated, opt[:cut_off])
107
+ # orf.to_fasta('Open Reading Frames', "#{opt[:output_dir]}/2_orf.fa")
108
+
109
+ # if opt[:extract_orf]
110
+ # puts "\nSuccess: All output files created in the directory:" \
111
+ # "#{opt[:output_dir]}'.\n "
112
+ # exit
113
+ # end
114
+ # end
115
+
116
+ # ############# Setting up more variables...##############
117
+ # if opt[:motif] == 'neuro_clv'
118
+ # motif = 'KK|KR|RR|' \
119
+ # 'R..R|R....R|R......R|H..R|H....R|H......R|K..R|K....R|K......R'
120
+ # else
121
+ # motif = opt[:motif]
122
+ # end
123
+ # vldr.motif_type(motif)
124
+
125
+ # if input_type == 'genetic'
126
+ # sp_input_file = "#{opt[:output_dir]}/2_orf.fa"
127
+ # sp_hash = orf
128
+ # file_number = 3
129
+ # else # i.e. if the input is protein
130
+ # sp_input_file = opt[:input]
131
+ # sp_hash = input_read
132
+ # file_number = 1
133
+ # end
134
+
135
+ # if opt[:signalp_file].nil?
136
+ # sp_out_file = "#{opt[:output_dir]}/#{file_number}_signalp_out.txt"
137
+ # file_number += 1
138
+ # NpSearch::Signalp.signalp(sp_dir, sp_input_file, sp_out_file)
139
+ # else
140
+ # sp_out_file = opt[:signalp_file]
141
+ # file_number = 1
142
+ # end
143
+
144
+ # ############# Signal P Results file Validation #############
145
+ # vldr.sp_results(sp_out_file)
146
+
147
+ # ############# Extract sequences with a signal peptide #############
148
+ # secretome = NpSearch::Analysis.parse(sp_out_file, sp_hash, motif)
149
+ # secretome.to_fasta('secretome file',
150
+ # "#{opt[:output_dir]}/#{file_number}_secretome.fa")
151
+ # file_number += 1
152
+
153
+ # ############# Remove any duplicate data #############
154
+ # flattened_seq = NpSearch::Analysis.flattener(secretome)
155
+
156
+ # ############# Creating Output Files #############
157
+ # flattened_seq.to_fasta('fasta output file',
158
+ # "#{opt[:output_dir]}/#{file_number}_output.fa")
159
+ # flattened_seq.to_html(motif,
160
+ # "#{opt[:output_dir]}/#{file_number}_output.html")
161
+
162
+ # ############# Success #############
163
+ # puts # a blank line.
164
+ # puts "Success: All output files created in the directory:'#{opt[:output_dir]}'."
165
+ # puts # a blank line
@@ -0,0 +1,96 @@
1
+ require 'bio'
2
+ require 'fileutils'
3
+
4
+ # require 'npsearch/arg_validator'
5
+ require 'npsearch/logger'
6
+ require 'npsearch/sequence'
7
+ require 'npsearch/signalp'
8
+ require 'npsearch/pool'
9
+
10
+ # Top level module / namespace.
11
+ module NpSearch
12
+ class <<self
13
+ MIN_ORF_SIZE = 40 # amino acids (including potential signal peptide)
14
+
15
+ attr_accessor :opt
16
+ attr_accessor :sequences
17
+
18
+ def logger
19
+ @logger ||= Logger.new(STDERR, @opt[:verbose])
20
+ end
21
+
22
+ def init(opt)
23
+ # @opt = args_validation(opt)
24
+ @opt = opt
25
+ @sequences = []
26
+ @opt[:num_threads] = 8
27
+ @opt[:type] = guess_sequence_type
28
+ @opt[:signalp_path] = '/Volumes/Data/programs/signalp-4.1/signalp'
29
+ @pool = Pool.new(@opt[:num_threads]) if @opt[:num_threads] > 1
30
+ end
31
+
32
+ def run
33
+ iterate_input_file
34
+ # score_sequence
35
+ # scan(?<=(KR|RR|KK))(\w+?)(?=(KR|RR|KK|$))
36
+ @sequences.each { |s| puts ">#{s.id}\n#{s.seq}" }
37
+ end
38
+
39
+ private
40
+
41
+ def iterate_input_file
42
+ biofastafile = Bio::FlatFile.open(Bio::FastaFormat, @opt[:input_file])
43
+ biofastafile.each_entry do |entry|
44
+ if @opt[:num_threads] > 1
45
+ @pool.schedule(entry) { |e| initialise_seqs(e) }
46
+ else
47
+ initialise_seqs(entry)
48
+ end
49
+ end
50
+ @pool.shutdown if @opt[:num_threads] > 1
51
+ end
52
+
53
+ def initialise_seqs(entry)
54
+ if @opt[:type] == :protein
55
+ initialise_protein_seq(entry.entry_id, entry.aaseq)
56
+ else
57
+ initialise_transcriptomic_seq(entry.entry_id, entry.naseq)
58
+ end
59
+ end
60
+
61
+ def initialise_protein_seq(id, seq)
62
+ sp = Signalp.analyse_sequence(seq)
63
+ @sequences << Sequence.new(id, seq, sp) if sp[:sp] == 'Y'
64
+ end
65
+
66
+ def initialise_transcriptomic_seq(id, naseq)
67
+ (1..6).each do |f|
68
+ translated_seq = naseq.translate(f)
69
+ orfs = translated_seq.to_s.scan(/(?=(M\w{#{MIN_ORF_SIZE},}))./).flatten
70
+ initialise_orfs(id, orfs, f)
71
+ end
72
+ end
73
+
74
+ def initialise_orfs(id, orfs, frame)
75
+ idx = 0
76
+ orfs.each do |orf|
77
+ sp = Signalp.analyse_sequence(orf)
78
+ next if sp[:sp] == 'N'
79
+ seq = Sequence.new(id, orf, sp)
80
+ seq.translated_frame = frame
81
+ seq.orf_index = idx
82
+ @sequences << seq
83
+ idx += 1
84
+ end
85
+ end
86
+
87
+ def guess_sequence_type
88
+ fasta_content = IO.binread(@opt[:input_file])
89
+ # removing non-letter and ambiguous characters
90
+ cleaned_sequence = fasta_content.gsub(/[^A-Z]|[NX]/i, '')
91
+ return nil if cleaned_sequence.length < 10 # conservative
92
+ type = Bio::Sequence.new(cleaned_sequence).guess(0.9)
93
+ (type == Bio::Sequence::NA) ? :nucleotide : :protein
94
+ end
95
+ end
96
+ end
@@ -0,0 +1,264 @@
1
+ module NpSearch
2
+ class ArgValidators
3
+
4
+
5
+ # Changes the logger level to output extra info when the verbose option is
6
+ # true.
7
+ def initialize(verbose_opt)
8
+ LOG.level = Logger::INFO if verbose_opt == true
9
+ end
10
+
11
+ # Runs all the arguments method...
12
+ def arg(motif, input, output_dir, orf_min_length, extract_orf,
13
+ signalp_file, help_banner)
14
+ comp_arg(input, motif, output_dir, extract_orf, help_banner)
15
+ input_type = guess_input_type(input)
16
+ extract_orf_conflict(input_type, extract_orf)
17
+ input_sp_file_conflict(input_type, signalp_file)
18
+ orf_min_length(orf_min_length)
19
+ input_type
20
+ end
21
+
22
+ # Ensures that the compulsory input arguments are supplied...
23
+ def comp_arg(input, motif, output_dir, extract_orf, help_banner)
24
+ comp_arg_error(motif, 'Query Motif ("-m" option)') if extract_orf == false
25
+ comp_arg_error(input, 'Input file ("-i option")')
26
+ comp_arg_error(output_dir, 'Output Folder ("-o" option)')
27
+ return unless input.nil? || (motif.nil? && extract_orf == false)
28
+ puts help_banner
29
+ exit
30
+ end
31
+
32
+ # Ensures that a message is provided for all missing compulsory args.
33
+ # Run from comp_arg method
34
+ def comp_arg_error(arg, message)
35
+ puts 'Usage Error: No ' + message + ' is supplied' if arg.nil?
36
+ end
37
+
38
+ # Guesses the type of data within the input file on the first 100 lines of
39
+ # the file (ignores all identifiers (lines that start with a '>').
40
+ # It has a 80% threshold.
41
+ def guess_input_type(input_file)
42
+ input_file_format(input_file)
43
+ sequences = []
44
+ File.open(input_file, 'r') do |file_stream|
45
+ file_stream.readlines[0..100].each do |line|
46
+ sequences << line.to_s unless line.match(/^>/)
47
+ end
48
+ end
49
+ type = Bio::Sequence.new(sequences).guess(0.8)
50
+ if type == Bio::Sequence::NA
51
+ input_type = 'genetic'
52
+ elsif type == Bio::Sequence::AA
53
+ input_type = 'protein'
54
+ end
55
+ input_type
56
+ end
57
+
58
+ # Ensures that the input file a) exists b) is not empty and c) is a fasta
59
+ # file. Run from the guess_input_type method.
60
+ def input_file_format(input_file)
61
+ unless File.exist?(input_file)
62
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
63
+ ' does not exist.')
64
+ end
65
+ if File.zero?(input_file)
66
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
67
+ ' is empty.')
68
+ end
69
+ unless File.probably_fasta?(input_file)
70
+ fail ArgumentError("Critical Error: The input file '#{input_file}'" \
71
+ ' does not seem to be in fasta format. Only' \
72
+ ' input files in fasta format are supported.')
73
+ end
74
+ end
75
+
76
+ # Ensures that the extract_orf option is only used with genetic data.
77
+ def extract_orf_conflict(input_type, extract_orf)
78
+ return unless input_type == 'protein' && extract_orf == true
79
+ fail ArgumentError('Usage Error: Conflicting arguments detected:' \
80
+ ' Protein data detected within the input file,' \
81
+ ' when using the Extract_ORF option (option' \
82
+ ' "-e"). This option is only available when' \
83
+ ' input file contains genetic data.')
84
+ end
85
+
86
+ # Ensures that the protein data (or open reading frames) are supplied as
87
+ # the input file when the signal p output file is passed.
88
+ def input_sp_file_conflict(input_type, signalp_file)
89
+ return unless input_type == 'genetic' && !signalp_file.nil?
90
+ fail ArgumentError('Usage Error: Conflicting arguments detected' \
91
+ ': Genetic data detected within the input file' \
92
+ ' when using the Signal P Input Option (Option' \
93
+ ' "-s"). The Signal P input Option requires the' \
94
+ ' input of two files: the Signal P Script Result' \
95
+ ' files (at the "-s" option) and the protein' \
96
+ ' data file used to run the Signal P Script.')
97
+ end
98
+
99
+ # Ensures that the ORF minimum length is a number. Any digits after the
100
+ # decimal place are ignored.
101
+ def orf_min_length(orf_min_length)
102
+ return unless orf_min_length.to_i < 1
103
+ fail ArgumentError('Usage Error: The Open Reading Frames minimum' \
104
+ ' length can only be a full integer.')
105
+ end
106
+ end
107
+
108
+ class Validators
109
+ # Checks for the presence of the output directory; if not found, it asks
110
+ # the user whether they want to create the output directory.
111
+ def output_dir(output_dir)
112
+ unless File.directory? output_dir # If output_dir doesn't exist
113
+ fail IOError, "\n\nThe output directory deoes not exist\n\n"
114
+ end
115
+ rescue IOError
116
+ puts # a blank line
117
+ puts 'The output directory does not exist.'
118
+ puts # a blank line
119
+ puts "The directory '#{output_dir}' will be created in this location."
120
+ puts 'Do you to continue? [y/n]'
121
+ print '> '
122
+ inp = $stdin.gets.chomp
123
+ until inp.downcase == 'n' || inp.downcase == 'y' || inp == ''
124
+ puts # a blank line
125
+ puts "The input: '#{inp}' is not recognised - 'y' or 'n' are the" \
126
+ ' only recognisable inputs.'
127
+ puts 'Please try again.'
128
+ puts "The directory '#{output_dir}' will be created in this" \
129
+ ' location.'
130
+ puts 'Do you to continue? [y/n]'
131
+ print '> '
132
+ inp = $stdin.gets.chomp
133
+ end
134
+ if inp.downcase == 'y' || inp == ''
135
+ FileUtils.mkdir_p "#{output_dir}"
136
+ puts 'Created output directory...'
137
+ elsif inp.downcase == 'n'
138
+ raise ArgumentError('Critical Error: An output directory is' \
139
+ ' required; please create an output directory' \
140
+ ' and then try again.')
141
+ end
142
+ end
143
+
144
+ # Ensures that the Signal P Script is present. If not found in the home
145
+ # directory, it asks the user for its location.
146
+ def signalp_dir
147
+ signalp_dir = "#{Dir.home}/SignalPeptide"
148
+ if File.exist? "#{signalp_dir}/signalp"
149
+ signalp_directory = signalp_dir
150
+ else
151
+ begin
152
+ fail IOError('The Signal P Script directory cannot be found at' \
153
+ " the following location: '#{signalp_dir}/'.")
154
+ rescue IOError
155
+ puts # a blank line
156
+ puts 'Error: The Signal P Script directory cannot be found at the' \
157
+ " following location: '#{signalp_dir}/'."
158
+ puts # a blank line
159
+ puts 'Please enter the full path or a relative path to the Signal' \
160
+ ' P Script directory (i.e. to the folder containing the' \
161
+ ' Signal P script). Refer to the online tutorial for more help'
162
+ print '> '
163
+ inp = $stdin.gets.chomp
164
+ until (File.exist? "#{signalp_dir}/signalp") ||
165
+ (File.exist? "#{inp}/signalp")
166
+ puts # a blank line
167
+ puts 'The Signal P directory cannot be found at the following' \
168
+ " location: '#{inp}'"
169
+ puts 'Please enter the full path or a relative path to the Signal' \
170
+ ' Peptide directory again.'
171
+ print '> '
172
+ inp = $stdin.gets.chomp
173
+ end
174
+ signalp_directory = inp
175
+ puts # a blank line
176
+ puts "The Signal P directory has been found at '#{signalp_directory}'"
177
+ FileUtils.ln_s "#{signalp_directory}", "#{Dir.home}/SignalPeptide",
178
+ force: true
179
+ puts # a blank line
180
+ end
181
+ end
182
+ signalp_directory
183
+ end
184
+
185
+ # Ensures that the supported version of the Signal P Script has been linked
186
+ # to NpSearch. Run from the 'sp_results' method.
187
+ def sp_version(input_file)
188
+ File.open(input_file, 'r') do |file_stream|
189
+ first_line = file_stream.readline
190
+ if first_line.match(/# SignalP-4.1/)
191
+ return true
192
+ else
193
+ return false
194
+ end
195
+ end
196
+ end
197
+
198
+ # Ensures that the critical columns in the tabular results produced by the
199
+ # Signal P script are conserved. Run from the 'sp_results' method.
200
+ def sp_column(_input_file)
201
+ File.open('signalp_out.txt', 'r') do |file_stream|
202
+ secondline = file_stream.readlines[1]
203
+ row = secondline.gsub(/\s+/m, ' ').chomp.split(' ')
204
+ if row[1] != 'name' && row[4] != 'Ymax' && row[5] != 'pos' &&
205
+ row[9] != 'D'
206
+ return true
207
+ else
208
+ return false
209
+ end
210
+ end
211
+ end
212
+
213
+ # Ensure that the right version of the Signal P script is used (via
214
+ # 'sp_version' Method). If the wrong signal p script has been linked to
215
+ # NpSearch, check whether the critical columns in the tabular results
216
+ # produced by the Signal P Script are conserved (via 'sp_column'
217
+ # Method).
218
+ def sp_results(signalp_output_file)
219
+ return if sp_version(signalp_output_file)
220
+ # i.e. if Signal P is the wrong version
221
+ if sp_column(signalp_output_file) # If wrong version but correct columns
222
+ puts # a blank line
223
+ puts 'Warning: The wrong version of signalp has been linked.' \
224
+ ' However, the signal peptide output file still seems to' \
225
+ ' be in the right format.'
226
+ else
227
+ puts # a blank line
228
+ puts 'Warning: The wrong version of the signal p has been linked' \
229
+ ' and the signal peptide output is in an unrecognised format.'
230
+ puts 'Continuing may give you meaningless results.'
231
+ end
232
+ puts # a blank line
233
+ puts 'Do you still want to continue? [y/n]'
234
+ print '> '
235
+ inp = $stdin.gets.chomp
236
+ until inp.downcase == 'n' || inp.downcase == 'y'
237
+ puts # a blank line
238
+ puts "The input: '#{inp}' is not recognised - 'y' or 'n' are the" \
239
+ ' only recognisable inputs.'
240
+ puts 'Please try again.'
241
+ end
242
+ if inp.downcase == 'y'
243
+ puts 'Continuing.'
244
+ elsif inp.downcase == 'n'
245
+ fail IOError('Critical Error: NpSearch only supports SignalP 4.1' \
246
+ ' (downloadable form CBS) Please ensure the version' \
247
+ ' of the signal p script is downloaded.')
248
+ end
249
+ end
250
+
251
+ # Guesses the type of the data in the supplied motif. It ignores all
252
+ # non-word characters (e.g. '|' that is used for regex). It has a 90%
253
+ # threshold.
254
+ def motif_type(motif)
255
+ motif_seq = Bio::Sequence.new(motif.gsub(/\W/, ''))
256
+ type = motif_seq.guess(0.9)
257
+ return unless type.to_s != 'Bio::Sequence::AA'
258
+ fail IOError('Critical Error: There seems to be an error in' \
259
+ ' processing the motif. Please ensure that the motif' \
260
+ ' contains amino acid residues that you wish to search' \
261
+ ' for.')
262
+ end
263
+ end
264
+ end