sequence_logo 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +17 -0
- data/Gemfile +4 -0
- data/LICENSE +22 -0
- data/README.md +61 -0
- data/Rakefile +5 -0
- data/bin/create_all_logos +3 -0
- data/bin/generate_logo +3 -0
- data/bin/pmflogo +3 -0
- data/lib/sequence_logo.rb +7 -0
- data/lib/sequence_logo/assets/nucl_simpa/a.png +0 -0
- data/lib/sequence_logo/assets/nucl_simpa/c.png +0 -0
- data/lib/sequence_logo/assets/nucl_simpa/g.png +0 -0
- data/lib/sequence_logo/assets/nucl_simpa/t.png +0 -0
- data/lib/sequence_logo/exec/create_all_logos.rb +25 -0
- data/lib/sequence_logo/exec/generate_logo.rb +18 -0
- data/lib/sequence_logo/exec/pmflogo.rb +26 -0
- data/lib/sequence_logo/pmflogo_lib.rb +193 -0
- data/lib/sequence_logo/version.rb +3 -0
- data/lib/sequence_logo/ytilib.rb +9 -0
- data/lib/sequence_logo/ytilib/addon.rb +247 -0
- data/lib/sequence_logo/ytilib/bismark.rb +71 -0
- data/lib/sequence_logo/ytilib/hack1.rb +75 -0
- data/lib/sequence_logo/ytilib/infocod.rb +108 -0
- data/lib/sequence_logo/ytilib/iupac.rb +92 -0
- data/lib/sequence_logo/ytilib/pm.rb +562 -0
- data/lib/sequence_logo/ytilib/pmsd.rb +99 -0
- data/lib/sequence_logo/ytilib/randoom.rb +131 -0
- data/lib/sequence_logo/ytilib/ytilib.rb +147 -0
- data/sequence_logo.gemspec +21 -0
- metadata +103 -0
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2012 Ilya Vorontsov
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
# SequenceLogo
|
2
|
+
|
3
|
+
SequenceLogo is a tool for drawing sequence logos of motifs. It gets Positional Count Matrices(PCMs) at input and generates png-logos for motif. Also one can create logo for reverse complement or even generate logos for a whole collection of motifs.
|
4
|
+
Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike Stephens. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence (see http://weblogo.berkeley.edu/)
|
5
|
+
|
6
|
+
|
7
|
+
## Installation
|
8
|
+
|
9
|
+
Add this line to your application's Gemfile:
|
10
|
+
|
11
|
+
gem 'sequence_logo'
|
12
|
+
|
13
|
+
And then execute:
|
14
|
+
|
15
|
+
$ bundle
|
16
|
+
|
17
|
+
Or install it yourself as:
|
18
|
+
|
19
|
+
$ gem install sequence_logo
|
20
|
+
|
21
|
+
## Usage
|
22
|
+
|
23
|
+
SequenceLogo consists of three tools:
|
24
|
+
* The most flexible tool **pmflogo** generates single logo for a single motif. It has quite complicated usage format:
|
25
|
+
|
26
|
+
pmflogo \<input_file\> \<output_logo_filename\> [words_count] [x_unit=100] [y_size=200] [icd_mode=discrete|weblogo] [revcomp=no|yes] [scheme=nucl_simpa] [paper_mode=no|yes] [threshold_lines=yes|no]
|
27
|
+
|
28
|
+
Any optional argument can be set as 'default', skipped parameters are also substituted as default (in example below icd_mode is default, and also scheme, paper_mode and threshold_lines): `pmflogo motif.pcm logo.png default 30 60 default yes`
|
29
|
+
|
30
|
+
Required arguments:
|
31
|
+
* input_file can be either in PCM format (file extension should be .pat or .pcm), or in FASTA format (file extensions: .mfa, .fasta, .plain), or in SMall BiSMark format (.xml), or in IUPAC format (any other extension).
|
32
|
+
* output_logo_filename is output logo file with format .png (extension should be included into name) which will be generated
|
33
|
+
|
34
|
+
Optional parameters:
|
35
|
+
* words_count [=default] is a float number that represents alignment weight. If words_count is set to 'default' - it'd be obtained from input (if it's PCM or IUPAC). In some cases (when PPM is used) words_count can't be obtained. In such a case discrete logo can't be drawn, and weblogo will be drawn instead.
|
36
|
+
* x_unit - width of a single letter
|
37
|
+
* y_size - full height of an image
|
38
|
+
* icd_mode - information content mode
|
39
|
+
* revcomp - create logo for a direct or reverse-complement orientation
|
40
|
+
* scheme - nucleotide images folder name (by default only one scheme is used)
|
41
|
+
* paper_mode - if paper_mode is true then threshold lines won't be drawn but a border is drawn instead
|
42
|
+
* threshold_lines - lines on levels: icd2of4, icdThc(=icd3of4), icdTlc, - relative to icd4of4
|
43
|
+
|
44
|
+
* Tool **generate_logo** generates two logos - direct and reverse-complement with some reasonable defaults for a single motif and puts a logo in a logo_folder
|
45
|
+
|
46
|
+
generate_logo \<motif_filename\> [logo_folder = directory of input motif file]
|
47
|
+
|
48
|
+
* Tool **create_all_logos** generates two logos - direct and reverse-complement with some reasonable defaults for each motif in a folder and puts all logos in a logo_folder
|
49
|
+
|
50
|
+
create_all_logos \<motifs_folder> \<logo_folder>
|
51
|
+
|
52
|
+
|
53
|
+
## Contributing
|
54
|
+
|
55
|
+
1. Fork it
|
56
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
57
|
+
3. Commit your changes (`git commit -am 'Added some feature'`)
|
58
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
59
|
+
5. Create new Pull Request
|
60
|
+
|
61
|
+
Copyright (c) 2011-2012 Ivan Kulakovskiy(author), Ilya Vorontsov(refactoring and gemification)
|
data/Rakefile
ADDED
data/bin/generate_logo
ADDED
data/bin/pmflogo
ADDED
Binary file
|
Binary file
|
Binary file
|
Binary file
|
@@ -0,0 +1,25 @@
|
|
1
|
+
require 'sequence_logo'
|
2
|
+
require 'fileutils'
|
3
|
+
|
4
|
+
motifs_folder = ARGV.shift
|
5
|
+
unless motifs_folder && Dir.exist?(motifs_folder)
|
6
|
+
puts('Specified input folder not exists')
|
7
|
+
exit(1)
|
8
|
+
end
|
9
|
+
|
10
|
+
logo_folder = ARGV.shift
|
11
|
+
unless logo_folder
|
12
|
+
puts('Output logo folder must be specified')
|
13
|
+
exit(1)
|
14
|
+
end
|
15
|
+
|
16
|
+
Dir.mkdir(logo_folder) unless Dir.exist?(logo_folder)
|
17
|
+
|
18
|
+
Dir.glob(File.join(motifs_folder, '*')).to_enum.each do |filename|
|
19
|
+
filename_wo_ext = File.basename(filename, File.extname(filename))
|
20
|
+
direct_output = File.join(logo_folder,"#{filename_wo_ext}_direct.png")
|
21
|
+
revcomp_output = File.join(logo_folder,"#{filename_wo_ext}_revcomp.png")
|
22
|
+
|
23
|
+
draw_logo(filename, direct_output, words_count: 'default', x_unit: 30, y_size: 60, icd_mode: 'discrete', revcomp: 'direct')
|
24
|
+
draw_logo(filename, revcomp_output, words_count: 'default', x_unit: 30, y_size: 60, icd_mode: 'discrete', revcomp: 'revcomp')
|
25
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
require 'sequence_logo'
|
2
|
+
require 'fileutils'
|
3
|
+
|
4
|
+
filename = ARGV.shift
|
5
|
+
unless filename && File.exist?(filename)
|
6
|
+
puts 'Existing input file should be specified'
|
7
|
+
exit(1)
|
8
|
+
end
|
9
|
+
|
10
|
+
logo_dir = ARGV.shift || File.dirname(filename)
|
11
|
+
FileUtils.mkdir(logo_dir) unless Dir.exist?(logo_dir)
|
12
|
+
|
13
|
+
filename_wo_ext = File.basename(filename, File.extname(filename))
|
14
|
+
direct_output = File.join(logo_dir,"#{filename_wo_ext}_direct.png")
|
15
|
+
revcomp_output = File.join(logo_dir,"#{filename_wo_ext}_revcomp.png")
|
16
|
+
|
17
|
+
draw_logo(filename, direct_output, x_unit: 30, y_size: 60, revcomp: 'direct')
|
18
|
+
draw_logo(filename, revcomp_output, x_unit: 30, y_size: 60, revcomp: 'revcomp')
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# pmflogo <input_file> <output_logo_filename> [words_count] [x_unit=100] [y_size=200] [icd_mode=discrete|weblogo] [revcomp=no|yes] [scheme=nucl_simpa] [paper_mode=no|yes] [threshold_lines=yes|no]
|
2
|
+
# Any optional argument can be set as 'default' e.g.
|
3
|
+
# pmflogo motif.pcm logo.png default 30 60 default yes
|
4
|
+
# skipped parameters are also substituted as default (in example above icd_mode is default, and also scheme, paper_mode and threshold_lines)
|
5
|
+
|
6
|
+
require 'sequence_logo'
|
7
|
+
|
8
|
+
if ARGV.size < 2
|
9
|
+
puts('At least two arguments must be specified, see usage of pmflogo')
|
10
|
+
exit(2)
|
11
|
+
end
|
12
|
+
|
13
|
+
input_file, output_logo_filename = ARGV.shift(2)
|
14
|
+
unless File.exist?(input_file)
|
15
|
+
puts('Specified input file not exists')
|
16
|
+
exit(1)
|
17
|
+
end
|
18
|
+
|
19
|
+
options = {}
|
20
|
+
options[:words_count] = ARGV.shift
|
21
|
+
options[:x_unit], options[:y_size] = ARGV.shift(2)
|
22
|
+
options[:icd_mode], options[:revcomp], options[:scheme], options[:paper_mode], options[:threshold_lines] = ARGV.shift(5)
|
23
|
+
|
24
|
+
options.reject!{|k,v| v.nil?}
|
25
|
+
|
26
|
+
draw_logo(input_file, output_logo_filename, options)
|
@@ -0,0 +1,193 @@
|
|
1
|
+
require 'sequence_logo/ytilib'
|
2
|
+
require 'RMagick'
|
3
|
+
|
4
|
+
class PPM
|
5
|
+
def get_ppm
|
6
|
+
self
|
7
|
+
end
|
8
|
+
|
9
|
+
def get_line(v)
|
10
|
+
( (v - icd4of4) / icd4of4 ).abs
|
11
|
+
end
|
12
|
+
|
13
|
+
def get_logo(icd_mode)
|
14
|
+
if icd_mode == :weblogo
|
15
|
+
get_logo_weblogo
|
16
|
+
else
|
17
|
+
get_logo_discrete
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
|
22
|
+
def get_logo_weblogo
|
23
|
+
rseq = []
|
24
|
+
@matrix['A'].each_index { |i|
|
25
|
+
rseq << 2 + ['A','C','G','T'].inject(0) { |sum, l|
|
26
|
+
pn = @matrix[l][i]
|
27
|
+
sum += (pn == 0) ? 0 : pn * Math.log(pn) / Math.log(2)
|
28
|
+
}
|
29
|
+
}
|
30
|
+
|
31
|
+
mat = {'A'=>[], 'C'=>[], 'G'=>[], 'T'=>[]}
|
32
|
+
@matrix['A'].each_index { |i|
|
33
|
+
['A','C','G','T'].each { |l|
|
34
|
+
mat[l][i]= @matrix[l][i] * rseq[i] / 2 # so we can handle a '2 bit' scale here
|
35
|
+
}
|
36
|
+
}
|
37
|
+
|
38
|
+
mat
|
39
|
+
end
|
40
|
+
|
41
|
+
def get_logo_discrete
|
42
|
+
checkerr("words count is undefined") { !words_count }
|
43
|
+
|
44
|
+
rseq = []
|
45
|
+
@matrix['A'].each_index { |i|
|
46
|
+
rseq << (icd4of4 == 0 ? 1.0 : ( (infocod(i) - icd4of4) / icd4of4 ).abs)
|
47
|
+
}
|
48
|
+
|
49
|
+
mat = {'A'=>[], 'C'=>[], 'G'=>[], 'T'=>[]}
|
50
|
+
@matrix['A'].each_index { |i|
|
51
|
+
['A','C','G','T'].each { |l|
|
52
|
+
mat[l][i] = @matrix[l][i] * rseq[i]
|
53
|
+
}
|
54
|
+
}
|
55
|
+
|
56
|
+
mat
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
def get_ppm_from_file(in_file_name, words_count)
|
61
|
+
case File.ext_wo_name(in_file_name)
|
62
|
+
when 'pat', 'pcm'
|
63
|
+
pm = PM.load(in_file_name)
|
64
|
+
pm.fixwc if pm.words_count
|
65
|
+
when 'mfa', 'fasta', 'plain'
|
66
|
+
pm = PM.new_pcm(Ytilib.read_seqs2array(in_file_name))
|
67
|
+
when 'xml'
|
68
|
+
pm = PM.from_bismark(Bismark.new(in_file_name).elements["//PPM"])
|
69
|
+
when in_file_name
|
70
|
+
pm = PPM.from_IUPAC(in_file_name.upcase)
|
71
|
+
end
|
72
|
+
pm = pm.get_ppm
|
73
|
+
pm.words_count = words_count if words_count
|
74
|
+
pm
|
75
|
+
end
|
76
|
+
|
77
|
+
def create_canvas(x_size, y_size, icd_mode, paper_mode, threshold_lines, pm)
|
78
|
+
|
79
|
+
i_logo = Magick::ImageList.new
|
80
|
+
if paper_mode
|
81
|
+
i_logo.new_image(x_size, y_size)
|
82
|
+
else
|
83
|
+
if icd_mode == :discrete
|
84
|
+
i_logo.new_image(x_size, y_size, Magick::HatchFill.new('white', 'white'))
|
85
|
+
if threshold_lines
|
86
|
+
dr = Magick::Draw.new
|
87
|
+
dr.fill('transparent')
|
88
|
+
|
89
|
+
dr.stroke_width(y_size / 200.0)
|
90
|
+
dr.stroke_dasharray(7,7)
|
91
|
+
|
92
|
+
line2of4 = y_size - pm.get_line(pm.icd2of4) * y_size
|
93
|
+
lineThc = y_size - pm.get_line(pm.icdThc) * y_size
|
94
|
+
lineTlc = y_size - pm.get_line(pm.icdTlc) * y_size
|
95
|
+
|
96
|
+
dr.stroke('silver')
|
97
|
+
dr.line(0, line2of4, x_size, line2of4)
|
98
|
+
dr.line(0, lineThc, x_size, lineThc)
|
99
|
+
dr.line(0, lineTlc, x_size, lineTlc)
|
100
|
+
|
101
|
+
dr.draw(i_logo)
|
102
|
+
end
|
103
|
+
else
|
104
|
+
i_logo.new_image(x_size, y_size, Magick::HatchFill.new('white', 'bisque'))
|
105
|
+
end
|
106
|
+
end
|
107
|
+
i_logo
|
108
|
+
end
|
109
|
+
|
110
|
+
def letter_images(scheme_dir)
|
111
|
+
if File.exist?(File.join(scheme_dir,'a.png'))
|
112
|
+
lp = {'A' => File.join(scheme_dir,'a.png'), 'C' => File.join(scheme_dir,'c.png'), 'G' => File.join(scheme_dir,'g.png'), 'T' => File.join(scheme_dir,'t.png')}
|
113
|
+
elsif File.exist?(File.join(scheme_dir,'a.gif'))
|
114
|
+
lp = {'A' => File.join(scheme_dir,'a.gif'), 'C' => File.join(scheme_dir,'c.gif'), 'G' => File.join(scheme_dir,'g.gif'), 'T' => File.join(scheme_dir,'t.gif')}
|
115
|
+
else
|
116
|
+
raise "Scheme not exists in folder #{scheme_dir}"
|
117
|
+
end
|
118
|
+
i_letters = Magick::ImageList.new(lp['A'], lp['C'], lp['G'], lp['T'])
|
119
|
+
end
|
120
|
+
|
121
|
+
def draw_letters_on_canvas(i_logo, i_letters, matrix, y_size, x_unit)
|
122
|
+
matrix['A'].each_index { |i|
|
123
|
+
y_pos = 0
|
124
|
+
sorted_letters = ['A', 'C', 'G', 'T'].collect { |letter| {:score => matrix[letter][i], :letter => letter} }.sort_by { |pair| pair[:score] }.collect { |pair| pair[:letter] }.reverse
|
125
|
+
sorted_letters.each { |letter|
|
126
|
+
next if y_size * matrix[letter][i] <= 1
|
127
|
+
letter_index = {'A' => 0, 'C' => 1, 'G' => 2, 'T' => 3}[letter]
|
128
|
+
y_block = (y_size * matrix[letter][i]).round
|
129
|
+
i_logo << i_letters[letter_index].dup.resize(x_unit, y_block)
|
130
|
+
y_pos += y_block
|
131
|
+
i_logo.cur_image.page = Magick::Rectangle.new(0, 0, i * x_unit, y_size - y_pos )
|
132
|
+
}
|
133
|
+
}
|
134
|
+
end
|
135
|
+
|
136
|
+
|
137
|
+
def draw_logo(in_file_name, out_file_name, options = {})
|
138
|
+
default_options = { words_count: nil,
|
139
|
+
x_unit: 100,
|
140
|
+
y_size: 200,
|
141
|
+
icd_mode: 'discrete',
|
142
|
+
revcomp: false,
|
143
|
+
scheme: 'nucl_simpa',
|
144
|
+
paper_mode: false,
|
145
|
+
threshold_lines: true }
|
146
|
+
|
147
|
+
options = options.reject{|k,v| v == 'default' || v == :default}
|
148
|
+
options = default_options.merge( options )
|
149
|
+
|
150
|
+
x_unit = options[:x_unit].to_i
|
151
|
+
y_size = options[:y_size].to_i
|
152
|
+
icd_mode = options[:icd_mode].to_sym
|
153
|
+
scheme = options[:scheme]
|
154
|
+
|
155
|
+
words_count = options[:words_count]
|
156
|
+
words_count = words_count.to_f if words_count
|
157
|
+
|
158
|
+
revcomp = options[:revcomp]
|
159
|
+
revcomp = false if revcomp == 'no' || revcomp == 'false' || revcomp == 'direct'
|
160
|
+
|
161
|
+
paper_mode = options[:paper_mode]
|
162
|
+
paper_mode = false if paper_mode == 'no' || paper_mode == 'false'
|
163
|
+
|
164
|
+
threshold_lines = options[:threshold_lines]
|
165
|
+
threshold_lines = false if threshold_lines == 'no' || threshold_lines == 'false'
|
166
|
+
|
167
|
+
########################
|
168
|
+
|
169
|
+
pm = get_ppm_from_file(in_file_name, words_count)
|
170
|
+
checkerr("bad input file") { pm == nil }
|
171
|
+
|
172
|
+
x_size = x_unit * pm.length
|
173
|
+
|
174
|
+
|
175
|
+
unless pm.words_count
|
176
|
+
report "words count for PM is undefined, assuming weblogo mode"
|
177
|
+
icd_mode = :weblogo
|
178
|
+
end
|
179
|
+
|
180
|
+
i_logo = create_canvas(x_size, y_size, icd_mode, paper_mode, threshold_lines, pm)
|
181
|
+
|
182
|
+
pm.revcomp! if revcomp
|
183
|
+
matrix = pm.get_logo(icd_mode)
|
184
|
+
|
185
|
+
scheme_dir = File.join(SequenceLogo::AssetsPath, scheme)
|
186
|
+
i_letters = letter_images(scheme_dir)
|
187
|
+
draw_letters_on_canvas(i_logo, i_letters, matrix, y_size, x_unit)
|
188
|
+
|
189
|
+
i_logo = i_logo.flatten_images
|
190
|
+
i_logo.cur_image.border!(x_unit / 100 + 1, x_unit / 100 + 1, icd_mode == :discrete ? "green" : "red") if paper_mode
|
191
|
+
|
192
|
+
i_logo.write(out_file_name)
|
193
|
+
end
|
@@ -0,0 +1,9 @@
|
|
1
|
+
require 'sequence_logo/ytilib/ytilib'
|
2
|
+
require 'sequence_logo/ytilib/addon'
|
3
|
+
require 'sequence_logo/ytilib/iupac'
|
4
|
+
require 'sequence_logo/ytilib/pm'
|
5
|
+
require 'sequence_logo/ytilib/pmsd'
|
6
|
+
require 'sequence_logo/ytilib/randoom'
|
7
|
+
require 'sequence_logo/ytilib/bismark'
|
8
|
+
require 'sequence_logo/ytilib/hack1'
|
9
|
+
require 'sequence_logo/ytilib/infocod'
|
@@ -0,0 +1,247 @@
|
|
1
|
+
#!/usr/bin/ruby
|
2
|
+
|
3
|
+
def File.ext_wo_name(what)
|
4
|
+
return what if what.rindex(".") == nil
|
5
|
+
what = File.basename(what)
|
6
|
+
"#{what}"[what.rindex(".")+1..-1]
|
7
|
+
end
|
8
|
+
|
9
|
+
def File.name_wo_ext(what)
|
10
|
+
return what if what.rindex(".") == nil
|
11
|
+
what = File.basename(what)
|
12
|
+
"#{what}"[0...what.rindex(".")]
|
13
|
+
end
|
14
|
+
|
15
|
+
class Float
|
16
|
+
def round_to(x)
|
17
|
+
(self * 10**x).round.to_f / 10**x
|
18
|
+
end
|
19
|
+
|
20
|
+
def cut_to(x)
|
21
|
+
(self.abs * 10**x).floor.to_f * (self == 0.0 ? 0 : self/self.abs).round / 10**x
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
class Array
|
26
|
+
def shuffle
|
27
|
+
arr = self.dup
|
28
|
+
arr.size.downto 2 do |j|
|
29
|
+
r = rand j
|
30
|
+
arr[j-1], arr[r] = arr[r], arr[j-1]
|
31
|
+
end
|
32
|
+
arr
|
33
|
+
end
|
34
|
+
|
35
|
+
def shuffle!
|
36
|
+
(size - 1).downto 1 do |i|
|
37
|
+
j = rand(i + 1)
|
38
|
+
self[i], self[j] = self[j], self[i]
|
39
|
+
end
|
40
|
+
self
|
41
|
+
end
|
42
|
+
|
43
|
+
def average
|
44
|
+
self.empty? ? nil : self.inject(0) { |sum,s| sum += s } / self.size
|
45
|
+
end
|
46
|
+
alias mean average
|
47
|
+
|
48
|
+
def variance
|
49
|
+
return self.collect { |s| s*s }.average - average**2
|
50
|
+
end
|
51
|
+
|
52
|
+
def sum
|
53
|
+
self.inject(self[0]) { |sum,s| sum += s} - self[0]
|
54
|
+
end
|
55
|
+
|
56
|
+
end
|
57
|
+
|
58
|
+
class String
|
59
|
+
|
60
|
+
def compl!
|
61
|
+
self.tr!("acgtACGT", "tgcaTGCA")
|
62
|
+
return self
|
63
|
+
end
|
64
|
+
|
65
|
+
def compl
|
66
|
+
return self.tr("acgtACGT", "tgcaTGCA")
|
67
|
+
end
|
68
|
+
|
69
|
+
alias comp! compl!
|
70
|
+
alias complement! compl!
|
71
|
+
alias comp compl
|
72
|
+
alias complement compl
|
73
|
+
|
74
|
+
def revcomp
|
75
|
+
return comp.reverse
|
76
|
+
end
|
77
|
+
|
78
|
+
def revcomp!
|
79
|
+
return comp!.reverse!
|
80
|
+
end
|
81
|
+
|
82
|
+
def to_id
|
83
|
+
return self.gsub(/[^.\w]/, '_').upcase
|
84
|
+
end
|
85
|
+
|
86
|
+
end
|
87
|
+
|
88
|
+
# Also this can be done is a more sophisticated way
|
89
|
+
=begin
|
90
|
+
String.class_eval do
|
91
|
+
def to_id
|
92
|
+
return self.gsub(/[^.\w]/, '_')
|
93
|
+
end
|
94
|
+
end
|
95
|
+
=end
|
96
|
+
|
97
|
+
class String
|
98
|
+
# The opposite of String::next / String::succ. It is impossible to be a
|
99
|
+
# *complete* opposite because both "9".next = "10" and "09".next = "10";
|
100
|
+
# if going backwards from "10" there's no way to know whether the result
|
101
|
+
# should be "09" or "9". Where the first ranged character is about to
|
102
|
+
# underflow and the next character is within the same range the result
|
103
|
+
# is shrunk down - that is, "10" goes to "9", "aa" goes to "z"; any non-
|
104
|
+
# range prefix or suffix is OK, e.g. "+!$%10-=+" goes to "+!$%9-=+".
|
105
|
+
# Items in the middle of a string don't do this - e.g. "12.10" goes to
|
106
|
+
# "12.09", to match how "next" would work as best as possible.
|
107
|
+
#
|
108
|
+
# The standard "next" function works on strings that contain *no*
|
109
|
+
# alphanumeric characters, using character codes. This implementation
|
110
|
+
# of "prev" does *not* work on such strings - while strings may contain
|
111
|
+
# any characters you like, only the alphanumeric components are operated
|
112
|
+
# upon.
|
113
|
+
#
|
114
|
+
# Should total underflow result, "nil" will be returned - e.g. "00".prev
|
115
|
+
# returns 'nil', as does "a".prev. This is done even if there are other
|
116
|
+
# characters in the string that were not touched - e.g. "+0.0".prev
|
117
|
+
# also returns "nil". Broadly speaking, a "nil" return value is used for
|
118
|
+
# any attempt to find the previous value of a string that could not have
|
119
|
+
# been generated using "next" in the first place.
|
120
|
+
#
|
121
|
+
# As with "next" sometimes the result of "prev" can be a little obscure
|
122
|
+
# so it is often best to try out things using "irb" if unsure. Note in
|
123
|
+
# particular that software revision numbers do not necessarily behave
|
124
|
+
# predictably, because they don't with "next"! E.g. "12.4.9" might go to
|
125
|
+
# "12.4.10" for a revision number, but "12.4.9".next = "12.5.0". Thus
|
126
|
+
# "12.5.0".prev = "12.4.9" and "12.4.10".prev = "12.4.09" (because the
|
127
|
+
# only way to make "12.4.10" using "next" is to start at "12.4.09").
|
128
|
+
#
|
129
|
+
# Since 'succ' (successor) is an alias for 'next', so 'pred'
|
130
|
+
# (predecessor) is an alias for 'prev'.
|
131
|
+
#
|
132
|
+
def prev(collapse = false)
|
133
|
+
str = self.dup
|
134
|
+
early_exit = false
|
135
|
+
any_done = false
|
136
|
+
ranges = [
|
137
|
+
('0'[0]..'9'[0]),
|
138
|
+
('a'[0]..'z'[0]),
|
139
|
+
('A'[0]..'Z'[0]),
|
140
|
+
nil
|
141
|
+
]
|
142
|
+
|
143
|
+
# Search forward for the first in-range character. If found check
|
144
|
+
# to see if that character is "1", "a" or "A". If it is, record
|
145
|
+
# its index (from 0 to string length - 1). We'll need this if
|
146
|
+
# underflows wrap as far as the found byte because in that case
|
147
|
+
# this first found byte should be deleted ("aa..." -> "z...",
|
148
|
+
# "10..." -> "9...").
|
149
|
+
|
150
|
+
first_ranged = nil
|
151
|
+
|
152
|
+
for index in (1..str.length)
|
153
|
+
byte = str[index - 1]
|
154
|
+
|
155
|
+
# Determine whether or not the current byte is a number, lower case
|
156
|
+
# or upper case letter. We expect 'select' to only find one matching
|
157
|
+
# array entry in 'ranges', thus we dereference index 0 after the
|
158
|
+
# 'end' to put a matching range from within 'ranges' into 'within',
|
159
|
+
# or 'nil' for any unmatched byte.
|
160
|
+
|
161
|
+
within = ranges.select do |range|
|
162
|
+
range.nil? or range.include?(byte)
|
163
|
+
end [0]
|
164
|
+
|
165
|
+
unless within.nil?
|
166
|
+
case within.first
|
167
|
+
when '0'[0]
|
168
|
+
match_byte = '1'[0]
|
169
|
+
else
|
170
|
+
match_byte = within.first
|
171
|
+
end
|
172
|
+
|
173
|
+
first_ranged = index - 1 if (byte == match_byte)
|
174
|
+
first_within = within
|
175
|
+
break
|
176
|
+
end
|
177
|
+
end
|
178
|
+
|
179
|
+
for index in (1..str.length)
|
180
|
+
|
181
|
+
# Process the input string in reverse character order - fetch the
|
182
|
+
# bytes via negative index.
|
183
|
+
|
184
|
+
byte = str[-index]
|
185
|
+
|
186
|
+
within = ranges.select do |range|
|
187
|
+
range.nil? or range.include?(byte)
|
188
|
+
end [0]
|
189
|
+
|
190
|
+
# Skip this letter unless within a known range. Otherwise note that
|
191
|
+
# at least one byte was able to be processed.
|
192
|
+
|
193
|
+
next if within.nil?
|
194
|
+
any_done = true
|
195
|
+
|
196
|
+
# Decrement the current byte. If it is still within its range, set
|
197
|
+
# the byte and bail out - we're finished. Flag the early exit. If
|
198
|
+
# the byte is no longer within range, wrap the character around
|
199
|
+
# and continue the loop to carry the decrement to an earlier byte.
|
200
|
+
|
201
|
+
byte = byte - 1
|
202
|
+
|
203
|
+
if (within.include? byte)
|
204
|
+
str[-index] = byte
|
205
|
+
early_exit = true
|
206
|
+
break
|
207
|
+
else
|
208
|
+
str[-index] = within.last
|
209
|
+
|
210
|
+
# If we've just wrapped around a character immediately after the
|
211
|
+
# one found right at the start ('0', 'a' or 'A') then this first
|
212
|
+
# ranged character should be deleted (so "10" -> "09"
|
213
|
+
|
214
|
+
if (first_ranged != nil and first_within.include?(byte + 1) and (first_ranged - str.length) == -(index + 1))
|
215
|
+
str.slice!(-(index + 1))
|
216
|
+
early_exit = true
|
217
|
+
break
|
218
|
+
end
|
219
|
+
end
|
220
|
+
|
221
|
+
end # From outer 'for' loop
|
222
|
+
|
223
|
+
# If we did process at least one byte but we did not exit early, then
|
224
|
+
# the loop completed due to carrying a decrement to other bytes. This
|
225
|
+
# means an underflow condition - return 'nil'.
|
226
|
+
|
227
|
+
if (any_done == true and early_exit == false)
|
228
|
+
return nil
|
229
|
+
else
|
230
|
+
return str
|
231
|
+
end
|
232
|
+
end
|
233
|
+
|
234
|
+
# As (extended) String::pred / String::prev, but modifies the string in
|
235
|
+
# place rather than returning a copy. If underflow occurs, the string
|
236
|
+
# will be unchanged. Returns 'self'.
|
237
|
+
#
|
238
|
+
def prev!
|
239
|
+
new_str = prev
|
240
|
+
self.replace(new_str) unless new_str.nil?
|
241
|
+
return self
|
242
|
+
end
|
243
|
+
|
244
|
+
alias pred prev
|
245
|
+
alias pred! prev!
|
246
|
+
|
247
|
+
end
|