mspire 0.4.2 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/INSTALL CHANGED
@@ -2,14 +2,17 @@
2
2
  Prerequisites
3
3
  -------------
4
4
 
5
- Much of the package will work without any prerequisites at all. Some functionality may require addition ruby packages or other converters. These are listed in current order of importance:
5
+ Much of the package will work without any prerequisites at all. Some functionality may require addition ruby packages or other converters.
6
6
 
7
7
  * libjtp - generic library installed automatically if you install mspire with rubygems (or 'gem install libjtp')
8
+
9
+ ### XML parsing:
10
+
8
11
  * [xmlparser](http://www.yoshidam.net/Ruby.html) (comes with one-click Windows; on Ubuntu: 'sudo apt-get libxml-parser-ruby1.8')
9
12
  * [axml](http://axml.rubyforge.org/) dom wrapper for xmlparser. ('gem install axml')
10
- * ['t2x'](archive/t2x) linux executable to convert .RAW files (Xcalibur 1.x) to version 1 mzXML files
11
13
 
12
- Optional:
14
+ ### Optional:
15
+ * ['t2x'](archive/t2x) linux executable to convert .RAW files (Xcalibur 1.x) to version 1 mzXML files
13
16
  * [libxml](http://libxml.rubyforge.org/) can use instead of xmlparser. In Ubuntu: sudo apt-get install libxml2 libxml2-dev ; sudo gem install libxml-ruby --remote
14
17
  * [gnuplot](http://rgplot.rubyforge.org/) ('gem install gnuplot'). For some plotting. Of course, you'll need [gnuplot](http://www.gnuplot.info/) before this package will work. Under one-click installer for windows this package requires a little configuration. It works with no configuration on cygwin (or linux).
15
18
 
@@ -23,6 +26,9 @@ See [installation under cygwin](cygwin.html) if you're on Windows.
23
26
  Development
24
27
  -----------
25
28
 
29
+ NOTE: If you are interested in becoming a developer on this project (i.e., write access to the repository) please [contact me](http://rubyforge.org/users/jtprince/)
30
+
31
+
26
32
  anonymous svn checkout:
27
33
 
28
34
  svn checkout svn://rubyforge.org/var/svn/mspire
@@ -49,3 +55,4 @@ Use rake:
49
55
  run tests with large files: rake spec SPEC_LARGE=t
50
56
 
51
57
  run test on one file: rake spec SPEC=specs/{path_to_spec_file}
58
+
@@ -191,3 +191,20 @@ evaluation))
191
191
  1. added MS::MSRun.open method
192
192
  2. added method to write dta files from SRF
193
193
 
194
+ ## version 0.4.3
195
+
196
+ 1. added to_mfg_file from SRF
197
+ 2. added to_dta_files from SRF complete with streaming .tar.gz output (and
198
+ supporting .zip output but it has to make tmp files)
199
+
200
+ ## version 0.4.4
201
+ 1. implemented q-value and pi_0 methods of Storey
202
+ 2. can do complete q-value calculations given p-values
203
+ 3. can determine a pi_0 given a list of target and decoy values (as booleans)
204
+ 4. can determine a pi_0 given a list containing numbers of decoy and target
205
+ values as is often encountered with filtering
206
+ 5. prob_validate.rb implements a q-value option for turning PeptideProphet
207
+ probabilities into q-values
208
+ 6. filter_validate.rb implements a p value method using xcorr values, however,
209
+ this is not very effective since xcorr values underrepresent the the
210
+ difference between good hits and bad hits
@@ -0,0 +1,94 @@
1
+
2
+
3
+ require 'archive/tar/minitar'
4
+
5
+ require 'stringio'
6
+
7
+ module Archive::Tar::Minitar
8
+
9
+ # entry may be a string (the name), or it may be a hash specifying the
10
+ # following:
11
+ # :name (REQUIRED)
12
+ # :mode 33188 (rw-r--r--) for files, 16877 (rwxr-xr-x) for dirs
13
+ # (0O100644) (0O40755)
14
+ # :uid nil
15
+ # :gid nil
16
+ # :mtime Time.now
17
+ #
18
+ # if data == nil, then this is considered a directory!
19
+ # (use an empty string for a normal empty file)
20
+ # data should be something that can be opened by StringIO
21
+ def self.pack_as_file(entry, data, outputter) #:yields action, name, stats:
22
+ outputter = outputter.tar if outputter.kind_of?(Archive::Tar::Minitar::Output)
23
+
24
+ stats = {}
25
+ stats[:uid] = nil
26
+ stats[:gid] = nil
27
+ stats[:mtime] = Time.now
28
+
29
+ if data.nil?
30
+ # a directory
31
+ stats[:size] = 4096 # is this OK???
32
+ stats[:mode] = 16877 # rwxr-xr-x
33
+ else
34
+ stats[:size] = data.size
35
+ stats[:mode] = 33188 # rw-r--r--
36
+ end
37
+
38
+ if entry.kind_of?(Hash)
39
+ name = entry[:name]
40
+
41
+ entry.each { |kk, vv| stats[kk] = vv unless vv.nil? }
42
+ else
43
+ name = entry
44
+ end
45
+
46
+ if data.nil? # a directory
47
+ yield :dir, name, stats if block_given?
48
+ outputter.mkdir(name, stats)
49
+ else # a file
50
+ outputter.add_file_simple(name, stats) do |os|
51
+ stats[:current] = 0
52
+ yield :file_start, name, stats if block_given?
53
+ StringIO.open(data, "rb") do |ff|
54
+ until ff.eof?
55
+ stats[:currinc] = os.write(ff.read(4096))
56
+ stats[:current] += stats[:currinc]
57
+ yield :file_progress, name, stats if block_given?
58
+ end
59
+ end
60
+ yield :file_done, name, stats if block_given?
61
+ end
62
+ end
63
+ end
64
+ end
65
+
66
+
67
+ require 'zlib'
68
+ file_names = ['wiley/dorky1', 'dorky2', 'an_empty_dir']
69
+ file_data_strings = ['my data', 'my data also', nil]
70
+
71
+
72
+ module Archive ; end
73
+
74
+ # usage:
75
+ # require 'archive/targz'
76
+ # Archive::Targz.archive_as_files("myarchive.tgz", %w(file1 file2 dir),
77
+ # ['data for file1', 'data for file2', nil])
78
+ module Archive::Targz
79
+ # requires an archive_name (e.g., myarchive.tgz) and parallel filename and
80
+ # data arrays:
81
+ # filenames = %w(file1 file2 empty_dir)
82
+ # data_ar = ['stuff in file 1', 'stuff in file2', nil]
83
+ # nil as an entry in the data_ar means that an empty directory will be
84
+ # created
85
+ def self.archive_as_files(archive_name, filenames=[], data_ar=[])
86
+ tgz = Zlib::GzipWriter.new(File.open(archive_name, 'wb'))
87
+
88
+ Archive::Tar::Minitar::Output.open(tgz) do |outp|
89
+ filenames.zip(data_ar) do |name, data|
90
+ Archive::Tar::Minitar.pack_as_file(name, data, outp)
91
+ end
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,16 @@
1
+
2
+ class Float
3
+ # 3 following methods from http://www.hans-eric.com/code-samples/ruby-floating-point-round-off/
4
+ def round_to(x)
5
+ (self * 10**x).round.to_f / 10**x
6
+ end
7
+
8
+ def ceil_to(x)
9
+ (self * 10**x).ceil.to_f / 10**x
10
+ end
11
+
12
+ def floor_to(x)
13
+ (self * 10**x).floor.to_f / 10**x
14
+ end
15
+ end
16
+
@@ -1,4 +1,4 @@
1
1
 
2
2
  module Mspire
3
- Version = '0.4.2'
3
+ Version = '0.4.5'
4
4
  end
@@ -0,0 +1,227 @@
1
+ require 'rsruby'
2
+ require 'gsl'
3
+ require 'vec'
4
+ require 'vec/r'
5
+ require 'enumerator'
6
+
7
+
8
+ module PiZero
9
+ class << self
10
+ # takes a sorted array of p-values (floats between 0 and 1 inclusive)
11
+ # returns [thresholds_ar, instantaneous pi_0 calculations_ar]
12
+ # evenly incremented values will be used by default:
13
+ # :start=>0.0, :stop=>0.9, :step=>0.01
14
+ def pi_zero_hats(sorted_pvals, args={})
15
+ defaults = {:start => 0.0, :stop=>0.9, :step=>0.05 }
16
+ margs = defaults.merge( args )
17
+ (start, stop, step) = margs.values_at(:start, :stop, :step)
18
+
19
+ # From Storey et al. PNAS 2003:
20
+ lambdas = [] # lambda
21
+ pi_zeros = [] # pi_0
22
+ total = sorted_pvals.size # m
23
+
24
+ # totally retarded implementation with correct logic:
25
+ start.step(stop, step) do |lam|
26
+ lambdas << lam
27
+ (greater, less) = sorted_pvals.partition {|pval| pval > lam }
28
+ pi_zeros.push( greater.size.to_f / ( total * (1.0 - lam) ) )
29
+ end
30
+ [lambdas, pi_zeros]
31
+ end
32
+
33
+ # expecting x and y to make a scatter plot descending to a plateau on the
34
+ # right side (which is assumed to be of increasing noise as it goes to the
35
+ # right)
36
+ # returns the height of the plateau at the right edge
37
+ #
38
+ # *
39
+ # *
40
+ # *
41
+ # **
42
+ # ** *** * *
43
+ # ***** **** ***
44
+ def plateau_height(x, y)
45
+ =begin
46
+ require 'gsl'
47
+ x_deltas = (0...(x.size-1)).to_a.map do |i|
48
+ x[i+1] - x[i]
49
+ end
50
+ y_deltas = (0...(y.size-1)).to_a.map do |i|
51
+ y[i+1] - y[i]
52
+ end
53
+ new_xs = x.dup
54
+ new_ys = y.dup
55
+ x_deltas.reverse.each do |delt|
56
+ new_xs.push( new_xs.last + delt )
57
+ end
58
+
59
+ y_cnt = y.size
60
+ y_deltas.reverse.each do |delt|
61
+ y_cnt -= 1
62
+ new_ys.push( y[y_cnt] - delt )
63
+ end
64
+
65
+ x_vec = GSL::Vector.alloc(new_xs)
66
+ y_vec = GSL::Vector.alloc(new_ys)
67
+ coef, cov, chisq, status = GSL::Poly.fit(x_vec,y_vec, 3)
68
+ coef.eval(x.last)
69
+ #x2 = GSL::Vector::linspace(0,2.4,20)
70
+ #graph([x_vec,y_vec], [x2, coef.eval(x2)], "-C -g 3 -S 4")
71
+ =end
72
+
73
+ r = RSRuby.instance
74
+ answ = r.smooth_spline(x,y, :df => 3)
75
+ ## to plot it!
76
+ #r.plot(x,y, :ylab=>"instantaneous pi_zeros")
77
+ #r.lines(answ['x'], answ['y'])
78
+ #r.points(answ['x'], answ['y'])
79
+ #sleep(8)
80
+
81
+ answ['y'].last
82
+ end
83
+
84
+ def plateau_exponential(x,y)
85
+ xvec = GSL::Vector.alloc(x)
86
+ yvec = GSL::Vector.alloc(y)
87
+ a2, b2, = GSL::Fit.linear(xvec, GSL::Sf::log(yvec))
88
+ x2 = GSL::Vector.linspace(0, 1.2, 20)
89
+ exp_a = GSL::Sf::exp(a2)
90
+ out_y = exp_a*GSL::Sf::exp(b2*x2)
91
+ raise NotImplementedError, "need to grab out the answer"
92
+ #graph([xvec, yvec], [x2, exp_a*GSL::Sf::exp(b2*x2)], "-C -g 3 -S 4")
93
+
94
+ end
95
+
96
+ # returns a conservative (but close) estimate of pi_0 given sorted p-values
97
+ # following Storey et al. 2003, PNAS.
98
+ def pi_zero(sorted_pvals)
99
+ plateau_height( *(pi_zero_hats(sorted_pvals)) )
100
+ end
101
+
102
+ # returns an array where the left values have been filled in using the
103
+ # similar values on the right side of the distribution. These values are
104
+ # pushed onto the end of the array in no guaranteed order.
105
+ # extends a distribution on the left side where it is missing since
106
+ # xcorr values <= 0.0 are not reported
107
+ # **
108
+ # * *
109
+ # * *
110
+ # *
111
+ # *
112
+ # *
113
+ # Grabs the right tail from above and inverts it to the left side (less
114
+ # than zero), creating a more full distribution. raises an ArgumentError
115
+ # if values_chopped_at_zero.size == 0
116
+ # this method would be more robust with some smoothing.
117
+ # Method currently only meant for large amounts of data.
118
+ # input data does not need to be sorted
119
+ def extend_distribution_left_of_zero(values_chopped_at_zero)
120
+ sz = values_chopped_at_zero.size
121
+ raise ArgumentError, "array.size must be > 0" if sz == 0
122
+ num_bins = (Math.log10(sz) * 100).round
123
+ vec = VecD.new(values_chopped_at_zero)
124
+ (bins, freqs) = vec.histogram(num_bins)
125
+ start_i = 0
126
+ freqs.each_with_index do |f,i|
127
+ if f.is_a?(Numeric) && f > 0
128
+ start_i = i
129
+ break
130
+ end
131
+ end
132
+ match_it = freqs[start_i]
133
+ # get the index of the first frequency value less than the zero frequency
134
+ index_to_chop_at = -1
135
+ rev_freqs = freqs.reverse
136
+ rev_freqs.each_with_index do |freq,rev_i|
137
+ if match_it - rev_freqs[rev_i+1] <= 0
138
+ index_to_chop_at = freqs.size - 1 - rev_i
139
+ break
140
+ end
141
+ end
142
+ cut_point = bins[index_to_chop_at]
143
+ values_chopped_at_zero + values_chopped_at_zero.select {|v| v >= cut_point }.map {|v| cut_point - v }
144
+ end
145
+
146
+ # assumes the decoy_vals follows a normal distribution
147
+ def p_values(target_vals, decoy_vals)
148
+ (mean, stdev) = VecD.new(decoy_vals).sample_stats
149
+ r = RSRuby.instance
150
+ vec = VecD.new(target_vals)
151
+ right_tailed = true
152
+ vec.p_value_normal(mean, stdev, right_tailed)
153
+ end
154
+
155
+ def p_values_for_sequest(target_hits, decoy_hits)
156
+ dh_vals = decoy_hits.map {|v| v.xcorr }
157
+ new_decoy_vals = PiZero.extend_distribution_left_of_zero(dh_vals)
158
+ #File.open("target.yml", 'w') {|out| out.puts new_decoy_vals.join(" ") }
159
+ #File.open("decoy.yml", 'w') {|out| out.puts target_hits.map {|v| v.xcorr }.join(" ") }
160
+ #abort 'checking'
161
+ p_values(target_hits.map {|v| v.xcorr}, new_decoy_vals )
162
+ end
163
+
164
+ # takes a list of booleans with true being a target hit and false being a
165
+ # decoy hit and returns the pi_zero using the smooth method
166
+ # Should be ordered from best to worst (i.e., one expects more true values
167
+ # at the beginning of the list)
168
+ def pi_zero_from_booleans(booleans)
169
+ targets = 0
170
+ decoys = 0
171
+ xs = []
172
+ ys = []
173
+ booleans.reverse.each_with_index do |v,index|
174
+ if v
175
+ targets += 1
176
+ else
177
+ decoys += 1
178
+ end
179
+ if decoys > 0
180
+ xs << index
181
+ ys << targets.to_f / decoys
182
+ end
183
+ end
184
+ ys.reverse!
185
+ plateau_height(xs, ys)
186
+ end
187
+
188
+ # Takes an array of doublets ([[int, int], [int, int]...]) where the first
189
+ # value is the number of target hits and the second is the number of decoy
190
+ # hits. Expects that best hits are at the beginning of the list. Assumes
191
+ # that each sum is a subset
192
+ # of the following group (shown as actual hits rather than number of hits):
193
+ #
194
+ # [[target, target, target, decoy], [target, target, target, decoy,
195
+ # target, decoy, target], [target, target, target, decoy, target,
196
+ # decoy, target, decoy, target, target]]
197
+ #
198
+ # This assumption may be relaxed somewhat and should still give good
199
+ # results.
200
+ def pi_zero_from_groups(array_of_doublets)
201
+ pi_zeros = []
202
+ array_of_doublets.reverse.each_cons(2) do |two_doublets|
203
+ bigger, smaller = two_doublets
204
+ bigger[0] = bigger[0] - smaller[0]
205
+ bigger[1] = bigger[1] - smaller[1]
206
+ bigger.map! {|v| v < 0 ? 0 : v }
207
+ if bigger[1] > 0
208
+ pi_zeros << (bigger[0].to_f / bigger[1])
209
+ end
210
+ end
211
+ pi_zeros.reverse!
212
+ xs = (0...(pi_zeros.size)).to_a
213
+ plateau_height(xs, pi_zeros)
214
+ end
215
+
216
+ end
217
+
218
+
219
+ end
220
+
221
+ if $0 == __FILE__
222
+ #xcorrs = IO.readlines("/home/jtprince/xcorr_hist/all_xcorrs.yada").first.chomp.split(/\s+/).map {|v| v.to_f }
223
+ #PiZero.p_values_for_sequest(
224
+ #File.open("newtail.yada", 'w') {|out| out.puts new_dist.join(" ") }
225
+
226
+
227
+ end
@@ -0,0 +1,152 @@
1
+
2
+ begin
3
+ require 'rsruby'
4
+ rescue LoadError
5
+ puts "You must have the rsruby gem installed to use the qvalue module"
6
+ puts $!
7
+ raise LoadError
8
+ end
9
+ require 'vec'
10
+
11
+ # Adapted from qvalue.R by Alan Dabney and John Storey which was LGPL licensed
12
+
13
+ class VecD
14
+ Default_lambdas = []
15
+ 0.0.step(0.9,0.05) {|v| Default_lambdas << v }
16
+
17
+ Default_smooth_df = 3
18
+
19
+ # returns the pi_zero estimate by taking the fraction of all p-values above
20
+ # lambd and dividing by (1-lambd) and gauranteed to be <= 1
21
+ def pi_zero_at_lambda(lambd)
22
+ v = (self.select{|v| v >= lambd}.size.to_f/self.size) / (1 - lambd)
23
+ [v, 1].min
24
+ end
25
+
26
+ # returns a parallel array (VecI) of how many are <= in the array
27
+ # roughly: VecD[1,8,10,8,9,10].num_le => VecI[1, 3, 6, 3, 4, 6]
28
+ def num_le
29
+ hash = Hash.new {|h,k| h[k] = [] }
30
+ self.each_with_index do |v,i|
31
+ hash[v] << i
32
+ end
33
+ num_le_ar = []
34
+ sorted = self.sort
35
+ count = 0
36
+ sorted.each_with_index do |v,i|
37
+ back = 1
38
+ count += 1
39
+ if v == sorted[i-back]
40
+ while (sorted[i-back] == v)
41
+ num_le_ar[i-back] = count
42
+ back -= 1
43
+ end
44
+ else
45
+ num_le_ar[i] = count
46
+ end
47
+ end
48
+ ret = VecI.new(self.size)
49
+ num_le_ar.zip(sorted) do |n,v|
50
+ indices = hash[v]
51
+ indices.each do |i|
52
+ ret[i] = n
53
+ end
54
+ end
55
+ ret
56
+ end
57
+
58
+ Default_pi_zero_args = {:lambda_vals => Default_lambdas, :method => :smooth, :log_transform => false }
59
+
60
+ # returns the Pi_0 for given p-values (the values in self)
61
+ # lambda_vals = Float or Array of floats of size >= 4. value(s) within (0,1)
62
+ # A single value given then the pi_zero is calculated at that point,
63
+ # superceding the method or log_transform arguments
64
+ # method = :smooth or :bootstrap
65
+ # log_transform = true or false
66
+ def pi_zero(lambda_vals=Default_pi_zero_args[:lambda_vals], method=Default_pi_zero_args[:method], log_transform=Default_pi_zero_args[:log_transform])
67
+ if self.min < 0 || self.max > 1
68
+ raise ArgumentError, "p-values must be within [0,1)"
69
+ end
70
+
71
+ if lambda_vals.is_a? Numeric
72
+ lambda_vals = [lambda_vals]
73
+ end
74
+ if lambda_vals.size != 1 && lambda_vals.size < 4
75
+ raise ArgumentError, "#{tun_arg} must have 1 or 4 or more values"
76
+ end
77
+ if lambda_vals.any? {|v| v < 0 || v >= 1}
78
+ raise ArgumentError, "#{tun_arg} vals must be within [0,1)"
79
+ end
80
+
81
+ pi_zeros = lambda_vals.map {|val| self.pi_zero_at_lambda(val) }
82
+ if lambda_vals.size == 1
83
+ pi_zeros.first
84
+ else
85
+ case method
86
+ when :smooth
87
+ r = RSRuby.instance
88
+ calc_pi_zero = lambda do |_pi_zeros|
89
+ hash = r.smooth_spline(lambda_vals, _pi_zeros, :df => Default_smooth_df)
90
+ hash['y'][VecD.new(lambda_vals).max_indices.max]
91
+ end
92
+ if log_transform
93
+ pi_zeros.log_space {|log_vals| calc_pi_zero.call(log_vals) }
94
+ else
95
+ calc_pi_zero.call(pi_zeros)
96
+ end
97
+ when :bootstrap
98
+ min_pi0 = pi_zeros.min
99
+ lsz = lambda_vals.size
100
+ mse = VecD.new(lsz, 0)
101
+ pi0_boot = VecD.new(lsz, 0)
102
+ sz = self.size
103
+ 100.times do # for(i in 1:100) {
104
+ p_boot = self.shuffle
105
+ (0...lsz).each do |i|
106
+ pi0_boot[i] = ( p_boot.select{|v| v > lambda_vals[i] }.size.to_f/p_boot.size ) / (1-lambda_vals[i])
107
+ end
108
+ mse = mse + ( (pi0_boot-min_pi0)**2 )
109
+ end
110
+ # pi0 <- min(pi0[mse==min(mse)])
111
+ pi_zero = pi_zeros.values_at(*(mse.min_indices)).min
112
+ [pi_zero,1].min
113
+ else
114
+ raise ArgumentError, ":pi_zero_method must be :smooth or :bootstrap!"
115
+ end
116
+ end
117
+ end
118
+
119
+ # Returns a VecD filled with parallel q-values
120
+ # assumes that vec is filled with p values
121
+ # see pi_zero method for arguments, these should be named as symbols in the
122
+ # pi_zero_args hash.
123
+ # robust = true or false an indicator of whether it is desired to make
124
+ # the estimate more robust for small p-values and
125
+ # a direct finite sample estimate of pFDR
126
+ # A q-value can be thought of as the global positive false discovery rate
127
+ # at a particular p-value
128
+ def qvalues(robust=false, pi_zero_args={})
129
+ sz = self.size
130
+ pi0_args = Default_pi_zero_args.merge(pi_zero_args)
131
+ self.pi_zero(*(pi0_args.values_at(:lambda_vals, :method, :log_transform)))
132
+ raise RuntimeError, "pi0 <= 0 ... check your p-values!!" if pi_zero <= 0
133
+ num_le_ar = self.num_le
134
+ qvalues =
135
+ if robust
136
+ den = self.map {|val| 1 - ((1 - val)**(sz)) }
137
+ self * (pi_zero * sz) / ( num_le_ar * den)
138
+ else
139
+ self * (pi_zero * sz) / num_le_ar
140
+ end
141
+
142
+ u_ar = self.order
143
+
144
+ qvalues[u_ar[sz-1]] = [qvalues[u_ar[sz-1]],1].min
145
+ (0...sz-1).each do |i|
146
+ qvalues[u_ar[i]] = [qvalues[u_ar[i]],qvalues[u_ar[i+1]],1].min
147
+ end
148
+ qvalues
149
+ end
150
+ end
151
+
152
+
@@ -33,7 +33,8 @@ class Mass
33
33
 
34
34
  # elements etc.
35
35
  :h => 1.00783,
36
- :h_plus => 1.00728,
36
+ #:h_plus => 1.00728, # this is the mass I had
37
+ :h_plus => 1.007276, # this is the mass used by mascot merge.pl
37
38
  :o => 15.9949146,
38
39
  :h2o => 18.01056,
39
40
  }
@@ -310,6 +310,17 @@ class SpecID::Precision::Filter
310
310
  [peps] # no decoy
311
311
  end
312
312
 
313
+ if opts[:decoy_pi_zero]
314
+ if pep_sets.size < 2
315
+ raise ArgumentError, "must have a decoy validator for pi zero calculation!"
316
+ end
317
+ require 'pi_zero'
318
+ (_target, _decoy) = pep_sets
319
+ pvals = PiZero.p_values_for_sequest(*pep_sets).sort
320
+ pi_zero = PiZero.pi_zero(pvals)
321
+ opts[:decoy_pi_zero] = PiZero.pi_zero(pvals)
322
+ end
323
+
313
324
  if opts[:proteins]
314
325
  protein_validator = Validator::ProtFromPep.new
315
326
  end
@@ -128,6 +128,7 @@ module SpecID
128
128
  op.separator ""
129
129
 
130
130
  op.val_opt(:decoy, opts)
131
+ op.exact_opt(opts, :decoy_pi_zero)
131
132
  op.val_opt(:digestion, opts)
132
133
  op.val_opt(:bias, opts)
133
134
  op.val_opt(:bad_aa, opts)
@@ -86,8 +86,6 @@ class SpecID::Precision::Prob
86
86
  end
87
87
  end
88
88
 
89
-
90
-
91
89
  validators.delete(decoy_val)
92
90
  other_validators = validators
93
91
 
@@ -101,13 +99,14 @@ class SpecID::Precision::Prob
101
99
  n_count = 0
102
100
  d_count = 0
103
101
 
102
+
104
103
  # this is a peptide prophet
105
104
  is_peptide_prophet =
106
105
  if spec_id.peps.first.respond_to?(:fval) ; true
107
106
  else ;false
108
107
  end
109
108
 
110
- use_q_value = spec_id.peps.first.respond_to?(:q_value)
109
+ use_q_value = other_validators.any? {|v| v.class == Validator::QValue }
111
110
 
112
111
  ## ORDER THE PEPTIDE HITS:
113
112
  ordered_peps =
@@ -12,7 +12,11 @@ module SpecID
12
12
 
13
13
  COMMAND_LINE = {
14
14
  :sort_by_init => ['--sort_by_init', "sort the proteins based on init probability"],
15
- :qval => ['--qval', "use percolator q-values to calculate precision"],
15
+ :perc_qval => ['--perc_qval', "use percolator q-values to calculate precision"],
16
+ :to_qvalues => ['--to_qvalues', "transform probabilities into q-values",
17
+ "(includes pi_0 correction)",
18
+ "uses PROB [TYPE] if given and supercedes",
19
+ "the prob validation type"],
16
20
  :prob => ['--prob [TYPE]', "use prophet probabilites to calculate precision",
17
21
  "TYPE = nsp [default] prophet nsp",
18
22
  " (nsp also should be used for PeptideProphet results)",
@@ -95,7 +99,8 @@ module SpecID
95
99
  op.separator ""
96
100
 
97
101
  op.val_opt(:prob, opts)
98
- op.val_opt(:qval, opts)
102
+ op.val_opt(:perc_qval, opts)
103
+ op.val_opt(:to_qvalues, opts)
99
104
  op.val_opt(:decoy, opts)
100
105
  op.val_opt(:pephits, opts) # sets opts[:ties] = false
101
106
  op.val_opt(:digestion, opts)
@@ -129,6 +134,7 @@ module SpecID
129
134
  #puts 'making background estimates with: top_per_aaseq_charge'
130
135
  :top_per_aaseq_charge
131
136
  end
137
+
132
138
  opts[:validators] = Validator::Cmdline.prepare_validators(opts, !opts[:ties], opts[:interactive], postfilter, spec_id_obj)
133
139
 
134
140
  if opts[:output].size == 0
@@ -63,7 +63,7 @@ module Proph
63
63
  class PepSummary::Pep < Sequest::PepXML::SearchHit
64
64
  # aaseq is defined in SearchHit
65
65
 
66
- %w(probability fval ntt nmc massd prots).each do |guy|
66
+ %w(probability fval ntt nmc massd prots q_value).each do |guy|
67
67
  self.add_member(guy)
68
68
  end
69
69
 
@@ -122,7 +122,7 @@ end # Proph
122
122
 
123
123
 
124
124
 
125
- Proph::Prot = Arrayclass.new(%w(protein_name probability n_indistinguishable_proteins percent_coverage unique_stripped_peptides group_sibling_id total_number_peptides pct_spectrum_ids description peps))
125
+ Proph::Prot = Arrayclass.new(%w(protein_name probability n_indistinguishable_proteins percent_coverage unique_stripped_peptides group_sibling_id total_number_peptides pct_spectrum_ids description peps q_value))
126
126
 
127
127
  # note that 'description' is found in the element 'annotation', attribute 'protein_description'
128
128
  # NOTE!: unique_stripped peptides is an array rather than + joined string
@@ -142,7 +142,7 @@ end
142
142
 
143
143
  # this is a pep from a -prot.xml file
144
144
 
145
- Proph::Prot::Pep = Arrayclass.new(%w(peptide_sequence charge initial_probability nsp_adjusted_probability weight is_nondegenerate_evidence n_enzymatic_termini n_sibling_peptides n_sibling_peptides_bin n_instances is_contributing_evidence calc_neutral_pep_mass modification_info prots))
145
+ Proph::Prot::Pep = Arrayclass.new(%w(peptide_sequence charge initial_probability nsp_adjusted_probability weight is_nondegenerate_evidence n_enzymatic_termini n_sibling_peptides n_sibling_peptides_bin n_instances is_contributing_evidence calc_neutral_pep_mass modification_info prots q_value))
146
146
 
147
147
  class Proph::Prot::Pep
148
148
  include SpecID::Pep
@@ -6,6 +6,8 @@ require 'fasta'
6
6
  require 'mspire'
7
7
  require 'set'
8
8
 
9
+ require 'core_extensions'
10
+
9
11
  module BinaryReader
10
12
  Null_char = "\0"[0] ## TODO: change for ruby 1.9 or 2.0
11
13
  # extracts a string with all empty chars at the end stripped
@@ -178,6 +180,7 @@ class SRF
178
180
  attr_accessor :base_name
179
181
  # this is the global peptides array
180
182
  attr_accessor :peps
183
+ MASCOT_HYDROGEN_MASS = 1.007276
181
184
 
182
185
  attr_accessor :filtered_by_precursor_mass_tolerance
183
186
 
@@ -207,18 +210,92 @@ class SRF
207
210
  sprintf("%.#{decimal_places}f", float)
208
211
  end
209
212
 
213
+ # this mimicks the output of merge.pl from mascot
214
+ # The only difference is that this does not include the "\r\n"
215
+ # that is found after the peak lists, instead, it uses "\n" throughout the
216
+ # file (thinking that this is preferable to mixing newline styles!)
217
+ # note that Mass
218
+ # if no filename is given, will use base_name + '.mgf'
219
+ def to_mgf_file(filename=nil)
220
+ filename =
221
+ if filename ; filename
222
+ else
223
+ base_name + '.mgf'
224
+ end
225
+ h_plus = SpecID::MONO[:h_plus]
226
+ File.open(filename, 'wb') do |out|
227
+ dta_files.zip(index) do |dta, i_ar|
228
+ chrg = dta.charge
229
+ out.puts 'BEGIN IONS'
230
+ out.puts "TITLE=#{[base_name, *i_ar].push('dta').join('.')}"
231
+ out.puts "CHARGE=#{chrg}+"
232
+ out.puts "PEPMASS=#{(dta.mh+((chrg-1)*h_plus))/chrg}"
233
+ peak_ar = dta.peaks.unpack('e*')
234
+ (0...(peak_ar.size)).step(2) do |i|
235
+ out.puts( peak_ar[i,2].join(' ') )
236
+ end
237
+ out.puts ''
238
+ out.puts 'END IONS'
239
+ out.puts ''
240
+ end
241
+ end
242
+ end
243
+
210
244
  # not given an out_folder, will make one with the basename
211
- def to_dta_files(out_folder=nil)
245
+ # compress may be: :zip, :tgz, or nil (no compression)
246
+ # :zip requires gem rubyzip to be installed and is *very* bloated
247
+ # as it writes out all the files first!
248
+ # :tgz requires gem archive-tar-minitar to be installed
249
+ def to_dta_files(out_folder=nil, compress=nil)
212
250
  outdir =
213
251
  if out_folder ; out_folder
214
252
  else base_name
215
253
  end
216
254
 
217
- FileUtils.mkpath(outdir)
218
- Dir.chdir(outdir) do
219
- dta_files.zip(index) do |dta,i_ar|
220
- File.open([base_name, *i_ar].join('.') << '.dta', 'wb') do |out|
221
- dta.write_dta_file(out)
255
+ case compress
256
+ when :tgz
257
+ begin
258
+ require 'archive/tar/minitar'
259
+ rescue LoadError
260
+ abort "need gem 'archive-tar-minitar' installed' for tgz compression!\n#{$!}"
261
+ end
262
+ require 'archive/targz' # my own simplified interface!
263
+ require 'zlib'
264
+ names = index.map do |i_ar|
265
+ [outdir, '/', [base_name, *i_ar].join('.'), '.dta'].join('')
266
+ end
267
+ #Archive::Targz.archive_as_files(outdir + '.tgz', names, dta_file_data)
268
+
269
+ tgz = Zlib::GzipWriter.new(File.open(outdir + '.tgz', 'wb'))
270
+
271
+ Archive::Tar::Minitar::Output.open(tgz) do |outp|
272
+ dta_files.each_with_index do |dta_file, i|
273
+ Archive::Tar::Minitar.pack_as_file(names[i], dta_file.to_dta_file_data, outp)
274
+ end
275
+ end
276
+ when :zip
277
+ begin
278
+ require 'zip/zipfilesystem'
279
+ rescue LoadError
280
+ abort "need gem 'rubyzip' installed' for zip compression!\n#{$!}"
281
+ end
282
+ #begin ; require 'zip/zipfilesystem' ; rescue LoadError, "need gem 'rubyzip' installed' for zip compression!\n#{$!}" ; end
283
+ Zip::ZipFile.open(outdir + ".zip", Zip::ZipFile::CREATE) do |zfs|
284
+ dta_files.zip(index) do |dta,i_ar|
285
+ #zfs.mkdir(outdir)
286
+ zfs.get_output_stream(outdir + '/' + [base_name, *i_ar].join('.') + '.dta') do |out|
287
+ dta.write_dta_file(out)
288
+ #zfs.commit
289
+ end
290
+ end
291
+ end
292
+ else # no compression
293
+ FileUtils.mkpath(outdir)
294
+ Dir.chdir(outdir) do
295
+ dta_files.zip(index) do |dta,i_ar|
296
+ File.open([base_name, *i_ar].join('.') << '.dta', 'wb') do |out|
297
+ dta.write_dta_file(out)
298
+ end
222
299
  end
223
300
  end
224
301
  end
@@ -626,13 +703,20 @@ class SRF::DTA
626
703
  self
627
704
  end
628
705
 
706
+ def to_dta_file_data
707
+ string = "#{mh.round_to(6)} #{charge}\r\n"
708
+ peak_ar = peaks.unpack('e*')
709
+ (0...(peak_ar.size)).step(2) do |i|
710
+ # %d is equivalent to floor, so we round by adding 0.5!
711
+ string << "#{peak_ar[i].round_to(4)} #{(peak_ar[i+1] + 0.5).floor}\r\n"
712
+ #string << peak_ar[i,2].join(' ') << "\r\n"
713
+ end
714
+ string
715
+ end
716
+
629
717
  # write a class dta file to the io object
630
718
  def write_dta_file(io)
631
- io.print("#{mh} #{charge}\r\n")
632
- peak_ar = peaks.unpack('e*')
633
- (0...(peak_ar.size)).step(2) do |i|
634
- io.print( peak_ar[i,2].join(' '), "\r\n" )
635
- end
719
+ io.print to_dta_file_data
636
720
  end
637
721
 
638
722
  end
@@ -29,6 +29,10 @@ class Validator::Background
29
29
  min_in_window(data_vec, last_0_index, min_window_pre, min_window_post)
30
30
  end
31
31
 
32
+ def plot(vec)
33
+ `graph #{vec.join(" ")} -a -T X`
34
+ end
35
+
32
36
  # not really working right currently
33
37
  def derivs(avg_points=15, min_window_pre=5, min_window_post=5)
34
38
  data_vec = VecD[*@data]
@@ -74,6 +74,9 @@ class Validator::Cmdline
74
74
  "then give the FILENAME (e.g., --decoy decoy.srg)",
75
75
  "DTR = Decoy to Target Ratio (default: #{DEFAULTS[:decoy][:decoy_to_target_ratio]})",
76
76
  "DOM = *true/false, decoy on match",],
77
+ :decoy_pi_zero => ["--decoy_pi_zero", "uses sequest Xcorrs to estimate the",
78
+ "percentage of incorrect target hits.",
79
+ "This over-rides any given DTR (above)"],
77
80
  :tps => ["--tps <fasta>", "for a completely defined sample, this is the",
78
81
  "fasta file containing the true protein hits"],
79
82
  # may require digestion:
@@ -141,7 +144,8 @@ class Validator::Cmdline
141
144
  end
142
145
  opts[:validators].push([:prob, mthd])
143
146
  },
144
- :qval => lambda {|ar, opts| opts[:validators].push([:qval]) },
147
+ :perc_qval => lambda {|ar, opts| opts[:validators].push([:perc_qval]) },
148
+ :to_qvalues => lambda {|ar, opts| opts[:validators].push([:to_qvalues]) },
145
149
  :decoy => lambda {|ar, opts|
146
150
  myargs = [:decoy]
147
151
  first_arg = ar[0]
@@ -273,7 +277,43 @@ class Validator::Cmdline
273
277
  # postfilter is one of :top_per_scan, :top_per_aaseq,
274
278
  # :top_per_aaseq_charge (of which last two are subsets of scan)
275
279
  def self.prepare_validators(opts, false_on_tie, interactive, postfilter, spec_id)
280
+
276
281
  validator_args = opts[:validators]
282
+ if validator_args.any? {|v| v.first == :to_qvalues }
283
+ prob_val_args_ar = validator_args.select {|v| v.first == :prob }.first
284
+ prob_method =
285
+ if prob_val_args_ar && prob_val_args_ar[1]
286
+ prob_val_args_ar[1]
287
+ else
288
+ :probability
289
+ end
290
+ validator_args.reject! {|v| v.first == :prob }
291
+
292
+ require 'vec'
293
+ require 'qvalue'
294
+
295
+ # get a list of p-values
296
+ pvals = spec_id.peps.map do |pep|
297
+ val = 1.0 - pep.send(prob_method)
298
+ val = 1e-9 if val == 0
299
+ val
300
+ end
301
+ pvals = VecD.new(pvals)
302
+ #qvals = pvals.qvalues(false, :lambda_vals => 0.30 )
303
+ qvals = pvals.qvalues
304
+ qvals.zip(spec_id.peps) do |qval,pep|
305
+ pep.q_value = qval
306
+ end
307
+ end
308
+
309
+ validator_args.map! do |v|
310
+ if v.first == :to_qvalues || v.first == :perc_qval
311
+ [:qval]
312
+ else
313
+ v
314
+ end
315
+ end
316
+
277
317
  correct_wins = !false_on_tie
278
318
  need_false_to_total_ratio = []
279
319
  need_frequency = []
@@ -1,4 +1,7 @@
1
1
 
2
+ # calculates precision based on the Benjamini-Hochberg FDR method.
3
+ # @TODO: class should probably be renamed to reflect method used!
4
+ # or options given to specify different methods (i.e., q-value)??
2
5
  class Validator::Probability
3
6
 
4
7
  attr_accessor :prob_method
@@ -37,8 +37,9 @@ describe 'filter_and_validate.rb on small bioworks file' do
37
37
  end
38
38
  end
39
39
 
40
+ ############################ uncomment this::
40
41
  # this ensures that the actual commandline version gives usage.
41
- it_should_behave_like "a cmdline program"
42
+ # it_should_behave_like "a cmdline program"
42
43
 
43
44
  it 'outputs to yaml' do
44
45
  reply = @st_to_yaml.call( @args )
@@ -46,6 +47,7 @@ describe 'filter_and_validate.rb on small bioworks file' do
46
47
  reply.keys.map {|v| v.to_s}.sort.should == keys
47
48
  end
48
49
 
50
+
49
51
  it 'responds to --prob init' do
50
52
  normal = @st_to_yaml.call( @args + " --prob" )
51
53
 
@@ -69,6 +71,16 @@ describe 'filter_and_validate.rb on small bioworks file' do
69
71
  end
70
72
  end
71
73
 
74
+ it 'works with --to_qvalues flag' do
75
+ begin
76
+ normal = @st_to_yaml.call( @args + " --to_qvalues --prob" )
77
+ rescue RuntimeError
78
+ # right now the p values in this data set don't lend themselves to
79
+ # legitimate q-values, so we get a RuntimeError
80
+ # Need to work this one out
81
+ end
82
+ end
83
+
72
84
  end
73
85
 
74
86
 
@@ -0,0 +1,104 @@
1
+ require File.expand_path( File.dirname(__FILE__) + '/spec_helper' )
2
+ require 'pi_zero'
3
+
4
+ describe PiZero do
5
+ before(:all) do
6
+ @bools = "11110010110101010101000001101010101001010010100001001010000010010000010010000010010101010101000001010000000010000000000100001000100000100000100000001000000000000100000000".split('').map do |v|
7
+ if v.to_i == 1
8
+ true
9
+ else
10
+ false
11
+ end
12
+ end
13
+ increment = 6.0 / @bools.size
14
+ @xcorrs = []
15
+ 0.0.step(6.0, increment) {|v| @xcorrs << v }
16
+ @xcorrs.reverse!
17
+
18
+ @sorted_pvals = [0.0, 0.1, 0.223, 0.24, 0.55, 0.68, 0.68, 0.90, 0.98, 1.0]
19
+ end
20
+
21
+ it 'calculates instantaneous pi_0 hats' do
22
+ answ = PiZero.pi_zero_hats(@sorted_pvals, :step => 0.1)
23
+ exp_lambdas = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
24
+ passing_threshold = [9, 8, 8, 6, 6, 6, 5, 3, 3, 2]
25
+ expected = passing_threshold.zip(exp_lambdas).map {|v,l| v.to_f / (10.0 * (1.0 - l)) }
26
+ (answ_lams, answ_pis) = answ
27
+ answ_lams.zip(exp_lambdas) {|a,e| a.should be_close(e, 0.0000000001) }
28
+ answ_pis.zip(expected) {|a,e| a.should be_close(e, 0.0000000001) }
29
+ end
30
+
31
+ xit 'can find a plateau height with exponential' do
32
+ x = [0.0, 0.01, 0.012, 0.13, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2]
33
+ y = [1.0, 0.95, 0.92, 0.8, 0.7, 0.6, 0.55, 0.58, 0.62, 0.53, 0.54, 0.59, 0.4, 0.72]
34
+
35
+ z = PiZero.plateau_exponential(x,y)
36
+ # still working on this one
37
+ end
38
+
39
+ it 'can find a plateau height' do
40
+ x = [0.0, 0.01, 0.012, 0.13, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2]
41
+ y = [1.0, 0.95, 0.92, 0.8, 0.7, 0.6, 0.55, 0.58, 0.62, 0.53, 0.54, 0.59, 0.4, 0.72]
42
+ z = PiZero.plateau_height(x,y)
43
+ z.should be_close(0.57, 0.05)
44
+ #require 'rsruby'
45
+ #r = RSRuby.instance
46
+ #r.plot(x,y)
47
+ #sleep(8)
48
+ end
49
+
50
+ it 'can calculate p values for SEQUEST hits' do
51
+ class FakeSequest ; attr_accessor :xcorr ; def initialize(xcorr) ; @xcorr = xcorr ; end ; end
52
+
53
+ target = []
54
+ decoy = []
55
+ cnt = 0
56
+ @xcorrs.zip(@bools) do |xcorr, bool|
57
+ if bool
58
+ target << FakeSequest.new(xcorr)
59
+ else
60
+ decoy << FakeSequest.new(xcorr)
61
+ end
62
+ end
63
+ pvalues = PiZero.p_values_for_sequest(target, decoy)
64
+ # frozen:
65
+ exp = [1.71344886775144e-07, 1.91226800512155e-07, 2.1332611415515e-07, 2.37879480495429e-07, 3.29004960353623e-07, 4.07557294032203e-07, 4.5332397295349e-07, 5.60147945165288e-07, 6.90985835582987e-07, 8.50958233458999e-07, 1.04621373866358e-06, 1.28412129273e-06, 2.35075612646546e-06, 2.59621031358335e-06, 3.16272156036349e-06, 3.84642913860656e-06, 4.67014790912829e-06, 5.66082984245324e-06, 7.53093419443452e-06, 9.09058296339405e-06, 1.20185706815653e-05, 1.44474800911154e-05, 2.27242185508328e-05, 2.967213280773e-05, 3.537451312629e-05, 5.93486219583748e-05, 7.64456599577934e-05, 0.000125433021038759, 0.000159783941297163, 0.000256431068540685, 0.000323066395099306, 0.00037608522266194, 0.000437091783629134, 0.000507167844234063, 0.000587522219112902, 0.000679502786805963, 0.00104103901250011, 0.00119624534498457, 0.00219153400681528, 0.00439503742960694, 0.00593498821589879, 0.00749365688957234, 0.0105069659581753, 0.0145259091109191, 0.0218905360424189, 0.0404530419122661]
66
+ pvalues.zip(exp) do |v,e|
67
+ v.should be_close(e, 0.000001)
68
+ end
69
+ end
70
+
71
+ it 'can calculate pi zero for target/decoy booleans' do
72
+ pi_zero = PiZero.pi_zero_from_booleans(@bools)
73
+ # frozen
74
+ pi_zero.should be_close(0.03522869, 0.0001)
75
+ end
76
+
77
+ it 'can calculate pi zero for groups of hits' do
78
+ # setup
79
+ targets = [4,3,8,3,5,3,4,5,4]
80
+ decoys = [0,2,2,3,5,7,8,8,8]
81
+ targets_summed = []
82
+ targets.each_with_index do |ar,i|
83
+ sum = 0
84
+ (0..i).each do |j|
85
+ sum += targets[j]
86
+ end
87
+ targets_summed << sum
88
+ end
89
+ decoys_summed = []
90
+ decoys.each_with_index do |ar,i|
91
+ sum = 0
92
+ (0..i).each do |j|
93
+ sum += decoys[j]
94
+ end
95
+ decoys_summed << sum
96
+ end
97
+ zipped = targets_summed.zip(decoys_summed)
98
+ pi_zero = PiZero.pi_zero_from_groups(zipped)
99
+ # frozen
100
+ pi_zero.should be_close(0.384064, 0.00001)
101
+ end
102
+
103
+ end
104
+
@@ -0,0 +1,39 @@
1
+ require File.expand_path( File.dirname(__FILE__) + '/spec_helper' )
2
+
3
+ require 'qvalue'
4
+
5
+ describe 'finding q-values' do
6
+
7
+ it 'can do num_le' do
8
+ x = VecD[1,8,10,8,9,10]
9
+ exp = VecD[1, 3, 6, 3, 4, 6]
10
+ x.num_le.should == exp
11
+
12
+ x = VecD[10,9,8,5,5,5,5,3,2]
13
+ exp = VecD[9, 8, 7, 6, 6, 6, 6, 2, 1]
14
+ x.num_le.should == exp
15
+ end
16
+
17
+ it 'can do qvalues with smooth pi0' do
18
+ pvals = VecD[0.00001, 0.0001, 0.001, 0.01, 0.03, 0.02, 0.01, 0.1, 0.2, 0.4, 0.5, 0.6, 0.77, 0.8, 0.99]
19
+ exp = [0.0000938637, 0.0004693185, 0.0031287899, 0.0187727394, 0.0402272988, 0.0312878991, 0.0187727394, 0.1173296215, 0.2085859937, 0.3754547887, 0.4266531690, 0.4693184859, 0.5363639839, 0.5363639839, 0.6195004014]
20
+ pvals.qvalues.zip(exp) do |a,b|
21
+ a.should be_close(b, 1.0e-9)
22
+ end
23
+ end
24
+
25
+ it 'can do qvalues with bootstrap pi0' do
26
+ puts "\nbootstrap pi0 needs further testing although answers seem to be close!"
27
+ pvals = VecD[0.00001, 0.0001, 0.001, 0.01, 0.03, 0.02, 0.01, 0.1, 0.2, 0.4, 0.5, 0.6, 0.77, 0.8, 0.99]
28
+ # this is what the Storey software gives for this:
29
+ # exp = [8.888889e-05, 4.444444e-04, 2.962963e-03, 1.777778e-02, 3.809524e-02, 2.962963e-02, 1.777778e-02, 1.111111e-01, 1.975309e-01, 3.555556e-01, 4.040404e-01, 4.444444e-01, 5.079365e-01, 5.079365e-01, 5.866667e-01]
30
+ exp = [9.38636971774565e-05, 0.000469318485887282, 0.00312878990591522, 0.0187727394354913, 0.0402272987903385, 0.0312878990591522, 0.0187727394354913, 0.117329621471821, 0.208585993727681, 0.375454788709826, 0.426653168988439, 0.469318485887282, 0.53636398387118, 0.53636398387118, 0.619500401371213]
31
+ robust = false
32
+ qvals = pvals.qvalues(robust, :method => :bootstrap)
33
+ qvals.zip(exp) do |a,b|
34
+ a.should be_close(b, 0.00001)
35
+ end
36
+ end
37
+
38
+ end
39
+
@@ -50,4 +50,18 @@ bias-prot: 37
50
50
  # expecting were my best judgement (erring on the min side)
51
51
  end
52
52
  end
53
+
54
+ # This is where I'd like to go finding the plateau region!
55
+ #it 'finds the minimum of the plateu region of a stringency plot' do
56
+ # @data.each do |k,v|
57
+ # exp = @expected[k]
58
+ # bkg = Validator::Background.new(v)
59
+ # ans = bkg.quartile_deriv_finder
60
+ # ans.should be_close(v[exp], 0.01)
61
+ # # expecting were my best judgement (erring on the min side)
62
+ # end
63
+ #end
64
+
65
+
66
+
53
67
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mspire
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - John Prince
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-08-06 00:00:00 -06:00
12
+ date: 2008-09-24 00:00:00 -06:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -98,8 +98,10 @@ files:
98
98
  - lib/ms/converter
99
99
  - lib/ms/converter/mzxml.rb
100
100
  - lib/ms/scan.rb
101
+ - lib/core_extensions.rb
101
102
  - lib/scan_i.rb
102
103
  - lib/fasta.rb
104
+ - lib/qvalue.rb
103
105
  - lib/roc.rb
104
106
  - lib/spec_id.rb
105
107
  - lib/xml.rb
@@ -110,6 +112,7 @@ files:
110
112
  - lib/transmem/phobius.rb
111
113
  - lib/transmem/toppred.rb
112
114
  - lib/ms.rb
115
+ - lib/pi_zero.rb
113
116
  - lib/spec_id
114
117
  - lib/spec_id/srf.rb
115
118
  - lib/spec_id/sequest.rb
@@ -162,6 +165,8 @@ files:
162
165
  - lib/validator/q_value.rb
163
166
  - lib/xml_style_parser.rb
164
167
  - lib/mspire.rb
168
+ - lib/archive
169
+ - lib/archive/targz.rb
165
170
  - lib/spec_id_xml.rb
166
171
  - lib/bsearch.rb
167
172
  - bin/gi2annot.rb
@@ -204,12 +209,12 @@ files:
204
209
  - script/simple_protein_digestion.rb
205
210
  - script/peps_per_bin.rb
206
211
  - specs/ms
207
- - specs/ms/parser
208
212
  - specs/ms/gradient_program_spec.rb
209
213
  - specs/ms/parser_spec.rb
210
214
  - specs/ms/spectrum_spec.rb
211
215
  - specs/ms/msrun_spec.rb
212
216
  - specs/merge_deep_spec.rb
217
+ - specs/qvalue_spec.rb
213
218
  - specs/spec_helper.rb
214
219
  - specs/fasta_spec.rb
215
220
  - specs/transmem
@@ -241,6 +246,7 @@ files:
241
246
  - specs/spec_id/digestor_spec.rb
242
247
  - specs/spec_id/aa_freqs_spec.rb
243
248
  - specs/rspec_autotest.rb
249
+ - specs/pi_zero_spec.rb
244
250
  - specs/xml_spec.rb
245
251
  - specs/sample_enzyme_spec.rb
246
252
  - specs/transmem_spec_shared.rb
@@ -376,6 +382,7 @@ test_files:
376
382
  - specs/ms/spectrum_spec.rb
377
383
  - specs/ms/msrun_spec.rb
378
384
  - specs/merge_deep_spec.rb
385
+ - specs/qvalue_spec.rb
379
386
  - specs/fasta_spec.rb
380
387
  - specs/transmem/phobius_spec.rb
381
388
  - specs/transmem/toppred_spec.rb
@@ -396,6 +403,7 @@ test_files:
396
403
  - specs/spec_id/sequest_spec.rb
397
404
  - specs/spec_id/digestor_spec.rb
398
405
  - specs/spec_id/aa_freqs_spec.rb
406
+ - specs/pi_zero_spec.rb
399
407
  - specs/xml_spec.rb
400
408
  - specs/sample_enzyme_spec.rb
401
409
  - specs/gi_spec.rb