mspire-mascot-dat 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.document +5 -0
- data/.rspec +1 -0
- data/LICENSE.txt +21 -0
- data/README.md +46 -0
- data/Rakefile +39 -0
- data/VERSION +1 -0
- data/lib/mspire/mascot/dat/index.rb +93 -0
- data/lib/mspire/mascot/dat/peptide.rb +95 -0
- data/lib/mspire/mascot/dat/query.rb +69 -0
- data/lib/mspire/mascot/dat.rb +59 -0
- data/mspire-mascot-dat.gemspec +70 -0
- data/spec/mspire/mascot/dat/index_spec.rb +44 -0
- data/spec/mspire/mascot/dat/peptide_spec.rb +24 -0
- data/spec/mspire/mascot/dat/query_spec.rb +33 -0
- data/spec/mspire/mascot/dat_spec.rb +94 -0
- data/spec/reference/dat_format_reference.md +667 -0
- data/spec/reference/two_spectra_decoy_F004129.png +0 -0
- data/spec/reference/two_spectra_no_decoy_F004128.png +0 -0
- data/spec/spec_helper.rb +11 -0
- data/spec/testfiles/F004128.dat +897 -0
- data/spec/testfiles/F004129.dat +1259 -0
- data/spec/testfiles/two_spectra.mgf +864 -0
- metadata +133 -0
@@ -0,0 +1,667 @@
|
|
1
|
+
# Results File
|
2
|
+
|
3
|
+
The results file contains the search results together with the search
|
4
|
+
input parameters and MS data. This means that a results file contains
|
5
|
+
everything necessary to generate a report, repeat the search at a later
|
6
|
+
date, or act as the self-contained input file to a project database or LIMS.
|
7
|
+
The contents are divided into logical sections:
|
8
|
+
|
9
|
+
1. Search parameters
|
10
|
+
2. Mass values
|
11
|
+
3. Quantitation method (if used)
|
12
|
+
4. Unimod extract
|
13
|
+
5. Enzyme definition
|
14
|
+
6. Taxonomy (if a taxonomy filter was used)
|
15
|
+
7. Misc. header information
|
16
|
+
8. Summary results (for Protein Summary)
|
17
|
+
9. Mixtures (if PMF)
|
18
|
+
10. Summary of decoy results (if automatic decoy)
|
19
|
+
11. Summary of error tolerant results (if automatic ET)
|
20
|
+
12. Mixtures in decoy results (if automatic decoy PMF)
|
21
|
+
13. Peptides (if SQ or MIS)
|
22
|
+
14. Decoy peptides (if SQ or MIS and automatic ET)
|
23
|
+
15. Error tolerant peptides (if SQ or MIS and automatic ET)
|
24
|
+
16. Proteins (if SQ or MIS)
|
25
|
+
17. Query data, one block for each query
|
26
|
+
18. Index
|
27
|
+
|
28
|
+
### General Notes
|
29
|
+
|
30
|
+
1. Values are shown in italics
|
31
|
+
2. Scripts are written so that label case doesn’t matter.
|
32
|
+
3. Labels are used to assist readability, but kept short to minimise
|
33
|
+
file size
|
34
|
+
4. Parameters are grouped logically
|
35
|
+
5. Order of blocks is not important except that the index block
|
36
|
+
must be the last block. Presence of blank lines within the index
|
37
|
+
block may cause a problem.
|
38
|
+
6. Because the MIME type is defined as an unknown application,
|
39
|
+
if this file passes through a mail agent, it will be treated as an
|
40
|
+
“octet stream” and encoded “base64” for transmission.
|
41
|
+
|
42
|
+
## Search parameters
|
43
|
+
|
44
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
45
|
+
Content-Type: application/x-Mascot; name=”parameters”
|
46
|
+
|
47
|
+
USERNAME=user name in plain text
|
48
|
+
USEREMAIL=email address in plain text
|
49
|
+
SEARCH=PMF
|
50
|
+
COM=search title text
|
51
|
+
DB=MSDB
|
52
|
+
CLE=Trypsin
|
53
|
+
MASS=Monoisotopic
|
54
|
+
MODS=Mod 1,Mod 2
|
55
|
+
.
|
56
|
+
.
|
57
|
+
.
|
58
|
+
RULES=1,2,5,6,8,9,13,14
|
59
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
60
|
+
|
61
|
+
The Parameters section contains the complete set of parameter values
|
62
|
+
from the search form apart from the contents of the uploaded data file or
|
63
|
+
the query window. Labels must be unique, independent of case. Where a
|
64
|
+
parameter can be multivalued (e.g. mods) the values are listed on one
|
65
|
+
line separated by commas.
|
66
|
+
RULES contains a list of the rule numbers that define the instrument
|
67
|
+
type in the configuration file fragmentation_rules. The rule numbers
|
68
|
+
are listed explicitly because the contents of the configuration file may
|
69
|
+
have changed since the search was run.
|
70
|
+
|
71
|
+
## Masses
|
72
|
+
|
73
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
74
|
+
Content-Type: application/x-Mascot; name=”masses”
|
75
|
+
|
76
|
+
A=71.037110
|
77
|
+
B=114.534940
|
78
|
+
C=160.030649
|
79
|
+
D=115.026940
|
80
|
+
E=129.042590
|
81
|
+
F=147.068410
|
82
|
+
G=57.021460
|
83
|
+
H=137.058910
|
84
|
+
I=113.084060
|
85
|
+
J=0.000000
|
86
|
+
K=128.094960
|
87
|
+
L=113.084060
|
88
|
+
M=131.040480
|
89
|
+
N=114.042930
|
90
|
+
O=0.000000
|
91
|
+
P=97.052760
|
92
|
+
Q=128.058580
|
93
|
+
R=156.101110
|
94
|
+
S=87.032030
|
95
|
+
T=101.047680
|
96
|
+
U=150.953630
|
97
|
+
V=99.068410
|
98
|
+
W=186.079310
|
99
|
+
X=111.000000
|
100
|
+
Y=163.063330
|
101
|
+
Z=128.550590
|
102
|
+
Hydrogen=1.007825
|
103
|
+
Carbon=12.000000
|
104
|
+
Nitrogen=14.003074
|
105
|
+
Oxygen=15.994915
|
106
|
+
Electron=0.000549
|
107
|
+
C_term=17.002740
|
108
|
+
N_term=1.007825
|
109
|
+
delta1=15.994919,Oxidation (M)
|
110
|
+
NeutralLoss1=0.000000
|
111
|
+
FixedMod1=57.021469, Carbamidomethyl (C)
|
112
|
+
FixedModResidues1=C
|
113
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
114
|
+
|
115
|
+
This block contains “actual” mass values. That is, average or monisotopic
|
116
|
+
residue masses, including any fixed modifications; C and N terminus
|
117
|
+
groups also include any fixed modifications.
|
118
|
+
|
119
|
+
FixedMod1, FixedMod2, etc., records the delta mass and name for each
|
120
|
+
fixed modification as comma separated values. FixedModResidues1 gives
|
121
|
+
the site specificity. If multiple residues are affected, they are listed as a
|
122
|
+
string, e.g. STY. If there was a neutral loss, the delta mass is given by
|
123
|
+
the value of FixedModNeutralLoss1.
|
124
|
+
|
125
|
+
FixedModn=delta, Name
|
126
|
+
FixedModResiduesn=[A-Z]|C_term|N_term
|
127
|
+
FixedModNeutralLossn=mass
|
128
|
+
|
129
|
+
Fixed modifications cannot have peptide neutral losses, multiple neutral
|
130
|
+
losses and cannot be protein-terminal or residue-terminal. In all these
|
131
|
+
cases, fixed modifications are automatically converted into variable ones.
|
132
|
+
|
133
|
+
Variable modifications are reported in delta1, delta2, etc. Each entry
|
134
|
+
defines the difference in mass introduced by the modification together
|
135
|
+
with the name of the modification, separated by a comma. If a variable
|
136
|
+
modification suffers a neutral loss on fragmentation, the delta is speci-
|
137
|
+
fied by a NeutralLossn entry. By definition, this is always a master
|
138
|
+
neutral loss. If there are multiple neutral losses, then two more lines
|
139
|
+
appear:
|
140
|
+
|
141
|
+
NeutralLossn_master=mass[[,mass] ...]
|
142
|
+
NeutralLossn_slave=mass[[,mass] ...]
|
143
|
+
|
144
|
+
The first neutral loss (defined by NeutralLossn) has an implicit index
|
145
|
+
number of 1. Any additional neutral losses (defined by
|
146
|
+
NeutralLossn_master or followed by NeutralLossn_slave) have implicit
|
147
|
+
index numbers of 2 and up.
|
148
|
+
|
149
|
+
If a modification includes a required or optional neutral loss from the
|
150
|
+
precursor, this is recorded as follows:
|
151
|
+
|
152
|
+
ReqPepNeutralLossn=mass[[,mass] ...]
|
153
|
+
PepNeutralLossn=mass[[,mass] ...]
|
154
|
+
|
155
|
+
Error-tolerant modifications are not listed in masses section.
|
156
|
+
|
157
|
+
## Quantitation
|
158
|
+
|
159
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
160
|
+
Content-Type: application/x-Mascot; name=”quantitation”
|
161
|
+
|
162
|
+
<?xml version=”1.0" encoding=”UTF-8" standalone=”no” ?>
|
163
|
+
<quantitation majorVersion=”1" minorVersion=”0" xmlns=”http://
|
164
|
+
www.matrixscience.com/xmlns/schema/quantitation_1" xmlns:xsi=
|
165
|
+
“http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http:/
|
166
|
+
/www.matrixscience.com/xmlns/schema/quantitation_1 qu
|
167
|
+
antitation_1.xsd”>
|
168
|
+
<method constrain_search=”false” description=”15N metabolic label-
|
169
|
+
ling” min_num_peptides=”2" name=”15N Metabolic [MD]” pro
|
170
|
+
t_score_type=”mudpit” protein_ratio_type=”weighted”
|
171
|
+
report_detail=”true” require_bold_red=”true” show_sub_sets=”0.5"
|
172
|
+
sig_th
|
173
|
+
reshold_value=”0.05">
|
174
|
+
<component name=”light”>
|
175
|
+
<isotope/>
|
176
|
+
</component>
|
177
|
+
<component name=”heavy”>
|
178
|
+
<isotope>
|
179
|
+
<old>N</old>
|
180
|
+
<new>15N</new>
|
181
|
+
</isotope>
|
182
|
+
|
183
|
+
This section is an extract from quantitation.xml containing the
|
184
|
+
quantitation method specified for the search. For more details and a link
|
185
|
+
to the schema, refer to the Mascot HTML help pages for quantitation.
|
186
|
+
|
187
|
+
## Unimod
|
188
|
+
|
189
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
190
|
+
Content-Type: application/x-Mascot; name=”unimod”
|
191
|
+
|
192
|
+
<?xml version=”1.0" encoding=”UTF-8" standalone=”no” ?>
|
193
|
+
<umod:unimod xmlns:umod=”http://www.unimod.org/xmlns/schema/unimod_2"
|
194
|
+
majorVersion=”2" minorVersion=”0" xmlns:xsi=”http://w
|
195
|
+
ww.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://
|
196
|
+
www.unimod.org/xmlns/schema/unimod_2 unimod_2.xsd”>
|
197
|
+
<umod:elements>
|
198
|
+
<umod:elem avge_mass=”1.00794" full_name=”Hydrogen”
|
199
|
+
mono_mass=”1.007825035" title=”H”/>
|
200
|
+
<umod:elem avge_mass=”2.014101779" full_name=”Deuterium”
|
201
|
+
mono_mass=”2.014101779" title=”2H”/>
|
202
|
+
<umod:elem avge_mass=”6.941" full_name=”Lithium”
|
203
|
+
mono_mass=”7.016003" title=”Li”/>
|
204
|
+
<umod:elem avge_mass=”12.0107" full_name=”Carbon” mono_mass=”12"
|
205
|
+
title=”C”/>
|
206
|
+
|
207
|
+
This section is an extract from unimod.xml containing data for the
|
208
|
+
elements, amino_acids, and any modifications specified in the search
|
209
|
+
form. For more details and a link to the schema, refer to the help pages
|
210
|
+
at www.unimod.org
|
211
|
+
|
212
|
+
## Enzyme
|
213
|
+
|
214
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
215
|
+
Content-Type: application/x-Mascot; name=”enzyme”
|
216
|
+
|
217
|
+
Title:Trypsin
|
218
|
+
Cleavage:KR
|
219
|
+
Restrict:P
|
220
|
+
Cterm
|
221
|
+
*
|
222
|
+
|
223
|
+
This section is simply an extract from the enzyme file. Syntax details can
|
224
|
+
be found in Chapter 6
|
225
|
+
|
226
|
+
## Taxonomy
|
227
|
+
|
228
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
229
|
+
Content-Type: application/x-Mascot; name=”taxonomy”
|
230
|
+
|
231
|
+
Title:. . . . . . . . . . . . . . . . Homo sapiens (human)
|
232
|
+
Include: 9606
|
233
|
+
Exclude:
|
234
|
+
*
|
235
|
+
|
236
|
+
This section is simply an extract from the taxonomy file. Syntax details
|
237
|
+
can be found in Chapter 9
|
238
|
+
|
239
|
+
## Header
|
240
|
+
|
241
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
242
|
+
Content-Type: application/x-Mascot; name=”header”
|
243
|
+
|
244
|
+
sequences=number of sequences in DB
|
245
|
+
sequences_after_tax=number of sequences after taxonomy filter
|
246
|
+
residues=number of residues in DB
|
247
|
+
distribution=see below
|
248
|
+
exec_time=search time in seconds
|
249
|
+
date=timestamp (seconds since Jan 1st 1970)
|
250
|
+
time=time in hh:mm:ss
|
251
|
+
queries=number of queries, (>= 1)
|
252
|
+
max_hits=maximum number of hits to be listed
|
253
|
+
version=version information
|
254
|
+
fastafile=full path to database fasta file
|
255
|
+
release=filename of actual database used - e.g. Owl_31.fasta
|
256
|
+
taskid=unique task identifier for searches submitted asynchronously
|
257
|
+
pmf_num_queries_used=number of mass values selected for PMF match
|
258
|
+
pmf_queries_used=comma separated list of selected query numbers
|
259
|
+
Warn0=
|
260
|
+
Warn1=
|
261
|
+
Warn2=
|
262
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
263
|
+
|
264
|
+
The Header section contains general values, used in the master results
|
265
|
+
page header paragraph.
|
266
|
+
Distribution is a comma separated list of values that represent a
|
267
|
+
histogram of the complete protein score distribution. The first value is
|
268
|
+
the number of entries with score 0, the second is the number of entries
|
269
|
+
with score 1, and so on, up to the maximum score for the search. Scores
|
270
|
+
are converted to integers by truncation. This distribution is only mean-
|
271
|
+
ingful for a peptide mass fingerprint search.
|
272
|
+
If intensity values are supplied for a peptide mass fingerprint, Mascot
|
273
|
+
iterates the experimental peaks to find the set that gives the best score.
|
274
|
+
The number of values selected is reported in pmf_num_queries_used
|
275
|
+
and the selected queries listed in pmf_queries_used.
|
276
|
+
|
277
|
+
## Summary results
|
278
|
+
|
279
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
280
|
+
Content-Type: application/x-Mascot; name=”summary”
|
281
|
+
|
282
|
+
qmass1=Mr
|
283
|
+
qexp1=m/z for query 1,
|
284
|
+
charge
|
285
|
+
qintensity1=intensity value for query1 (if available)
|
286
|
+
qmatch1=Total number of peptide mass matches for query1 in database
|
287
|
+
qplughole1=Threshold score for homologous peptide match (MIS only)
|
288
|
+
qmass2=...
|
289
|
+
qexp2=...
|
290
|
+
qintensity1=
|
291
|
+
qmatch2=...
|
292
|
+
qplughole2=...
|
293
|
+
.
|
294
|
+
.
|
295
|
+
.
|
296
|
+
qmassn=...
|
297
|
+
qexpn=...
|
298
|
+
qintensityn=
|
299
|
+
qmatchn=...
|
300
|
+
qplugholen=...
|
301
|
+
num_hits=number of hits in the summary block (<= max_hits)
|
302
|
+
h1=accession string,
|
303
|
+
total protein score,
|
304
|
+
obsolete,
|
305
|
+
intact protein mass
|
306
|
+
h1_text=title text
|
307
|
+
h1_frame=frame_number (between 1 and 6, for nucleic acid only)
|
308
|
+
h1_q1=missed cleavages, (–1 indicates no match)
|
309
|
+
peptide Mr,
|
310
|
+
delta,
|
311
|
+
start,
|
312
|
+
end,
|
313
|
+
number of ions matched,
|
314
|
+
peptide string,
|
315
|
+
peaks used from Ions1,
|
316
|
+
variable modifications string,
|
317
|
+
ions score,
|
318
|
+
multiplicity,
|
319
|
+
ion series found,
|
320
|
+
peaks used from Ions2,
|
321
|
+
peaks used from Ions3,
|
322
|
+
total area of matched peaks
|
323
|
+
h1_q1_et_mods=modification mass,
|
324
|
+
neutral loss mass,
|
325
|
+
modification description
|
326
|
+
h1_q1_et_mods_master=neutral loss mass[[,neutral loss mass] ... ]
|
327
|
+
h1_q1_et_mods_slave=neutral loss mass[[,neutral loss mass] ... ]
|
328
|
+
h1_q1_primary_nl=neutral loss string
|
329
|
+
h1_q1_na_diff=original NA sequence,
|
330
|
+
modified NA sequence
|
331
|
+
h1_q1_tag=tagNum:startPos:endPos:seriesID,...
|
332
|
+
h1_q1_drange=startPos:endPos
|
333
|
+
h1_q1_terms=residue,residue
|
334
|
+
h1_q1_subst=pos1,ambig1,matched1 ... ,posn,ambign,matchedn
|
335
|
+
h1_q2=...
|
336
|
+
.
|
337
|
+
.
|
338
|
+
.
|
339
|
+
h1_qm=...
|
340
|
+
h2=...
|
341
|
+
.
|
342
|
+
.
|
343
|
+
.
|
344
|
+
hn_qm=...
|
345
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
346
|
+
|
347
|
+
Where a parameter has multiple values, these are shown on separate
|
348
|
+
lines for clarity. In the actual result file, all values for a parameter are
|
349
|
+
on a single line and there are no spaces or tabs between values.
|
350
|
+
Variable modifications is a string of digits, one digit for the N terminus,
|
351
|
+
one for each residue and one for the C terminus. Each digit specifies the
|
352
|
+
modification used to obtain the match: 0 indicates no modification, 1
|
353
|
+
indicates delta1, 2 indicates delta2 etc., in the masses section. If the
|
354
|
+
number of modifications exceeds 9, the letters A to W are used to repre-
|
355
|
+
sent modifications 10 to 32. X is used to indicate a modification found in
|
356
|
+
error tolerant mode.
|
357
|
+
neutral loss string is the same concept as the variable mod string,
|
358
|
+
except each character represents the index of the primary neutral loss
|
359
|
+
(one of the master NL). Any position that is not modified, or where the
|
360
|
+
mod has no neutral loss, is set to 0. hn_qm_primary_nl will only be
|
361
|
+
output if the string contains at least one non-zero character.
|
362
|
+
If a new modification is found in an error tolerant search, its position is
|
363
|
+
marked by X, and details are recorded in an additional entry,
|
364
|
+
hn_qm_et_mods. If the error tolerant search is of a nucleic acid data-
|
365
|
+
base, and the modification is a single base change in the primary se-
|
366
|
+
quence, the two mass fields will be set to zero, and one of the keywords
|
367
|
+
NA_INSERTION, NA_DELETION, or NA_SUBSTITUTION will appear in the
|
368
|
+
description field. The additional parameter hn_qm_na_diff is then used
|
369
|
+
to record the ‘before’ and ‘after’ nucleic acid sequences.
|
370
|
+
*Ion series* is a string of 19 digits representing the ion series:
|
371
|
+
|
372
|
+
a
|
373
|
+
place holder
|
374
|
+
a++
|
375
|
+
b
|
376
|
+
place holder
|
377
|
+
b++
|
378
|
+
y
|
379
|
+
place holder
|
380
|
+
y++
|
381
|
+
c
|
382
|
+
c++
|
383
|
+
x
|
384
|
+
x++
|
385
|
+
z
|
386
|
+
z++
|
387
|
+
z+H
|
388
|
+
z+H++
|
389
|
+
z+2H
|
390
|
+
z+2H++
|
391
|
+
|
392
|
+
A digit is set to 1 if the corresponding series contains more than just
|
393
|
+
random matches and 2 if the series contributes to the score.
|
394
|
+
Multiplicity means number of peptide mass matches for a query in a
|
395
|
+
protein
|
396
|
+
For each sequence tag, four colon separated values are output: 1-based
|
397
|
+
tag number, 1-based residue position where tag starts, 1-based residue
|
398
|
+
position where tag ends, ion series into which the tag was matched:
|
399
|
+
|
400
|
+
-1 means no matches for the tag
|
401
|
+
0 “a” series (single charge)
|
402
|
+
1 “a-NH3” series (single charge)
|
403
|
+
2 “a” series (double charge)
|
404
|
+
3 “b” series (single charge)
|
405
|
+
4 “b-NH3” series (single charge)
|
406
|
+
5 “b” series (double charge)
|
407
|
+
6 “y” series (single charge)
|
408
|
+
7 “y-NH3” series (single charge)
|
409
|
+
8 “y” series (double charge)
|
410
|
+
9 “c” series (single charge)
|
411
|
+
10 “c” series (double charge)
|
412
|
+
11 “x” series (single charge)
|
413
|
+
12 “x” series (double charge)
|
414
|
+
13 “z” series (single charge)
|
415
|
+
14 “z” series (double charge)
|
416
|
+
15 “a-H2O” series (single charge)
|
417
|
+
16 “a-H2O” series (double charge)
|
418
|
+
17 “b-H2O” series (single charge)
|
419
|
+
18 “b-H2O” series (double charge)
|
420
|
+
19 “y-H2O” series (single charge)
|
421
|
+
20 “y-H2O” series (double charge)
|
422
|
+
21 “a-NH3” series (double charge)
|
423
|
+
22 “b-NH3” series (double charge)
|
424
|
+
23 “y-NH3” series (double charge)
|
425
|
+
25 “internal yb” series (single charge)
|
426
|
+
26 “internal ya” series (single charge)
|
427
|
+
27 “z+H” series (single charge)
|
428
|
+
28 “z+H” series (double charge)
|
429
|
+
29 high-energy “d” and “d’” series (single charge)
|
430
|
+
31 high-energy “v” series (single charge)
|
431
|
+
32 high-energy “w” and “w’” series (single charge)
|
432
|
+
33 “z+2H” series (single charge)
|
433
|
+
34 “z+2H” series (double charge)
|
434
|
+
|
435
|
+
If there are multiple tags for a query, comma separated groups of these
|
436
|
+
numbers are output for each tag.
|
437
|
+
hn_qm_drange is output for a query that includes an error tolerant
|
438
|
+
sequence tag. It defines the range of positions within which an unsus-
|
439
|
+
pected modification has been located. For a peptide of 10 residues,
|
440
|
+
position 0 would indicate the amino terminus and position 11 would
|
441
|
+
indicate the carboxy terminus. If there is no location information, the
|
442
|
+
range is output as 0,256
|
443
|
+
|
444
|
+
hn_qm_terms shows the residues the bracket the peptide in the protein.
|
445
|
+
If the peptide forms the terminus of the protein, then a hyphen is used
|
446
|
+
instead.
|
447
|
+
|
448
|
+
hn_qm_subst is output when the matched peptide contained an ambigu-
|
449
|
+
ous residue, (B, X, or Z). The argument is one or more triplets of comma
|
450
|
+
separated values. For each triplet, the first value is the residue position,
|
451
|
+
the second is the ambiguous residue, and the third is the residue that
|
452
|
+
has been substituted to obtain the reported match.
|
453
|
+
|
454
|
+
For a large MS/MS search, num_hits is set to zero, and the summary
|
455
|
+
block only contains entries for qmassn, qexpn, qmatchn,
|
456
|
+
qplugholen. The threshold for switching to this mode is specified using
|
457
|
+
two parameters in the Options section of mascot.dat. SplitDataFileSize
|
458
|
+
is the size of the search process in bytes, (default 10000000), and
|
459
|
+
SplitNumberOfQueries is the size of the search in queries, (default
|
460
|
+
1000).
|
461
|
+
|
462
|
+
If this is a two-pass search, either an automatic decoy database search or
|
463
|
+
an automatic error tolerant search, a second summary block appears,
|
464
|
+
containing the second set of results. The section name is either
|
465
|
+
et_summary or decoy_summary. The syntax of the contents is identical
|
466
|
+
|
467
|
+
## Mixture
|
468
|
+
|
469
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
470
|
+
Content-Type: application/x-Mascot; name=”mixture”
|
471
|
+
|
472
|
+
num_hits=number of mixtures found
|
473
|
+
h1_score=total score for mixture 1
|
474
|
+
h1_numprot=number of proteins in mixture 1
|
475
|
+
h1_nummatch=number of queries matched
|
476
|
+
h1_m1=accession string for protein component 1
|
477
|
+
h1_m2=accession string for protein component 2
|
478
|
+
.
|
479
|
+
.
|
480
|
+
.
|
481
|
+
h1_mm=accession string for protein component m
|
482
|
+
h2_score=
|
483
|
+
.
|
484
|
+
.
|
485
|
+
.
|
486
|
+
hn_mm=
|
487
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
488
|
+
|
489
|
+
The Mixture section is only output for a peptide mass fingerprint. If any
|
490
|
+
statistically significant protein mixtures are found, the mixture compo-
|
491
|
+
nents are summarised. For details of individual components, use the
|
492
|
+
accession strings to refer back to the Summary section.
|
493
|
+
|
494
|
+
If this is an automatic decoy database search, a second mixture block
|
495
|
+
appears, containing the second set of results. The section name is
|
496
|
+
decoy_mixture. The syntax of the contents is identical
|
497
|
+
|
498
|
+
## Peptides
|
499
|
+
|
500
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
501
|
+
Content-Type: application/x-Mascot; name=”peptides”
|
502
|
+
|
503
|
+
q1_p1=missed cleavages, (–1 indicates no match)
|
504
|
+
peptide Mr,
|
505
|
+
delta,
|
506
|
+
number of ions matched,
|
507
|
+
peptide string,
|
508
|
+
peaks used from Ions1,
|
509
|
+
variable modifications string,
|
510
|
+
ions score,
|
511
|
+
ion series found,
|
512
|
+
peaks used from Ions2,
|
513
|
+
peaks used from Ions3;
|
514
|
+
“accession string”: data for first protein
|
515
|
+
frame number:
|
516
|
+
start:
|
517
|
+
end:
|
518
|
+
multiplicity,
|
519
|
+
“accession string”: data for second protein
|
520
|
+
frame number:
|
521
|
+
start:
|
522
|
+
end:
|
523
|
+
multiplicity,
|
524
|
+
etc.
|
525
|
+
q1_p1_et_mods=modification mass,
|
526
|
+
neutral loss mass,
|
527
|
+
modification description
|
528
|
+
q1_p1_et_mods_master=neutral loss mass[[,neutral loss mass] ... ]
|
529
|
+
q1_p1_et_mods_slave=neutral loss mass[[,neutral loss mass] ... ]
|
530
|
+
q1_p1_primary_nl=neutral loss string
|
531
|
+
q1_p1_na_diff=original NA sequence,
|
532
|
+
modified NA sequence
|
533
|
+
q1_p1_tag=tagNum:startPos:endPos:seriesID,...
|
534
|
+
q1_p1_drange=startPos:endPos
|
535
|
+
q1_p1_terms=residue,residue[[:residue,residue] ... ]
|
536
|
+
q1_p1_subst=pos1,ambig1,matched1 ... ,posn,ambign,matchedn
|
537
|
+
q1_p1_comp=quantitation component name
|
538
|
+
q1_p2=...
|
539
|
+
.
|
540
|
+
.
|
541
|
+
.
|
542
|
+
qn_pm=...
|
543
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
544
|
+
|
545
|
+
Each line contains the data for a peptide match followed by data for at
|
546
|
+
least one protein in which the peptide was found.
|
547
|
+
|
548
|
+
If there multiple entries in the database containing the matched peptide,
|
549
|
+
there will be a corresponding number of pairs of bracketing residues
|
550
|
+
listed in qn_pm_terms.
|
551
|
+
|
552
|
+
Otherwise, individual field descriptions are identical to those for the
|
553
|
+
Summary section
|
554
|
+
|
555
|
+
If this is a two-pass search, either an automatic decoy database search or
|
556
|
+
an automatic error tolerant search, a second peptides block appears,
|
557
|
+
containing the second set of results. The section name is either
|
558
|
+
et_peptides or decoy_peptides. The syntax of the contents is identical
|
559
|
+
|
560
|
+
## Proteins
|
561
|
+
|
562
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
563
|
+
Content-Type: application/x-Mascot; name=”proteins”
|
564
|
+
|
565
|
+
“accession string”=protein mass,
|
566
|
+
“title text”
|
567
|
+
.
|
568
|
+
.
|
569
|
+
.
|
570
|
+
“accession string”=protein mass,
|
571
|
+
“title text”
|
572
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
573
|
+
|
574
|
+
This block contains reference data for the proteins listed in the peptides
|
575
|
+
block.
|
576
|
+
|
577
|
+
## Input data for query n
|
578
|
+
|
579
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
580
|
+
Content-Type: application/x-Mascot; name=”queryn”
|
581
|
+
|
582
|
+
title=query title
|
583
|
+
index=query index
|
584
|
+
seq1=sequence qualifier (e.g. N-ABCDEF)
|
585
|
+
seq2=...
|
586
|
+
.
|
587
|
+
.
|
588
|
+
.
|
589
|
+
seqn=
|
590
|
+
comp1=composition qualifier (e.g. 0[P]2[W])
|
591
|
+
comp2=...
|
592
|
+
.
|
593
|
+
.
|
594
|
+
.
|
595
|
+
compn=...
|
596
|
+
PepTol=peptide tolerance qualifier (e.g. 2.000000,Da)
|
597
|
+
IT_MODS=Mod 1[,Mod 2[,...]]
|
598
|
+
INSTRUMENT=instrument identifier, (e.g. ESI-TRAP)
|
599
|
+
RULES=1,2,5,6,8,9,13,14
|
600
|
+
INTERNALS=min mass,max mass
|
601
|
+
CHARGE=charge state (e.g. 2+)
|
602
|
+
RTINSECONDS=a[[-b][,c[-d]]]
|
603
|
+
SCANS=a[[-b][,c[-d]]]
|
604
|
+
tag1=sequence tag (e.g. t,889.4,[QK]S,1104.54)
|
605
|
+
.
|
606
|
+
.
|
607
|
+
.
|
608
|
+
tagn=...
|
609
|
+
mass_min=lowest mass
|
610
|
+
mass_max=highest mass
|
611
|
+
int_min=lowest intensity
|
612
|
+
int_max=highest intensity
|
613
|
+
num_vals=number of mass values
|
614
|
+
num_used1=-1 (obsolete)
|
615
|
+
ions1=1344.65:34.3,1365.41:13.2
|
616
|
+
ions2=y-1344.65:34.3,1365.41:13.2
|
617
|
+
ions3=b-1344.65:34.3,1365.41:13.2
|
618
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
619
|
+
|
620
|
+
Value “queryn” runs from “query1” (no leading zeros). ionsn values are
|
621
|
+
sorted so that the matched values come first.
|
622
|
+
|
623
|
+
Most searches will only require a few of these fields. For example, a
|
624
|
+
peptide mass fingerprint would only include the charge field.
|
625
|
+
|
626
|
+
The index is a 0 based record of the original query order before sorting by
|
627
|
+
Mr
|
628
|
+
|
629
|
+
ions2 and ions3 are only required when fragment ions are specified in a
|
630
|
+
sequence query as being N-terminal or C-terminal series.
|
631
|
+
The first field in a tagn value is t for a standard sequence tag and e for
|
632
|
+
an error tolerant sequence tag
|
633
|
+
|
634
|
+
Some search parameters can be define in the local scope of a query.
|
635
|
+
These are CHARGE, COMP, INSTRUMENT, IT_MODS, TOL, TOLU.
|
636
|
+
Any that are used are listed here. If the MGF file contained scan range
|
637
|
+
information in terms of seconds or scans, this is written to
|
638
|
+
RTINSECONDS and/or SCANS.
|
639
|
+
|
640
|
+
## Index
|
641
|
+
|
642
|
+
--gc0p4Jq0M2Yt08jU534c0p
|
643
|
+
Content-Type: application/x-Mascot; name=”index”
|
644
|
+
|
645
|
+
parameters=4
|
646
|
+
masses=78
|
647
|
+
unimod=116
|
648
|
+
enzyme=322
|
649
|
+
taxonomy=329
|
650
|
+
header=336
|
651
|
+
summary=351
|
652
|
+
et_summary=6059
|
653
|
+
peptides=6473
|
654
|
+
et_peptides=7143
|
655
|
+
proteins=7292
|
656
|
+
query1=7362
|
657
|
+
query2=7374.
|
658
|
+
.
|
659
|
+
.
|
660
|
+
.
|
661
|
+
query81=8322
|
662
|
+
query82=8334
|
663
|
+
--gc0p4Jq0M2Yt08jU534c0p--
|
664
|
+
|
665
|
+
Values in index are the line number offsets of the section “Content-
|
666
|
+
Type:” lines (starting from 0 for the first line of the file).
|
667
|
+
|
Binary file
|
Binary file
|
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
require 'rspec'
|
2
|
+
|
3
|
+
# Requires supporting files with custom matchers and macros, etc,
|
4
|
+
# in ./support/ and its subdirectories.
|
5
|
+
#Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
|
6
|
+
|
7
|
+
RSpec.configure do |config|
|
8
|
+
config.treat_symbols_as_metadata_keys_with_true_values = true
|
9
|
+
end
|
10
|
+
|
11
|
+
TESTFILES = __dir__ + "/testfiles"
|