python-katlas 0.1.4__tar.gz → 0.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {python_katlas-0.1.4 → python_katlas-0.2.1}/LICENSE +0 -0
- {python_katlas-0.1.4 → python_katlas-0.2.1}/MANIFEST.in +0 -0
- python_katlas-0.2.1/PKG-INFO +492 -0
- {python_katlas-0.1.4 → python_katlas-0.2.1}/README.md +206 -124
- python_katlas-0.2.1/katlas/__init__.py +1 -0
- python_katlas-0.2.1/katlas/_modidx.py +141 -0
- python_katlas-0.2.1/katlas/common.py +10 -0
- python_katlas-0.2.1/katlas/compare.py +118 -0
- python_katlas-0.2.1/katlas/data.py +544 -0
- python_katlas-0.2.1/katlas/hierarchical.py +20 -0
- python_katlas-0.2.1/katlas/lo.py +69 -0
- python_katlas-0.2.1/katlas/pathway.py +170 -0
- python_katlas-0.2.1/katlas/plot.py +356 -0
- python_katlas-0.2.1/katlas/pspa.py +138 -0
- python_katlas-0.2.1/katlas/pssm.py +375 -0
- python_katlas-0.2.1/katlas/scoring.py +325 -0
- python_katlas-0.2.1/katlas/utils.py +236 -0
- python_katlas-0.2.1/pyproject.toml +59 -0
- python_katlas-0.2.1/python_katlas.egg-info/PKG-INFO +492 -0
- {python_katlas-0.1.4 → python_katlas-0.2.1}/python_katlas.egg-info/SOURCES.txt +11 -8
- {python_katlas-0.1.4 → python_katlas-0.2.1}/python_katlas.egg-info/dependency_links.txt +0 -0
- {python_katlas-0.1.4 → python_katlas-0.2.1}/python_katlas.egg-info/entry_points.txt +0 -0
- python_katlas-0.2.1/python_katlas.egg-info/requires.txt +15 -0
- {python_katlas-0.1.4 → python_katlas-0.2.1}/python_katlas.egg-info/top_level.txt +0 -0
- python_katlas-0.1.4/PKG-INFO +0 -416
- python_katlas-0.1.4/katlas/__init__.py +0 -1
- python_katlas-0.1.4/katlas/_modidx.py +0 -109
- python_katlas-0.1.4/katlas/core.py +0 -816
- python_katlas-0.1.4/katlas/dl.py +0 -357
- python_katlas-0.1.4/katlas/feature.py +0 -295
- python_katlas-0.1.4/katlas/imports.py +0 -7
- python_katlas-0.1.4/katlas/plot.py +0 -670
- python_katlas-0.1.4/katlas/train.py +0 -233
- python_katlas-0.1.4/python_katlas.egg-info/PKG-INFO +0 -416
- python_katlas-0.1.4/python_katlas.egg-info/not-zip-safe +0 -1
- python_katlas-0.1.4/python_katlas.egg-info/requires.txt +0 -19
- python_katlas-0.1.4/settings.ini +0 -44
- python_katlas-0.1.4/setup.py +0 -57
- {python_katlas-0.1.4 → python_katlas-0.2.1}/setup.cfg +0 -0
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1,492 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: python-katlas
|
|
3
|
+
Version: 0.2.1
|
|
4
|
+
Summary: tools for predicting kinome specificities
|
|
5
|
+
Author-email: lily <lcai888666@gmail.com>
|
|
6
|
+
License: Apache-2.0
|
|
7
|
+
Project-URL: Repository, https://github.com/sky1ove/katlas
|
|
8
|
+
Project-URL: Documentation, https://sky1ove.github.io/katlas
|
|
9
|
+
Keywords: nbdev,jupyter,notebook,python
|
|
10
|
+
Classifier: Natural Language :: English
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
15
|
+
Requires-Python: >=3.10
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
License-File: LICENSE
|
|
18
|
+
Requires-Dist: pandas
|
|
19
|
+
Requires-Dist: gdown
|
|
20
|
+
Requires-Dist: pyarrow
|
|
21
|
+
Requires-Dist: tqdm
|
|
22
|
+
Requires-Dist: logomaker-kinase
|
|
23
|
+
Requires-Dist: seaborn
|
|
24
|
+
Requires-Dist: reactome2py
|
|
25
|
+
Requires-Dist: scikit-learn
|
|
26
|
+
Requires-Dist: biopython
|
|
27
|
+
Requires-Dist: python-kplot>=0.0.2
|
|
28
|
+
Requires-Dist: filelock>=3.25.2
|
|
29
|
+
Provides-Extra: dev
|
|
30
|
+
Requires-Dist: nbdev; extra == "dev"
|
|
31
|
+
Requires-Dist: jupyterlab>=3.6.8; extra == "dev"
|
|
32
|
+
Dynamic: license-file
|
|
33
|
+
|
|
34
|
+
# KATLAS
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
|
|
38
|
+
|
|
39
|
+
<img alt="Katlas logo" width="600" caption="Katlas logo" src="https://github.com/sky1ove/katlas/raw/main/logo.png" id="logo"/>
|
|
40
|
+
|
|
41
|
+
KATLAS is a repository containing python tools to predict kinases given
|
|
42
|
+
a substrate sequence. It also contains datasets of kinase substrate
|
|
43
|
+
specificities and human phosphoproteomics.
|
|
44
|
+
|
|
45
|
+
***References***: Please cite the appropriate papers if KATLAS is
|
|
46
|
+
helpful to your research.
|
|
47
|
+
|
|
48
|
+
- KATLAS was described in the paper \[Computational Decoding of Human
|
|
49
|
+
Kinome Substrate Specificities and Functions\]
|
|
50
|
+
|
|
51
|
+
- The positional scanning peptide array (PSPA) data is from paper [An
|
|
52
|
+
atlas of substrate specificities for the human serine/threonine
|
|
53
|
+
kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper
|
|
54
|
+
[The intrinsic substrate specificity of the human tyrosine
|
|
55
|
+
kinome](https://www.nature.com/articles/s41586-024-07407-y)
|
|
56
|
+
|
|
57
|
+
- The kinase substrate datasets used for generating PSSMs are derived
|
|
58
|
+
from
|
|
59
|
+
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)
|
|
60
|
+
and paper [Large-scale Discovery of Substrates of the Human
|
|
61
|
+
Kinome](https://www.nature.com/articles/s41598-019-46385-4)
|
|
62
|
+
|
|
63
|
+
- Phosphorylation sites are acquired from
|
|
64
|
+
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),
|
|
65
|
+
paper [The functional landscape of the human
|
|
66
|
+
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
|
|
67
|
+
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
|
|
68
|
+
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
|
|
69
|
+
|
|
70
|
+
## Web applications
|
|
71
|
+
|
|
72
|
+
Users can now run the analysis directly on the web without needing to
|
|
73
|
+
code.
|
|
74
|
+
|
|
75
|
+
Check out our latest web platform:
|
|
76
|
+
[kinase-atlas.com](https://kinase-atlas.com/)
|
|
77
|
+
|
|
78
|
+
## Install
|
|
79
|
+
|
|
80
|
+
``` bash
|
|
81
|
+
pip install python-katlas
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Import
|
|
85
|
+
|
|
86
|
+
``` python
|
|
87
|
+
from katlas.common import *
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
# Quick start
|
|
91
|
+
|
|
92
|
+
We provide two methods to calculate substrate sequence:
|
|
93
|
+
|
|
94
|
+
- Computational Data-Driven Method (CDDM)
|
|
95
|
+
- Positional Scanning Peptide Array (PSPA)
|
|
96
|
+
|
|
97
|
+
We consider the input in two formats:
|
|
98
|
+
|
|
99
|
+
- a single input string (phosphorylation site)
|
|
100
|
+
- a csv/dataframe that contains a column of phosphorylation sites
|
|
101
|
+
|
|
102
|
+
For input sequences, we also consider it in two conditions:
|
|
103
|
+
|
|
104
|
+
- all capital
|
|
105
|
+
- contains lower cases indicating phosphorylation status
|
|
106
|
+
|
|
107
|
+
## Quick start
|
|
108
|
+
|
|
109
|
+
### Site scoring
|
|
110
|
+
|
|
111
|
+
CDDM, all capital
|
|
112
|
+
|
|
113
|
+
``` python
|
|
114
|
+
predict_kinase('AAAAAAASGAGSDN',**Params("CDDM_upper"))
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2A', '3G', '4S', '5D', '6N']
|
|
118
|
+
|
|
119
|
+
GCN2 4.556
|
|
120
|
+
MPSK1 4.425
|
|
121
|
+
MEKK2 4.253
|
|
122
|
+
WNK3 4.213
|
|
123
|
+
WNK1 4.064
|
|
124
|
+
...
|
|
125
|
+
PDK1 -25.077
|
|
126
|
+
PDHK3 -25.346
|
|
127
|
+
CLK2 -27.251
|
|
128
|
+
ROR2 -27.582
|
|
129
|
+
DDR1 -53.581
|
|
130
|
+
Length: 328, dtype: float64
|
|
131
|
+
|
|
132
|
+
CDDM, with lower case indicating phosphorylation status
|
|
133
|
+
|
|
134
|
+
``` python
|
|
135
|
+
predict_kinase('AAAAAAAsGGAGsDN',**Params("CDDM"))
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']
|
|
139
|
+
|
|
140
|
+
ROR1 8.355
|
|
141
|
+
WNK1 4.907
|
|
142
|
+
WNK2 4.782
|
|
143
|
+
ERK5 4.466
|
|
144
|
+
RIPK2 4.045
|
|
145
|
+
...
|
|
146
|
+
DDR1 -29.393
|
|
147
|
+
TNNI3K -29.884
|
|
148
|
+
CHAK1 -31.775
|
|
149
|
+
VRK1 -45.287
|
|
150
|
+
BRAF -49.403
|
|
151
|
+
Length: 328, dtype: float64
|
|
152
|
+
|
|
153
|
+
PSPA, with lower case indicating phosphorylation status
|
|
154
|
+
|
|
155
|
+
``` python
|
|
156
|
+
predict_kinase('AEEKEyHsEGG',**Params("PSPA"))
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']
|
|
160
|
+
|
|
161
|
+
kinase
|
|
162
|
+
EGFR 4.013
|
|
163
|
+
FGFR4 3.568
|
|
164
|
+
ZAP70 3.412
|
|
165
|
+
CSK 3.241
|
|
166
|
+
SYK 3.209
|
|
167
|
+
...
|
|
168
|
+
JAK1 -3.837
|
|
169
|
+
DDR2 -4.421
|
|
170
|
+
TNK2 -4.534
|
|
171
|
+
TNNI3K_TYR -4.651
|
|
172
|
+
TNK1 -5.320
|
|
173
|
+
Length: 93, dtype: float64
|
|
174
|
+
|
|
175
|
+
To replicate the results from The Kinase Library (PSPA)
|
|
176
|
+
|
|
177
|
+
Check this link: [The Kinase
|
|
178
|
+
Library](https://kinase-library.mit.edu/site?s=AEEKEy*HSEGG&pp=false&scp=true),
|
|
179
|
+
and use log2(score) to rank, it shows same results with the below (with
|
|
180
|
+
slight differences due to rounding).
|
|
181
|
+
|
|
182
|
+
``` python
|
|
183
|
+
out = predict_kinase('AEEKEyHSEGG',**Params("PSPA"))
|
|
184
|
+
out
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']
|
|
188
|
+
|
|
189
|
+
kinase
|
|
190
|
+
EGFR 3.181
|
|
191
|
+
FGFR4 2.390
|
|
192
|
+
CSK 2.308
|
|
193
|
+
ZAP70 2.068
|
|
194
|
+
SYK 1.998
|
|
195
|
+
...
|
|
196
|
+
EPHA1 -3.501
|
|
197
|
+
FES -3.699
|
|
198
|
+
TNK1 -4.269
|
|
199
|
+
TNK2 -4.577
|
|
200
|
+
DDR2 -4.920
|
|
201
|
+
Length: 93, dtype: float64
|
|
202
|
+
|
|
203
|
+
- So far [The kinase Library](https://kinase-library.phosphosite.org)
|
|
204
|
+
considers all ***tyr sequences*** in capital regardless of whether or
|
|
205
|
+
not they contain lower cases, which is a small bug and should be fixed
|
|
206
|
+
soon.
|
|
207
|
+
- Kinase with “\_TYR” indicates it is a dual specificity kinase tested
|
|
208
|
+
in PSPA tyrosine setting, which has not been included in
|
|
209
|
+
kinase-library yet.
|
|
210
|
+
|
|
211
|
+
We can also calculate the percentile score using a referenced score
|
|
212
|
+
sheet.
|
|
213
|
+
|
|
214
|
+
``` python
|
|
215
|
+
# Percentile reference sheet
|
|
216
|
+
y_pct = Data.get_pspa_tyr_pct()
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
``` python
|
|
220
|
+
get_pct('AEEKEyHSEGG',pct_ref = y_pct,**Params("PSPA_y"))
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']
|
|
224
|
+
|
|
225
|
+
<div>
|
|
226
|
+
<style scoped>
|
|
227
|
+
.dataframe tbody tr th:only-of-type {
|
|
228
|
+
vertical-align: middle;
|
|
229
|
+
}
|
|
230
|
+
.dataframe tbody tr th {
|
|
231
|
+
vertical-align: top;
|
|
232
|
+
}
|
|
233
|
+
.dataframe thead th {
|
|
234
|
+
text-align: right;
|
|
235
|
+
}
|
|
236
|
+
</style>
|
|
237
|
+
|
|
238
|
+
| | log2(score) | percentile |
|
|
239
|
+
|-------|-------------|------------|
|
|
240
|
+
| EGFR | 3.181 | 96.787423 |
|
|
241
|
+
| FGFR4 | 2.390 | 94.012303 |
|
|
242
|
+
| CSK | 2.308 | 95.201640 |
|
|
243
|
+
| ZAP70 | 2.068 | 88.380041 |
|
|
244
|
+
| SYK | 1.998 | 85.522898 |
|
|
245
|
+
| ... | ... | ... |
|
|
246
|
+
| EPHA1 | -3.501 | 12.139440 |
|
|
247
|
+
| FES | -3.699 | 21.216678 |
|
|
248
|
+
| TNK1 | -4.269 | 5.481887 |
|
|
249
|
+
| TNK2 | -4.577 | 2.050581 |
|
|
250
|
+
| DDR2 | -4.920 | 10.403281 |
|
|
251
|
+
|
|
252
|
+
<p>93 rows × 2 columns</p>
|
|
253
|
+
</div>
|
|
254
|
+
|
|
255
|
+
### Site scoring in a df
|
|
256
|
+
|
|
257
|
+
Load your csv:
|
|
258
|
+
|
|
259
|
+
``` python
|
|
260
|
+
# df = pd.read_csv('your_file.csv')
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
Or load a demo df
|
|
264
|
+
|
|
265
|
+
``` python
|
|
266
|
+
# Load a demo df with phosphorylation sites
|
|
267
|
+
df = Data.get_ochoa_site().head()
|
|
268
|
+
df.iloc[:,-2:]
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
<div>
|
|
272
|
+
<style scoped>
|
|
273
|
+
.dataframe tbody tr th:only-of-type {
|
|
274
|
+
vertical-align: middle;
|
|
275
|
+
}
|
|
276
|
+
.dataframe tbody tr th {
|
|
277
|
+
vertical-align: top;
|
|
278
|
+
}
|
|
279
|
+
.dataframe thead th {
|
|
280
|
+
text-align: right;
|
|
281
|
+
}
|
|
282
|
+
</style>
|
|
283
|
+
|
|
284
|
+
| | site_seq | gene_site |
|
|
285
|
+
|-----|-----------------|----------------|
|
|
286
|
+
| 0 | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
|
|
287
|
+
| 1 | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
|
|
288
|
+
| 2 | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
|
|
289
|
+
| 3 | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |
|
|
290
|
+
| 4 | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |
|
|
291
|
+
|
|
292
|
+
</div>
|
|
293
|
+
|
|
294
|
+
Set the column name and param to calculate
|
|
295
|
+
|
|
296
|
+
Here we choose param_CDDM_upper, as the sequences in the demo df are all
|
|
297
|
+
in capital. You can also choose other params.
|
|
298
|
+
|
|
299
|
+
``` python
|
|
300
|
+
results = predict_kinase_df(df,'site_seq',**Params("CDDM_upper"))
|
|
301
|
+
results
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
input dataframe has a length 5
|
|
305
|
+
Preprocessing
|
|
306
|
+
Finish preprocessing
|
|
307
|
+
Merging reference
|
|
308
|
+
Finish merging
|
|
309
|
+
|
|
310
|
+
<div>
|
|
311
|
+
<style scoped>
|
|
312
|
+
.dataframe tbody tr th:only-of-type {
|
|
313
|
+
vertical-align: middle;
|
|
314
|
+
}
|
|
315
|
+
.dataframe tbody tr th {
|
|
316
|
+
vertical-align: top;
|
|
317
|
+
}
|
|
318
|
+
.dataframe thead th {
|
|
319
|
+
text-align: right;
|
|
320
|
+
}
|
|
321
|
+
</style>
|
|
322
|
+
|
|
323
|
+
| | SRC | EPHA3 | FES | NTRK3 | ALK | ABL1 | FLT3 | EPHA8 | EPHB2 | EPHB1 | ... | VRK1 | PKMYT1 | GRK3 | CAMK1B | CDC7 | SMMLCK | ROR1 | GAK | MAST2 | BRAF |
|
|
324
|
+
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|
|
325
|
+
| 0 | -2.440640 | -0.818753 | -1.663990 | -0.738991 | -2.047628 | -3.602344 | -3.200998 | -0.935176 | -1.388444 | -1.859450 | ... | -17.103237 | -113.698143 | -16.848783 | -41.520172 | -41.646187 | 1.284159 | -26.566362 | -69.165062 | -17.706400 | -87.763214 |
|
|
326
|
+
| 1 | -3.838486 | -2.735969 | -2.533986 | -2.150399 | -3.792498 | -4.725527 | -5.711791 | -4.534240 | -3.148449 | -2.511518 | ... | -67.889053 | -68.652641 | -45.833855 | -64.171600 | -39.465572 | -65.061722 | -109.561707 | -85.911224 | -60.105064 | -63.889122 |
|
|
327
|
+
| 2 | -2.610423 | -2.370090 | -3.235637 | -1.508413 | -2.571347 | -3.740941 | -3.025596 | -3.373504 | -2.776297 | -3.060740 | ... | -15.798462 | -45.905319 | -61.440742 | -67.695694 | -55.047962 | -42.135216 | -38.501572 | -62.624382 | -56.119389 | -107.060989 |
|
|
328
|
+
| 3 | -5.180541 | -4.201880 | -5.766463 | -3.038421 | -3.836897 | -4.249900 | -5.029885 | -5.411311 | -4.713308 | -4.827825 | ... | -96.978317 | -83.419777 | -22.559393 | -110.611588 | -63.283070 | -37.240440 | -24.497492 | -112.878151 | -43.538158 | -60.348518 |
|
|
329
|
+
| 4 | -2.844254 | -3.322700 | -3.681745 | -1.766435 | -2.666579 | -3.748774 | -4.083619 | -3.912834 | -3.724181 | -3.948160 | ... | -35.824612 | -87.983566 | -83.312317 | -107.162407 | -61.478374 | -85.793571 | -43.738819 | -47.004211 | -42.281624 | -59.518513 |
|
|
330
|
+
|
|
331
|
+
<p>5 rows × 328 columns</p>
|
|
332
|
+
</div>
|
|
333
|
+
|
|
334
|
+
``` python
|
|
335
|
+
results.iloc[0].sort_values(ascending=False)
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
TLK2 8.264621
|
|
339
|
+
GCN2 8.101542
|
|
340
|
+
TLK1 7.693897
|
|
341
|
+
HRI 6.691402
|
|
342
|
+
PLK3 6.579368
|
|
343
|
+
...
|
|
344
|
+
NIK -64.605148
|
|
345
|
+
SRPK2 -67.300667
|
|
346
|
+
GAK -69.165062
|
|
347
|
+
BRAF -87.763214
|
|
348
|
+
PKMYT1 -113.698143
|
|
349
|
+
Name: 0, Length: 328, dtype: float32
|
|
350
|
+
|
|
351
|
+
## Dataset
|
|
352
|
+
|
|
353
|
+
Besides calculating sequence scores, we also provides multiple datasets
|
|
354
|
+
of phosphorylation sites.
|
|
355
|
+
|
|
356
|
+
### CPTAC pan-cancer phosphoproteomics
|
|
357
|
+
|
|
358
|
+
``` python
|
|
359
|
+
df = Data.get_cptac_ensembl_site()
|
|
360
|
+
df.head(3)
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
<div>
|
|
364
|
+
<style scoped>
|
|
365
|
+
.dataframe tbody tr th:only-of-type {
|
|
366
|
+
vertical-align: middle;
|
|
367
|
+
}
|
|
368
|
+
.dataframe tbody tr th {
|
|
369
|
+
vertical-align: top;
|
|
370
|
+
}
|
|
371
|
+
.dataframe thead th {
|
|
372
|
+
text-align: right;
|
|
373
|
+
}
|
|
374
|
+
</style>
|
|
375
|
+
|
|
376
|
+
| | gene | site | site_seq | protein | gene_name | gene_site | protein_site |
|
|
377
|
+
|----|----|----|----|----|----|----|----|
|
|
378
|
+
| 0 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR | M6PR_S267 | ENSP00000000412_S267 |
|
|
379
|
+
| 1 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR | M6PR_S267 | ENSP00000440488_S267 |
|
|
380
|
+
| 2 | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28 | USP28_S1053 | ENSP00000003302_S1053 |
|
|
381
|
+
|
|
382
|
+
</div>
|
|
383
|
+
|
|
384
|
+
### [Ochoa et al. human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)
|
|
385
|
+
|
|
386
|
+
``` python
|
|
387
|
+
df = Data.get_ochoa_site()
|
|
388
|
+
df.head(3)
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
<div>
|
|
392
|
+
<style scoped>
|
|
393
|
+
.dataframe tbody tr th:only-of-type {
|
|
394
|
+
vertical-align: middle;
|
|
395
|
+
}
|
|
396
|
+
.dataframe tbody tr th {
|
|
397
|
+
vertical-align: top;
|
|
398
|
+
}
|
|
399
|
+
.dataframe thead th {
|
|
400
|
+
text-align: right;
|
|
401
|
+
}
|
|
402
|
+
</style>
|
|
403
|
+
|
|
404
|
+
| | uniprot | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name | gene | Sequence | is_valid | site_seq | gene_site |
|
|
405
|
+
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|
|
406
|
+
| 0 | A0A075B6Q4 | 24 | S | 1.0 | 0.91 | 6.839384 | 1.0 | A0A075B6Q4_24 | 0.149257 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
|
|
407
|
+
| 1 | A0A075B6Q4 | 35 | S | 1.0 | 0.87 | 9.192622 | 0.0 | A0A075B6Q4_35 | 0.136966 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
|
|
408
|
+
| 2 | A0A075B6Q4 | 57 | S | 0.0 | 0.28 | 0.818834 | 0.0 | A0A075B6Q4_57 | 0.125364 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
|
|
409
|
+
|
|
410
|
+
</div>
|
|
411
|
+
|
|
412
|
+
### PhosphoSitePlus human phosphorylation site
|
|
413
|
+
|
|
414
|
+
``` python
|
|
415
|
+
df = Data.get_psp_human_site()
|
|
416
|
+
df.head(3)
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
<div>
|
|
420
|
+
<style scoped>
|
|
421
|
+
.dataframe tbody tr th:only-of-type {
|
|
422
|
+
vertical-align: middle;
|
|
423
|
+
}
|
|
424
|
+
.dataframe tbody tr th {
|
|
425
|
+
vertical-align: top;
|
|
426
|
+
}
|
|
427
|
+
.dataframe thead th {
|
|
428
|
+
text-align: right;
|
|
429
|
+
}
|
|
430
|
+
</style>
|
|
431
|
+
|
|
432
|
+
| | gene | protein | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |
|
|
433
|
+
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|
|
434
|
+
| 0 | YWHAB | 14-3-3 beta | P31946 | T2 | YWHAB_T2 | 15718712 | human | \_\_\_\_\_\_MtMDksELV | NaN | 3.0 | 1.0 | None | 0 |
|
|
435
|
+
| 1 | YWHAB | 14-3-3 beta | P31946 | S6 | YWHAB_S6 | 15718709 | human | \_\_MtMDksELVQkAk | NaN | 8.0 | NaN | None | 0 |
|
|
436
|
+
| 2 | YWHAB | 14-3-3 beta | P31946 | Y21 | YWHAB_Y21 | 3426383 | human | LAEQAERyDDMAAAM | NaN | NaN | 4.0 | None | 0 |
|
|
437
|
+
|
|
438
|
+
</div>
|
|
439
|
+
|
|
440
|
+
### Unique sites of combined Ochoa & PhosphoSitePlus
|
|
441
|
+
|
|
442
|
+
``` python
|
|
443
|
+
df = Data.get_combine_site_psp_ochoa()
|
|
444
|
+
df.head(3)
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
<div>
|
|
448
|
+
<style scoped>
|
|
449
|
+
.dataframe tbody tr th:only-of-type {
|
|
450
|
+
vertical-align: middle;
|
|
451
|
+
}
|
|
452
|
+
.dataframe tbody tr th {
|
|
453
|
+
vertical-align: top;
|
|
454
|
+
}
|
|
455
|
+
.dataframe thead th {
|
|
456
|
+
text-align: right;
|
|
457
|
+
}
|
|
458
|
+
</style>
|
|
459
|
+
|
|
460
|
+
| | uniprot | gene | site | site_seq | source | AM_pathogenicity | CDDM_upper | CDDM_max_score |
|
|
461
|
+
|----|----|----|----|----|----|----|----|----|
|
|
462
|
+
| 0 | A0A024R4G9 | C19orf48 | S20 | ITGSRLLSMVPGPAR | psp | NaN | PRKX,AKT1,PKG1,P90RSK,HIPK4,AKT3,HIPK1,PKACB,H... | 2.407041 |
|
|
463
|
+
| 1 | A0A075B6Q4 | None | S24 | VDDEKGDSNDDYDSA | ochoa | NaN | CK2A2,CK2A1,GRK7,GRK5,CK1G1,CK1A,IKKA,CK1G2,CA... | 2.295654 |
|
|
464
|
+
| 2 | A0A075B6Q4 | None | S35 | YDSAGLLSDEDCMSV | ochoa | NaN | CK2A2,CK2A1,IKKA,ATM,IKKB,CAMK1D,MARK2,GRK7,IK... | 2.488683 |
|
|
465
|
+
|
|
466
|
+
</div>
|
|
467
|
+
|
|
468
|
+
## Phosphorylation site sequence example
|
|
469
|
+
|
|
470
|
+
***All capital - 15 length (-7 to +7)***
|
|
471
|
+
|
|
472
|
+
- QSEEEKLSPSPTTED
|
|
473
|
+
- TLQHVPDYRQNVYIP
|
|
474
|
+
- TMGLSARyGPQFTLQ
|
|
475
|
+
|
|
476
|
+
***All capital - 10 length (-5 to +4)***
|
|
477
|
+
|
|
478
|
+
- SRDPHYQDPH
|
|
479
|
+
- LDNPDyQQDF
|
|
480
|
+
- AAAAAsGGAG
|
|
481
|
+
|
|
482
|
+
***With lowercase - (-7 to +7)***
|
|
483
|
+
|
|
484
|
+
- QsEEEKLsPsPTTED
|
|
485
|
+
- TLQHVPDyRQNVYIP
|
|
486
|
+
- TMGLsARyGPQFTLQ
|
|
487
|
+
|
|
488
|
+
***With lowercase - (-5 to +4)***
|
|
489
|
+
|
|
490
|
+
- sRDPHyQDPH
|
|
491
|
+
- LDNPDyQQDF
|
|
492
|
+
- AAAAAsGGAG
|