python-katlas 0.0.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- katlas/__init__.py +1 -0
- katlas/_modidx.py +110 -0
- katlas/core.py +769 -0
- katlas/dl.py +355 -0
- katlas/feature.py +290 -0
- katlas/imports.py +7 -0
- katlas/plot.py +663 -0
- katlas/train.py +231 -0
- python_katlas-0.0.1.dist-info/LICENSE +201 -0
- python_katlas-0.0.1.dist-info/METADATA +402 -0
- python_katlas-0.0.1.dist-info/RECORD +14 -0
- python_katlas-0.0.1.dist-info/WHEEL +5 -0
- python_katlas-0.0.1.dist-info/entry_points.txt +2 -0
- python_katlas-0.0.1.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,402 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: python-katlas
|
|
3
|
+
Version: 0.0.1
|
|
4
|
+
Summary: tools for predicting kinome specificities
|
|
5
|
+
Home-page: https://github.com/sky1ove/python-katlas
|
|
6
|
+
Author: lily
|
|
7
|
+
Author-email: lcai888666@gmail.com
|
|
8
|
+
License: Apache Software License 2.0
|
|
9
|
+
Keywords: nbdev jupyter notebook python
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Natural Language :: English
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.7
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
18
|
+
Requires-Python: >=3.7
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
License-File: LICENSE
|
|
21
|
+
Requires-Dist: fastai (>=2.7.12)
|
|
22
|
+
Requires-Dist: pandas
|
|
23
|
+
Requires-Dist: logomaker
|
|
24
|
+
Requires-Dist: seaborn
|
|
25
|
+
Requires-Dist: rdkit
|
|
26
|
+
Requires-Dist: fairscale
|
|
27
|
+
Requires-Dist: fair-esm
|
|
28
|
+
Requires-Dist: umap-learn
|
|
29
|
+
Requires-Dist: adjustText
|
|
30
|
+
Requires-Dist: bokeh
|
|
31
|
+
Requires-Dist: fastbook
|
|
32
|
+
Requires-Dist: biopython
|
|
33
|
+
Requires-Dist: scikit-learn (>=1.3.0)
|
|
34
|
+
Requires-Dist: statsmodels
|
|
35
|
+
Requires-Dist: openpyxl
|
|
36
|
+
Provides-Extra: dev
|
|
37
|
+
Requires-Dist: nbdev ; extra == 'dev'
|
|
38
|
+
Requires-Dist: pyngrok ; extra == 'dev'
|
|
39
|
+
|
|
40
|
+
# KATLAS
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
|
|
44
|
+
|
|
45
|
+
<a target="_blank" href="https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/index.ipynb">
|
|
46
|
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
|
47
|
+
</a>
|
|
48
|
+
|
|
49
|
+
<img alt="Katlas logo" width="700" caption="Katlas logo" src="https://github.com/sky1ove/katlas/raw/main/dataset/images/logo.png" id="logo"/>
|
|
50
|
+
|
|
51
|
+
KATLAS is a repository containing python tools to predict kinases given
|
|
52
|
+
a substrate sequence. It also contains datasets of kinase substrate
|
|
53
|
+
specificities and human phosphoproteomics.
|
|
54
|
+
|
|
55
|
+
***References***: Please cite the appropriate papers if KATLAS is
|
|
56
|
+
helpful to your research.
|
|
57
|
+
|
|
58
|
+
- KATLAS was described in the paper \[Decoding Human Kinome
|
|
59
|
+
Specificities through a Computational Data-Driven Approach
|
|
60
|
+
(manuscript)\]
|
|
61
|
+
|
|
62
|
+
- The positional scanning peptide array (PSPA) data is from paper [An
|
|
63
|
+
atlas of substrate specificities for the human serine/threonine
|
|
64
|
+
kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper
|
|
65
|
+
[The intrinsic substrate specificity of the human tyrosine
|
|
66
|
+
kinome](https://www.nature.com/articles/s41586-024-07407-y)
|
|
67
|
+
|
|
68
|
+
- The kinase substrate datasets used for generating PSSMs are derived
|
|
69
|
+
from
|
|
70
|
+
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)
|
|
71
|
+
and paper [Large-scale Discovery of Substrates of the Human
|
|
72
|
+
Kinome](https://www.nature.com/articles/s41598-019-46385-4)
|
|
73
|
+
|
|
74
|
+
- Phosphorylation sites are acquired from
|
|
75
|
+
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),
|
|
76
|
+
paper [The functional landscape of the human
|
|
77
|
+
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
|
|
78
|
+
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
|
|
79
|
+
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
|
|
80
|
+
|
|
81
|
+
## Tutorials on Colab
|
|
82
|
+
|
|
83
|
+
- 1. [Substrate scoring on a single substrate
|
|
84
|
+
sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
|
|
85
|
+
- 2. [High throughput substrate scoring on phosphoproteomics
|
|
86
|
+
dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
|
|
87
|
+
- 3. [Query a protein’s phosphorylation sites and predict their
|
|
88
|
+
upstream
|
|
89
|
+
kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
|
|
90
|
+
- 4. [Kinase enrichment analysis for AKT
|
|
91
|
+
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
|
|
92
|
+
/ [Kinase enrichment analysis for EGFR
|
|
93
|
+
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
|
|
94
|
+
|
|
95
|
+
## Install
|
|
96
|
+
|
|
97
|
+
Install the latest version through git
|
|
98
|
+
|
|
99
|
+
``` python
|
|
100
|
+
!pip install git+https://github.com/sky1ove/katlas.git -Uqq
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Import
|
|
104
|
+
|
|
105
|
+
``` python
|
|
106
|
+
from katlas.core import *
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
# Quick start
|
|
110
|
+
|
|
111
|
+
We provide two methods to calculate substrate sequence:
|
|
112
|
+
|
|
113
|
+
- Computational Data-Driven Method (CDDM)
|
|
114
|
+
- Positional Scanning Peptide Array (PSPA)
|
|
115
|
+
|
|
116
|
+
We consider the input in two formats:
|
|
117
|
+
|
|
118
|
+
- a single input string (phosphorylation site)
|
|
119
|
+
- a csv/dataframe that contains a column of phosphorylation sites
|
|
120
|
+
|
|
121
|
+
For input sequences, we also consider it in two conditions:
|
|
122
|
+
|
|
123
|
+
- all capital
|
|
124
|
+
- contains lower cases indicating phosphorylation status
|
|
125
|
+
|
|
126
|
+
## Single sequence as input
|
|
127
|
+
|
|
128
|
+
### CDDM, all capital
|
|
129
|
+
|
|
130
|
+
``` python
|
|
131
|
+
predict_kinase('AAAAAAASGGAGSDN',**param_CDDM_upper)
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2G', '3A', '4G', '5S', '6D', '7N']
|
|
135
|
+
|
|
136
|
+
kinase
|
|
137
|
+
PAK6 2.032
|
|
138
|
+
ULK3 2.032
|
|
139
|
+
PRKX 2.012
|
|
140
|
+
ATR 1.991
|
|
141
|
+
PRKD1 1.988
|
|
142
|
+
...
|
|
143
|
+
DDR2 0.928
|
|
144
|
+
EPHA4 0.928
|
|
145
|
+
TEK 0.921
|
|
146
|
+
KIT 0.915
|
|
147
|
+
FGFR3 0.910
|
|
148
|
+
Length: 289, dtype: float64
|
|
149
|
+
|
|
150
|
+
### CDDM, with lower case indicating phosphorylation status
|
|
151
|
+
|
|
152
|
+
``` python
|
|
153
|
+
predict_kinase('AAAAAAAsGGAGsDN',**param_CDDM)
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']
|
|
157
|
+
|
|
158
|
+
kinase
|
|
159
|
+
ULK3 1.987
|
|
160
|
+
PAK6 1.981
|
|
161
|
+
PRKD1 1.946
|
|
162
|
+
PIM3 1.944
|
|
163
|
+
PRKX 1.939
|
|
164
|
+
...
|
|
165
|
+
EPHA4 0.905
|
|
166
|
+
EGFR 0.900
|
|
167
|
+
TEK 0.898
|
|
168
|
+
FGFR3 0.894
|
|
169
|
+
KIT 0.882
|
|
170
|
+
Length: 289, dtype: float64
|
|
171
|
+
|
|
172
|
+
### PSPA, with lower case indicating phosphorylation status
|
|
173
|
+
|
|
174
|
+
``` python
|
|
175
|
+
predict_kinase('AEEKEyHsEGG',**param_PSPA).head()
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']
|
|
179
|
+
|
|
180
|
+
kinase
|
|
181
|
+
EGFR 4.013
|
|
182
|
+
FGFR4 3.568
|
|
183
|
+
ZAP70 3.412
|
|
184
|
+
CSK 3.241
|
|
185
|
+
SYK 3.209
|
|
186
|
+
dtype: float64
|
|
187
|
+
|
|
188
|
+
### To replicate the results from The Kinase Library (PSPA)
|
|
189
|
+
|
|
190
|
+
Check this link: [The Kinase
|
|
191
|
+
Library](https://kinase-library.phosphosite.org/site?s=AEEKEy*HsEGG&pp=false&scp=true),
|
|
192
|
+
and use log2(score) to rank, it shows same results with the below (with
|
|
193
|
+
slight differences due to rounding).
|
|
194
|
+
|
|
195
|
+
``` python
|
|
196
|
+
predict_kinase('AEEKEyHSEGG',**param_PSPA).head(10)
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']
|
|
200
|
+
|
|
201
|
+
kinase
|
|
202
|
+
EGFR 3.181
|
|
203
|
+
FGFR4 2.390
|
|
204
|
+
CSK 2.308
|
|
205
|
+
ZAP70 2.068
|
|
206
|
+
SYK 1.998
|
|
207
|
+
PDHK1_TYR 1.922
|
|
208
|
+
RET 1.732
|
|
209
|
+
MATK 1.688
|
|
210
|
+
FLT1 1.627
|
|
211
|
+
BMPR2_TYR 1.456
|
|
212
|
+
dtype: float64
|
|
213
|
+
|
|
214
|
+
- So far [The kinase Library](https://kinase-library.phosphosite.org)
|
|
215
|
+
considers all ***tyr sequences*** in capital regardless of whether or
|
|
216
|
+
not they contain lower cases, which is a small bug and should be fixed
|
|
217
|
+
soon.
|
|
218
|
+
- Kinase with “\_TYR” indicates it is a dual specificity kinase tested
|
|
219
|
+
in PSPA tyrosine setting, which has not been included in
|
|
220
|
+
kinase-library yet.
|
|
221
|
+
|
|
222
|
+
We can also calculate the percentile score using a referenced score
|
|
223
|
+
sheet.
|
|
224
|
+
|
|
225
|
+
``` python
|
|
226
|
+
# Percentile reference sheet
|
|
227
|
+
y_pct = Data.get_pspa_tyr_pct()
|
|
228
|
+
|
|
229
|
+
get_pct('AEEKEyHSEGG',**param_PSPA_y, pct_ref = y_pct)
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']
|
|
233
|
+
|
|
234
|
+
|
|
235
|
+
|
|
236
|
+
| | log2(score) | percentile |
|
|
237
|
+
|-------|-------------|------------|
|
|
238
|
+
| EGFR | 3.181 | 96.787423 |
|
|
239
|
+
| FGFR4 | 2.390 | 94.012303 |
|
|
240
|
+
| CSK | 2.308 | 95.201640 |
|
|
241
|
+
| ZAP70 | 2.068 | 88.380041 |
|
|
242
|
+
| SYK | 1.998 | 85.522898 |
|
|
243
|
+
| ... | ... | ... |
|
|
244
|
+
| EPHA1 | -3.501 | 12.139440 |
|
|
245
|
+
| FES | -3.699 | 21.216678 |
|
|
246
|
+
| TNK1 | -4.269 | 5.481887 |
|
|
247
|
+
| TNK2 | -4.577 | 2.050581 |
|
|
248
|
+
| DDR2 | -4.920 | 10.403281 |
|
|
249
|
+
|
|
250
|
+
|
|
251
|
+
|
|
252
|
+
## High-throughput substrate scoring on a dataframe
|
|
253
|
+
|
|
254
|
+
### Load your csv
|
|
255
|
+
|
|
256
|
+
``` python
|
|
257
|
+
# df = pd.read_csv('your_file.csv')
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Load a demo df
|
|
261
|
+
|
|
262
|
+
``` python
|
|
263
|
+
# Load a demo df with phosphorylation sites
|
|
264
|
+
df = Data.get_ochoa_site().head()
|
|
265
|
+
df.iloc[:,-2:]
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
|
|
269
|
+
| | site_seq | gene_site |
|
|
270
|
+
|-----|-----------------|----------------|
|
|
271
|
+
| 0 | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
|
|
272
|
+
| 1 | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
|
|
273
|
+
| 2 | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
|
|
274
|
+
| 3 | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |
|
|
275
|
+
| 4 | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |
|
|
276
|
+
|
|
277
|
+
|
|
278
|
+
|
|
279
|
+
### Set the column name and param to calculate
|
|
280
|
+
|
|
281
|
+
Here we choose param_CDDM_upper, as the sequences in the demo df are all
|
|
282
|
+
in capital. You can also choose other params.
|
|
283
|
+
|
|
284
|
+
``` python
|
|
285
|
+
results = predict_kinase_df(df,'site_seq',**param_CDDM_upper)
|
|
286
|
+
results
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
input dataframe has a length 5
|
|
290
|
+
Preprocessing
|
|
291
|
+
Finish preprocessing
|
|
292
|
+
Calculating position: [-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]
|
|
293
|
+
|
|
294
|
+
100%|██████████| 289/289 [00:05<00:00, 56.64it/s]
|
|
295
|
+
|
|
296
|
+
|
|
297
|
+
|
|
298
|
+
| kinase | SRC | EPHA3 | FES | NTRK3 | ALK | EPHA8 | ABL1 | FLT3 | EPHB2 | FYN | ... | MEK5 | PKN2 | MAP2K7 | MRCKB | HIPK3 | CDK8 | BUB1 | MEKK3 | MAP2K3 | GRK1 |
|
|
299
|
+
|--------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|-----|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
|
|
300
|
+
| 0 | 0.991760 | 1.093712 | 1.051750 | 1.067134 | 1.013682 | 1.097519 | 0.966379 | 0.982464 | 1.054986 | 1.055910 | ... | 1.314859 | 1.635470 | 1.652251 | 1.622672 | 1.362973 | 1.797155 | 1.305198 | 1.423618 | 1.504941 | 1.872020 |
|
|
301
|
+
| 1 | 0.910262 | 0.953743 | 0.942327 | 0.950601 | 0.872694 | 0.932586 | 0.846899 | 0.826662 | 0.915020 | 0.942713 | ... | 1.175454 | 1.402006 | 1.430392 | 1.215826 | 1.569373 | 1.716455 | 1.270999 | 1.195081 | 1.223082 | 1.793290 |
|
|
302
|
+
| 2 | 0.849866 | 0.899910 | 0.848895 | 0.879652 | 0.874959 | 0.899414 | 0.839200 | 0.836523 | 0.858040 | 0.867269 | ... | 1.408003 | 1.813739 | 1.454786 | 1.084522 | 1.352556 | 1.524663 | 1.377839 | 1.173830 | 1.305691 | 1.811849 |
|
|
303
|
+
| 3 | 0.803826 | 0.836527 | 0.800759 | 0.894570 | 0.839905 | 0.781001 | 0.847847 | 0.807040 | 0.805877 | 0.801402 | ... | 1.110307 | 1.703637 | 1.795092 | 1.469653 | 1.549936 | 1.491344 | 1.446922 | 1.055452 | 1.534895 | 1.741090 |
|
|
304
|
+
| 4 | 0.822793 | 0.796532 | 0.792343 | 0.839882 | 0.810122 | 0.781420 | 0.805251 | 0.795022 | 0.790380 | 0.864538 | ... | 1.062617 | 1.357689 | 1.485945 | 1.249266 | 1.456078 | 1.422782 | 1.376471 | 1.089629 | 1.121309 | 1.697524 |
|
|
305
|
+
|
|
306
|
+
|
|
307
|
+
|
|
308
|
+
## Phosphorylation sites
|
|
309
|
+
|
|
310
|
+
Besides calculating sequence scores, we also provides multiple datasets
|
|
311
|
+
of phosphorylation sites.
|
|
312
|
+
|
|
313
|
+
### CPTAC pan-cancer phosphoproteomics
|
|
314
|
+
|
|
315
|
+
``` python
|
|
316
|
+
df = Data.get_cptac_ensembl_site()
|
|
317
|
+
df.head(3)
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
|
|
321
|
+
|
|
322
|
+
| | gene | site | site_seq | protein | gene_name | gene_site | protein_site |
|
|
323
|
+
|-----|--------------------|-------|-----------------|-------------------|-----------|-------------|-----------------------|
|
|
324
|
+
| 0 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR | M6PR_S267 | ENSP00000000412_S267 |
|
|
325
|
+
| 1 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR | M6PR_S267 | ENSP00000440488_S267 |
|
|
326
|
+
| 2 | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28 | USP28_S1053 | ENSP00000003302_S1053 |
|
|
327
|
+
|
|
328
|
+
|
|
329
|
+
|
|
330
|
+
### [Ochoa et al. human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)
|
|
331
|
+
|
|
332
|
+
``` python
|
|
333
|
+
df = Data.get_ochoa_site()
|
|
334
|
+
df.head(3)
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
|
|
338
|
+
| | uniprot | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name | gene | Sequence | is_valid | site_seq | gene_site |
|
|
339
|
+
|-----|------------|----------|---------|-------------|----------------|------------------------|-----------|------------------|------------------|-----------------|------------------|------|---------------------------------------------------|----------|-----------------|----------------|
|
|
340
|
+
| 0 | A0A075B6Q4 | 24 | S | True | 0.91 | 6.839384 | True | A0A075B6Q4_24 | 0.149257 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
|
|
341
|
+
| 1 | A0A075B6Q4 | 35 | S | True | 0.87 | 9.192622 | False | A0A075B6Q4_35 | 0.136966 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
|
|
342
|
+
| 2 | A0A075B6Q4 | 57 | S | False | 0.28 | 0.818834 | False | A0A075B6Q4_57 | 0.125364 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
|
|
343
|
+
|
|
344
|
+
|
|
345
|
+
|
|
346
|
+
### PhosphoSitePlus human phosphorylation site
|
|
347
|
+
|
|
348
|
+
``` python
|
|
349
|
+
df = Data.get_psp_human_site()
|
|
350
|
+
df.head(3)
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
|
|
354
|
+
| | gene | protein | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |
|
|
355
|
+
|-----|-------|-------------|---------|------|-----------|-------------|---------|-----------------------|--------|--------|--------|----------|----------------|
|
|
356
|
+
| 0 | YWHAB | 14-3-3 beta | P31946 | T2 | YWHAB_T2 | 15718712 | human | \_\_\_\_\_\_MtMDksELV | NaN | 3.0 | 1.0 | None | 0 |
|
|
357
|
+
| 1 | YWHAB | 14-3-3 beta | P31946 | S6 | YWHAB_S6 | 15718709 | human | \_\_MtMDksELVQkAk | NaN | 8.0 | NaN | None | 0 |
|
|
358
|
+
| 2 | YWHAB | 14-3-3 beta | P31946 | Y21 | YWHAB_Y21 | 3426383 | human | LAEQAERyDDMAAAM | NaN | NaN | 4.0 | None | 0 |
|
|
359
|
+
|
|
360
|
+
|
|
361
|
+
|
|
362
|
+
### Unique sites of combined Ochoa & PhosphoSitePlus
|
|
363
|
+
|
|
364
|
+
``` python
|
|
365
|
+
df = Data.get_combine_site_psp_ochoa()
|
|
366
|
+
df.head(3)
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
|
|
370
|
+
| | site_seq | gene_site | gene | source | num_site | acceptor | -7 | -6 | -5 | -4 | ... | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|
|
371
|
+
|-----|-----------------|------------|-------|--------|----------|----------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
|
|
372
|
+
| 0 | AAAAAAASGGAGSDN | PBX1_S136 | PBX1 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | A | G | S | D | N |
|
|
373
|
+
| 1 | AAAAAAASGGGVSPD | PBX2_S146 | PBX2 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | G | V | S | P | D |
|
|
374
|
+
| 2 | AAAAAAASGVTTGKP | CLASR_S349 | CLASR | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | V | T | T | G | K | P |
|
|
375
|
+
|
|
376
|
+
|
|
377
|
+
|
|
378
|
+
## Phosphorylation site sequence example
|
|
379
|
+
|
|
380
|
+
***All capital - 15 length (-7 to +7)***
|
|
381
|
+
|
|
382
|
+
- QSEEEKLSPSPTTED
|
|
383
|
+
- TLQHVPDYRQNVYIP
|
|
384
|
+
- TMGLSARyGPQFTLQ
|
|
385
|
+
|
|
386
|
+
***All capital - 10 length (-5 to +4)***
|
|
387
|
+
|
|
388
|
+
- SRDPHYQDPH
|
|
389
|
+
- LDNPDyQQDF
|
|
390
|
+
- AAAAAsGGAG
|
|
391
|
+
|
|
392
|
+
***With lowercase - (-7 to +7)***
|
|
393
|
+
|
|
394
|
+
- QsEEEKLsPsPTTED
|
|
395
|
+
- TLQHVPDyRQNVYIP
|
|
396
|
+
- TMGLsARyGPQFTLQ
|
|
397
|
+
|
|
398
|
+
***With lowercase - (-5 to +4)***
|
|
399
|
+
|
|
400
|
+
- sRDPHyQDPH
|
|
401
|
+
- LDNPDyQQDF
|
|
402
|
+
- AAAAAsGGAG
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
katlas/__init__.py,sha256=sXLh7g3KC4QCFxcZGBTpG2scR7hmmBsMjq6LqRptkRg,22
|
|
2
|
+
katlas/_modidx.py,sha256=wuIOxQQtyUyUDt8xnoZYyHfJAjnWMcoSYO6D3PXUFGE,10996
|
|
3
|
+
katlas/core.py,sha256=25yF0J2RBO_Fup1dUQA_h6Tfwcs96-A5uuzdf_lCpo0,34975
|
|
4
|
+
katlas/dl.py,sha256=Rm1EO6oGTiHpqp4EA2xAvbIUnh608FPYOdzndRGKVkc,10849
|
|
5
|
+
katlas/feature.py,sha256=3zgTuCnXqH1e0LGZ2Hkvan852PiaIHxj27cg_TJfKzo,11471
|
|
6
|
+
katlas/imports.py,sha256=-ZphRU8K1KspxMpgRxisE0OskrCw3S8JR8tvmeXBRY0,147
|
|
7
|
+
katlas/plot.py,sha256=vB3gv0aaCNERW1CtdDWqM4jIZOx1auGWwi_1I22xBa0,23630
|
|
8
|
+
katlas/train.py,sha256=s0ucsZVaixCTZPz-XAI2J7zQDeGkiYEJKOc2dFTYsAc,7625
|
|
9
|
+
python_katlas-0.0.1.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
10
|
+
python_katlas-0.0.1.dist-info/METADATA,sha256=3yYodyC6FDFo2E4vGk4DgDuJHGGK0PWIXXyIivPFk_s,15256
|
|
11
|
+
python_katlas-0.0.1.dist-info/WHEEL,sha256=EVRjI69F5qVjm_YgqcTXPnTAv3BfSUr0WVAHuSP3Xoo,92
|
|
12
|
+
python_katlas-0.0.1.dist-info/entry_points.txt,sha256=SF3xDlCmE84ECTBIMDo_FNg1aXGX2-lXkCvH5o4VgpM,34
|
|
13
|
+
python_katlas-0.0.1.dist-info/top_level.txt,sha256=pKBKw9KOSJgnnFkoilkDij_iJ_tJbIO4XnrSXIleqNc,7
|
|
14
|
+
python_katlas-0.0.1.dist-info/RECORD,,
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
katlas
|