python-katlas 0.1.0__tar.gz → 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {python-katlas-0.1.0/python_katlas.egg-info → python-katlas-0.1.1}/PKG-INFO +13 -10
- {python-katlas-0.1.0 → python-katlas-0.1.1}/README.md +12 -9
- python-katlas-0.1.1/katlas/__init__.py +1 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/_modidx.py +1 -2
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/core.py +43 -51
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/dl.py +14 -14
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/feature.py +9 -9
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/plot.py +20 -20
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/train.py +7 -7
- {python-katlas-0.1.0 → python-katlas-0.1.1/python_katlas.egg-info}/PKG-INFO +13 -10
- {python-katlas-0.1.0 → python-katlas-0.1.1}/settings.ini +1 -1
- python-katlas-0.1.0/katlas/__init__.py +0 -1
- {python-katlas-0.1.0 → python-katlas-0.1.1}/LICENSE +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/MANIFEST.in +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/imports.py +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/SOURCES.txt +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/dependency_links.txt +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/entry_points.txt +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/not-zip-safe +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/requires.txt +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/top_level.txt +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/setup.cfg +0 -0
- {python-katlas-0.1.0 → python-katlas-0.1.1}/setup.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: python-katlas
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.1
|
|
4
4
|
Summary: tools for predicting kinome specificities
|
|
5
5
|
Home-page: https://github.com/sky1ove/python-katlas
|
|
6
6
|
Author: lily
|
|
@@ -60,6 +60,13 @@ helpful to your research.
|
|
|
60
60
|
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
|
|
61
61
|
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
|
|
62
62
|
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
## Web applications
|
|
66
|
+
|
|
67
|
+
Users can now run the analysis directly on the web without needing to code.
|
|
68
|
+
|
|
69
|
+
Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
|
|
63
70
|
|
|
64
71
|
## Tutorials on Colab
|
|
65
72
|
|
|
@@ -67,20 +74,16 @@ helpful to your research.
|
|
|
67
74
|
sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
|
|
68
75
|
- 2. [High throughput substrate scoring on phosphoproteomics
|
|
69
76
|
dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
|
|
70
|
-
- 3. [
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
- 4. [Kinase enrichment analysis for AKT
|
|
74
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
|
|
75
|
-
/ [Kinase enrichment analysis for EGFR
|
|
76
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
|
|
77
|
+
- 3. [Kinase enrichment analysis for AKT
|
|
78
|
+
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
|
|
79
|
+
|
|
77
80
|
|
|
78
81
|
## Install
|
|
79
82
|
|
|
80
|
-
Install the latest version through
|
|
83
|
+
Install the latest version through pip
|
|
81
84
|
|
|
82
85
|
``` python
|
|
83
|
-
|
|
86
|
+
pip install python-katlas -Uq
|
|
84
87
|
```
|
|
85
88
|
|
|
86
89
|
## Import
|
|
@@ -38,6 +38,13 @@ helpful to your research.
|
|
|
38
38
|
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
|
|
39
39
|
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
|
|
40
40
|
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
## Web applications
|
|
44
|
+
|
|
45
|
+
Users can now run the analysis directly on the web without needing to code.
|
|
46
|
+
|
|
47
|
+
Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
|
|
41
48
|
|
|
42
49
|
## Tutorials on Colab
|
|
43
50
|
|
|
@@ -45,20 +52,16 @@ helpful to your research.
|
|
|
45
52
|
sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
|
|
46
53
|
- 2. [High throughput substrate scoring on phosphoproteomics
|
|
47
54
|
dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
|
|
48
|
-
- 3. [
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
- 4. [Kinase enrichment analysis for AKT
|
|
52
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
|
|
53
|
-
/ [Kinase enrichment analysis for EGFR
|
|
54
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
|
|
55
|
+
- 3. [Kinase enrichment analysis for AKT
|
|
56
|
+
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
|
|
57
|
+
|
|
55
58
|
|
|
56
59
|
## Install
|
|
57
60
|
|
|
58
|
-
Install the latest version through
|
|
61
|
+
Install the latest version through pip
|
|
59
62
|
|
|
60
63
|
``` python
|
|
61
|
-
|
|
64
|
+
pip install python-katlas -Uq
|
|
62
65
|
```
|
|
63
66
|
|
|
64
67
|
## Import
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "0.1.0"
|
|
@@ -46,13 +46,12 @@ d = { 'settings': { 'branch': 'main',
|
|
|
46
46
|
'katlas.core.get_one_kinase': ('core.html#get_one_kinase', 'katlas/core.py'),
|
|
47
47
|
'katlas.core.get_pct': ('core.html#get_pct', 'katlas/core.py'),
|
|
48
48
|
'katlas.core.get_pct_df': ('core.html#get_pct_df', 'katlas/core.py'),
|
|
49
|
-
'katlas.core.
|
|
49
|
+
'katlas.core.get_pvalue': ('core.html#get_pvalue', 'katlas/core.py'),
|
|
50
50
|
'katlas.core.get_unique_site': ('core.html#get_unique_site', 'katlas/core.py'),
|
|
51
51
|
'katlas.core.multiply': ('core.html#multiply', 'katlas/core.py'),
|
|
52
52
|
'katlas.core.multiply_func': ('core.html#multiply_func', 'katlas/core.py'),
|
|
53
53
|
'katlas.core.predict_kinase': ('core.html#predict_kinase', 'katlas/core.py'),
|
|
54
54
|
'katlas.core.predict_kinase_df': ('core.html#predict_kinase_df', 'katlas/core.py'),
|
|
55
|
-
'katlas.core.query_gene': ('core.html#query_gene', 'katlas/core.py'),
|
|
56
55
|
'katlas.core.raw2norm': ('core.html#raw2norm', 'katlas/core.py'),
|
|
57
56
|
'katlas.core.sumup': ('core.html#sumup', 'katlas/core.py')},
|
|
58
57
|
'katlas.dl': { 'katlas.dl.CNN1D_1': ('dl.html#cnn1d_1', 'katlas/dl.py'),
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
__all__ = ['param_PSPA_st', 'param_PSPA_y', 'param_PSPA', 'param_CDDM', 'param_CDDM_upper', 'Data', 'CPTAC', 'convert_string',
|
|
7
7
|
'checker', 'STY2sty', 'cut_seq', 'get_dict', 'multiply_func', 'multiply', 'sumup', 'predict_kinase',
|
|
8
8
|
'predict_kinase_df', 'get_pct', 'get_pct_df', 'get_unique_site', 'extract_site_seq', 'get_freq',
|
|
9
|
-
'
|
|
9
|
+
'get_pvalue', 'get_metaP', 'raw2norm', 'get_one_kinase']
|
|
10
10
|
|
|
11
11
|
# %% ../nbs/00_core.ipynb 4
|
|
12
12
|
import math, pandas as pd, numpy as np
|
|
@@ -14,7 +14,7 @@ from tqdm import tqdm
|
|
|
14
14
|
from scipy.stats import chi2
|
|
15
15
|
from typing import Callable
|
|
16
16
|
from functools import partial
|
|
17
|
-
from scipy.stats import ttest_ind
|
|
17
|
+
from scipy.stats import ttest_ind, mannwhitneyu, wilcoxon
|
|
18
18
|
from statsmodels.stats.multitest import multipletests
|
|
19
19
|
|
|
20
20
|
# %% ../nbs/00_core.ipynb 7
|
|
@@ -550,7 +550,7 @@ def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
|
|
|
550
550
|
# Return results as a DataFrame
|
|
551
551
|
return out
|
|
552
552
|
|
|
553
|
-
# %% ../nbs/00_core.ipynb
|
|
553
|
+
# %% ../nbs/00_core.ipynb 53
|
|
554
554
|
def get_pct(site,ref,func,pct_ref):
|
|
555
555
|
|
|
556
556
|
"Replicate the precentile results from The Kinase Library."
|
|
@@ -575,7 +575,7 @@ def get_pct(site,ref,func,pct_ref):
|
|
|
575
575
|
final.columns=['log2(score)','percentile']
|
|
576
576
|
return final
|
|
577
577
|
|
|
578
|
-
# %% ../nbs/00_core.ipynb
|
|
578
|
+
# %% ../nbs/00_core.ipynb 59
|
|
579
579
|
def get_pct_df(score_df, # output from predict_kinase_df
|
|
580
580
|
pct_ref, # a reference df for percentile calculation
|
|
581
581
|
):
|
|
@@ -600,7 +600,7 @@ def get_pct_df(score_df, # output from predict_kinase_df
|
|
|
600
600
|
|
|
601
601
|
return percentiles_df
|
|
602
602
|
|
|
603
|
-
# %% ../nbs/00_core.ipynb
|
|
603
|
+
# %% ../nbs/00_core.ipynb 64
|
|
604
604
|
def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphorylation sites
|
|
605
605
|
seq_col: str='site_seq', # column name of site sequence
|
|
606
606
|
id_col: str='gene_site' # column name of site id
|
|
@@ -616,7 +616,7 @@ def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphoryla
|
|
|
616
616
|
|
|
617
617
|
return unique
|
|
618
618
|
|
|
619
|
-
# %% ../nbs/00_core.ipynb
|
|
619
|
+
# %% ../nbs/00_core.ipynb 67
|
|
620
620
|
def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequence
|
|
621
621
|
seq_col: str, # column name of protein sequence
|
|
622
622
|
position_col: str # column name of position 0
|
|
@@ -642,7 +642,7 @@ def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequenc
|
|
|
642
642
|
|
|
643
643
|
return np.array(data)
|
|
644
644
|
|
|
645
|
-
# %% ../nbs/00_core.ipynb
|
|
645
|
+
# %% ../nbs/00_core.ipynb 72
|
|
646
646
|
def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains phosphorylation sequence splitted by their position
|
|
647
647
|
aa_order = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the full matrix
|
|
648
648
|
aa_order_paper = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the partial matrix
|
|
@@ -683,35 +683,16 @@ def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains
|
|
|
683
683
|
|
|
684
684
|
return paper,full
|
|
685
685
|
|
|
686
|
-
# %% ../nbs/00_core.ipynb
|
|
687
|
-
def
|
|
688
|
-
|
|
689
|
-
"Query gene in the phosphoproteomics dataset"
|
|
690
|
-
|
|
691
|
-
# query gene in the dataframe
|
|
692
|
-
df_gene = df[df.gene_site.str.contains(f'{gene}_')]
|
|
693
|
-
|
|
694
|
-
# sort dataframe based on position
|
|
695
|
-
sort_position = df_gene.gene_site.str.split('_').str[-1].str[1:].astype(int).sort_values().index
|
|
696
|
-
df_gene = df_gene.loc[sort_position]
|
|
697
|
-
|
|
698
|
-
return df_gene
|
|
699
|
-
|
|
700
|
-
# %% ../nbs/00_core.ipynb 83
|
|
701
|
-
def get_ttest(df,
|
|
686
|
+
# %% ../nbs/00_core.ipynb 76
|
|
687
|
+
def get_pvalue(df,
|
|
702
688
|
columns1, # list of column names for group1
|
|
703
689
|
columns2, # list of column names for group2
|
|
690
|
+
test_method = 'mann_whitney', # 'student_t', 'mann_whitney', 'wilcoxon'
|
|
704
691
|
FC_method = 'median', # or mean
|
|
705
|
-
alpha=0.05, # significance level in multipletests for p_adj
|
|
706
|
-
correction_method='fdr_bh', # method in multipletests for p_adj
|
|
707
692
|
):
|
|
708
|
-
"""
|
|
709
|
-
Performs t-tests and calculates log2 fold change between two groups of columns in a DataFrame.
|
|
710
|
-
NaN p-values are excluded from the multiple testing correction.
|
|
711
693
|
|
|
712
|
-
|
|
713
|
-
|
|
714
|
-
"""
|
|
694
|
+
"Performs statistical tests and calculates difference between the median or mean of two groups of columns."
|
|
695
|
+
|
|
715
696
|
group1 = df[columns1]
|
|
716
697
|
group2 = df[columns2]
|
|
717
698
|
|
|
@@ -726,24 +707,36 @@ def get_ttest(df,
|
|
|
726
707
|
# As phosphoproteomics data has already been log transformed, we can directly use subtraction
|
|
727
708
|
FCs = m2 - m1
|
|
728
709
|
|
|
729
|
-
# Perform
|
|
730
|
-
|
|
710
|
+
# Perform the chosen test and handle NaN p-values
|
|
711
|
+
if test_method == 'student_t': # data is normally distributed, non-paired
|
|
712
|
+
test_func = ttest_ind
|
|
713
|
+
elif test_method == 'mann_whitney': # not normally distributed, non-paired, mann_whitney considers the rank, ignore the differences
|
|
714
|
+
test_func = mannwhitneyu
|
|
715
|
+
elif test_method == 'wilcoxon': # not normally distributed, paired
|
|
716
|
+
test_func = wilcoxon
|
|
717
|
+
|
|
718
|
+
t_results = []
|
|
719
|
+
for idx in tqdm(df.index, desc=f"Computing {test_method} tests"):
|
|
720
|
+
try:
|
|
721
|
+
if test_method == 'wilcoxon': # as wilcoxon is paired, if lack a paired sample, just give nan, as default nanpolicy is propagate (gives nan if nan in input)
|
|
722
|
+
stat, pvalue = test_func(group1.loc[idx], group2.loc[idx])
|
|
723
|
+
else:
|
|
724
|
+
stat, pvalue = test_func(group1.loc[idx], group2.loc[idx], nan_policy='omit')
|
|
725
|
+
except ValueError: # Handle cases with insufficient data
|
|
726
|
+
pvalue = np.nan
|
|
727
|
+
t_results.append(pvalue)
|
|
731
728
|
|
|
732
729
|
# Exclude NaN p-values before multiple testing correction
|
|
733
|
-
p_values =
|
|
734
|
-
valid_p_values = np.
|
|
730
|
+
p_values = np.array(t_results, dtype=float) # Ensure the correct data type
|
|
731
|
+
valid_p_values = p_values[~np.isnan(p_values)]
|
|
735
732
|
|
|
736
|
-
# valid_p_values = np.array(p_values)
|
|
737
|
-
valid_p_values = valid_p_values[~np.isnan(valid_p_values)]
|
|
738
|
-
|
|
739
733
|
# Adjust for multiple testing on valid p-values only
|
|
740
|
-
reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=
|
|
741
|
-
|
|
734
|
+
reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=0.05, method='fdr_bh')
|
|
735
|
+
|
|
742
736
|
# Create a full list of corrected p-values including NaNs
|
|
743
|
-
full_pvals_corrected = np.
|
|
744
|
-
full_pvals_corrected[:] = np.nan
|
|
737
|
+
full_pvals_corrected = np.full_like(p_values, np.nan)
|
|
745
738
|
np.place(full_pvals_corrected, ~np.isnan(p_values), pvals_corrected)
|
|
746
|
-
|
|
739
|
+
|
|
747
740
|
# Adjust the significance accordingly
|
|
748
741
|
full_reject = np.zeros_like(p_values, dtype=bool)
|
|
749
742
|
np.place(full_reject, ~np.isnan(p_values), reject)
|
|
@@ -752,22 +745,21 @@ def get_ttest(df,
|
|
|
752
745
|
results = pd.DataFrame({
|
|
753
746
|
'log2FC': FCs,
|
|
754
747
|
'p_value': p_values,
|
|
755
|
-
'p_adj': full_pvals_corrected
|
|
756
|
-
'significant': full_reject
|
|
748
|
+
'p_adj': full_pvals_corrected
|
|
757
749
|
})
|
|
758
750
|
|
|
759
751
|
results['p_value'] = results['p_value'].astype(float)
|
|
760
|
-
|
|
752
|
+
|
|
761
753
|
def get_signed_logP(r,p_col):
|
|
762
754
|
log10 = -np.log10(r[p_col])
|
|
763
755
|
return -log10 if r['log2FC']<0 else log10
|
|
764
|
-
|
|
756
|
+
|
|
765
757
|
results['signed_logP'] = results.apply(partial(get_signed_logP,p_col='p_value'),axis=1)
|
|
766
758
|
results['signed_logPadj'] = results.apply(partial(get_signed_logP,p_col='p_adj'),axis=1)
|
|
767
|
-
|
|
759
|
+
|
|
768
760
|
return results
|
|
769
761
|
|
|
770
|
-
# %% ../nbs/00_core.ipynb
|
|
762
|
+
# %% ../nbs/00_core.ipynb 77
|
|
771
763
|
def get_metaP(p_values):
|
|
772
764
|
|
|
773
765
|
"Use Fisher's method to calculate a combined p value given a list of p values; this function also allows negative p values (negative correlation)"
|
|
@@ -779,7 +771,7 @@ def get_metaP(p_values):
|
|
|
779
771
|
|
|
780
772
|
return score
|
|
781
773
|
|
|
782
|
-
# %% ../nbs/00_core.ipynb
|
|
774
|
+
# %% ../nbs/00_core.ipynb 80
|
|
783
775
|
def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and single amino acid as columns
|
|
784
776
|
PDHK: bool=False, # whether this kinase belongs to PDHK family
|
|
785
777
|
):
|
|
@@ -802,7 +794,7 @@ def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and s
|
|
|
802
794
|
|
|
803
795
|
return df2
|
|
804
796
|
|
|
805
|
-
# %% ../nbs/00_core.ipynb
|
|
797
|
+
# %% ../nbs/00_core.ipynb 82
|
|
806
798
|
def get_one_kinase(df: pd.DataFrame, #stacked dataframe (paper's raw data)
|
|
807
799
|
kinase:str, # a specific kinase
|
|
808
800
|
normalize: bool=False, # normalize according to the paper; special for PDHK1/4
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
__all__ = ['def_device', 'seed_everything', 'GeneralDataset', 'get_sampler', 'MLP_1', 'CNN1D_1', 'init_weights', 'lin_wn',
|
|
7
7
|
'conv_wn', 'CNN1D_2', 'train_dl', 'train_dl_cv', 'predict_dl']
|
|
8
8
|
|
|
9
|
-
# %% ../nbs/04_DL.ipynb
|
|
9
|
+
# %% ../nbs/04_DL.ipynb 5
|
|
10
10
|
from fastbook import *
|
|
11
11
|
import fastcore.all as fc,torch.nn.init as init
|
|
12
12
|
from fastai.callback.training import GradientClip
|
|
@@ -22,7 +22,7 @@ from sklearn.model_selection import *
|
|
|
22
22
|
from sklearn.metrics import mean_squared_error
|
|
23
23
|
from scipy.stats import spearmanr,pearsonr
|
|
24
24
|
|
|
25
|
-
# %% ../nbs/04_DL.ipynb
|
|
25
|
+
# %% ../nbs/04_DL.ipynb 7
|
|
26
26
|
def seed_everything(seed=123):
|
|
27
27
|
random.seed(seed)
|
|
28
28
|
os.environ['PYTHONHASHSEED'] = str(seed)
|
|
@@ -32,10 +32,10 @@ def seed_everything(seed=123):
|
|
|
32
32
|
torch.backends.cudnn.deterministic = True
|
|
33
33
|
torch.backends.cudnn.benchmark = False
|
|
34
34
|
|
|
35
|
-
# %% ../nbs/04_DL.ipynb
|
|
35
|
+
# %% ../nbs/04_DL.ipynb 9
|
|
36
36
|
def_device = 'mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu'
|
|
37
37
|
|
|
38
|
-
# %% ../nbs/04_DL.ipynb
|
|
38
|
+
# %% ../nbs/04_DL.ipynb 14
|
|
39
39
|
class GeneralDataset:
|
|
40
40
|
def __init__(self,
|
|
41
41
|
df, # a dataframe of values
|
|
@@ -62,7 +62,7 @@ class GeneralDataset:
|
|
|
62
62
|
y = torch.Tensor(self.y[index])
|
|
63
63
|
return X, y
|
|
64
64
|
|
|
65
|
-
# %% ../nbs/04_DL.ipynb
|
|
65
|
+
# %% ../nbs/04_DL.ipynb 18
|
|
66
66
|
def get_sampler(info,col):
|
|
67
67
|
|
|
68
68
|
"For imbalanced data, get higher weights for less-represented samples"
|
|
@@ -82,7 +82,7 @@ def get_sampler(info,col):
|
|
|
82
82
|
|
|
83
83
|
return sampler
|
|
84
84
|
|
|
85
|
-
# %% ../nbs/04_DL.ipynb
|
|
85
|
+
# %% ../nbs/04_DL.ipynb 24
|
|
86
86
|
def MLP_1(num_features,
|
|
87
87
|
num_targets,
|
|
88
88
|
hidden_units = [512, 218],
|
|
@@ -112,7 +112,7 @@ def MLP_1(num_features,
|
|
|
112
112
|
|
|
113
113
|
return model
|
|
114
114
|
|
|
115
|
-
# %% ../nbs/04_DL.ipynb
|
|
115
|
+
# %% ../nbs/04_DL.ipynb 30
|
|
116
116
|
class CNN1D_1(Module):
|
|
117
117
|
|
|
118
118
|
def __init__(self,
|
|
@@ -137,12 +137,12 @@ class CNN1D_1(Module):
|
|
|
137
137
|
x = self.fc2(x)
|
|
138
138
|
return x
|
|
139
139
|
|
|
140
|
-
# %% ../nbs/04_DL.ipynb
|
|
140
|
+
# %% ../nbs/04_DL.ipynb 34
|
|
141
141
|
def init_weights(m, leaky=0.):
|
|
142
142
|
"Initiate any Conv layer with Kaiming norm."
|
|
143
143
|
if isinstance(m, (nn.Conv1d,nn.Conv2d,nn.Conv3d)): init.kaiming_normal_(m.weight, a=leaky)
|
|
144
144
|
|
|
145
|
-
# %% ../nbs/04_DL.ipynb
|
|
145
|
+
# %% ../nbs/04_DL.ipynb 35
|
|
146
146
|
def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
|
|
147
147
|
"Weight norm of linear."
|
|
148
148
|
layers = nn.Sequential(
|
|
@@ -152,7 +152,7 @@ def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
|
|
|
152
152
|
if act: layers.append(act())
|
|
153
153
|
return layers
|
|
154
154
|
|
|
155
|
-
# %% ../nbs/04_DL.ipynb
|
|
155
|
+
# %% ../nbs/04_DL.ipynb 36
|
|
156
156
|
def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
|
|
157
157
|
"Weight norm of conv."
|
|
158
158
|
layers = nn.Sequential(
|
|
@@ -162,7 +162,7 @@ def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
|
|
|
162
162
|
if act: layers.append(act())
|
|
163
163
|
return layers
|
|
164
164
|
|
|
165
|
-
# %% ../nbs/04_DL.ipynb
|
|
165
|
+
# %% ../nbs/04_DL.ipynb 37
|
|
166
166
|
class CNN1D_2(nn.Module):
|
|
167
167
|
|
|
168
168
|
def __init__(self, ni, nf, amp_scale = 16):
|
|
@@ -212,7 +212,7 @@ class CNN1D_2(nn.Module):
|
|
|
212
212
|
|
|
213
213
|
return x
|
|
214
214
|
|
|
215
|
-
# %% ../nbs/04_DL.ipynb
|
|
215
|
+
# %% ../nbs/04_DL.ipynb 41
|
|
216
216
|
def train_dl(df,
|
|
217
217
|
feat_col,
|
|
218
218
|
target_col,
|
|
@@ -275,7 +275,7 @@ def train_dl(df,
|
|
|
275
275
|
|
|
276
276
|
return target, pred
|
|
277
277
|
|
|
278
|
-
# %% ../nbs/04_DL.ipynb
|
|
278
|
+
# %% ../nbs/04_DL.ipynb 46
|
|
279
279
|
@fc.delegates(train_dl)
|
|
280
280
|
def train_dl_cv(df,
|
|
281
281
|
feat_col,
|
|
@@ -325,7 +325,7 @@ def train_dl_cv(df,
|
|
|
325
325
|
|
|
326
326
|
return oof, metrics
|
|
327
327
|
|
|
328
|
-
# %% ../nbs/04_DL.ipynb
|
|
328
|
+
# %% ../nbs/04_DL.ipynb 54
|
|
329
329
|
def predict_dl(df,
|
|
330
330
|
feat_col,
|
|
331
331
|
target_col,
|
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
# %% auto 0
|
|
6
6
|
__all__ = ['get_rdkit', 'get_morgan', 'get_esm', 'get_t5', 'get_t5_bfd', 'reduce_feature', 'remove_hi_corr', 'preprocess']
|
|
7
7
|
|
|
8
|
-
# %% ../nbs/01_feature.ipynb
|
|
8
|
+
# %% ../nbs/01_feature.ipynb 5
|
|
9
9
|
from fastbook import *
|
|
10
10
|
import torch,re,joblib,gc,esm
|
|
11
11
|
from tqdm.notebook import tqdm; tqdm.pandas()
|
|
@@ -30,7 +30,7 @@ from umap.umap_ import UMAP
|
|
|
30
30
|
|
|
31
31
|
set_config(transform_output="pandas")
|
|
32
32
|
|
|
33
|
-
# %% ../nbs/01_feature.ipynb
|
|
33
|
+
# %% ../nbs/01_feature.ipynb 8
|
|
34
34
|
def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
|
|
35
35
|
col:str = "SMILES", # colname of smile
|
|
36
36
|
normalize: bool = True, # normalize features using StandardScaler()
|
|
@@ -49,7 +49,7 @@ def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
|
|
|
49
49
|
# feature_df = feature_df.reset_index()
|
|
50
50
|
return feature_df
|
|
51
51
|
|
|
52
|
-
# %% ../nbs/01_feature.ipynb
|
|
52
|
+
# %% ../nbs/01_feature.ipynb 12
|
|
53
53
|
def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
|
|
54
54
|
col: str = "SMILES", # colname of smile
|
|
55
55
|
radius=3
|
|
@@ -61,7 +61,7 @@ def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
|
|
|
61
61
|
fp_df.columns = "morgan_" + fp_df.columns.astype(str)
|
|
62
62
|
return fp_df
|
|
63
63
|
|
|
64
|
-
# %% ../nbs/01_feature.ipynb
|
|
64
|
+
# %% ../nbs/01_feature.ipynb 16
|
|
65
65
|
def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
|
|
66
66
|
col: str = 'sequence', # colname of amino acid sequence
|
|
67
67
|
model_name: str = "esm2_t33_650M_UR50D", # Name of the ESM model to use for the embeddings.
|
|
@@ -128,7 +128,7 @@ def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
|
|
|
128
128
|
|
|
129
129
|
return df_feature
|
|
130
130
|
|
|
131
|
-
# %% ../nbs/01_feature.ipynb
|
|
131
|
+
# %% ../nbs/01_feature.ipynb 20
|
|
132
132
|
def get_t5(df: pd.DataFrame,
|
|
133
133
|
col: str = 'sequence'
|
|
134
134
|
):
|
|
@@ -170,7 +170,7 @@ def get_t5(df: pd.DataFrame,
|
|
|
170
170
|
|
|
171
171
|
return T5_feature
|
|
172
172
|
|
|
173
|
-
# %% ../nbs/01_feature.ipynb
|
|
173
|
+
# %% ../nbs/01_feature.ipynb 23
|
|
174
174
|
def get_t5_bfd(df:pd.DataFrame,
|
|
175
175
|
col: str = 'sequence'
|
|
176
176
|
):
|
|
@@ -212,7 +212,7 @@ def get_t5_bfd(df:pd.DataFrame,
|
|
|
212
212
|
|
|
213
213
|
return T5_feature
|
|
214
214
|
|
|
215
|
-
# %% ../nbs/01_feature.ipynb
|
|
215
|
+
# %% ../nbs/01_feature.ipynb 27
|
|
216
216
|
def reduce_feature(df: pd.DataFrame,
|
|
217
217
|
method: str='pca', # dimensionality reduction method, accept both capital and lower case
|
|
218
218
|
complexity: int=20, # None for PCA; perfplexity for TSNE, recommend: 30; n_neigbors for UMAP, recommend: 15
|
|
@@ -258,7 +258,7 @@ def reduce_feature(df: pd.DataFrame,
|
|
|
258
258
|
|
|
259
259
|
return embedding_df
|
|
260
260
|
|
|
261
|
-
# %% ../nbs/01_feature.ipynb
|
|
261
|
+
# %% ../nbs/01_feature.ipynb 30
|
|
262
262
|
def remove_hi_corr(df: pd.DataFrame,
|
|
263
263
|
thr: float=0.98 # threshold
|
|
264
264
|
):
|
|
@@ -278,7 +278,7 @@ def remove_hi_corr(df: pd.DataFrame,
|
|
|
278
278
|
|
|
279
279
|
return df
|
|
280
280
|
|
|
281
|
-
# %% ../nbs/01_feature.ipynb
|
|
281
|
+
# %% ../nbs/01_feature.ipynb 34
|
|
282
282
|
def preprocess(df: pd.DataFrame,
|
|
283
283
|
thr: float=0.98):
|
|
284
284
|
|
|
@@ -7,7 +7,7 @@ __all__ = ['set_sns', 'get_color_dict', 'logo_func', 'get_logo', 'get_logo2', 'p
|
|
|
7
7
|
'plot_cluster', 'plot_bokeh', 'plot_count', 'plot_bar', 'plot_group_bar', 'plot_box', 'plot_corr',
|
|
8
8
|
'draw_corr', 'get_AUCDF', 'plot_confusion_matrix']
|
|
9
9
|
|
|
10
|
-
# %% ../nbs/02_plot.ipynb
|
|
10
|
+
# %% ../nbs/02_plot.ipynb 5
|
|
11
11
|
import joblib,logomaker
|
|
12
12
|
import fastcore.all as fc, pandas as pd, numpy as np, seaborn as sns
|
|
13
13
|
from adjustText import adjust_text
|
|
@@ -32,14 +32,14 @@ from bokeh.layouts import column
|
|
|
32
32
|
from bokeh.palettes import Category20_20
|
|
33
33
|
from itertools import cycle
|
|
34
34
|
|
|
35
|
-
# %% ../nbs/02_plot.ipynb
|
|
35
|
+
# %% ../nbs/02_plot.ipynb 7
|
|
36
36
|
def set_sns():
|
|
37
37
|
"Set seaborn resolution for notebook display"
|
|
38
38
|
sns.set(rc={"figure.dpi":300, 'savefig.dpi':300})
|
|
39
39
|
sns.set_context('notebook')
|
|
40
40
|
sns.set_style("ticks")
|
|
41
41
|
|
|
42
|
-
# %% ../nbs/02_plot.ipynb
|
|
42
|
+
# %% ../nbs/02_plot.ipynb 8
|
|
43
43
|
def get_color_dict(categories, # list of names to assign color
|
|
44
44
|
palette: str='tab20', # choose from sns.color_palette
|
|
45
45
|
):
|
|
@@ -49,7 +49,7 @@ def get_color_dict(categories, # list of names to assign color
|
|
|
49
49
|
color_map = {category: next(color_cycle) for category in categories}
|
|
50
50
|
return color_map
|
|
51
51
|
|
|
52
|
-
# %% ../nbs/02_plot.ipynb
|
|
52
|
+
# %% ../nbs/02_plot.ipynb 12
|
|
53
53
|
def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino acid at each position
|
|
54
54
|
title: str='logo', # title of the motif logo
|
|
55
55
|
):
|
|
@@ -81,7 +81,7 @@ def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino
|
|
|
81
81
|
logo.ax.set_yticks([])
|
|
82
82
|
logo.ax.set_title(title)
|
|
83
83
|
|
|
84
|
-
# %% ../nbs/02_plot.ipynb
|
|
84
|
+
# %% ../nbs/02_plot.ipynb 13
|
|
85
85
|
def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substrates as columns
|
|
86
86
|
kinase: str, # a specific kinase name in index
|
|
87
87
|
):
|
|
@@ -120,7 +120,7 @@ def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substra
|
|
|
120
120
|
# plot logo
|
|
121
121
|
logo_func(ratio2, kinase)
|
|
122
122
|
|
|
123
|
-
# %% ../nbs/02_plot.ipynb
|
|
123
|
+
# %% ../nbs/02_plot.ipynb 17
|
|
124
124
|
def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of a kinase, with index as amino acid, and columns as positions
|
|
125
125
|
title: str = 'logo', # title of the graph
|
|
126
126
|
):
|
|
@@ -159,7 +159,7 @@ def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of
|
|
|
159
159
|
|
|
160
160
|
logo_func(ratio2,title)
|
|
161
161
|
|
|
162
|
-
# %% ../nbs/02_plot.ipynb
|
|
162
|
+
# %% ../nbs/02_plot.ipynb 20
|
|
163
163
|
@fc.delegates(sns.scatterplot)
|
|
164
164
|
def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
|
|
165
165
|
x: str, # column name for x axis
|
|
@@ -203,7 +203,7 @@ def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
|
|
|
203
203
|
|
|
204
204
|
plt.tight_layout()
|
|
205
205
|
|
|
206
|
-
# %% ../nbs/02_plot.ipynb
|
|
206
|
+
# %% ../nbs/02_plot.ipynb 24
|
|
207
207
|
@fc.delegates(sns.histplot)
|
|
208
208
|
def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
|
|
209
209
|
x: str, # column name of values
|
|
@@ -220,7 +220,7 @@ def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
|
|
|
220
220
|
plt.figure(figsize=figsize)
|
|
221
221
|
sns.histplot(data=df,x=x,**hist_params,**kwargs)
|
|
222
222
|
|
|
223
|
-
# %% ../nbs/02_plot.ipynb
|
|
223
|
+
# %% ../nbs/02_plot.ipynb 28
|
|
224
224
|
@fc.delegates(sns.heatmap)
|
|
225
225
|
def plot_heatmap(matrix, # a matrix of values
|
|
226
226
|
title: str='heatmap', # title of the heatmap
|
|
@@ -235,7 +235,7 @@ def plot_heatmap(matrix, # a matrix of values
|
|
|
235
235
|
sns.heatmap(matrix, cmap=cmap, annot=False,**kwargs)
|
|
236
236
|
plt.title(title)
|
|
237
237
|
|
|
238
|
-
# %% ../nbs/02_plot.ipynb
|
|
238
|
+
# %% ../nbs/02_plot.ipynb 32
|
|
239
239
|
@fc.delegates(sns.scatterplot)
|
|
240
240
|
def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and second column to be y
|
|
241
241
|
**kwargs, # arguments for sns.scatterplot
|
|
@@ -244,7 +244,7 @@ def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and se
|
|
|
244
244
|
plt.figure(figsize=(7,7))
|
|
245
245
|
sns.scatterplot(data = X,x=X.columns[0],y=X.columns[1],alpha=0.7,**kwargs)
|
|
246
246
|
|
|
247
|
-
# %% ../nbs/02_plot.ipynb
|
|
247
|
+
# %% ../nbs/02_plot.ipynb 34
|
|
248
248
|
def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for dimensionality reduction
|
|
249
249
|
method: str='pca', # dimensionality reduction method, choose from pca, umap, and tsne
|
|
250
250
|
hue: str=None, # colname of color
|
|
@@ -269,7 +269,7 @@ def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for di
|
|
|
269
269
|
texts = [plt.text(embedding_df[x_col][i], embedding_df[y_col][i], name_list[i],fontsize=8) for i in range(len(embedding_df))]
|
|
270
270
|
adjust_text(texts, arrowprops=dict(arrowstyle='-', color='black'))
|
|
271
271
|
|
|
272
|
-
# %% ../nbs/02_plot.ipynb
|
|
272
|
+
# %% ../nbs/02_plot.ipynb 38
|
|
273
273
|
def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality reduction
|
|
274
274
|
idx, # pd.Series or list that indicates identities for searching box
|
|
275
275
|
hue:None, # pd.Series or list that indicates category for each sample
|
|
@@ -367,7 +367,7 @@ def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality
|
|
|
367
367
|
layout = column(autocomplete, p)
|
|
368
368
|
show(layout)
|
|
369
369
|
|
|
370
|
-
# %% ../nbs/02_plot.ipynb
|
|
370
|
+
# %% ../nbs/02_plot.ipynb 41
|
|
371
371
|
def plot_count(cnt, # from df['x'].value_counts()
|
|
372
372
|
tick_spacing: float= None, # tick spacing for x axis
|
|
373
373
|
palette: str='tab20'):
|
|
@@ -383,7 +383,7 @@ def plot_count(cnt, # from df['x'].value_counts()
|
|
|
383
383
|
if tick_spacing is not None:
|
|
384
384
|
ax.xaxis.set_major_locator(MultipleLocator(tick_spacing))
|
|
385
385
|
|
|
386
|
-
# %% ../nbs/02_plot.ipynb
|
|
386
|
+
# %% ../nbs/02_plot.ipynb 43
|
|
387
387
|
@fc.delegates(sns.barplot)
|
|
388
388
|
def plot_bar(df,
|
|
389
389
|
value, # colname of value
|
|
@@ -438,7 +438,7 @@ def plot_bar(df,
|
|
|
438
438
|
|
|
439
439
|
plt.gca().spines[['right', 'top']].set_visible(False)
|
|
440
440
|
|
|
441
|
-
# %% ../nbs/02_plot.ipynb
|
|
441
|
+
# %% ../nbs/02_plot.ipynb 46
|
|
442
442
|
@fc.delegates(sns.barplot)
|
|
443
443
|
def plot_group_bar(df,
|
|
444
444
|
value_cols, # list of column names for values, the order depends on the first item
|
|
@@ -481,7 +481,7 @@ def plot_group_bar(df,
|
|
|
481
481
|
plt.gca().spines[['right', 'top']].set_visible(False)
|
|
482
482
|
plt.legend(fontsize=fontsize) # if change legend location, use loc='upper right'
|
|
483
483
|
|
|
484
|
-
# %% ../nbs/02_plot.ipynb
|
|
484
|
+
# %% ../nbs/02_plot.ipynb 49
|
|
485
485
|
@fc.delegates(sns.boxplot)
|
|
486
486
|
def plot_box(df,
|
|
487
487
|
value, # colname of value
|
|
@@ -523,7 +523,7 @@ def plot_box(df,
|
|
|
523
523
|
# plt.gca().spines[['right', 'top']].set_visible(False)
|
|
524
524
|
|
|
525
525
|
|
|
526
|
-
# %% ../nbs/02_plot.ipynb
|
|
526
|
+
# %% ../nbs/02_plot.ipynb 52
|
|
527
527
|
@fc.delegates(sns.regplot)
|
|
528
528
|
def plot_corr(x, # x axis values, or colname of x axis
|
|
529
529
|
y, # y axis values, or colname of y axis
|
|
@@ -560,7 +560,7 @@ def plot_corr(x, # x axis values, or colname of x axis
|
|
|
560
560
|
transform=plt.gca().transAxes,
|
|
561
561
|
ha='center', va='center')
|
|
562
562
|
|
|
563
|
-
# %% ../nbs/02_plot.ipynb
|
|
563
|
+
# %% ../nbs/02_plot.ipynb 56
|
|
564
564
|
def draw_corr(corr):
|
|
565
565
|
|
|
566
566
|
"plot heatmap from df.corr()"
|
|
@@ -572,7 +572,7 @@ def draw_corr(corr):
|
|
|
572
572
|
plt.figure(figsize=(20, 16)) # Set the figure size
|
|
573
573
|
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, mask=mask, fmt='.2f')
|
|
574
574
|
|
|
575
|
-
# %% ../nbs/02_plot.ipynb
|
|
575
|
+
# %% ../nbs/02_plot.ipynb 60
|
|
576
576
|
def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
|
|
577
577
|
|
|
578
578
|
"Plot CDF curve and get relative area under the curve"
|
|
@@ -637,7 +637,7 @@ def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
|
|
|
637
637
|
|
|
638
638
|
return AUCDF
|
|
639
639
|
|
|
640
|
-
# %% ../nbs/02_plot.ipynb
|
|
640
|
+
# %% ../nbs/02_plot.ipynb 63
|
|
641
641
|
def plot_confusion_matrix(target, # pd.Series
|
|
642
642
|
pred, # pd.Series
|
|
643
643
|
class_names:list=['0','1'],
|
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
# %% auto 0
|
|
6
6
|
__all__ = ['get_splits', 'split_data', 'score_each', 'train_ml', 'train_ml_cv', 'predict_ml']
|
|
7
7
|
|
|
8
|
-
# %% ../nbs/03_ML.ipynb
|
|
8
|
+
# %% ../nbs/03_ML.ipynb 5
|
|
9
9
|
# katlas
|
|
10
10
|
from .core import Data
|
|
11
11
|
from .feature import *
|
|
@@ -29,7 +29,7 @@ from sklearn.ensemble import *
|
|
|
29
29
|
from sklearn import set_config
|
|
30
30
|
set_config(transform_output="pandas")
|
|
31
31
|
|
|
32
|
-
# %% ../nbs/03_ML.ipynb
|
|
32
|
+
# %% ../nbs/03_ML.ipynb 8
|
|
33
33
|
def get_splits(df: pd.DataFrame, # df contains info for split
|
|
34
34
|
stratified: str=None, # colname to make stratified kfold; sampling from different groups
|
|
35
35
|
group: str=None, # colname to make group kfold; test and train are from different groups
|
|
@@ -79,7 +79,7 @@ def get_splits(df: pd.DataFrame, # df contains info for split
|
|
|
79
79
|
|
|
80
80
|
return splits
|
|
81
81
|
|
|
82
|
-
# %% ../nbs/03_ML.ipynb
|
|
82
|
+
# %% ../nbs/03_ML.ipynb 13
|
|
83
83
|
def split_data(df: pd.DataFrame, # dataframe of values
|
|
84
84
|
feat_col: list, # feature columns
|
|
85
85
|
target_col: list, # target columns
|
|
@@ -95,7 +95,7 @@ def split_data(df: pd.DataFrame, # dataframe of values
|
|
|
95
95
|
|
|
96
96
|
return X_train, y_train, X_test, y_test
|
|
97
97
|
|
|
98
|
-
# %% ../nbs/03_ML.ipynb
|
|
98
|
+
# %% ../nbs/03_ML.ipynb 17
|
|
99
99
|
def score_each(target: pd.DataFrame, # target dataframe
|
|
100
100
|
pred: pd.DataFrame, # predicted dataframe
|
|
101
101
|
absolute = True, # if absolute, take average with absolute values for pearson/spearman
|
|
@@ -134,7 +134,7 @@ def score_each(target: pd.DataFrame, # target dataframe
|
|
|
134
134
|
|
|
135
135
|
return mse,pearson_mean, metrics_df
|
|
136
136
|
|
|
137
|
-
# %% ../nbs/03_ML.ipynb
|
|
137
|
+
# %% ../nbs/03_ML.ipynb 22
|
|
138
138
|
def train_ml(df, # dataframe of values
|
|
139
139
|
feat_col, # feature columns
|
|
140
140
|
target_col, # target columns
|
|
@@ -169,7 +169,7 @@ def train_ml(df, # dataframe of values
|
|
|
169
169
|
|
|
170
170
|
return y_test, y_pred
|
|
171
171
|
|
|
172
|
-
# %% ../nbs/03_ML.ipynb
|
|
172
|
+
# %% ../nbs/03_ML.ipynb 25
|
|
173
173
|
def train_ml_cv( df, # dataframe of values
|
|
174
174
|
feat_col, # feature columns
|
|
175
175
|
target_col, # target columns
|
|
@@ -213,7 +213,7 @@ def train_ml_cv( df, # dataframe of values
|
|
|
213
213
|
|
|
214
214
|
return oof, metrics
|
|
215
215
|
|
|
216
|
-
# %% ../nbs/03_ML.ipynb
|
|
216
|
+
# %% ../nbs/03_ML.ipynb 32
|
|
217
217
|
def predict_ml(df, # Dataframe that contains features
|
|
218
218
|
feat_col, # feature columns
|
|
219
219
|
target_col=None,
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: python-katlas
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.1
|
|
4
4
|
Summary: tools for predicting kinome specificities
|
|
5
5
|
Home-page: https://github.com/sky1ove/python-katlas
|
|
6
6
|
Author: lily
|
|
@@ -60,6 +60,13 @@ helpful to your research.
|
|
|
60
60
|
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
|
|
61
61
|
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
|
|
62
62
|
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
## Web applications
|
|
66
|
+
|
|
67
|
+
Users can now run the analysis directly on the web without needing to code.
|
|
68
|
+
|
|
69
|
+
Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
|
|
63
70
|
|
|
64
71
|
## Tutorials on Colab
|
|
65
72
|
|
|
@@ -67,20 +74,16 @@ helpful to your research.
|
|
|
67
74
|
sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
|
|
68
75
|
- 2. [High throughput substrate scoring on phosphoproteomics
|
|
69
76
|
dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
|
|
70
|
-
- 3. [
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
- 4. [Kinase enrichment analysis for AKT
|
|
74
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
|
|
75
|
-
/ [Kinase enrichment analysis for EGFR
|
|
76
|
-
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
|
|
77
|
+
- 3. [Kinase enrichment analysis for AKT
|
|
78
|
+
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
|
|
79
|
+
|
|
77
80
|
|
|
78
81
|
## Install
|
|
79
82
|
|
|
80
|
-
Install the latest version through
|
|
83
|
+
Install the latest version through pip
|
|
81
84
|
|
|
82
85
|
``` python
|
|
83
|
-
|
|
86
|
+
pip install python-katlas -Uq
|
|
84
87
|
```
|
|
85
88
|
|
|
86
89
|
## Import
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
__version__ = "0.0.9"
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|