python-katlas 0.1.0__tar.gz → 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. {python-katlas-0.1.0/python_katlas.egg-info → python-katlas-0.1.1}/PKG-INFO +13 -10
  2. {python-katlas-0.1.0 → python-katlas-0.1.1}/README.md +12 -9
  3. python-katlas-0.1.1/katlas/__init__.py +1 -0
  4. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/_modidx.py +1 -2
  5. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/core.py +43 -51
  6. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/dl.py +14 -14
  7. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/feature.py +9 -9
  8. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/plot.py +20 -20
  9. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/train.py +7 -7
  10. {python-katlas-0.1.0 → python-katlas-0.1.1/python_katlas.egg-info}/PKG-INFO +13 -10
  11. {python-katlas-0.1.0 → python-katlas-0.1.1}/settings.ini +1 -1
  12. python-katlas-0.1.0/katlas/__init__.py +0 -1
  13. {python-katlas-0.1.0 → python-katlas-0.1.1}/LICENSE +0 -0
  14. {python-katlas-0.1.0 → python-katlas-0.1.1}/MANIFEST.in +0 -0
  15. {python-katlas-0.1.0 → python-katlas-0.1.1}/katlas/imports.py +0 -0
  16. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/SOURCES.txt +0 -0
  17. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/dependency_links.txt +0 -0
  18. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/entry_points.txt +0 -0
  19. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/not-zip-safe +0 -0
  20. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/requires.txt +0 -0
  21. {python-katlas-0.1.0 → python-katlas-0.1.1}/python_katlas.egg-info/top_level.txt +0 -0
  22. {python-katlas-0.1.0 → python-katlas-0.1.1}/setup.cfg +0 -0
  23. {python-katlas-0.1.0 → python-katlas-0.1.1}/setup.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: python-katlas
3
- Version: 0.1.0
3
+ Version: 0.1.1
4
4
  Summary: tools for predicting kinome specificities
5
5
  Home-page: https://github.com/sky1ove/python-katlas
6
6
  Author: lily
@@ -60,6 +60,13 @@ helpful to your research.
60
60
  phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
61
61
  and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
62
62
  [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
63
+
64
+
65
+ ## Web applications
66
+
67
+ Users can now run the analysis directly on the web without needing to code.
68
+
69
+ Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
63
70
 
64
71
  ## Tutorials on Colab
65
72
 
@@ -67,20 +74,16 @@ helpful to your research.
67
74
  sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
68
75
  - 2. [High throughput substrate scoring on phosphoproteomics
69
76
  dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
70
- - 3. [Query a protein’s phosphorylation sites and predict their
71
- upstream
72
- kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
73
- - 4. [Kinase enrichment analysis for AKT
74
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
75
- / [Kinase enrichment analysis for EGFR
76
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
77
+ - 3. [Kinase enrichment analysis for AKT
78
+ inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
79
+
77
80
 
78
81
  ## Install
79
82
 
80
- Install the latest version through git
83
+ Install the latest version through pip
81
84
 
82
85
  ``` python
83
- !pip install git+https://github.com/sky1ove/katlas.git -Uqq
86
+ pip install python-katlas -Uq
84
87
  ```
85
88
 
86
89
  ## Import
@@ -38,6 +38,13 @@ helpful to your research.
38
38
  phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
39
39
  and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
40
40
  [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
41
+
42
+
43
+ ## Web applications
44
+
45
+ Users can now run the analysis directly on the web without needing to code.
46
+
47
+ Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
41
48
 
42
49
  ## Tutorials on Colab
43
50
 
@@ -45,20 +52,16 @@ helpful to your research.
45
52
  sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
46
53
  - 2. [High throughput substrate scoring on phosphoproteomics
47
54
  dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
48
- - 3. [Query a protein’s phosphorylation sites and predict their
49
- upstream
50
- kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
51
- - 4. [Kinase enrichment analysis for AKT
52
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
53
- / [Kinase enrichment analysis for EGFR
54
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
55
+ - 3. [Kinase enrichment analysis for AKT
56
+ inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
57
+
55
58
 
56
59
  ## Install
57
60
 
58
- Install the latest version through git
61
+ Install the latest version through pip
59
62
 
60
63
  ``` python
61
- !pip install git+https://github.com/sky1ove/katlas.git -Uqq
64
+ pip install python-katlas -Uq
62
65
  ```
63
66
 
64
67
  ## Import
@@ -0,0 +1 @@
1
+ __version__ = "0.1.0"
@@ -46,13 +46,12 @@ d = { 'settings': { 'branch': 'main',
46
46
  'katlas.core.get_one_kinase': ('core.html#get_one_kinase', 'katlas/core.py'),
47
47
  'katlas.core.get_pct': ('core.html#get_pct', 'katlas/core.py'),
48
48
  'katlas.core.get_pct_df': ('core.html#get_pct_df', 'katlas/core.py'),
49
- 'katlas.core.get_ttest': ('core.html#get_ttest', 'katlas/core.py'),
49
+ 'katlas.core.get_pvalue': ('core.html#get_pvalue', 'katlas/core.py'),
50
50
  'katlas.core.get_unique_site': ('core.html#get_unique_site', 'katlas/core.py'),
51
51
  'katlas.core.multiply': ('core.html#multiply', 'katlas/core.py'),
52
52
  'katlas.core.multiply_func': ('core.html#multiply_func', 'katlas/core.py'),
53
53
  'katlas.core.predict_kinase': ('core.html#predict_kinase', 'katlas/core.py'),
54
54
  'katlas.core.predict_kinase_df': ('core.html#predict_kinase_df', 'katlas/core.py'),
55
- 'katlas.core.query_gene': ('core.html#query_gene', 'katlas/core.py'),
56
55
  'katlas.core.raw2norm': ('core.html#raw2norm', 'katlas/core.py'),
57
56
  'katlas.core.sumup': ('core.html#sumup', 'katlas/core.py')},
58
57
  'katlas.dl': { 'katlas.dl.CNN1D_1': ('dl.html#cnn1d_1', 'katlas/dl.py'),
@@ -6,7 +6,7 @@
6
6
  __all__ = ['param_PSPA_st', 'param_PSPA_y', 'param_PSPA', 'param_CDDM', 'param_CDDM_upper', 'Data', 'CPTAC', 'convert_string',
7
7
  'checker', 'STY2sty', 'cut_seq', 'get_dict', 'multiply_func', 'multiply', 'sumup', 'predict_kinase',
8
8
  'predict_kinase_df', 'get_pct', 'get_pct_df', 'get_unique_site', 'extract_site_seq', 'get_freq',
9
- 'query_gene', 'get_ttest', 'get_metaP', 'raw2norm', 'get_one_kinase']
9
+ 'get_pvalue', 'get_metaP', 'raw2norm', 'get_one_kinase']
10
10
 
11
11
  # %% ../nbs/00_core.ipynb 4
12
12
  import math, pandas as pd, numpy as np
@@ -14,7 +14,7 @@ from tqdm import tqdm
14
14
  from scipy.stats import chi2
15
15
  from typing import Callable
16
16
  from functools import partial
17
- from scipy.stats import ttest_ind
17
+ from scipy.stats import ttest_ind, mannwhitneyu, wilcoxon
18
18
  from statsmodels.stats.multitest import multipletests
19
19
 
20
20
  # %% ../nbs/00_core.ipynb 7
@@ -550,7 +550,7 @@ def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
550
550
  # Return results as a DataFrame
551
551
  return out
552
552
 
553
- # %% ../nbs/00_core.ipynb 56
553
+ # %% ../nbs/00_core.ipynb 53
554
554
  def get_pct(site,ref,func,pct_ref):
555
555
 
556
556
  "Replicate the precentile results from The Kinase Library."
@@ -575,7 +575,7 @@ def get_pct(site,ref,func,pct_ref):
575
575
  final.columns=['log2(score)','percentile']
576
576
  return final
577
577
 
578
- # %% ../nbs/00_core.ipynb 62
578
+ # %% ../nbs/00_core.ipynb 59
579
579
  def get_pct_df(score_df, # output from predict_kinase_df
580
580
  pct_ref, # a reference df for percentile calculation
581
581
  ):
@@ -600,7 +600,7 @@ def get_pct_df(score_df, # output from predict_kinase_df
600
600
 
601
601
  return percentiles_df
602
602
 
603
- # %% ../nbs/00_core.ipynb 67
603
+ # %% ../nbs/00_core.ipynb 64
604
604
  def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphorylation sites
605
605
  seq_col: str='site_seq', # column name of site sequence
606
606
  id_col: str='gene_site' # column name of site id
@@ -616,7 +616,7 @@ def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphoryla
616
616
 
617
617
  return unique
618
618
 
619
- # %% ../nbs/00_core.ipynb 70
619
+ # %% ../nbs/00_core.ipynb 67
620
620
  def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequence
621
621
  seq_col: str, # column name of protein sequence
622
622
  position_col: str # column name of position 0
@@ -642,7 +642,7 @@ def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequenc
642
642
 
643
643
  return np.array(data)
644
644
 
645
- # %% ../nbs/00_core.ipynb 75
645
+ # %% ../nbs/00_core.ipynb 72
646
646
  def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains phosphorylation sequence splitted by their position
647
647
  aa_order = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the full matrix
648
648
  aa_order_paper = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the partial matrix
@@ -683,35 +683,16 @@ def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains
683
683
 
684
684
  return paper,full
685
685
 
686
- # %% ../nbs/00_core.ipynb 79
687
- def query_gene(df,gene):
688
-
689
- "Query gene in the phosphoproteomics dataset"
690
-
691
- # query gene in the dataframe
692
- df_gene = df[df.gene_site.str.contains(f'{gene}_')]
693
-
694
- # sort dataframe based on position
695
- sort_position = df_gene.gene_site.str.split('_').str[-1].str[1:].astype(int).sort_values().index
696
- df_gene = df_gene.loc[sort_position]
697
-
698
- return df_gene
699
-
700
- # %% ../nbs/00_core.ipynb 83
701
- def get_ttest(df,
686
+ # %% ../nbs/00_core.ipynb 76
687
+ def get_pvalue(df,
702
688
  columns1, # list of column names for group1
703
689
  columns2, # list of column names for group2
690
+ test_method = 'mann_whitney', # 'student_t', 'mann_whitney', 'wilcoxon'
704
691
  FC_method = 'median', # or mean
705
- alpha=0.05, # significance level in multipletests for p_adj
706
- correction_method='fdr_bh', # method in multipletests for p_adj
707
692
  ):
708
- """
709
- Performs t-tests and calculates log2 fold change between two groups of columns in a DataFrame.
710
- NaN p-values are excluded from the multiple testing correction.
711
693
 
712
- Returns:
713
- DataFrame: Results including log2FC, p-values, adjusted p-values, significance, signed log10 P value, and signed log10 Padj
714
- """
694
+ "Performs statistical tests and calculates difference between the median or mean of two groups of columns."
695
+
715
696
  group1 = df[columns1]
716
697
  group2 = df[columns2]
717
698
 
@@ -726,24 +707,36 @@ def get_ttest(df,
726
707
  # As phosphoproteomics data has already been log transformed, we can directly use subtraction
727
708
  FCs = m2 - m1
728
709
 
729
- # Perform t-tests and handle NaN p-values
730
- t_results = [ttest_ind(group1.loc[idx], group2.loc[idx], nan_policy='omit') for idx in tqdm(df.index, desc="Computing t-tests")]
710
+ # Perform the chosen test and handle NaN p-values
711
+ if test_method == 'student_t': # data is normally distributed, non-paired
712
+ test_func = ttest_ind
713
+ elif test_method == 'mann_whitney': # not normally distributed, non-paired, mann_whitney considers the rank, ignore the differences
714
+ test_func = mannwhitneyu
715
+ elif test_method == 'wilcoxon': # not normally distributed, paired
716
+ test_func = wilcoxon
717
+
718
+ t_results = []
719
+ for idx in tqdm(df.index, desc=f"Computing {test_method} tests"):
720
+ try:
721
+ if test_method == 'wilcoxon': # as wilcoxon is paired, if lack a paired sample, just give nan, as default nanpolicy is propagate (gives nan if nan in input)
722
+ stat, pvalue = test_func(group1.loc[idx], group2.loc[idx])
723
+ else:
724
+ stat, pvalue = test_func(group1.loc[idx], group2.loc[idx], nan_policy='omit')
725
+ except ValueError: # Handle cases with insufficient data
726
+ pvalue = np.nan
727
+ t_results.append(pvalue)
731
728
 
732
729
  # Exclude NaN p-values before multiple testing correction
733
- p_values = [result.pvalue if result.pvalue is not np.nan else np.nan for result in t_results]
734
- valid_p_values = np.array(p_values, dtype=float) # Ensure the correct data type
730
+ p_values = np.array(t_results, dtype=float) # Ensure the correct data type
731
+ valid_p_values = p_values[~np.isnan(p_values)]
735
732
 
736
- # valid_p_values = np.array(p_values)
737
- valid_p_values = valid_p_values[~np.isnan(valid_p_values)]
738
-
739
733
  # Adjust for multiple testing on valid p-values only
740
- reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=alpha, method=correction_method)
741
-
734
+ reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=0.05, method='fdr_bh')
735
+
742
736
  # Create a full list of corrected p-values including NaNs
743
- full_pvals_corrected = np.empty_like(p_values)
744
- full_pvals_corrected[:] = np.nan
737
+ full_pvals_corrected = np.full_like(p_values, np.nan)
745
738
  np.place(full_pvals_corrected, ~np.isnan(p_values), pvals_corrected)
746
-
739
+
747
740
  # Adjust the significance accordingly
748
741
  full_reject = np.zeros_like(p_values, dtype=bool)
749
742
  np.place(full_reject, ~np.isnan(p_values), reject)
@@ -752,22 +745,21 @@ def get_ttest(df,
752
745
  results = pd.DataFrame({
753
746
  'log2FC': FCs,
754
747
  'p_value': p_values,
755
- 'p_adj': full_pvals_corrected,
756
- 'significant': full_reject
748
+ 'p_adj': full_pvals_corrected
757
749
  })
758
750
 
759
751
  results['p_value'] = results['p_value'].astype(float)
760
-
752
+
761
753
  def get_signed_logP(r,p_col):
762
754
  log10 = -np.log10(r[p_col])
763
755
  return -log10 if r['log2FC']<0 else log10
764
-
756
+
765
757
  results['signed_logP'] = results.apply(partial(get_signed_logP,p_col='p_value'),axis=1)
766
758
  results['signed_logPadj'] = results.apply(partial(get_signed_logP,p_col='p_adj'),axis=1)
767
-
759
+
768
760
  return results
769
761
 
770
- # %% ../nbs/00_core.ipynb 84
762
+ # %% ../nbs/00_core.ipynb 77
771
763
  def get_metaP(p_values):
772
764
 
773
765
  "Use Fisher's method to calculate a combined p value given a list of p values; this function also allows negative p values (negative correlation)"
@@ -779,7 +771,7 @@ def get_metaP(p_values):
779
771
 
780
772
  return score
781
773
 
782
- # %% ../nbs/00_core.ipynb 87
774
+ # %% ../nbs/00_core.ipynb 80
783
775
  def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and single amino acid as columns
784
776
  PDHK: bool=False, # whether this kinase belongs to PDHK family
785
777
  ):
@@ -802,7 +794,7 @@ def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and s
802
794
 
803
795
  return df2
804
796
 
805
- # %% ../nbs/00_core.ipynb 89
797
+ # %% ../nbs/00_core.ipynb 82
806
798
  def get_one_kinase(df: pd.DataFrame, #stacked dataframe (paper's raw data)
807
799
  kinase:str, # a specific kinase
808
800
  normalize: bool=False, # normalize according to the paper; special for PDHK1/4
@@ -6,7 +6,7 @@
6
6
  __all__ = ['def_device', 'seed_everything', 'GeneralDataset', 'get_sampler', 'MLP_1', 'CNN1D_1', 'init_weights', 'lin_wn',
7
7
  'conv_wn', 'CNN1D_2', 'train_dl', 'train_dl_cv', 'predict_dl']
8
8
 
9
- # %% ../nbs/04_DL.ipynb 4
9
+ # %% ../nbs/04_DL.ipynb 5
10
10
  from fastbook import *
11
11
  import fastcore.all as fc,torch.nn.init as init
12
12
  from fastai.callback.training import GradientClip
@@ -22,7 +22,7 @@ from sklearn.model_selection import *
22
22
  from sklearn.metrics import mean_squared_error
23
23
  from scipy.stats import spearmanr,pearsonr
24
24
 
25
- # %% ../nbs/04_DL.ipynb 6
25
+ # %% ../nbs/04_DL.ipynb 7
26
26
  def seed_everything(seed=123):
27
27
  random.seed(seed)
28
28
  os.environ['PYTHONHASHSEED'] = str(seed)
@@ -32,10 +32,10 @@ def seed_everything(seed=123):
32
32
  torch.backends.cudnn.deterministic = True
33
33
  torch.backends.cudnn.benchmark = False
34
34
 
35
- # %% ../nbs/04_DL.ipynb 8
35
+ # %% ../nbs/04_DL.ipynb 9
36
36
  def_device = 'mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu'
37
37
 
38
- # %% ../nbs/04_DL.ipynb 13
38
+ # %% ../nbs/04_DL.ipynb 14
39
39
  class GeneralDataset:
40
40
  def __init__(self,
41
41
  df, # a dataframe of values
@@ -62,7 +62,7 @@ class GeneralDataset:
62
62
  y = torch.Tensor(self.y[index])
63
63
  return X, y
64
64
 
65
- # %% ../nbs/04_DL.ipynb 17
65
+ # %% ../nbs/04_DL.ipynb 18
66
66
  def get_sampler(info,col):
67
67
 
68
68
  "For imbalanced data, get higher weights for less-represented samples"
@@ -82,7 +82,7 @@ def get_sampler(info,col):
82
82
 
83
83
  return sampler
84
84
 
85
- # %% ../nbs/04_DL.ipynb 23
85
+ # %% ../nbs/04_DL.ipynb 24
86
86
  def MLP_1(num_features,
87
87
  num_targets,
88
88
  hidden_units = [512, 218],
@@ -112,7 +112,7 @@ def MLP_1(num_features,
112
112
 
113
113
  return model
114
114
 
115
- # %% ../nbs/04_DL.ipynb 29
115
+ # %% ../nbs/04_DL.ipynb 30
116
116
  class CNN1D_1(Module):
117
117
 
118
118
  def __init__(self,
@@ -137,12 +137,12 @@ class CNN1D_1(Module):
137
137
  x = self.fc2(x)
138
138
  return x
139
139
 
140
- # %% ../nbs/04_DL.ipynb 33
140
+ # %% ../nbs/04_DL.ipynb 34
141
141
  def init_weights(m, leaky=0.):
142
142
  "Initiate any Conv layer with Kaiming norm."
143
143
  if isinstance(m, (nn.Conv1d,nn.Conv2d,nn.Conv3d)): init.kaiming_normal_(m.weight, a=leaky)
144
144
 
145
- # %% ../nbs/04_DL.ipynb 34
145
+ # %% ../nbs/04_DL.ipynb 35
146
146
  def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
147
147
  "Weight norm of linear."
148
148
  layers = nn.Sequential(
@@ -152,7 +152,7 @@ def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
152
152
  if act: layers.append(act())
153
153
  return layers
154
154
 
155
- # %% ../nbs/04_DL.ipynb 35
155
+ # %% ../nbs/04_DL.ipynb 36
156
156
  def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
157
157
  "Weight norm of conv."
158
158
  layers = nn.Sequential(
@@ -162,7 +162,7 @@ def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
162
162
  if act: layers.append(act())
163
163
  return layers
164
164
 
165
- # %% ../nbs/04_DL.ipynb 36
165
+ # %% ../nbs/04_DL.ipynb 37
166
166
  class CNN1D_2(nn.Module):
167
167
 
168
168
  def __init__(self, ni, nf, amp_scale = 16):
@@ -212,7 +212,7 @@ class CNN1D_2(nn.Module):
212
212
 
213
213
  return x
214
214
 
215
- # %% ../nbs/04_DL.ipynb 40
215
+ # %% ../nbs/04_DL.ipynb 41
216
216
  def train_dl(df,
217
217
  feat_col,
218
218
  target_col,
@@ -275,7 +275,7 @@ def train_dl(df,
275
275
 
276
276
  return target, pred
277
277
 
278
- # %% ../nbs/04_DL.ipynb 45
278
+ # %% ../nbs/04_DL.ipynb 46
279
279
  @fc.delegates(train_dl)
280
280
  def train_dl_cv(df,
281
281
  feat_col,
@@ -325,7 +325,7 @@ def train_dl_cv(df,
325
325
 
326
326
  return oof, metrics
327
327
 
328
- # %% ../nbs/04_DL.ipynb 53
328
+ # %% ../nbs/04_DL.ipynb 54
329
329
  def predict_dl(df,
330
330
  feat_col,
331
331
  target_col,
@@ -5,7 +5,7 @@
5
5
  # %% auto 0
6
6
  __all__ = ['get_rdkit', 'get_morgan', 'get_esm', 'get_t5', 'get_t5_bfd', 'reduce_feature', 'remove_hi_corr', 'preprocess']
7
7
 
8
- # %% ../nbs/01_feature.ipynb 4
8
+ # %% ../nbs/01_feature.ipynb 5
9
9
  from fastbook import *
10
10
  import torch,re,joblib,gc,esm
11
11
  from tqdm.notebook import tqdm; tqdm.pandas()
@@ -30,7 +30,7 @@ from umap.umap_ import UMAP
30
30
 
31
31
  set_config(transform_output="pandas")
32
32
 
33
- # %% ../nbs/01_feature.ipynb 7
33
+ # %% ../nbs/01_feature.ipynb 8
34
34
  def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
35
35
  col:str = "SMILES", # colname of smile
36
36
  normalize: bool = True, # normalize features using StandardScaler()
@@ -49,7 +49,7 @@ def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
49
49
  # feature_df = feature_df.reset_index()
50
50
  return feature_df
51
51
 
52
- # %% ../nbs/01_feature.ipynb 11
52
+ # %% ../nbs/01_feature.ipynb 12
53
53
  def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
54
54
  col: str = "SMILES", # colname of smile
55
55
  radius=3
@@ -61,7 +61,7 @@ def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
61
61
  fp_df.columns = "morgan_" + fp_df.columns.astype(str)
62
62
  return fp_df
63
63
 
64
- # %% ../nbs/01_feature.ipynb 15
64
+ # %% ../nbs/01_feature.ipynb 16
65
65
  def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
66
66
  col: str = 'sequence', # colname of amino acid sequence
67
67
  model_name: str = "esm2_t33_650M_UR50D", # Name of the ESM model to use for the embeddings.
@@ -128,7 +128,7 @@ def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
128
128
 
129
129
  return df_feature
130
130
 
131
- # %% ../nbs/01_feature.ipynb 19
131
+ # %% ../nbs/01_feature.ipynb 20
132
132
  def get_t5(df: pd.DataFrame,
133
133
  col: str = 'sequence'
134
134
  ):
@@ -170,7 +170,7 @@ def get_t5(df: pd.DataFrame,
170
170
 
171
171
  return T5_feature
172
172
 
173
- # %% ../nbs/01_feature.ipynb 22
173
+ # %% ../nbs/01_feature.ipynb 23
174
174
  def get_t5_bfd(df:pd.DataFrame,
175
175
  col: str = 'sequence'
176
176
  ):
@@ -212,7 +212,7 @@ def get_t5_bfd(df:pd.DataFrame,
212
212
 
213
213
  return T5_feature
214
214
 
215
- # %% ../nbs/01_feature.ipynb 26
215
+ # %% ../nbs/01_feature.ipynb 27
216
216
  def reduce_feature(df: pd.DataFrame,
217
217
  method: str='pca', # dimensionality reduction method, accept both capital and lower case
218
218
  complexity: int=20, # None for PCA; perfplexity for TSNE, recommend: 30; n_neigbors for UMAP, recommend: 15
@@ -258,7 +258,7 @@ def reduce_feature(df: pd.DataFrame,
258
258
 
259
259
  return embedding_df
260
260
 
261
- # %% ../nbs/01_feature.ipynb 29
261
+ # %% ../nbs/01_feature.ipynb 30
262
262
  def remove_hi_corr(df: pd.DataFrame,
263
263
  thr: float=0.98 # threshold
264
264
  ):
@@ -278,7 +278,7 @@ def remove_hi_corr(df: pd.DataFrame,
278
278
 
279
279
  return df
280
280
 
281
- # %% ../nbs/01_feature.ipynb 33
281
+ # %% ../nbs/01_feature.ipynb 34
282
282
  def preprocess(df: pd.DataFrame,
283
283
  thr: float=0.98):
284
284
 
@@ -7,7 +7,7 @@ __all__ = ['set_sns', 'get_color_dict', 'logo_func', 'get_logo', 'get_logo2', 'p
7
7
  'plot_cluster', 'plot_bokeh', 'plot_count', 'plot_bar', 'plot_group_bar', 'plot_box', 'plot_corr',
8
8
  'draw_corr', 'get_AUCDF', 'plot_confusion_matrix']
9
9
 
10
- # %% ../nbs/02_plot.ipynb 4
10
+ # %% ../nbs/02_plot.ipynb 5
11
11
  import joblib,logomaker
12
12
  import fastcore.all as fc, pandas as pd, numpy as np, seaborn as sns
13
13
  from adjustText import adjust_text
@@ -32,14 +32,14 @@ from bokeh.layouts import column
32
32
  from bokeh.palettes import Category20_20
33
33
  from itertools import cycle
34
34
 
35
- # %% ../nbs/02_plot.ipynb 6
35
+ # %% ../nbs/02_plot.ipynb 7
36
36
  def set_sns():
37
37
  "Set seaborn resolution for notebook display"
38
38
  sns.set(rc={"figure.dpi":300, 'savefig.dpi':300})
39
39
  sns.set_context('notebook')
40
40
  sns.set_style("ticks")
41
41
 
42
- # %% ../nbs/02_plot.ipynb 7
42
+ # %% ../nbs/02_plot.ipynb 8
43
43
  def get_color_dict(categories, # list of names to assign color
44
44
  palette: str='tab20', # choose from sns.color_palette
45
45
  ):
@@ -49,7 +49,7 @@ def get_color_dict(categories, # list of names to assign color
49
49
  color_map = {category: next(color_cycle) for category in categories}
50
50
  return color_map
51
51
 
52
- # %% ../nbs/02_plot.ipynb 11
52
+ # %% ../nbs/02_plot.ipynb 12
53
53
  def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino acid at each position
54
54
  title: str='logo', # title of the motif logo
55
55
  ):
@@ -81,7 +81,7 @@ def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino
81
81
  logo.ax.set_yticks([])
82
82
  logo.ax.set_title(title)
83
83
 
84
- # %% ../nbs/02_plot.ipynb 12
84
+ # %% ../nbs/02_plot.ipynb 13
85
85
  def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substrates as columns
86
86
  kinase: str, # a specific kinase name in index
87
87
  ):
@@ -120,7 +120,7 @@ def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substra
120
120
  # plot logo
121
121
  logo_func(ratio2, kinase)
122
122
 
123
- # %% ../nbs/02_plot.ipynb 16
123
+ # %% ../nbs/02_plot.ipynb 17
124
124
  def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of a kinase, with index as amino acid, and columns as positions
125
125
  title: str = 'logo', # title of the graph
126
126
  ):
@@ -159,7 +159,7 @@ def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of
159
159
 
160
160
  logo_func(ratio2,title)
161
161
 
162
- # %% ../nbs/02_plot.ipynb 19
162
+ # %% ../nbs/02_plot.ipynb 20
163
163
  @fc.delegates(sns.scatterplot)
164
164
  def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
165
165
  x: str, # column name for x axis
@@ -203,7 +203,7 @@ def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
203
203
 
204
204
  plt.tight_layout()
205
205
 
206
- # %% ../nbs/02_plot.ipynb 23
206
+ # %% ../nbs/02_plot.ipynb 24
207
207
  @fc.delegates(sns.histplot)
208
208
  def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
209
209
  x: str, # column name of values
@@ -220,7 +220,7 @@ def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
220
220
  plt.figure(figsize=figsize)
221
221
  sns.histplot(data=df,x=x,**hist_params,**kwargs)
222
222
 
223
- # %% ../nbs/02_plot.ipynb 27
223
+ # %% ../nbs/02_plot.ipynb 28
224
224
  @fc.delegates(sns.heatmap)
225
225
  def plot_heatmap(matrix, # a matrix of values
226
226
  title: str='heatmap', # title of the heatmap
@@ -235,7 +235,7 @@ def plot_heatmap(matrix, # a matrix of values
235
235
  sns.heatmap(matrix, cmap=cmap, annot=False,**kwargs)
236
236
  plt.title(title)
237
237
 
238
- # %% ../nbs/02_plot.ipynb 31
238
+ # %% ../nbs/02_plot.ipynb 32
239
239
  @fc.delegates(sns.scatterplot)
240
240
  def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and second column to be y
241
241
  **kwargs, # arguments for sns.scatterplot
@@ -244,7 +244,7 @@ def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and se
244
244
  plt.figure(figsize=(7,7))
245
245
  sns.scatterplot(data = X,x=X.columns[0],y=X.columns[1],alpha=0.7,**kwargs)
246
246
 
247
- # %% ../nbs/02_plot.ipynb 33
247
+ # %% ../nbs/02_plot.ipynb 34
248
248
  def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for dimensionality reduction
249
249
  method: str='pca', # dimensionality reduction method, choose from pca, umap, and tsne
250
250
  hue: str=None, # colname of color
@@ -269,7 +269,7 @@ def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for di
269
269
  texts = [plt.text(embedding_df[x_col][i], embedding_df[y_col][i], name_list[i],fontsize=8) for i in range(len(embedding_df))]
270
270
  adjust_text(texts, arrowprops=dict(arrowstyle='-', color='black'))
271
271
 
272
- # %% ../nbs/02_plot.ipynb 37
272
+ # %% ../nbs/02_plot.ipynb 38
273
273
  def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality reduction
274
274
  idx, # pd.Series or list that indicates identities for searching box
275
275
  hue:None, # pd.Series or list that indicates category for each sample
@@ -367,7 +367,7 @@ def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality
367
367
  layout = column(autocomplete, p)
368
368
  show(layout)
369
369
 
370
- # %% ../nbs/02_plot.ipynb 40
370
+ # %% ../nbs/02_plot.ipynb 41
371
371
  def plot_count(cnt, # from df['x'].value_counts()
372
372
  tick_spacing: float= None, # tick spacing for x axis
373
373
  palette: str='tab20'):
@@ -383,7 +383,7 @@ def plot_count(cnt, # from df['x'].value_counts()
383
383
  if tick_spacing is not None:
384
384
  ax.xaxis.set_major_locator(MultipleLocator(tick_spacing))
385
385
 
386
- # %% ../nbs/02_plot.ipynb 42
386
+ # %% ../nbs/02_plot.ipynb 43
387
387
  @fc.delegates(sns.barplot)
388
388
  def plot_bar(df,
389
389
  value, # colname of value
@@ -438,7 +438,7 @@ def plot_bar(df,
438
438
 
439
439
  plt.gca().spines[['right', 'top']].set_visible(False)
440
440
 
441
- # %% ../nbs/02_plot.ipynb 45
441
+ # %% ../nbs/02_plot.ipynb 46
442
442
  @fc.delegates(sns.barplot)
443
443
  def plot_group_bar(df,
444
444
  value_cols, # list of column names for values, the order depends on the first item
@@ -481,7 +481,7 @@ def plot_group_bar(df,
481
481
  plt.gca().spines[['right', 'top']].set_visible(False)
482
482
  plt.legend(fontsize=fontsize) # if change legend location, use loc='upper right'
483
483
 
484
- # %% ../nbs/02_plot.ipynb 48
484
+ # %% ../nbs/02_plot.ipynb 49
485
485
  @fc.delegates(sns.boxplot)
486
486
  def plot_box(df,
487
487
  value, # colname of value
@@ -523,7 +523,7 @@ def plot_box(df,
523
523
  # plt.gca().spines[['right', 'top']].set_visible(False)
524
524
 
525
525
 
526
- # %% ../nbs/02_plot.ipynb 51
526
+ # %% ../nbs/02_plot.ipynb 52
527
527
  @fc.delegates(sns.regplot)
528
528
  def plot_corr(x, # x axis values, or colname of x axis
529
529
  y, # y axis values, or colname of y axis
@@ -560,7 +560,7 @@ def plot_corr(x, # x axis values, or colname of x axis
560
560
  transform=plt.gca().transAxes,
561
561
  ha='center', va='center')
562
562
 
563
- # %% ../nbs/02_plot.ipynb 55
563
+ # %% ../nbs/02_plot.ipynb 56
564
564
  def draw_corr(corr):
565
565
 
566
566
  "plot heatmap from df.corr()"
@@ -572,7 +572,7 @@ def draw_corr(corr):
572
572
  plt.figure(figsize=(20, 16)) # Set the figure size
573
573
  sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, mask=mask, fmt='.2f')
574
574
 
575
- # %% ../nbs/02_plot.ipynb 59
575
+ # %% ../nbs/02_plot.ipynb 60
576
576
  def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
577
577
 
578
578
  "Plot CDF curve and get relative area under the curve"
@@ -637,7 +637,7 @@ def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
637
637
 
638
638
  return AUCDF
639
639
 
640
- # %% ../nbs/02_plot.ipynb 62
640
+ # %% ../nbs/02_plot.ipynb 63
641
641
  def plot_confusion_matrix(target, # pd.Series
642
642
  pred, # pd.Series
643
643
  class_names:list=['0','1'],
@@ -5,7 +5,7 @@
5
5
  # %% auto 0
6
6
  __all__ = ['get_splits', 'split_data', 'score_each', 'train_ml', 'train_ml_cv', 'predict_ml']
7
7
 
8
- # %% ../nbs/03_ML.ipynb 4
8
+ # %% ../nbs/03_ML.ipynb 5
9
9
  # katlas
10
10
  from .core import Data
11
11
  from .feature import *
@@ -29,7 +29,7 @@ from sklearn.ensemble import *
29
29
  from sklearn import set_config
30
30
  set_config(transform_output="pandas")
31
31
 
32
- # %% ../nbs/03_ML.ipynb 7
32
+ # %% ../nbs/03_ML.ipynb 8
33
33
  def get_splits(df: pd.DataFrame, # df contains info for split
34
34
  stratified: str=None, # colname to make stratified kfold; sampling from different groups
35
35
  group: str=None, # colname to make group kfold; test and train are from different groups
@@ -79,7 +79,7 @@ def get_splits(df: pd.DataFrame, # df contains info for split
79
79
 
80
80
  return splits
81
81
 
82
- # %% ../nbs/03_ML.ipynb 12
82
+ # %% ../nbs/03_ML.ipynb 13
83
83
  def split_data(df: pd.DataFrame, # dataframe of values
84
84
  feat_col: list, # feature columns
85
85
  target_col: list, # target columns
@@ -95,7 +95,7 @@ def split_data(df: pd.DataFrame, # dataframe of values
95
95
 
96
96
  return X_train, y_train, X_test, y_test
97
97
 
98
- # %% ../nbs/03_ML.ipynb 16
98
+ # %% ../nbs/03_ML.ipynb 17
99
99
  def score_each(target: pd.DataFrame, # target dataframe
100
100
  pred: pd.DataFrame, # predicted dataframe
101
101
  absolute = True, # if absolute, take average with absolute values for pearson/spearman
@@ -134,7 +134,7 @@ def score_each(target: pd.DataFrame, # target dataframe
134
134
 
135
135
  return mse,pearson_mean, metrics_df
136
136
 
137
- # %% ../nbs/03_ML.ipynb 21
137
+ # %% ../nbs/03_ML.ipynb 22
138
138
  def train_ml(df, # dataframe of values
139
139
  feat_col, # feature columns
140
140
  target_col, # target columns
@@ -169,7 +169,7 @@ def train_ml(df, # dataframe of values
169
169
 
170
170
  return y_test, y_pred
171
171
 
172
- # %% ../nbs/03_ML.ipynb 24
172
+ # %% ../nbs/03_ML.ipynb 25
173
173
  def train_ml_cv( df, # dataframe of values
174
174
  feat_col, # feature columns
175
175
  target_col, # target columns
@@ -213,7 +213,7 @@ def train_ml_cv( df, # dataframe of values
213
213
 
214
214
  return oof, metrics
215
215
 
216
- # %% ../nbs/03_ML.ipynb 31
216
+ # %% ../nbs/03_ML.ipynb 32
217
217
  def predict_ml(df, # Dataframe that contains features
218
218
  feat_col, # feature columns
219
219
  target_col=None,
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: python-katlas
3
- Version: 0.1.0
3
+ Version: 0.1.1
4
4
  Summary: tools for predicting kinome specificities
5
5
  Home-page: https://github.com/sky1ove/python-katlas
6
6
  Author: lily
@@ -60,6 +60,13 @@ helpful to your research.
60
60
  phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
61
61
  and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
62
62
  [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
63
+
64
+
65
+ ## Web applications
66
+
67
+ Users can now run the analysis directly on the web without needing to code.
68
+
69
+ Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
63
70
 
64
71
  ## Tutorials on Colab
65
72
 
@@ -67,20 +74,16 @@ helpful to your research.
67
74
  sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
68
75
  - 2. [High throughput substrate scoring on phosphoproteomics
69
76
  dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
70
- - 3. [Query a protein’s phosphorylation sites and predict their
71
- upstream
72
- kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
73
- - 4. [Kinase enrichment analysis for AKT
74
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
75
- / [Kinase enrichment analysis for EGFR
76
- inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
77
+ - 3. [Kinase enrichment analysis for AKT
78
+ inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
79
+
77
80
 
78
81
  ## Install
79
82
 
80
- Install the latest version through git
83
+ Install the latest version through pip
81
84
 
82
85
  ``` python
83
- !pip install git+https://github.com/sky1ove/katlas.git -Uqq
86
+ pip install python-katlas -Uq
84
87
  ```
85
88
 
86
89
  ## Import
@@ -5,7 +5,7 @@
5
5
  ### Python library ###
6
6
  repo = python-katlas
7
7
  lib_name = %(repo)s
8
- version = 0.1.0
8
+ version = 0.1.1
9
9
  min_python = 3.7
10
10
  license = apache2
11
11
  black_formatting = False
@@ -1 +0,0 @@
1
- __version__ = "0.0.9"
File without changes
File without changes
File without changes
File without changes