PyPI - python-katlas - Versions diffs - 0.0.9__tar.gz → 0.1.1__tar.gz - Mend

python-katlas 0.0.9tar.gz → 0.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

{python-katlas-0.0.9/python_katlas.egg-info → python-katlas-0.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: python-katlas
-Version: 0.0.9
+Version: 0.1.1
 Summary: tools for predicting kinome specificities
 Home-page: https://github.com/sky1ove/python-katlas
 Author: lily
@@ -60,6 +60,13 @@ helpful to your research.
   phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
   and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
   [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
+## Web applications
+Users can now run the analysis directly on the web without needing to code.
+Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
 ## Tutorials on Colab
@@ -67,20 +74,16 @@ helpful to your research.
       sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
 - 2.  [High throughput substrate scoring on phosphoproteomics
       dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
-- 3.  [Query a protein’s phosphorylation sites and predict their
-      upstream
-      kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
-- 4.  [Kinase enrichment analysis for AKT
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
-      / [Kinase enrichment analysis for EGFR
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
+- 3.  [Kinase enrichment analysis for AKT
+      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
 ## Install
-Install the latest version through git
+Install the latest version through pip
 ``` python
-!pip install git+https://github.com/sky1ove/katlas.git -Uqq
+pip install python-katlas -Uq
 ```
 ## Import

{python-katlas-0.0.9 → python-katlas-0.1.1}/README.md RENAMED Viewed

@@ -38,6 +38,13 @@ helpful to your research.
   phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
   and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
   [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
+## Web applications
+Users can now run the analysis directly on the web without needing to code.
+Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
 ## Tutorials on Colab
@@ -45,20 +52,16 @@ helpful to your research.
       sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
 - 2.  [High throughput substrate scoring on phosphoproteomics
       dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
-- 3.  [Query a protein’s phosphorylation sites and predict their
-      upstream
-      kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
-- 4.  [Kinase enrichment analysis for AKT
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
-      / [Kinase enrichment analysis for EGFR
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
+- 3.  [Kinase enrichment analysis for AKT
+      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
 ## Install
-Install the latest version through git
+Install the latest version through pip
 ``` python
-!pip install git+https://github.com/sky1ove/katlas.git -Uqq
+pip install python-katlas -Uq
 ```
 ## Import

python-katlas-0.1.1/katlas/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.1.0"

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/_modidx.py RENAMED Viewed

@@ -46,13 +46,12 @@ d = { 'settings': { 'branch': 'main',
                              'katlas.core.get_one_kinase': ('core.html#get_one_kinase', 'katlas/core.py'),
                              'katlas.core.get_pct': ('core.html#get_pct', 'katlas/core.py'),
                              'katlas.core.get_pct_df': ('core.html#get_pct_df', 'katlas/core.py'),
-                             'katlas.core.get_ttest': ('core.html#get_ttest', 'katlas/core.py'),
+                             'katlas.core.get_pvalue': ('core.html#get_pvalue', 'katlas/core.py'),
                              'katlas.core.get_unique_site': ('core.html#get_unique_site', 'katlas/core.py'),
                              'katlas.core.multiply': ('core.html#multiply', 'katlas/core.py'),
                              'katlas.core.multiply_func': ('core.html#multiply_func', 'katlas/core.py'),
                              'katlas.core.predict_kinase': ('core.html#predict_kinase', 'katlas/core.py'),
                              'katlas.core.predict_kinase_df': ('core.html#predict_kinase_df', 'katlas/core.py'),
-                             'katlas.core.query_gene': ('core.html#query_gene', 'katlas/core.py'),
                              'katlas.core.raw2norm': ('core.html#raw2norm', 'katlas/core.py'),
                              'katlas.core.sumup': ('core.html#sumup', 'katlas/core.py')},
             'katlas.dl': { 'katlas.dl.CNN1D_1': ('dl.html#cnn1d_1', 'katlas/dl.py'),

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/core.py RENAMED Viewed

@@ -6,7 +6,7 @@
 __all__ = ['param_PSPA_st', 'param_PSPA_y', 'param_PSPA', 'param_CDDM', 'param_CDDM_upper', 'Data', 'CPTAC', 'convert_string',
            'checker', 'STY2sty', 'cut_seq', 'get_dict', 'multiply_func', 'multiply', 'sumup', 'predict_kinase',
            'predict_kinase_df', 'get_pct', 'get_pct_df', 'get_unique_site', 'extract_site_seq', 'get_freq',
-           'query_gene', 'get_ttest', 'get_metaP', 'raw2norm', 'get_one_kinase']
+           'get_pvalue', 'get_metaP', 'raw2norm', 'get_one_kinase']
 # %% ../nbs/00_core.ipynb 4
 import math, pandas as pd, numpy as np
@@ -14,7 +14,7 @@ from tqdm import tqdm
 from scipy.stats import chi2
 from typing import Callable
 from functools import partial
-from scipy.stats import ttest_ind
+from scipy.stats import ttest_ind, mannwhitneyu, wilcoxon
 from statsmodels.stats.multitest import multipletests
 # %% ../nbs/00_core.ipynb 7
@@ -451,17 +451,18 @@ def predict_kinase(input_string: str, # site sequence
 # %% ../nbs/00_core.ipynb 41
 # PSPA
-param_PSPA_st = {'ref':Data.get_pspa_st_norm(), 'func':multiply} # Johnson et al. Nature official
-param_PSPA_y = {'ref':Data.get_pspa_tyr_norm(), 'func':multiply}
-param_PSPA = {'ref':Data.get_pspa_all_norm(), 'func':multiply}
+param_PSPA_st = {'ref':Data.get_pspa_st_norm().astype('float32'), 'func':multiply} # Johnson et al. Nature official
+param_PSPA_y = {'ref':Data.get_pspa_tyr_norm().astype('float32'), 'func':multiply}
+param_PSPA = {'ref':Data.get_pspa_all_norm().astype('float32'), 'func':multiply}
 # Kinase-substrate dataset, CDDM
-param_CDDM = {'ref':Data.get_cddm(), 'func':sumup}
-param_CDDM_upper = {'ref':Data.get_cddm_upper(), 'func':sumup, 'to_upper':True} # specific for all uppercase
+param_CDDM = {'ref':Data.get_cddm().astype('float32'), 'func':sumup}
+param_CDDM_upper = {'ref':Data.get_cddm_upper().astype('float32'), 'func':sumup, 'to_upper':True} # specific for all uppercase
-# %% ../nbs/00_core.ipynb 45
+# %% ../nbs/00_core.ipynb 46
 def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
     print('input dataframe has a length', df.shape[0])
     print('Preprocessing')
@@ -493,12 +494,20 @@ def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
     df['keys'] = df['site_seq'].apply(get_dict)
     input_keys_df  = df[['keys']].explode('keys').reset_index()
     input_keys_df.columns = ['input_index', 'key']
     ref_T = ref.T
-    merged_df = input_keys_df.merge(ref_T, left_on='key', right_index=True, how='inner')
+    input_keys_df = input_keys_df.set_index('key')
+    print('Merging reference')
+    merged_df = input_keys_df.merge(ref_T, left_index=True, right_index=True, how='inner')
+    print('Finish merging')
     if func == sumup:
-        grouped_df = merged_df.drop(columns=['key']).groupby('input_index').sum()
+        grouped_df = merged_df.groupby('input_index').sum()
         out = grouped_df.reindex(df.index)
     elif func==multiply:
@@ -514,7 +523,7 @@ def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
             kinase_df = kinase_df.rename(columns={kinase: 'value'})
             # Compute log_value
-            kinase_df['log_value'] = np.log2(kinase_df['value'],where=kinase_df['value']>0)
+            kinase_df['log_value'] = np.log2(kinase_df['value'].where(kinase_df['value'] > 0))
             # Group by 'input_index' and compute sum and count
             grouped = kinase_df.dropna().groupby('input_index')
@@ -541,7 +550,7 @@ def predict_kinase_df(df, seq_col, ref, func, to_lower=False, to_upper=False):
     # Return results as a DataFrame
     return out
-# %% ../nbs/00_core.ipynb 54
+# %% ../nbs/00_core.ipynb 53
 def get_pct(site,ref,func,pct_ref):
     "Replicate the precentile results from The Kinase Library."
@@ -566,7 +575,7 @@ def get_pct(site,ref,func,pct_ref):
     final.columns=['log2(score)','percentile']
     return final
-# %% ../nbs/00_core.ipynb 60
+# %% ../nbs/00_core.ipynb 59
 def get_pct_df(score_df, # output from predict_kinase_df
                pct_ref, # a reference df for percentile calculation
               ):
@@ -591,7 +600,7 @@ def get_pct_df(score_df, # output from predict_kinase_df
     return percentiles_df
-# %% ../nbs/00_core.ipynb 65
+# %% ../nbs/00_core.ipynb 64
 def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphorylation sites
                     seq_col: str='site_seq', # column name of site sequence
                     id_col: str='gene_site' # column name of site id
@@ -607,7 +616,7 @@ def get_unique_site(df:pd.DataFrame = None,# dataframe that contains phosphoryla
     return unique
-# %% ../nbs/00_core.ipynb 68
+# %% ../nbs/00_core.ipynb 67
 def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequence
                      seq_col: str, # column name of protein sequence
                      position_col: str # column name of position 0
@@ -633,7 +642,7 @@ def extract_site_seq(df: pd.DataFrame, # dataframe that contains protein sequenc
     return np.array(data)
-# %% ../nbs/00_core.ipynb 73
+# %% ../nbs/00_core.ipynb 72
 def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains phosphorylation sequence splitted by their position
              aa_order = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the full matrix
              aa_order_paper = [i for i in 'PGACSTVILMFYWHKRQNDEsty'], # amino acid to include in the partial matrix
@@ -674,35 +683,16 @@ def get_freq(df_k: pd.DataFrame, # a dataframe for a single kinase that contains
     return paper,full
-# %% ../nbs/00_core.ipynb 77
-def query_gene(df,gene):
-    "Query gene in the phosphoproteomics dataset"
-    # query gene in the dataframe
-    df_gene = df[df.gene_site.str.contains(f'{gene}_')]
-    # sort dataframe based on position
-    sort_position = df_gene.gene_site.str.split('_').str[-1].str[1:].astype(int).sort_values().index
-    df_gene = df_gene.loc[sort_position]
-    return df_gene
-# %% ../nbs/00_core.ipynb 81
-def get_ttest(df,
+# %% ../nbs/00_core.ipynb 76
+def get_pvalue(df,
               columns1, # list of column names for group1
               columns2, # list of column names for group2
+              test_method = 'mann_whitney', # 'student_t', 'mann_whitney', 'wilcoxon'
               FC_method = 'median', # or mean
-              alpha=0.05, # significance level in multipletests for p_adj
-              correction_method='fdr_bh', # method in multipletests for p_adj
              ):
-    """
-    Performs t-tests and calculates log2 fold change between two groups of columns in a DataFrame.
-    NaN p-values are excluded from the multiple testing correction.
-    Returns:
-    DataFrame: Results including log2FC, p-values, adjusted p-values, significance, signed log10 P value, and signed log10 Padj
-    """
+    "Performs statistical tests and calculates difference between the median or mean of two groups of columns."
     group1 = df[columns1]
     group2 = df[columns2]
@@ -717,24 +707,36 @@ def get_ttest(df,
     # As phosphoproteomics data has already been log transformed, we can directly use subtraction
     FCs = m2 - m1
-    # Perform t-tests and handle NaN p-values
-    t_results = [ttest_ind(group1.loc[idx], group2.loc[idx], nan_policy='omit') for idx in tqdm(df.index, desc="Computing t-tests")]
+    # Perform the chosen test and handle NaN p-values
+    if test_method == 'student_t': # data is normally distributed, non-paired
+        test_func = ttest_ind
+    elif test_method == 'mann_whitney': # not normally distributed, non-paired, mann_whitney considers the rank, ignore the differences
+        test_func = mannwhitneyu
+    elif test_method == 'wilcoxon': # not normally distributed, paired
+        test_func = wilcoxon
+    t_results = []
+    for idx in tqdm(df.index, desc=f"Computing {test_method} tests"):
+        try:
+            if test_method == 'wilcoxon': # as wilcoxon is paired, if lack a paired sample, just give nan, as default nanpolicy is propagate (gives nan if nan in input)
+                stat, pvalue = test_func(group1.loc[idx], group2.loc[idx])
+            else:
+                stat, pvalue = test_func(group1.loc[idx], group2.loc[idx], nan_policy='omit')
+        except ValueError:  # Handle cases with insufficient data
+            pvalue = np.nan
+        t_results.append(pvalue)
     # Exclude NaN p-values before multiple testing correction
-    p_values = [result.pvalue if result.pvalue is not np.nan else np.nan for result in t_results]
-    valid_p_values = np.array(p_values, dtype=float)  # Ensure the correct data type
+    p_values = np.array(t_results, dtype=float)  # Ensure the correct data type
+    valid_p_values = p_values[~np.isnan(p_values)]
-    # valid_p_values = np.array(p_values)
-    valid_p_values = valid_p_values[~np.isnan(valid_p_values)]
     # Adjust for multiple testing on valid p-values only
-    reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=alpha, method=correction_method)
+    reject, pvals_corrected, _, _ = multipletests(valid_p_values, alpha=0.05, method='fdr_bh')
     # Create a full list of corrected p-values including NaNs
-    full_pvals_corrected = np.empty_like(p_values)
-    full_pvals_corrected[:] = np.nan
+    full_pvals_corrected = np.full_like(p_values, np.nan)
     np.place(full_pvals_corrected, ~np.isnan(p_values), pvals_corrected)
     # Adjust the significance accordingly
     full_reject = np.zeros_like(p_values, dtype=bool)
     np.place(full_reject, ~np.isnan(p_values), reject)
@@ -743,22 +745,21 @@ def get_ttest(df,
     results = pd.DataFrame({
         'log2FC': FCs,
         'p_value': p_values,
-        'p_adj': full_pvals_corrected,
-        'significant': full_reject
+        'p_adj': full_pvals_corrected
     })
     results['p_value'] = results['p_value'].astype(float)
     def get_signed_logP(r,p_col):
         log10 = -np.log10(r[p_col])
         return -log10 if r['log2FC']<0 else log10
     results['signed_logP'] = results.apply(partial(get_signed_logP,p_col='p_value'),axis=1)
     results['signed_logPadj'] = results.apply(partial(get_signed_logP,p_col='p_adj'),axis=1)
     return results
-# %% ../nbs/00_core.ipynb 82
+# %% ../nbs/00_core.ipynb 77
 def get_metaP(p_values):
     "Use Fisher's method to calculate a combined p value given a list of p values; this function also allows negative p values (negative correlation)"
@@ -770,7 +771,7 @@ def get_metaP(p_values):
     return score
-# %% ../nbs/00_core.ipynb 85
+# %% ../nbs/00_core.ipynb 80
 def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and single amino acid as columns
              PDHK: bool=False, # whether this kinase belongs to PDHK family
             ):
@@ -793,7 +794,7 @@ def raw2norm(df: pd.DataFrame, # single kinase's df has position as index, and s
     return df2
-# %% ../nbs/00_core.ipynb 87
+# %% ../nbs/00_core.ipynb 82
 def get_one_kinase(df: pd.DataFrame, #stacked dataframe (paper's raw data)
                    kinase:str, # a specific kinase
                    normalize: bool=False, # normalize according to the paper; special for PDHK1/4

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/dl.py RENAMED Viewed

@@ -6,7 +6,7 @@
 __all__ = ['def_device', 'seed_everything', 'GeneralDataset', 'get_sampler', 'MLP_1', 'CNN1D_1', 'init_weights', 'lin_wn',
            'conv_wn', 'CNN1D_2', 'train_dl', 'train_dl_cv', 'predict_dl']
-# %% ../nbs/04_DL.ipynb 4
+# %% ../nbs/04_DL.ipynb 5
 from fastbook import *
 import fastcore.all as fc,torch.nn.init as init
 from fastai.callback.training import GradientClip
@@ -22,7 +22,7 @@ from sklearn.model_selection import *
 from sklearn.metrics import mean_squared_error
 from scipy.stats import spearmanr,pearsonr
-# %% ../nbs/04_DL.ipynb 6
+# %% ../nbs/04_DL.ipynb 7
 def seed_everything(seed=123):
     random.seed(seed)
     os.environ['PYTHONHASHSEED'] = str(seed)
@@ -32,10 +32,10 @@ def seed_everything(seed=123):
     torch.backends.cudnn.deterministic = True
     torch.backends.cudnn.benchmark = False
-# %% ../nbs/04_DL.ipynb 8
+# %% ../nbs/04_DL.ipynb 9
 def_device = 'mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu'
-# %% ../nbs/04_DL.ipynb 13
+# %% ../nbs/04_DL.ipynb 14
 class GeneralDataset:
     def __init__(self,
                  df, # a dataframe of values
@@ -62,7 +62,7 @@ class GeneralDataset:
             y = torch.Tensor(self.y[index])
             return X, y
-# %% ../nbs/04_DL.ipynb 17
+# %% ../nbs/04_DL.ipynb 18
 def get_sampler(info,col):
     "For imbalanced data, get higher weights for less-represented samples"
@@ -82,7 +82,7 @@ def get_sampler(info,col):
     return sampler
-# %% ../nbs/04_DL.ipynb 23
+# %% ../nbs/04_DL.ipynb 24
 def MLP_1(num_features,
           num_targets,
           hidden_units = [512, 218],
@@ -112,7 +112,7 @@ def MLP_1(num_features,
     return model
-# %% ../nbs/04_DL.ipynb 29
+# %% ../nbs/04_DL.ipynb 30
 class CNN1D_1(Module):
     def __init__(self,
@@ -137,12 +137,12 @@ class CNN1D_1(Module):
         x = self.fc2(x)
         return x
-# %% ../nbs/04_DL.ipynb 33
+# %% ../nbs/04_DL.ipynb 34
 def init_weights(m, leaky=0.):
     "Initiate any Conv layer with Kaiming norm."
     if isinstance(m, (nn.Conv1d,nn.Conv2d,nn.Conv3d)): init.kaiming_normal_(m.weight, a=leaky)
-# %% ../nbs/04_DL.ipynb 34
+# %% ../nbs/04_DL.ipynb 35
 def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
     "Weight norm of linear."
     layers =  nn.Sequential(
@@ -152,7 +152,7 @@ def lin_wn(ni,nf,dp=0.1,act=nn.SiLU):
     if act: layers.append(act())
     return layers
-# %% ../nbs/04_DL.ipynb 35
+# %% ../nbs/04_DL.ipynb 36
 def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
     "Weight norm of conv."
     layers =  nn.Sequential(
@@ -162,7 +162,7 @@ def conv_wn(ni, nf, ks=3, stride=1, padding=1, dp=0.1,act=nn.ReLU):
     if act: layers.append(act())
     return layers
-# %% ../nbs/04_DL.ipynb 36
+# %% ../nbs/04_DL.ipynb 37
 class CNN1D_2(nn.Module):
     def __init__(self, ni, nf, amp_scale = 16):
@@ -212,7 +212,7 @@ class CNN1D_2(nn.Module):
         return x
-# %% ../nbs/04_DL.ipynb 40
+# %% ../nbs/04_DL.ipynb 41
 def train_dl(df,
             feat_col,
             target_col,
@@ -275,7 +275,7 @@ def train_dl(df,
     return target, pred
-# %% ../nbs/04_DL.ipynb 45
+# %% ../nbs/04_DL.ipynb 46
 @fc.delegates(train_dl)
 def train_dl_cv(df,
                 feat_col,
@@ -325,7 +325,7 @@ def train_dl_cv(df,
     return oof, metrics
-# %% ../nbs/04_DL.ipynb 53
+# %% ../nbs/04_DL.ipynb 54
 def predict_dl(df,
                feat_col,
                target_col,

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/feature.py RENAMED Viewed

@@ -5,7 +5,7 @@
 # %% auto 0
 __all__ = ['get_rdkit', 'get_morgan', 'get_esm', 'get_t5', 'get_t5_bfd', 'reduce_feature', 'remove_hi_corr', 'preprocess']
-# %% ../nbs/01_feature.ipynb 4
+# %% ../nbs/01_feature.ipynb 5
 from fastbook import *
 import torch,re,joblib,gc,esm
 from tqdm.notebook import tqdm; tqdm.pandas()
@@ -30,7 +30,7 @@ from umap.umap_ import UMAP
 set_config(transform_output="pandas")
-# %% ../nbs/01_feature.ipynb 7
+# %% ../nbs/01_feature.ipynb 8
 def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
               col:str = "SMILES", # colname of smile
               normalize: bool = True, # normalize features using StandardScaler()
@@ -49,7 +49,7 @@ def get_rdkit(df: pd.DataFrame, # a dataframe that contains smiles
     # feature_df = feature_df.reset_index()
     return feature_df
-# %% ../nbs/01_feature.ipynb 11
+# %% ../nbs/01_feature.ipynb 12
 def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
                col: str = "SMILES", # colname of smile
                radius=3
@@ -61,7 +61,7 @@ def get_morgan(df: pd.DataFrame, # a dataframe that contains smiles
     fp_df.columns = "morgan_" + fp_df.columns.astype(str)
     return fp_df
-# %% ../nbs/01_feature.ipynb 15
+# %% ../nbs/01_feature.ipynb 16
 def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
             col: str = 'sequence', # colname of amino acid sequence
             model_name: str = "esm2_t33_650M_UR50D", # Name of the ESM model to use for the embeddings.
@@ -128,7 +128,7 @@ def get_esm(df:pd.DataFrame, # a dataframe that contains amino acid sequence
         return df_feature
-# %% ../nbs/01_feature.ipynb 19
+# %% ../nbs/01_feature.ipynb 20
 def get_t5(df: pd.DataFrame,
            col: str = 'sequence'
            ):
@@ -170,7 +170,7 @@ def get_t5(df: pd.DataFrame,
     return T5_feature
-# %% ../nbs/01_feature.ipynb 22
+# %% ../nbs/01_feature.ipynb 23
 def get_t5_bfd(df:pd.DataFrame,
                col: str = 'sequence'
                ):
@@ -212,7 +212,7 @@ def get_t5_bfd(df:pd.DataFrame,
     return T5_feature
-# %% ../nbs/01_feature.ipynb 26
+# %% ../nbs/01_feature.ipynb 27
 def reduce_feature(df: pd.DataFrame,
                    method: str='pca', # dimensionality reduction method, accept both capital and lower case
                    complexity: int=20, # None for PCA; perfplexity for TSNE, recommend: 30; n_neigbors for UMAP, recommend: 15
@@ -258,7 +258,7 @@ def reduce_feature(df: pd.DataFrame,
     return embedding_df
-# %% ../nbs/01_feature.ipynb 29
+# %% ../nbs/01_feature.ipynb 30
 def remove_hi_corr(df: pd.DataFrame,
                    thr: float=0.98 # threshold
                    ):
@@ -278,7 +278,7 @@ def remove_hi_corr(df: pd.DataFrame,
     return df
-# %% ../nbs/01_feature.ipynb 33
+# %% ../nbs/01_feature.ipynb 34
 def preprocess(df: pd.DataFrame,
                thr: float=0.98):

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/plot.py RENAMED Viewed

@@ -7,7 +7,7 @@ __all__ = ['set_sns', 'get_color_dict', 'logo_func', 'get_logo', 'get_logo2', 'p
            'plot_cluster', 'plot_bokeh', 'plot_count', 'plot_bar', 'plot_group_bar', 'plot_box', 'plot_corr',
            'draw_corr', 'get_AUCDF', 'plot_confusion_matrix']
-# %% ../nbs/02_plot.ipynb 4
+# %% ../nbs/02_plot.ipynb 5
 import joblib,logomaker
 import fastcore.all as fc, pandas as pd, numpy as np, seaborn as sns
 from adjustText import adjust_text
@@ -32,14 +32,14 @@ from bokeh.layouts import column
 from bokeh.palettes import Category20_20
 from itertools import cycle
-# %% ../nbs/02_plot.ipynb 6
+# %% ../nbs/02_plot.ipynb 7
 def set_sns():
     "Set seaborn resolution for notebook display"
     sns.set(rc={"figure.dpi":300, 'savefig.dpi':300})
     sns.set_context('notebook')
     sns.set_style("ticks")
-# %% ../nbs/02_plot.ipynb 7
+# %% ../nbs/02_plot.ipynb 8
 def get_color_dict(categories, # list of names to assign color
                    palette: str='tab20', # choose from sns.color_palette
                    ):
@@ -49,7 +49,7 @@ def get_color_dict(categories, # list of names to assign color
     color_map = {category: next(color_cycle) for category in categories}
     return color_map
-# %% ../nbs/02_plot.ipynb 11
+# %% ../nbs/02_plot.ipynb 12
 def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino acid at each position
               title: str='logo', # title of the motif logo
              ):
@@ -81,7 +81,7 @@ def logo_func(df:pd.DataFrame, # a dataframe that contains ratios for each amino
     logo.ax.set_yticks([])
     logo.ax.set_title(title)
-# %% ../nbs/02_plot.ipynb 12
+# %% ../nbs/02_plot.ipynb 13
 def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substrates as columns
              kinase: str, # a specific kinase name in index
              ):
@@ -120,7 +120,7 @@ def get_logo(df: pd.DataFrame, # stacked Dataframe with kinase as index, substra
     # plot logo
     logo_func(ratio2, kinase)
-# %% ../nbs/02_plot.ipynb 16
+# %% ../nbs/02_plot.ipynb 17
 def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of a kinase, with index as amino acid, and columns as positions
               title: str = 'logo', # title of the graph
               ):
@@ -159,7 +159,7 @@ def get_logo2(full: pd.DataFrame, # a dataframe that contains the full matrix of
     logo_func(ratio2,title)
-# %% ../nbs/02_plot.ipynb 19
+# %% ../nbs/02_plot.ipynb 20
 @fc.delegates(sns.scatterplot)
 def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
               x: str, # column name for x axis
@@ -203,7 +203,7 @@ def plot_rank(sorted_df: pd.DataFrame, # a sorted dataframe
     plt.tight_layout()
-# %% ../nbs/02_plot.ipynb 23
+# %% ../nbs/02_plot.ipynb 24
 @fc.delegates(sns.histplot)
 def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
               x: str, # column name of values
@@ -220,7 +220,7 @@ def plot_hist(df: pd.DataFrame, # a dataframe that contain values for plot
     plt.figure(figsize=figsize)
     sns.histplot(data=df,x=x,**hist_params,**kwargs)
-# %% ../nbs/02_plot.ipynb 27
+# %% ../nbs/02_plot.ipynb 28
 @fc.delegates(sns.heatmap)
 def plot_heatmap(matrix, # a matrix of values
                  title: str='heatmap', # title of the heatmap
@@ -235,7 +235,7 @@ def plot_heatmap(matrix, # a matrix of values
     sns.heatmap(matrix, cmap=cmap, annot=False,**kwargs)
     plt.title(title)
-# %% ../nbs/02_plot.ipynb 31
+# %% ../nbs/02_plot.ipynb 32
 @fc.delegates(sns.scatterplot)
 def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and second column to be y
             **kwargs, # arguments for sns.scatterplot
@@ -244,7 +244,7 @@ def plot_2d(X: pd.DataFrame, # a dataframe that has first column to be x, and se
     plt.figure(figsize=(7,7))
     sns.scatterplot(data = X,x=X.columns[0],y=X.columns[1],alpha=0.7,**kwargs)
-# %% ../nbs/02_plot.ipynb 33
+# %% ../nbs/02_plot.ipynb 34
 def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for dimensionality reduction
                  method: str='pca', # dimensionality reduction method, choose from pca, umap, and tsne
                  hue: str=None, # colname of color
@@ -269,7 +269,7 @@ def plot_cluster(df: pd.DataFrame, # a dataframe of values that is waited for di
         texts = [plt.text(embedding_df[x_col][i], embedding_df[y_col][i], name_list[i],fontsize=8) for i in range(len(embedding_df))]
         adjust_text(texts, arrowprops=dict(arrowstyle='-', color='black'))
-# %% ../nbs/02_plot.ipynb 37
+# %% ../nbs/02_plot.ipynb 38
 def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality reduction
                idx, # pd.Series or list that indicates identities for searching box
                hue:None, # pd.Series or list that indicates category for each sample
@@ -367,7 +367,7 @@ def plot_bokeh(X:pd.DataFrame, # a dataframe of two columns from dimensionality
     layout = column(autocomplete, p)
     show(layout)
-# %% ../nbs/02_plot.ipynb 40
+# %% ../nbs/02_plot.ipynb 41
 def plot_count(cnt, # from df['x'].value_counts()
                tick_spacing: float= None, # tick spacing for x axis
                palette: str='tab20'):
@@ -383,7 +383,7 @@ def plot_count(cnt, # from df['x'].value_counts()
     if tick_spacing is not None:
         ax.xaxis.set_major_locator(MultipleLocator(tick_spacing))
-# %% ../nbs/02_plot.ipynb 42
+# %% ../nbs/02_plot.ipynb 43
 @fc.delegates(sns.barplot)
 def plot_bar(df,
              value, # colname of value
@@ -438,7 +438,7 @@ def plot_bar(df,
     plt.gca().spines[['right', 'top']].set_visible(False)
-# %% ../nbs/02_plot.ipynb 45
+# %% ../nbs/02_plot.ipynb 46
 @fc.delegates(sns.barplot)
 def plot_group_bar(df,
                    value_cols,  # list of column names for values, the order depends on the first item
@@ -481,7 +481,7 @@ def plot_group_bar(df,
     plt.gca().spines[['right', 'top']].set_visible(False)
     plt.legend(fontsize=fontsize) # if change legend location, use loc='upper right'
-# %% ../nbs/02_plot.ipynb 48
+# %% ../nbs/02_plot.ipynb 49
 @fc.delegates(sns.boxplot)
 def plot_box(df,
              value, # colname of value
@@ -523,7 +523,7 @@ def plot_box(df,
     # plt.gca().spines[['right', 'top']].set_visible(False)
-# %% ../nbs/02_plot.ipynb 51
+# %% ../nbs/02_plot.ipynb 52
 @fc.delegates(sns.regplot)
 def plot_corr(x, # x axis values, or colname of x axis
               y, # y axis values, or colname of y axis
@@ -560,7 +560,7 @@ def plot_corr(x, # x axis values, or colname of x axis
             transform=plt.gca().transAxes,
              ha='center', va='center')
-# %% ../nbs/02_plot.ipynb 55
+# %% ../nbs/02_plot.ipynb 56
 def draw_corr(corr):
     "plot heatmap from df.corr()"
@@ -572,7 +572,7 @@ def draw_corr(corr):
     plt.figure(figsize=(20, 16))  # Set the figure size
     sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, mask=mask, fmt='.2f')
-# %% ../nbs/02_plot.ipynb 59
+# %% ../nbs/02_plot.ipynb 60
 def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
     "Plot CDF curve and get relative area under the curve"
@@ -637,7 +637,7 @@ def get_AUCDF(df,col, reverse=False,plot=True,xlabel='Rank of reported kinase'):
     return AUCDF
-# %% ../nbs/02_plot.ipynb 62
+# %% ../nbs/02_plot.ipynb 63
 def plot_confusion_matrix(target, # pd.Series
                           pred, # pd.Series
                           class_names:list=['0','1'],

{python-katlas-0.0.9 → python-katlas-0.1.1}/katlas/train.py RENAMED Viewed

@@ -5,7 +5,7 @@
 # %% auto 0
 __all__ = ['get_splits', 'split_data', 'score_each', 'train_ml', 'train_ml_cv', 'predict_ml']
-# %% ../nbs/03_ML.ipynb 4
+# %% ../nbs/03_ML.ipynb 5
 # katlas
 from .core import Data
 from .feature import *
@@ -29,7 +29,7 @@ from sklearn.ensemble import *
 from sklearn import set_config
 set_config(transform_output="pandas")
-# %% ../nbs/03_ML.ipynb 7
+# %% ../nbs/03_ML.ipynb 8
 def get_splits(df: pd.DataFrame, # df contains info for split
                stratified: str=None, # colname to make stratified kfold; sampling from different groups
                group: str=None, # colname to make group kfold; test and train are from different groups
@@ -79,7 +79,7 @@ def get_splits(df: pd.DataFrame, # df contains info for split
     return splits
-# %% ../nbs/03_ML.ipynb 12
+# %% ../nbs/03_ML.ipynb 13
 def split_data(df: pd.DataFrame, # dataframe of values
                feat_col: list, # feature columns
                target_col: list, # target columns
@@ -95,7 +95,7 @@ def split_data(df: pd.DataFrame, # dataframe of values
     return X_train, y_train, X_test, y_test
-# %% ../nbs/03_ML.ipynb 16
+# %% ../nbs/03_ML.ipynb 17
 def score_each(target: pd.DataFrame, # target dataframe
               pred: pd.DataFrame, # predicted dataframe
               absolute = True, # if absolute, take average with absolute values for pearson/spearman
@@ -134,7 +134,7 @@ def score_each(target: pd.DataFrame, # target dataframe
     return mse,pearson_mean, metrics_df
-# %% ../nbs/03_ML.ipynb 21
+# %% ../nbs/03_ML.ipynb 22
 def train_ml(df, # dataframe of values
              feat_col, # feature columns
              target_col, # target columns
@@ -169,7 +169,7 @@ def train_ml(df, # dataframe of values
     return y_test, y_pred
-# %% ../nbs/03_ML.ipynb 24
+# %% ../nbs/03_ML.ipynb 25
 def train_ml_cv( df, # dataframe of values
                  feat_col, # feature columns
                  target_col,  # target columns
@@ -213,7 +213,7 @@ def train_ml_cv( df, # dataframe of values
     return oof, metrics
-# %% ../nbs/03_ML.ipynb 31
+# %% ../nbs/03_ML.ipynb 32
 def predict_ml(df, # Dataframe that contains features
                feat_col, # feature columns
                target_col=None,

{python-katlas-0.0.9 → python-katlas-0.1.1/python_katlas.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: python-katlas
-Version: 0.0.9
+Version: 0.1.1
 Summary: tools for predicting kinome specificities
 Home-page: https://github.com/sky1ove/python-katlas
 Author: lily
@@ -60,6 +60,13 @@ helpful to your research.
   phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
   and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
   [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
+## Web applications
+Users can now run the analysis directly on the web without needing to code.
+Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)
 ## Tutorials on Colab
@@ -67,20 +74,16 @@ helpful to your research.
       sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
 - 2.  [High throughput substrate scoring on phosphoproteomics
       dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
-- 3.  [Query a protein’s phosphorylation sites and predict their
-      upstream
-      kinases](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03_query_gene.ipynb)
-- 4.  [Kinase enrichment analysis for AKT
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04a_enrichment_AKTi.ipynb)
-      / [Kinase enrichment analysis for EGFR
-      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_04b_enrichment_EGFRi.ipynb)
+- 3.  [Kinase enrichment analysis for AKT
+      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
 ## Install
-Install the latest version through git
+Install the latest version through pip
 ``` python
-!pip install git+https://github.com/sky1ove/katlas.git -Uqq
+pip install python-katlas -Uq
 ```
 ## Import

{python-katlas-0.0.9 → python-katlas-0.1.1}/settings.ini RENAMED Viewed

@@ -5,7 +5,7 @@
 ### Python library ###
 repo = python-katlas
 lib_name = %(repo)s
-version = 0.0.9
+version = 0.1.1
 min_python = 3.7
 license = apache2
 black_formatting = False