PyPI - enzymetk - Versions diffs - 0.0.2__tar.gz → 0.0.6__tar.gz - Mend

enzymetk 0.0.2tar.gz → 0.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{enzymetk-0.0.2 → enzymetk-0.0.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.2
+Metadata-Version: 2.4
 Name: enzymetk
-Version: 0.0.2
+Version: 0.0.6
 Home-page: https://github.com/arianemora/enzyme-tk/
 Author: Ariane Mora
 Author-email: ariane.n.mora@gmail.com
@@ -18,17 +18,12 @@ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: fair-esm
 Requires-Dist: scikit-learn
 Requires-Dist: numpy
 Requires-Dist: seaborn
 Requires-Dist: sciutil
-Requires-Dist: pandas==2.1.4
+Requires-Dist: pandas
 Requires-Dist: biopython
-Requires-Dist: sentence_transformers
-Requires-Dist: pubchempy
-Requires-Dist: pyfaidx
-Requires-Dist: spacy
 Dynamic: author
 Dynamic: author-email
 Dynamic: classifier
@@ -37,6 +32,7 @@ Dynamic: description-content-type
 Dynamic: home-page
 Dynamic: keywords
 Dynamic: license
+Dynamic: license-file
 Dynamic: project-url
 Dynamic: requires-dist
 Dynamic: requires-python
@@ -45,6 +41,9 @@ Dynamic: requires-python
 Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperable modules that act on dataframes. These modules are designed to be imported into pipelines for specific function. For this reason, `steps` as each module is called (e.g. finding similar proteins with `BLAST` would be considered a step) are designed to be as light as possible. An example of a pipeline is the [annotate-e](https://github.com/ArianeMora/annotate-e)  ` pipeline, this acts to annotate a fasta with an ensemble of methods (each is designated as an Enzyme-tk step).
+**If you have any issues installing, let me know - this has been tested only on Linux/Ubuntu. Please post an issue!**
 ## Installation
 ## Install base package to import modules
@@ -71,6 +70,7 @@ This is a work-in progress! e.g. some tools (e.g. proteInfer and CLEAN) require
 Here are some of the tools that have been implemented to be chained together as a pipeline:
+[boltz2](https://github.com/jwohlwend/boltz)
 [mmseqs2](https://github.com/soedinglab/mmseqs2)
 [foldseek](https://github.com/steineggerlab/foldseek)
 [diamond](https://github.com/bbuchfink/diamond)
@@ -89,6 +89,7 @@ Here are some of the tools that have been implemented to be chained together as
 [fasttree](https://morgannprice.github.io/fasttree/)
 [Porechop](https://github.com/rrwick/Porechop)
 [prokka](https://github.com/tseemann/prokka)
 ## Things to note
 All the tools use the conda env of `enzymetk` by default.
@@ -120,6 +121,8 @@ The steps are the main building blocks of the pipeline. They are responsible for
 BLAST is a tool for searching a database of sequences for similar sequences. Here you can either pass a database that you have already created or pass the sequences as part of your dataframe and pass the label column (this needs to have two values: reference and query) reference refers to sequences that you want to search against and query refers to sequences that you want to search for.
+Note you need to have installed the BLAST environment.
 ```python
 id_col = 'Entry'
 seq_col = 'Sequence'
@@ -148,6 +151,34 @@ df = pd.DataFrame(rows, columns=[id_col, seq_col])
 print(df)
 df << (ActiveSitePred(id_col, seq_col, squidly_dir, num_threads) >> Save('tmp/squidly_as_pred.pkl'))
+```
+### Boltz2
+Boltz2 is a model for predicting structures. Note you need docko installed as I run via that.
+Below is an example using boltz with 4 threads, and uses a cofactor (intermediate in this case). Just set to be None for a single substrate version.
+```
+import sys
+from enzymetk.dock_boltz_step import Boltz
+from enzymetk.save_step import Save
+import pandas as pd
+import os
+os.environ['MKL_THREADING_LAYER'] = 'GNU'
+output_dir = 'tmp/'
+num_threads = 4
+id_col = 'Entry'
+seq_col = 'Sequence'
+substrate_col = 'Substrate'
+intermediate_col = 'Intermediate'
+rows = [['P0DP23_boltz_8999', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p1', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP23_boltz_p2', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p3', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p4', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]']]
+df = pd.DataFrame(rows, columns=[id_col, seq_col, substrate_col, intermediate_col])
+df << (Boltz(id_col, seq_col, substrate_col, intermediate_col, f'{output_dir}', num_threads) >> Save(f'{output_dir}test.pkl'))
 ```
 ### Chai

{enzymetk-0.0.2 → enzymetk-0.0.6}/README.md RENAMED Viewed

@@ -2,6 +2,9 @@
 Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperable modules that act on dataframes. These modules are designed to be imported into pipelines for specific function. For this reason, `steps` as each module is called (e.g. finding similar proteins with `BLAST` would be considered a step) are designed to be as light as possible. An example of a pipeline is the [annotate-e](https://github.com/ArianeMora/annotate-e)  ` pipeline, this acts to annotate a fasta with an ensemble of methods (each is designated as an Enzyme-tk step).
+**If you have any issues installing, let me know - this has been tested only on Linux/Ubuntu. Please post an issue!**
 ## Installation
 ## Install base package to import modules
@@ -28,6 +31,7 @@ This is a work-in progress! e.g. some tools (e.g. proteInfer and CLEAN) require
 Here are some of the tools that have been implemented to be chained together as a pipeline:
+[boltz2](https://github.com/jwohlwend/boltz)
 [mmseqs2](https://github.com/soedinglab/mmseqs2)
 [foldseek](https://github.com/steineggerlab/foldseek)
 [diamond](https://github.com/bbuchfink/diamond)
@@ -46,6 +50,7 @@ Here are some of the tools that have been implemented to be chained together as
 [fasttree](https://morgannprice.github.io/fasttree/)
 [Porechop](https://github.com/rrwick/Porechop)
 [prokka](https://github.com/tseemann/prokka)
 ## Things to note
 All the tools use the conda env of `enzymetk` by default.
@@ -77,6 +82,8 @@ The steps are the main building blocks of the pipeline. They are responsible for
 BLAST is a tool for searching a database of sequences for similar sequences. Here you can either pass a database that you have already created or pass the sequences as part of your dataframe and pass the label column (this needs to have two values: reference and query) reference refers to sequences that you want to search against and query refers to sequences that you want to search for.
+Note you need to have installed the BLAST environment.
 ```python
 id_col = 'Entry'
 seq_col = 'Sequence'
@@ -105,6 +112,34 @@ df = pd.DataFrame(rows, columns=[id_col, seq_col])
 print(df)
 df << (ActiveSitePred(id_col, seq_col, squidly_dir, num_threads) >> Save('tmp/squidly_as_pred.pkl'))
+```
+### Boltz2
+Boltz2 is a model for predicting structures. Note you need docko installed as I run via that.
+Below is an example using boltz with 4 threads, and uses a cofactor (intermediate in this case). Just set to be None for a single substrate version.
+```
+import sys
+from enzymetk.dock_boltz_step import Boltz
+from enzymetk.save_step import Save
+import pandas as pd
+import os
+os.environ['MKL_THREADING_LAYER'] = 'GNU'
+output_dir = 'tmp/'
+num_threads = 4
+id_col = 'Entry'
+seq_col = 'Sequence'
+substrate_col = 'Substrate'
+intermediate_col = 'Intermediate'
+rows = [['P0DP23_boltz_8999', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p1', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP23_boltz_p2', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p3', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p4', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]']]
+df = pd.DataFrame(rows, columns=[id_col, seq_col, substrate_col, intermediate_col])
+df << (Boltz(id_col, seq_col, substrate_col, intermediate_col, f'{output_dir}', num_threads) >> Save(f'{output_dir}test.pkl'))
 ```
 ### Chai

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/__init__.py RENAMED Viewed

@@ -22,34 +22,11 @@ Date: March 2025
 __title__ = 'enzymetk'
 __description__ = 'Toolkit for enzymes and what not'
 __url__ = 'https://github.com/arianemora/enzyme-tk/'
-__version__ = '0.0.2'
+__version__ = '0.0.6'
 __author__ = 'Ariane Mora'
 __author_email__ = 'ariane.n.mora@gmail.com'
 __license__ = 'GPL3'
-# from enzymetk.step import *
-# from enzymetk.generate_msa_step import ClustalOmega
-# from enzymetk.annotateEC_CLEAN_step import CLEAN
-# from enzymetk.annotateEC_proteinfer_step import ProteInfer
-# from enzymetk.dock_chai_step import Chai
-# from enzymetk.dock_vina_step import Vina
-# from enzymetk.embedchem_chemberta_step import ChemBERT
-# from enzymetk.embedchem_rxnfp_step import RxnFP
-# from enzymetk.embedchem_selformer_step import SelFormer
-# from enzymetk.embedchem_unimol_step import UniMol
-# from enzymetk.embedprotein_esm_step import EmbedESM
-# from enzymetk.generate_tree_step import FastTree
-# from enzymetk.inpaint_ligandMPNN_step import LigandMPNN
-# from enzymetk.metagenomics_porechop_trim_reads_step import PoreChop
-# from enzymetk.metagenomics_prokka_annotate_genes import Prokka
-# #from enzymetk.predict_activity_step import
-# from enzymetk.predict_catalyticsite_step import ActiveSitePred
-# from enzymetk.sequence_search_blast import BLAST
-# from enzymetk.similarity_foldseek_step import FoldSeek
-# from enzymetk.similarity_mmseqs_step import MMseqs
-# from enzymetk.similarity_reaction_step import ReactionDist
-# from enzymetk.similarity_substrate_step import SubstrateDist

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/annotateEC_CLEAN_step.py RENAMED Viewed

@@ -116,7 +116,7 @@ class CLEAN(Step):
                 print(output_filenames)
                 for sub_df in output_filenames:
                     df = pd.concat([df, sub_df])
-                return df
+                return self.__filter_df(df)
             else:
-                return self.__execute([df, tmp_dir])
+                return self.__filter_df(self.__execute([df, tmp_dir]))
                 return df

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/annotateEC_CREEP_step.py RENAMED Viewed

@@ -38,7 +38,7 @@ class CREEP(Step):
         self.args_extract = args_extract
         self.args_retrieval = args_retrieval
-    def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
+    def __execute(self, df: pd.DataFrame, tmp_dir: str):
         tmp_dir = '/disk1/ariane/vscode/degradeo/pipeline/tmp/'
         input_filename = f'{tmp_dir}/creepasjkdkajshdkja.csv'
         df.to_csv(input_filename, index=False)

enzymetk-0.0.6/enzymetk/dock_boltz_step.py ADDED Viewed

@@ -0,0 +1,46 @@
+from enzymetk.step import Step
+import pandas as pd
+from docko.boltz import run_boltz_affinity
+import logging
+import numpy as np
+from multiprocessing.dummy import Pool as ThreadPool
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+class Boltz(Step):
+    def __init__(self, id_col: str, seq_col: str, substrate_col: str, intermediate_col: str, output_dir: str, num_threads: int):
+        self.id_col = id_col
+        self.seq_col = seq_col
+        self.substrate_col = substrate_col
+        self.intermediate_col = intermediate_col
+        self.output_dir = output_dir or None
+        self.num_threads = num_threads or 1
+    def __execute(self, df: pd.DataFrame) -> pd.DataFrame:
+        output_filenames = []
+        for run_id, seq, substrate, intermediate in df[[self.id_col, self.seq_col, self.substrate_col, self.intermediate_col]].values:
+            # Might have an issue if the things are not correctly installed in the same dicrectory
+            if not isinstance(substrate, str):
+                substrate = ''
+            print(run_id, seq, substrate)
+            run_boltz_affinity(run_id, seq, substrate, self.output_dir, intermediate)
+            output_filenames.append(f'{self.output_dir}/{run_id}/')
+        return output_filenames
+    def execute(self, df: pd.DataFrame) -> pd.DataFrame:
+        if self.output_dir:
+            if self.num_threads > 1:
+                pool = ThreadPool(self.num_threads)
+                df_list = np.array_split(df, self.num_threads)
+                results = pool.map(self.__execute, df_list)
+            else:
+                results = self.__execute(df)
+            df['output_dir'] = results
+            return df
+        else:
+            print('No output directory provided')

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/dock_chai_step.py RENAMED Viewed

@@ -11,16 +11,17 @@ logger.setLevel(logging.INFO)
 class Chai(Step):
-    def __init__(self, id_col: str, seq_col: str, substrate_col: str, output_dir: str, num_threads: int):
+    def __init__(self, id_col: str, seq_col: str, substrate_col: str, cofactor_col: str, output_dir: str, num_threads: int):
         self.id_col = id_col
         self.seq_col = seq_col
         self.substrate_col = substrate_col
+        self.cofactor_col = cofactor_col
         self.output_dir = output_dir or None
         self.num_threads = num_threads or 1
     def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
         output_filenames = []
-        for run_id, seq, substrate in df[[self.id_col, self.seq_col, self.substrate_col]].values:
+        for run_id, seq, substrate, cofactor in df[[self.id_col, self.seq_col, self.substrate_col, self.cofactor_col]].values:
             # Might have an issue if the things are not correctly installed in the same dicrectory
             if not isinstance(substrate, str):
                 substrate = ''
@@ -28,7 +29,8 @@ class Chai(Step):
             run_chai(run_id, # name
                     seq, # sequence
                     substrate, # ligand as smiles
-                    tmp_dir)
+                    tmp_dir,
+                    cofactor) # cofactor as smiles
             output_filenames.append(f'{tmp_dir}/{run_id}/')
         return output_filenames

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/dock_vina_step.py RENAMED Viewed

@@ -4,6 +4,7 @@ from docko.docko import *
 import logging
 import numpy as np
 import os
+from pathlib import Path
 from multiprocessing.dummy import Pool as ThreadPool
 logger = logging.getLogger(__name__)
@@ -21,34 +22,66 @@ class Vina(Step):
         self.substrate_col = substrate_col
         self.substrate_name_col = substrate_name_col
         self.active_site_col = active_site_col  # Expects active site residues as a string separated by |
-        self.output_dir = output_dir or None
+        self.output_dir = Path( output_dir) or None
         self.num_threads = num_threads or 1
     def __execute(self, df: pd.DataFrame) -> pd.DataFrame:
         output_filenames = []
         # ToDo: update to create from sequence if the path doesn't exist.
         for label, structure_path, seq, substrate_smiles, substrate_name, residues in df[[self.id_col, self.structure_col, self.sequence_col, self.substrate_col, self.substrate_name_col, self.active_site_col]].values:
-            os.system(f'mkdir {self.output_dir}{label}')
             try:
+                structure_path = str(structure_path)
                 residues = str(residues)
                 residues = [int(r) + 1 for r in residues.split('|')]
-                if not os.path.exists(f'{structure_path}'):
-                    # Try get the AF2 structure we expect the label to be the uniprot id
-                    get_alphafold_structure(label, f'{self.output_dir}{label}/{label}_AF2.pdb')
-                    structure_path = f'{self.output_dir}{label}/{label}_AF2.pdb'
-                clean_one_pdb(f'{structure_path}', f'{self.output_dir}{label}/{label}.pdb')
-                pdb_to_pdbqt_protein(f'{self.output_dir}{label}/{label}.pdb', f'{self.output_dir}{label}/{label}.pdbqt')
-                score = dock(sequence='', protein_name=label, smiles=substrate_smiles, ligand_name=substrate_name, residues=residues,
-                            protein_dir=f'{self.output_dir}', ligand_dir=f'{self.output_dir}', output_dir=f'{self.output_dir}{label}/', pH=7.4,
-                            method='vina', size_x=10.0, size_y=10.0, size_z=10.0)
-                output_filename = f'{self.output_dir}{label}/{label}.pdb'
-                output_filenames.append(output_filename)
+                label_dir = self.output_dir / label
+                label_dir.mkdir(parents=True, exist_ok=True)
+                structure_path = Path(structure_path)
+                if not structure_path.exists():
+                    # Try to download AF2 structure
+                    get_alphafold_structure(label, label_dir / f"{label}_AF2.pdb")
+                    structure_path = label_dir / f"{label}_AF2.pdb"
+                # Skip if still not found
+                if not structure_path.exists():
+                    print(f"Skipping {label}: AF2 structure not found.")
+                    output_filenames.append(None)
+                    continue
+                # Proceed with docking
+                pdb_path = label_dir / f"{label}.pdb"
+                pdbqt_path = label_dir / f"{label}.pdbqt"
+                clean_one_pdb(str(structure_path), str(pdb_path))
+                pdb_to_pdbqt_protein(str(pdb_path), str(pdbqt_path))
+                score = dock(
+                    sequence='',
+                    protein_name=label,
+                    smiles=substrate_smiles,
+                    ligand_name=substrate_name,
+                    residues=residues,
+                    protein_dir=str(self.output_dir),
+                    ligand_dir=str(self.output_dir),
+                    output_dir=str(label_dir),
+                    pH=7.4,
+                    method='vina',
+                    size_x=10.0,
+                    size_y=10.0,
+                    size_z=10.0
+                )
+                output_filenames.append(str(pdb_path))
             except Exception as e:
                 print(f'Error docking {label}: {e}')
                 output_filenames.append(None)
         return output_filenames
     def execute(self, df: pd.DataFrame) -> pd.DataFrame:
         if self.output_dir:
             if self.num_threads > 1:
@@ -60,4 +93,4 @@ class Vina(Step):
             df['output_dir'] = results
             return df
         else:
-            print('No output directory provided')
+            print('No output directory provided')

enzymetk-0.0.6/enzymetk/embedprotein_esm3_step.py ADDED Viewed

@@ -0,0 +1,71 @@
+# ESM 3 script
+from esm.sdk.api import ESMProtein
+from tempfile import TemporaryDirectory
+import torch
+import os
+import pandas as pd
+from esm.models.esm3 import ESM3
+from esm.sdk.api import ESMProtein, SamplingConfig
+from huggingface_hub import login
+from enzymetk.step import Step
+import numpy as np
+from tqdm import tqdm
+# CUDA setup
+os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   # see issue #152
+cuda = True
+DEVICE = torch.device("cuda" if cuda else "cpu")
+device = DEVICE
+class EmbedESM3(Step):
+    def __init__(self, id_col: str, seq_col: str, extraction_method='mean', num_threads=1,
+                 tmp_dir: str = None, env_name: str = 'enzymetk', save_tensors=False): # type: ignore
+        login()
+        self.client = ESM3.from_pretrained("esm3-open").to("cuda")
+        self.seq_col = seq_col
+        self.id_col = id_col
+        self.num_threads = num_threads or 1
+        self.extraction_method = extraction_method
+        self.tmp_dir = tmp_dir
+        self.env_name = env_name
+        self.save_tensors = save_tensors
+    def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
+        client = self.client
+        means = []
+        for id, seq in tqdm(df[[self.id_col, self.seq_col]].values):
+            protein = ESMProtein(
+                sequence=(
+                    seq
+                )
+            )
+            protein_tensor = client.encode(protein)
+            output = client.forward_and_sample(
+                protein_tensor, SamplingConfig(return_per_residue_embeddings=True)
+            )
+            if self.save_tensors:
+                torch.save(output.per_residue_embedding, os.path.join(tmp_dir, f'{id}.pt'))
+            means.append(np.array(output.per_residue_embedding.mean(dim=0).cpu()))
+        df['esm3_mean']  = means
+        return df
+    def execute(self, df: pd.DataFrame) -> pd.DataFrame:
+        if self.tmp_dir is None:
+            with TemporaryDirectory() as tmp_dir:
+                if self.num_threads > 1:
+                    dfs = []
+                    df_list = np.array_split(df, self.num_threads)
+                    for df_chunk in tqdm(df_list):
+                        dfs.append(self.__execute(df_chunk, tmp_dir))
+                    df = pd.DataFrame()
+                    for tmp_df in tqdm(dfs):
+                        df = pd.concat([df, tmp_df])
+                    return df
+                else:
+                    df = self.__execute(df, tmp_dir)
+                    return df
+        else:
+            df = self.__execute(df, self.tmp_dir)
+            return df

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedprotein_esm_step.py RENAMED Viewed

@@ -70,8 +70,8 @@ def extract_mean_embedding(df, id_column, encoding_dir, rep_num=33):
 class EmbedESM(Step):
-    def __init__(self, id_col: str, seq_col: str, model='esm2_t33_650M_UR50D', extraction_method='mean',
-                 active_site_col: str = None, num_threads=1, tmp_dir: str = None, env_name: str = 'enzymetk'):
+    def __init__(self, id_col: str, seq_col: str, model='esm2_t36_3B_UR50D', extraction_method='mean',
+                 active_site_col: str = None, num_threads=1, tmp_dir: str = None, env_name: str = 'enzymetk', rep_num=36):
         self.seq_col = seq_col
         self.id_col = id_col
         self.active_site_col = active_site_col
@@ -80,6 +80,7 @@ class EmbedESM(Step):
         self.extraction_method = extraction_method
         self.tmp_dir = tmp_dir
         self.env_name = env_name
+        self.rep_num = rep_num
     def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
         input_filename = f'{tmp_dir}/input.fasta'
@@ -95,11 +96,11 @@ class EmbedESM(Step):
         cmd = ['conda', 'run', '-n', self.env_name, 'python', Path(__file__).parent/'esm-extract.py', self.model, input_filename, tmp_dir, '--include', 'per_tok']
         self.run(cmd)
         if self.extraction_method == 'mean':
-            df = extract_mean_embedding(df, self.id_col, tmp_dir)
+            df = extract_mean_embedding(df, self.id_col, tmp_dir, rep_num=self.rep_num)
         elif self.extraction_method == 'active_site':
             if self.active_site_col is None:
                 raise ValueError('active_site_col must be provided if extraction_method is active_site')
-            df = extract_active_site_embedding(df, self.id_col, self.active_site_col, tmp_dir)
+            df = extract_active_site_embedding(df, self.id_col, self.active_site_col, tmp_dir, rep_num=self.rep_num)
         return df

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/inpaint_ligandMPNN_step.py RENAMED Viewed

@@ -16,7 +16,8 @@ import os
 class LigandMPNN(Step):
-    def __init__(self, pdb_column_name: str, ligand_mpnn_dir: str, output_dir: str, tmp_dir: str = None, args=None, num_threads: int = 1, env_name: str = 'ligandmpnn_env'):
+    def __init__(self, pdb_column_name: str, ligand_mpnn_dir: str, output_dir: str,
+                 tmp_dir: str = None, args=None, num_threads: int = 1, env_name: str = 'ligandmpnn_env'):
         self.pdb_column_name = pdb_column_name
         self.ligand_mpnn_dir = ligand_mpnn_dir
         self.output_dir = output_dir
@@ -34,7 +35,7 @@ class LigandMPNN(Step):
         os.chdir(self.ligand_mpnn_dir)
         for pdb_file in df[ self.pdb_column_name].values:
-            cmd = ['conda', 'run', '-n', self.env_name, 'python3', f'{self.ligand_mpnn_dir}run.py', '--pdb_path', pdb_file,  '--out_folder', f'{self.output_dir}']
+            cmd = ['conda', 'run', '-n', self.env_name, 'python3', f'{self.ligand_mpnn_dir}run.py', '--pdb_path', pdb_file,  '--out_folder', f'{self.output_dir}']
             if self.args is not None:
                 cmd.extend(self.args)
             result = subprocess.run(cmd, check=True)

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/predict_catalyticsite_step.py RENAMED Viewed

@@ -8,6 +8,8 @@ import numpy as np
 from tqdm import tqdm
 import random
 import string
+import logging
+import os
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
@@ -15,15 +17,17 @@ logger.setLevel(logging.INFO)
 class ActiveSitePred(Step):
-    def __init__(self, id_col: str, seq_col: str, squidly_dir: str, num_threads: int = 1,
-                 esm2_model = 'esm2_t36_3B_UR50D', tmp_dir: str = None):
+    def __init__(self, id_col: str, seq_col: str, num_threads: int = 1,
+                 esm2_model = 'esm2_t36_3B_UR50D', tmp_dir: str = None, args=None):
         self.id_col = id_col
         self.seq_col = seq_col
         self.num_threads = num_threads or 1
-        self.squidly_dir = squidly_dir
         self.esm2_model = esm2_model
         self.tmp_dir = tmp_dir
+        self.args = None
+        self.logger = logging.getLogger(__name__)
+        print('Predicting Active Sites using Squidly')
     def __to_fasta(self, df: pd.DataFrame, tmp_dir: str):
         tmp_label = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
@@ -37,13 +41,17 @@ class ActiveSitePred(Step):
     def __execute(self, df: pd.DataFrame, tmp_dir: str):
         input_filename = self.__to_fasta(df, tmp_dir)
         # Might have an issue if the things are not correctly installed in the same dicrectory
-        result = subprocess.run(['python', Path(__file__).parent/'predict_catalyticsite_run.py', '--out', str(tmp_dir),
-                                '--input', input_filename, '--squidly_dir', self.squidly_dir, '--esm2_model', self.esm2_model], capture_output=True, text=True)
-        output_filename = f'{input_filename.replace(".fasta", "_results.pkl")}'
+        cmd = []
+        cmd = ['squidly', 'run', input_filename, self.esm2_model, tmp_dir]
+        if self.args is not None:
+            cmd.extend(self.args)
+        result = self.run(cmd)
         if result.stderr:
-            logger.error(result.stderr)
-        logger.info(result.stdout)
+            self.logger.error(result.stderr)
+            print(result.stderr)
+        else:
+            self.logger.info(result.stdout)
+        output_filename = os.path.join(tmp_dir, 'squidly_ensemble.csv')
         return output_filename
     def execute(self, df: pd.DataFrame) -> pd.DataFrame:
@@ -61,10 +69,10 @@ class ActiveSitePred(Step):
                 df = pd.DataFrame()
                 print(output_filenames)
                 for p in output_filenames:
-                    sub_df = pd.read_pickle(p)
+                    sub_df = pd.read_csv(p)
                     df = pd.concat([df, sub_df])
                 return df
             else:
                 output_filename = self.__execute(df, tmp_dir)
-                return pd.read_pickle(output_filename)
+                return pd.read_csv(output_filename)

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/similarity_foldseek_step.py RENAMED Viewed

@@ -125,11 +125,10 @@ class FoldSeek(Step):
                          continue
                 df = pd.DataFrame()
                 print(output_filenames)
-                for p in output_filenames:
-                    sub_df = pd.read_pickle(p)
+                for sub_df in output_filenames:
                     df = pd.concat([df, sub_df])
                 return df
             else:
-                output_filename = self.__execute([df, tmp_dir])
-                return pd.read_pickle(output_filename)
+                df = self.__execute([df, tmp_dir])
+                return df

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/similarity_reaction_step.py RENAMED Viewed

@@ -24,22 +24,26 @@ class ReactionDist(Step):
         self.num_threads = num_threads
     def __execute(self, data: list) -> np.array:
-        reaction_df = data
-        tmp_label = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
-        rxn = rdChemReactions.ReactionFromSmarts(self.smiles_string)
-        rxn_fp = rdChemReactions.CreateStructuralFingerprintForReaction(rxn)
+        reaction_df = data
         rows = []
+        fp_params = rdChemReactions.ReactionFingerprintParams()
+        rxn = rdChemReactions.ReactionFromSmarts(self.smiles_string)
+        rxn_fp = rdChemReactions.CreateStructuralFingerprintForReaction(rxn, ReactionFingerPrintParams=fp_params) #rdChemReactions.CreateStructuralFingerprintForReaction(rxn, ReactionFingerPrintParams=fp_params)
         # compare all fp pairwise without duplicates
         for smile_id, smiles in tqdm(reaction_df[[self.id_column_name, self.smiles_column_name]].values): # -1 so the last fp will not be used
             mol_ = rdChemReactions.ReactionFromSmarts(smiles)
-            fps = rdChemReactions.CreateStructuralFingerprintForReaction(mol_)
-            rows.append([smile_id,
+            # Note: if you don't pass , ReactionFingerPrintParams=fp_params you get different results
+            # i.e. reactions that don't appear to be the same are reported as similar of 1.0
+            # https://github.com/rdkit/rdkit/discussions/5263
+            fps = rdChemReactions.CreateStructuralFingerprintForReaction(mol_, ReactionFingerPrintParams=fp_params)
+            rows.append([smile_id,
+                         self.smiles_string,
                          smiles,
                          DataStructs.TanimotoSimilarity(fps, rxn_fp),
                          DataStructs.RusselSimilarity(fps, rxn_fp),
                          DataStructs.CosineSimilarity(fps, rxn_fp)])
-        distance_df = pd.DataFrame(rows, columns=[self.id_column_name, 'TargetSmiles', 'TanimotoSimilarity', 'RusselSimilarity', 'CosineSimilarity'])
+        distance_df = pd.DataFrame(rows, columns=[self.id_column_name, 'QuerySmiles', 'TargetSmiles', 'TanimotoSimilarity', 'RusselSimilarity', 'CosineSimilarity'])
         return distance_df
     def execute(self, df: pd.DataFrame) -> pd.DataFrame:

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/similarity_substrate_step.py RENAMED Viewed

@@ -1,3 +1,5 @@
+import sys
+sys.path.append('/disk1/ariane/vscode/enzyme-tk/')
 from enzymetk.step import Step
 import pandas as pd
 import numpy as np
@@ -8,6 +10,7 @@ from rdkit.Chem import rdChemReactions
 import pandas as pd
 import os
 from rdkit.DataStructs import FingerprintSimilarity
+from rdkit.Chem import rdFingerprintGenerator
 from rdkit.Chem.Fingerprints import FingerprintMols
 import random
 import string
@@ -28,12 +31,15 @@ class SubstrateDist(Step):
         tmp_label = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
         rxn = Chem.MolFromSmiles(self.smiles_string)
-        rxn_fp = FingerprintMols.FingerprintMol(rxn)
+        # Switched to using morgan fingerprints https://greglandrum.github.io/rdkit-blog/posts/2023-01-18-fingerprint-generator-tutorial.html
+        # followed this tutorial
+        mfpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2,fpSize=2048)
+        rxn_fp = mfpgen.GetFingerprint(rxn)
         rows = []
         # compare all fp pairwise without duplicates
         for smile_id, smiles in tqdm(reaction_df[[self.id_column_name, self.smiles_column_name]].values): # -1 so the last fp will not be used
             mol_ = Chem.MolFromSmiles(smiles)
-            fps = FingerprintMols.FingerprintMol(mol_)
+            fps = mfpgen.GetFingerprint(mol_)
             rows.append([smile_id,
                          smiles,
                          DataStructs.TanimotoSimilarity(fps, rxn_fp),

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/step.py RENAMED Viewed

@@ -36,8 +36,9 @@ class Step():
         """ Execute some shit """
         return df
-    def run(self, cmd: list) -> None:
-        """ Run a command """
+    def run(self, cmd: list):
+        """ Run a command """
+        result = None
         start = timeit.default_timer()
         u.dp(['Running command', ' '.join([str(c) for c in cmd])])
         result = subprocess.run(cmd, capture_output=True, text=True)
@@ -48,8 +49,9 @@ class Step():
             logger.error(result.stderr)
         logger.info(result.stdout)
         u.dp(['Time for command to run (min): ', (timeit.default_timer() - start)/60])
+        return result
-    def __rshift__(self, other: Step) -> Step:
+    def __rshift__(self, other: Step)   :
         return Pipeline(self, other)
     def __rlshift__(self, other: pd.DataFrame) -> pd.DataFrame:

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.2
+Metadata-Version: 2.4
 Name: enzymetk
-Version: 0.0.2
+Version: 0.0.6
 Home-page: https://github.com/arianemora/enzyme-tk/
 Author: Ariane Mora
 Author-email: ariane.n.mora@gmail.com
@@ -18,17 +18,12 @@ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: fair-esm
 Requires-Dist: scikit-learn
 Requires-Dist: numpy
 Requires-Dist: seaborn
 Requires-Dist: sciutil
-Requires-Dist: pandas==2.1.4
+Requires-Dist: pandas
 Requires-Dist: biopython
-Requires-Dist: sentence_transformers
-Requires-Dist: pubchempy
-Requires-Dist: pyfaidx
-Requires-Dist: spacy
 Dynamic: author
 Dynamic: author-email
 Dynamic: classifier
@@ -37,6 +32,7 @@ Dynamic: description-content-type
 Dynamic: home-page
 Dynamic: keywords
 Dynamic: license
+Dynamic: license-file
 Dynamic: project-url
 Dynamic: requires-dist
 Dynamic: requires-python
@@ -45,6 +41,9 @@ Dynamic: requires-python
 Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperable modules that act on dataframes. These modules are designed to be imported into pipelines for specific function. For this reason, `steps` as each module is called (e.g. finding similar proteins with `BLAST` would be considered a step) are designed to be as light as possible. An example of a pipeline is the [annotate-e](https://github.com/ArianeMora/annotate-e)  ` pipeline, this acts to annotate a fasta with an ensemble of methods (each is designated as an Enzyme-tk step).
+**If you have any issues installing, let me know - this has been tested only on Linux/Ubuntu. Please post an issue!**
 ## Installation
 ## Install base package to import modules
@@ -71,6 +70,7 @@ This is a work-in progress! e.g. some tools (e.g. proteInfer and CLEAN) require
 Here are some of the tools that have been implemented to be chained together as a pipeline:
+[boltz2](https://github.com/jwohlwend/boltz)
 [mmseqs2](https://github.com/soedinglab/mmseqs2)
 [foldseek](https://github.com/steineggerlab/foldseek)
 [diamond](https://github.com/bbuchfink/diamond)
@@ -89,6 +89,7 @@ Here are some of the tools that have been implemented to be chained together as
 [fasttree](https://morgannprice.github.io/fasttree/)
 [Porechop](https://github.com/rrwick/Porechop)
 [prokka](https://github.com/tseemann/prokka)
 ## Things to note
 All the tools use the conda env of `enzymetk` by default.
@@ -120,6 +121,8 @@ The steps are the main building blocks of the pipeline. They are responsible for
 BLAST is a tool for searching a database of sequences for similar sequences. Here you can either pass a database that you have already created or pass the sequences as part of your dataframe and pass the label column (this needs to have two values: reference and query) reference refers to sequences that you want to search against and query refers to sequences that you want to search for.
+Note you need to have installed the BLAST environment.
 ```python
 id_col = 'Entry'
 seq_col = 'Sequence'
@@ -148,6 +151,34 @@ df = pd.DataFrame(rows, columns=[id_col, seq_col])
 print(df)
 df << (ActiveSitePred(id_col, seq_col, squidly_dir, num_threads) >> Save('tmp/squidly_as_pred.pkl'))
+```
+### Boltz2
+Boltz2 is a model for predicting structures. Note you need docko installed as I run via that.
+Below is an example using boltz with 4 threads, and uses a cofactor (intermediate in this case). Just set to be None for a single substrate version.
+```
+import sys
+from enzymetk.dock_boltz_step import Boltz
+from enzymetk.save_step import Save
+import pandas as pd
+import os
+os.environ['MKL_THREADING_LAYER'] = 'GNU'
+output_dir = 'tmp/'
+num_threads = 4
+id_col = 'Entry'
+seq_col = 'Sequence'
+substrate_col = 'Substrate'
+intermediate_col = 'Intermediate'
+rows = [['P0DP23_boltz_8999', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p1', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP23_boltz_p2', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p3', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]'],
+        ['P0DP24_boltz_p4', 'MALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAAMALWMRLLPLLALLALWGPDPAAA', 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', 'CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)[O-])CCC(=O)[O-].[Fe]']]
+df = pd.DataFrame(rows, columns=[id_col, seq_col, substrate_col, intermediate_col])
+df << (Boltz(id_col, seq_col, substrate_col, intermediate_col, f'{output_dir}', num_threads) >> Save(f'{output_dir}test.pkl'))
 ```
 ### Chai

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk.egg-info/SOURCES.txt RENAMED Viewed

@@ -5,6 +5,7 @@ enzymetk/__init__.py
 enzymetk/annotateEC_CLEAN_step.py
 enzymetk/annotateEC_CREEP_step.py
 enzymetk/annotateEC_proteinfer_step.py
+enzymetk/dock_boltz_step.py
 enzymetk/dock_chai_step.py
 enzymetk/dock_vina_step.py
 enzymetk/embedchem_chemberta_step.py
@@ -13,6 +14,7 @@ enzymetk/embedchem_rxnfp_step.py
 enzymetk/embedchem_selformer_run.py
 enzymetk/embedchem_selformer_step.py
 enzymetk/embedchem_unimol_step.py
+enzymetk/embedprotein_esm3_step.py
 enzymetk/embedprotein_esm_step.py
 enzymetk/esm-extract.py
 enzymetk/filter_sequence_step.py

enzymetk-0.0.6/enzymetk.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,6 @@
+scikit-learn
+numpy
+seaborn
+sciutil
+pandas
+biopython

{enzymetk-0.0.2 → enzymetk-0.0.6}/setup.py RENAMED Viewed

@@ -61,17 +61,13 @@ setup(name='enzymetk',
               'enzymetk = enzymetk.__main__:main'
           ]
       },
-      install_requires=['fair-esm',
+      install_requires=[
                         'scikit-learn',
                         'numpy',
                         'seaborn',
                         'sciutil',
-                        'pandas==2.1.4',
-                        'biopython',
-                        'sentence_transformers',
-                        'pubchempy',
-                        'pyfaidx',
-                        'spacy'],
+                        'pandas',
+                        'biopython'],
       python_requires='>=3.8',
       data_files=[("", ["LICENSE"])]
       )

enzymetk-0.0.2/enzymetk.egg-info/requires.txt DELETED Viewed

@@ -1,11 +0,0 @@
-fair-esm
-scikit-learn
-numpy
-seaborn
-sciutil
-pandas==2.1.4
-biopython
-sentence_transformers
-pubchempy
-pyfaidx
-spacy

{enzymetk-0.0.2 → enzymetk-0.0.6}/LICENSE RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/annotateEC_proteinfer_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_chemberta_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_rxnfp_run.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_rxnfp_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_selformer_run.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_selformer_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/embedchem_unimol_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/esm-extract.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/filter_sequence_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/filter_structure_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/generate_msa_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/generate_oligopool_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/generate_tree_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/main.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/metagenomics_porechop_trim_reads_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/metagenomics_prokka_annotate_genes.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/pipeline.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/predict_activity_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/predict_catalyticsite_run.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/reducedim_pca_run.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/reducedim_vae_run.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/reducedim_vae_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/save_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/sequence_search_blast.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk/similarity_mmseqs_step.py RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk.egg-info/entry_points.txt RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/enzymetk.egg-info/top_level.txt RENAMED Viewed

File without changes

{enzymetk-0.0.2 → enzymetk-0.0.6}/setup.cfg RENAMED Viewed

File without changes

enzymetk 0.0.2__tar.gz → 0.0.6__tar.gz

enzymetk 0.0.2tar.gz → 0.0.6tar.gz