PyPI - enzymetk - Versions diffs - 0.0.6__tar.gz → 0.0.7__tar.gz - Mend

enzymetk 0.0.6tar.gz → 0.0.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

{enzymetk-0.0.6 → enzymetk-0.0.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: enzymetk
-Version: 0.0.6
+Version: 0.0.7
 Home-page: https://github.com/arianemora/enzyme-tk/
 Author: Ariane Mora
 Author-email: ariane.n.mora@gmail.com
@@ -13,17 +13,22 @@ Classifier: Intended Audience :: Science/Research
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Natural Language :: English
 Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Requires-Python: >=3.8
+Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: scikit-learn
 Requires-Dist: numpy
 Requires-Dist: seaborn
 Requires-Dist: sciutil
+Requires-Dist: tqdm
 Requires-Dist: pandas
 Requires-Dist: biopython
+Requires-Dist: transformers
+Requires-Dist: torch
+Requires-Dist: huggingface_hub
 Dynamic: author
 Dynamic: author-email
 Dynamic: classifier
@@ -49,8 +54,91 @@ Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperabl
 ## Install base package to import modules
 ```bash
+conda create --name enzymetk python==3.12 -y
 pip install enzymetk
+# Install torch for your specific cuda version
+pip install torch torchvision #--index-url https://download.pytorch.org/whl/cu130
 ```
+## If you're at the bleeding edge, and going to use older models e.g. chemBERTa2 you may need to run
+```
+pip uninstall transformers -y
+pip install "transformers<5"
+```
+## For each module run install the first time you're running it
+This will install as a venv where possible and conda where the tools don't allow for venvs.
+See specific tools for info.
+```
+bm = BLAST(id_col, seq_col, label_col)
+bm.install() # by default will create a venv or if needed a conda env
+```
+Note if you want to use your specific environment you can install externally and override the installed venv or conda env e.g.
+```
+bm = BLAST(id_col, seq_col, label_col)
+bm.conda = 'blast_env' # an already installed env on your computer
+bm.venv = None # so it knows to use conda i.e. forces it not to use venv
+```
+## Modules requiring conda
+- CREEP [not tested again]
+- CLEAN [not tested again]
+- ProteInfer [not tested again]
+## Modules able to run in venv
+- BLAST [cpu, tested with both, see notebook]
+- ChemBERTA [cpu, colab]
+- Boltz
+- Chai: conda install -c conda-forge pdbfixer
+- esm2/3 [cpu, see notebook]
+- foldseek [tested and works]
+- ligandmpnn
+- mmseqs [can get working...]
+- msa []
+- reaction_similarity [good, cpu]
+- rxnfp [needs specific python version so not easy in colab] hence install is with `enzymetk install rxnfp` requires conda
+- substrate_similarity [good, cpu]
+- tree
+- unimol [good, cpu]
+Docko git@github.com:ArianeMora/docko.git
+ValueError: CCD component ALA not found!
+boltz predict  boltz.fasta --use_msa_server --cache ./mol
+srun -p gpu --qos=normal --gres=gpu:1 --pty --mem=64G  --time=000:30:00 bash
+pipelines: reads --> poreChop --> Flye --> Prokka --> Squidly --> Foldseek --> Boltz --> Chai
+pipelines: seqs --> BLAST --> Proteinfer --> Foldseek -->  MMseqs --> ClustalOmega --> FastTree
+pipelines: reactions --> rxnFP --> selformer --> uniMol --> chemBERTa2 --> RDkit reaction similarity
+| Module                       | Name          | Description                                                                       | Colab ipynb|
+|------------------------------|---------------|-----------------------------------------------------------------------------------|------------|
+| Metagenomics                 | PoreChop      | Used to filter adapters for nanopore sequences in metagenomics   pipeline.        | y          |
+| Metagenomics                 | Flye          | Used to assemble the metagenomes.                                                 | ?          |
+| Metagenomics                 | Prokka        | Annotation of genes within the genome.                                            | ?          |
+| Function prediction          | Proteinfer    | Annotation of genes to function (GO or EC class) using ML.                        | 33          |
+| Function prediction          | CLEAN         | Annotation of genes to EC class using ML.                                         | 11          |
+| Function prediction          | CREEP         | Annotation of genes to EC class using ML.                                         | 13          |
+| Function prediction          | Func-e        | Annotation of genes to reaction using ML.                                         | This study. |
+| Function prediction          | Squidly       | Annotation of catalytic residues using ML.                                        | 36          |
+| Embedding generation         | ESM2 & 3      | Conversion of amino acid sequence to a numerical embedding   using a PLM.         | 46,47       |
+| Embedding generation         | RxnFP         | Conversion of reaction smiles to a numerical embedding using a   language model.  | 48          |
+| Embedding generation         | Selformer     | Conversion of reaction selfies to a numerical embedding using   a language model. | 49          |
+| Embedding generation         | Uni-mol       | Conversion of molecule smiles to a numerical embedding using a   language model.  | 50          |
+| Embedding generation         | ChemBERTa2    | Conversion of reaction smiles to a numerical embedding using a   language model.  | 51          |
+| Docking                      | Chai          | Diffusion based folding of a protein and ligand.                                  | 42          |
+| Docking                      | Boltz         | Diffusion based folding of a protein and ligand.                                  | 52          |
+| Similarity                   | Diamond       | Sequence similarity calculation   using basic local alignment search.             | 53          |
+| Similarity                   | Foldseek      | Fast structure similarity search.                                                 | 54          |
+| Similarity                   | MMseqs        | Fast sequence clustering.                                                         | 55          |
+| Docking                      | StructureZyme | Alignment and calculation of structure metrics.                                   | 56          |
+| Oligo design                 | Oligopoolio   | Calculation of oligo fragments for gene assembly.                                 | This study. |
+| Sequencing                   | LevSeq        | Sequence verification of protein variants.                                        | 34          |
+| MSA generation               | ClustalOmega  | Creation of multiple sequence alignments (MSA).                                   | 57          |
+| Phylogenetic tree generation | FastTree      | Creation of multiple phylogenetic trees.                                          | 58          |
 ### Install only the specific requirements you need (recomended)
@@ -121,7 +209,11 @@ The steps are the main building blocks of the pipeline. They are responsible for
 BLAST is a tool for searching a database of sequences for similar sequences. Here you can either pass a database that you have already created or pass the sequences as part of your dataframe and pass the label column (this needs to have two values: reference and query) reference refers to sequences that you want to search against and query refers to sequences that you want to search for.
-Note you need to have installed the BLAST environment.
+Note you can install 2 ways, with a conda env by command line:
+```
+enzymetk install_diamond
+```
 ```python
 id_col = 'Entry'
@@ -288,6 +380,16 @@ df << (CREEP(id_col, reaction_col, CREEP_cache_dir='/disk1/share/software/CREEP/
 EmbedESM is a tool for embedding a set of sequences using ESM2.
+Either in your own conda env: `pip install esm-fair` or you can run:
+```
+id_col = 'Entry'
+seq_col = 'Sequence'
+label_col = 'ActiveSite'
+esm = EmbedESM(id_col, seq_col, extraction_method='mean', tmp_dir='tmp', rep_num=36) # i.e. the representation number you want usually the last layer
+esm.install() # And follow the instructions to activate the env
+```
 ```python
 from enzymetk.embedprotein_esm_step import EmbedESM
 from enzymetk.save_step import Save

{enzymetk-0.0.6 → enzymetk-0.0.7}/README.md RENAMED Viewed

@@ -10,8 +10,91 @@ Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperabl
 ## Install base package to import modules
 ```bash
+conda create --name enzymetk python==3.12 -y
 pip install enzymetk
+# Install torch for your specific cuda version
+pip install torch torchvision #--index-url https://download.pytorch.org/whl/cu130
 ```
+## If you're at the bleeding edge, and going to use older models e.g. chemBERTa2 you may need to run
+```
+pip uninstall transformers -y
+pip install "transformers<5"
+```
+## For each module run install the first time you're running it
+This will install as a venv where possible and conda where the tools don't allow for venvs.
+See specific tools for info.
+```
+bm = BLAST(id_col, seq_col, label_col)
+bm.install() # by default will create a venv or if needed a conda env
+```
+Note if you want to use your specific environment you can install externally and override the installed venv or conda env e.g.
+```
+bm = BLAST(id_col, seq_col, label_col)
+bm.conda = 'blast_env' # an already installed env on your computer
+bm.venv = None # so it knows to use conda i.e. forces it not to use venv
+```
+## Modules requiring conda
+- CREEP [not tested again]
+- CLEAN [not tested again]
+- ProteInfer [not tested again]
+## Modules able to run in venv
+- BLAST [cpu, tested with both, see notebook]
+- ChemBERTA [cpu, colab]
+- Boltz
+- Chai: conda install -c conda-forge pdbfixer
+- esm2/3 [cpu, see notebook]
+- foldseek [tested and works]
+- ligandmpnn
+- mmseqs [can get working...]
+- msa []
+- reaction_similarity [good, cpu]
+- rxnfp [needs specific python version so not easy in colab] hence install is with `enzymetk install rxnfp` requires conda
+- substrate_similarity [good, cpu]
+- tree
+- unimol [good, cpu]
+Docko git@github.com:ArianeMora/docko.git
+ValueError: CCD component ALA not found!
+boltz predict  boltz.fasta --use_msa_server --cache ./mol
+srun -p gpu --qos=normal --gres=gpu:1 --pty --mem=64G  --time=000:30:00 bash
+pipelines: reads --> poreChop --> Flye --> Prokka --> Squidly --> Foldseek --> Boltz --> Chai
+pipelines: seqs --> BLAST --> Proteinfer --> Foldseek -->  MMseqs --> ClustalOmega --> FastTree
+pipelines: reactions --> rxnFP --> selformer --> uniMol --> chemBERTa2 --> RDkit reaction similarity
+| Module                       | Name          | Description                                                                       | Colab ipynb|
+|------------------------------|---------------|-----------------------------------------------------------------------------------|------------|
+| Metagenomics                 | PoreChop      | Used to filter adapters for nanopore sequences in metagenomics   pipeline.        | y          |
+| Metagenomics                 | Flye          | Used to assemble the metagenomes.                                                 | ?          |
+| Metagenomics                 | Prokka        | Annotation of genes within the genome.                                            | ?          |
+| Function prediction          | Proteinfer    | Annotation of genes to function (GO or EC class) using ML.                        | 33          |
+| Function prediction          | CLEAN         | Annotation of genes to EC class using ML.                                         | 11          |
+| Function prediction          | CREEP         | Annotation of genes to EC class using ML.                                         | 13          |
+| Function prediction          | Func-e        | Annotation of genes to reaction using ML.                                         | This study. |
+| Function prediction          | Squidly       | Annotation of catalytic residues using ML.                                        | 36          |
+| Embedding generation         | ESM2 & 3      | Conversion of amino acid sequence to a numerical embedding   using a PLM.         | 46,47       |
+| Embedding generation         | RxnFP         | Conversion of reaction smiles to a numerical embedding using a   language model.  | 48          |
+| Embedding generation         | Selformer     | Conversion of reaction selfies to a numerical embedding using   a language model. | 49          |
+| Embedding generation         | Uni-mol       | Conversion of molecule smiles to a numerical embedding using a   language model.  | 50          |
+| Embedding generation         | ChemBERTa2    | Conversion of reaction smiles to a numerical embedding using a   language model.  | 51          |
+| Docking                      | Chai          | Diffusion based folding of a protein and ligand.                                  | 42          |
+| Docking                      | Boltz         | Diffusion based folding of a protein and ligand.                                  | 52          |
+| Similarity                   | Diamond       | Sequence similarity calculation   using basic local alignment search.             | 53          |
+| Similarity                   | Foldseek      | Fast structure similarity search.                                                 | 54          |
+| Similarity                   | MMseqs        | Fast sequence clustering.                                                         | 55          |
+| Docking                      | StructureZyme | Alignment and calculation of structure metrics.                                   | 56          |
+| Oligo design                 | Oligopoolio   | Calculation of oligo fragments for gene assembly.                                 | This study. |
+| Sequencing                   | LevSeq        | Sequence verification of protein variants.                                        | 34          |
+| MSA generation               | ClustalOmega  | Creation of multiple sequence alignments (MSA).                                   | 57          |
+| Phylogenetic tree generation | FastTree      | Creation of multiple phylogenetic trees.                                          | 58          |
 ### Install only the specific requirements you need (recomended)
@@ -82,7 +165,11 @@ The steps are the main building blocks of the pipeline. They are responsible for
 BLAST is a tool for searching a database of sequences for similar sequences. Here you can either pass a database that you have already created or pass the sequences as part of your dataframe and pass the label column (this needs to have two values: reference and query) reference refers to sequences that you want to search against and query refers to sequences that you want to search for.
-Note you need to have installed the BLAST environment.
+Note you can install 2 ways, with a conda env by command line:
+```
+enzymetk install_diamond
+```
 ```python
 id_col = 'Entry'
@@ -249,6 +336,16 @@ df << (CREEP(id_col, reaction_col, CREEP_cache_dir='/disk1/share/software/CREEP/
 EmbedESM is a tool for embedding a set of sequences using ESM2.
+Either in your own conda env: `pip install esm-fair` or you can run:
+```
+id_col = 'Entry'
+seq_col = 'Sequence'
+label_col = 'ActiveSite'
+esm = EmbedESM(id_col, seq_col, extraction_method='mean', tmp_dir='tmp', rep_num=36) # i.e. the representation number you want usually the last layer
+esm.install() # And follow the instructions to activate the env
+```
 ```python
 from enzymetk.embedprotein_esm_step import EmbedESM
 from enzymetk.save_step import Save

enzymetk-0.0.7/enzymetk/__init__.py ADDED Viewed

@@ -0,0 +1,122 @@
+###############################################################################
+#                                                                             #
+#    This program is free software: you can redistribute it and/or modify     #
+#    it under the terms of the GNU General Public License as published by     #
+#    the Free Software Foundation, either version 3 of the License, or        #
+#    (at your option) any later version.                                      #
+#                                                                             #
+#    This program is distributed in the hope that it will be useful,          #
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of           #
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the            #
+#    GNU General Public License for more details.                             #
+#                                                                             #
+#    You should have received a copy of the GNU General Public License        #
+#    along with this program. If not, see <http://www.gnu.org/licenses/>.     #
+#                                                                             #
+###############################################################################
+"""
+Author: Ariane Mora
+Date: March 2025
+"""
+__title__ = 'enzymetk'
+__description__ = 'Toolkit for enzymes and what not'
+__url__ = 'https://github.com/arianemora/enzyme-tk/'
+__version__ = '0.0.7'
+__author__ = 'Ariane Mora'
+__author_email__ = 'ariane.n.mora@gmail.com'
+__license__ = 'GPL3'
+# Core classes
+from enzymetk.step import Step, Pipeline
+from enzymetk.save_step import Save
+# EC Annotation
+from enzymetk.annotateEC_CLEAN_step import CLEAN
+from enzymetk.annotateEC_CREEP_step import CREEP
+from enzymetk.annotateEC_proteinfer_step import ProteInfer
+# Docking
+from enzymetk.dock_boltz_step import Boltz
+from enzymetk.dock_chai_step import Chai
+from enzymetk.dock_vina_step import Vina
+# Chemical Embeddings
+from enzymetk.embedchem_chemberta_step import ChemBERT
+from enzymetk.embedchem_rxnfp_step import RxnFP
+from enzymetk.embedchem_selformer_step import SelFormer
+from enzymetk.embedchem_unimol_step import UniMol
+# Protein Embeddings
+from enzymetk.embedprotein_esm_step import EmbedESM
+from enzymetk.embedprotein_esm3_step import EmbedESM3
+# Sequence Generation/Alignment
+from enzymetk.generate_msa_step import ClustalOmega
+from enzymetk.generate_tree_step import FastTree
+# Protein Design
+from enzymetk.inpaint_ligandMPNN_step import LigandMPNN
+# Metagenomics
+from enzymetk.metagenomics_porechop_trim_reads_step import PoreChop
+from enzymetk.metagenomics_prokka_annotate_genes import Prokka
+# Prediction
+from enzymetk.predict_catalyticsite_step import ActiveSitePred
+# Sequence Search
+from enzymetk.sequence_search_blast import BLAST
+# Similarity Search
+from enzymetk.similarity_foldseek_step import FoldSeek
+from enzymetk.similarity_mmseqs_step import MMseqs
+from enzymetk.similarity_reaction_step import ReactionDist
+from enzymetk.similarity_substrate_step import SubstrateDist
+# Structure Search (aliased to avoid conflict with similarity_foldseek_step.FoldSeek)
+from enzymetk.structure_search_foldseek import FoldSeek as StructureFoldSeek
+__all__ = [
+    # Core
+    'Step',
+    'Pipeline',
+    'Save',
+    # EC Annotation
+    'CLEAN',
+    'CREEP',
+    'ProteInfer',
+    # Docking
+    'Boltz',
+    'Chai',
+    'Vina',
+    # Chemical Embeddings
+    'ChemBERT',
+    'RxnFP',
+    'SelFormer',
+    'UniMol',
+    # Protein Embeddings
+    'EmbedESM',
+    'EmbedESM3',
+    # Sequence Generation/Alignment
+    'ClustalOmega',
+    'FastTree',
+    # Protein Design
+    'LigandMPNN',
+    # Metagenomics
+    'PoreChop',
+    'Prokka',
+    # Prediction
+    'ActiveSitePred',
+    # Sequence Search
+    'BLAST',
+    # Similarity Search
+    'FoldSeek',
+    'MMseqs',
+    'ReactionDist',
+    'SubstrateDist',
+    # Structure Search
+    'StructureFoldSeek',
+]

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/annotateEC_CREEP_step.py RENAMED Viewed

@@ -5,9 +5,12 @@ import subprocess
 import logging
 import numpy as np
 import os
+from enzymetk.step import run_script
+from pathlib import Path
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
+SCRIPT_DIR = Path(__file__).parent.resolve()
 """
 import os
@@ -38,9 +41,14 @@ class CREEP(Step):
         self.args_extract = args_extract
         self.args_retrieval = args_retrieval
+    def install(self, env_args=None):
+        # Try to automatically install CREEP conda env
+        run_script('install_CREEP.sh', verbose=True)
+        self.CREEP_dir = SCRIPT_DIR.parent.resolve() / 'conda_envs' / 'CREEP'
+        self.CREEP_cache_dir = f'{self.CREEP_dir}/data/'
     def __execute(self, df: pd.DataFrame, tmp_dir: str):
-        tmp_dir = '/disk1/ariane/vscode/degradeo/pipeline/tmp/'
-        input_filename = f'{tmp_dir}/creepasjkdkajshdkja.csv'
+        input_filename = f'{tmp_dir}/input.csv'
         df.to_csv(input_filename, index=False)
         cmd = ['conda', 'run', '-n', self.env_name, 'python', f'{self.CREEP_dir}scripts/step_02_extract_CREEP.py', '--pretrained_folder',
                                  f'{self.CREEP_cache_dir}output/easy_split',

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/annotateEC_proteinfer_step.py RENAMED Viewed

@@ -5,7 +5,10 @@ from multiprocessing.dummy import Pool as ThreadPool
 from tempfile import TemporaryDirectory
 import os
 import subprocess
+from enzymetk.step import run_script
+from pathlib import Path
+SCRIPT_DIR = Path(__file__).parent.resolve()
 class ProteInfer(Step):
@@ -53,6 +56,12 @@ class ProteInfer(Step):
         self.ec3_filter = ec3_filter
         self.ec4_filter = ec4_filter
+    def install(self, env_args=None):
+        # Try to automatically install CREEP conda env
+        run_script('install_CREEP.sh', verbose=True)
+        self.CREEP_dir = SCRIPT_DIR.parent.resolve() / 'conda_envs' / 'CREEP'
+        self.CREEP_cache_dir = f'{self.CREEP_dir}/data/'
     def __execute(self, data: list) -> np.array:
         df, tmp_dir = data
         # Make sure in the directory of proteinfer

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/dock_boltz_step.py RENAMED Viewed

@@ -1,6 +1,5 @@
 from enzymetk.step import Step
 import pandas as pd
-from docko.boltz import run_boltz_affinity
 import logging
 import numpy as np
 from multiprocessing.dummy import Pool as ThreadPool
@@ -9,16 +8,40 @@ from multiprocessing.dummy import Pool as ThreadPool
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
+try:
+    from docko.boltz import run_boltz_affinity
+except ImportError as e:
+    print("Boltz: Needs docko package. Install with: pip install docko.")
 class Boltz(Step):
-    def __init__(self, id_col: str, seq_col: str, substrate_col: str, intermediate_col: str, output_dir: str, num_threads: int):
+    def __init__(self, id_col: str, seq_col: str, substrate_col: str, intermediate_col: str, output_dir: str,
+                num_threads: 1, env_name = None, args=None):
+        super().__init__()
         self.id_col = id_col
         self.seq_col = seq_col
         self.substrate_col = substrate_col
         self.intermediate_col = intermediate_col
         self.output_dir = output_dir or None
         self.num_threads = num_threads or 1
+        self.conda = env_name
+        self.env_name = env_name
+        self.args = args
+    def install(self, env_args=None):
+        # e.g. env args could by python=='3.1.1.
+        self.install_venv(env_args)
+        # Now the specific
+        try:
+            cmd = [f'{self.env_name}/bin/pip', 'install', 'docko']
+            self.run(cmd)
+        except Exception as e:
+            cmd = [f'{self.env_name}/bin/pip3', 'install', 'docko']
+            self.run(cmd)
+        self.run(cmd)
+        # Now set the venv to be the location:
+        self.venv = f'{self.env_name}/bin/python'
     def __execute(self, df: pd.DataFrame) -> pd.DataFrame:
         output_filenames = []
@@ -28,11 +51,15 @@ class Boltz(Step):
             if not isinstance(substrate, str):
                 substrate = ''
             print(run_id, seq, substrate)
-            run_boltz_affinity(run_id, seq, substrate, self.output_dir, intermediate)
+            if self.args:
+                run_boltz_affinity(run_id, seq, substrate, self.output_dir, intermediate, self.args)
+            else:
+                run_boltz_affinity(run_id, seq, substrate, self.output_dir, intermediate)
             output_filenames.append(f'{self.output_dir}/{run_id}/')
         return output_filenames
     def execute(self, df: pd.DataFrame) -> pd.DataFrame:
         if self.output_dir:
             if self.num_threads > 1:
                 pool = ThreadPool(self.num_threads)

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/dock_chai_step.py RENAMED Viewed

@@ -1,9 +1,13 @@
 from enzymetk.step import Step
 import pandas as pd
-from docko.chai import run_chai
 import logging
 import numpy as np
+try:
+    from docko.chai import run_chai
+except ImportError as e:
+    print("Chai: Needs docko package. Install with: pip install docko.")
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
@@ -11,7 +15,9 @@ logger.setLevel(logging.INFO)
 class Chai(Step):
-    def __init__(self, id_col: str, seq_col: str, substrate_col: str, cofactor_col: str, output_dir: str, num_threads: int):
+    def __init__(self, id_col: str, seq_col: str, substrate_col: str, cofactor_col: str, output_dir: str,
+                num_threads: 1, venv_name = 'enzymetk', env_name = None):
+        super().__init__()
         self.id_col = id_col
         self.seq_col = seq_col
         self.substrate_col = substrate_col
@@ -19,6 +25,21 @@ class Chai(Step):
         self.output_dir = output_dir or None
         self.num_threads = num_threads or 1
+    def install(self, env_args=None):
+        # e.g. env args could by python=='3.1.1.
+        self.install_venv(env_args)
+        # Now the specific
+        try:
+            cmd = [f'{self.env_name}/bin/pip', 'install', 'docko']
+            self.run(cmd)
+        except Exception as e:
+            cmd = [f'{self.env_name}/bin/pip3', 'install', 'docko']
+            self.run(cmd)
+        self.run(cmd)
+        # Now set the venv to be the location:
+        self.venv = f'{self.env_name}/bin/python'
     def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
         output_filenames = []
         for run_id, seq, substrate, cofactor in df[[self.id_col, self.seq_col, self.substrate_col, self.cofactor_col]].values:

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/dock_vina_step.py RENAMED Viewed

@@ -1,12 +1,18 @@
 from enzymetk.step import Step
 import pandas as pd
-from docko.docko import *
 import logging
 import numpy as np
 import os
 from pathlib import Path
 from multiprocessing.dummy import Pool as ThreadPool
+try:
+    from docko.docko import *
+except ImportError as e:
+    print("Vina: Needs docko package. Install with: pip install docko.")
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
@@ -14,7 +20,8 @@ logger.setLevel(logging.INFO)
 class Vina(Step):
     def __init__(self, id_col: str, structure_col: str, sequence_col: str,
-                 substrate_col: str, substrate_name_col: str, active_site_col: str, output_dir: str, num_threads: int):
+                 substrate_col: str, substrate_name_col: str, active_site_col: str, output_dir: str, num_threads: 1,
+                 venv_name = 'enzymetk', env_name = None):
         print('Expects active site residues as a string separated by |. Zero indexed.')
         self.id_col = id_col
         self.structure_col = structure_col
@@ -25,6 +32,20 @@ class Vina(Step):
         self.output_dir = Path( output_dir) or None
         self.num_threads = num_threads or 1
+    def install(self, env_args=None):
+        # e.g. env args could by python=='3.1.1.
+        self.install_venv(env_args)
+        # Now the specific
+        try:
+            cmd = [f'{self.env_name}/bin/pip', 'install', 'docko']
+            self.run(cmd)
+        except Exception as e:
+            cmd = [f'{self.env_name}/bin/pip3', 'install', 'docko']
+            self.run(cmd)
+        self.run(cmd)
+        # Now set the venv to be the location:
+        self.venv = f'{self.env_name}/bin/python'
     def __execute(self, df: pd.DataFrame) -> pd.DataFrame:
         output_filenames = []
         # ToDo: update to create from sequence if the path doesn't exist.

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/embedchem_chemberta_step.py RENAMED Viewed

@@ -16,7 +16,6 @@ class ChemBERT(Step):
         self.seq_len_limit = 500
         self.embedding_len = 768
     def __execute(self, data: list) -> np.array:
         results = []
         for v in data:

{enzymetk-0.0.6 → enzymetk-0.0.7}/enzymetk/embedchem_rxnfp_step.py RENAMED Viewed

@@ -16,10 +16,28 @@ logger.setLevel(logging.INFO)
 class RxnFP(Step):
-    def __init__(self, smiles_col: str, num_threads: int, env_name: str = 'rxnfp'):
+    def __init__(self, smiles_col: str, num_threads: 1,
+                 env_name = 'rxnfp', venv_name = None):
+        super().__init__()
         self.value_col = smiles_col
         self.num_threads = num_threads or 1
+        self.conda = env_name
         self.env_name = env_name
+        self.venv = venv_name if venv_name else f'{env_name}/bin/python'
+    def install(self, env_args=['--python', '3.8']):
+        # e.g. env args could by python=='3.1.1.
+        self.install_conda(env_args=env_args)
+        # Now the specific
+        try:
+            cmd = [f'pip', 'install', 'rxnfp', 'rdkit=2020.03.3', 'tmap', 'numpy==1.23', 'sciutil']
+            self.run(cmd)
+        except Exception as e:
+            cmd = [f'pip', 'install', 'rxnfp', 'rdkit=2020.03.3', 'tmap', 'numpy==1.23', 'sciutil']
+            self.run(cmd)
+        self.run(cmd)
+        # Now set the venv to be the location:
+        self.conda = f'{self.env_name}'
     def __execute(self, df: pd.DataFrame, tmp_dir: str) -> pd.DataFrame:
         tmp_label = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
@@ -27,7 +45,7 @@ class RxnFP(Step):
         output_filename = f'{tmp_dir}/rxnfp_{tmp_label}.pkl'
         input_filename = f'{tmp_dir}/input_{tmp_label}.csv'
         df.to_csv(input_filename, index=False)
-        cmd = ['conda', 'run', '-n', self.env_name, 'python', Path(__file__).parent/'embedchem_rxnfp_run.py', '--out', output_filename,
+        cmd = ['python', Path(__file__).parent/'embedchem_rxnfp_run.py', '--out', output_filename,
                                 '--input', input_filename, '--label', self.value_col]
         self.run(cmd)
         # Might have an issue if the things are not correctly installed in the same dicrectory

enzymetk 0.0.6__tar.gz → 0.0.7__tar.gz

enzymetk 0.0.6tar.gz → 0.0.7tar.gz