PyPI - PySAR - Versions diffs - 2.5.1__tar.gz → 2.5.2__tar.gz - Mend

PySAR 2.5.1tar.gz → 2.5.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{pysar-2.5.1 → pysar-2.5.2}/PKG-INFO +8 -2
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/PKG-INFO +8 -2
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/SOURCES.txt +1 -0
{pysar-2.5.1 → pysar-2.5.2}/README.md +8 -2
{pysar-2.5.1 → pysar-2.5.2}/docs/conf.py +1 -1
{pysar-2.5.1 → pysar-2.5.2}/pySAR/__init__.py +7 -1
pysar-2.5.2/pySAR/config.py +103 -0
{pysar-2.5.1 → pysar-2.5.2}/pySAR/descriptors.py +59 -58
{pysar-2.5.1 → pysar-2.5.2}/pySAR/encoding.py +240 -37
{pysar-2.5.1 → pysar-2.5.2}/pySAR/evaluate.py +6 -4
pysar-2.5.2/pySAR/globals_.py +38 -0
{pysar-2.5.1 → pysar-2.5.2}/pySAR/model.py +157 -18
{pysar-2.5.1 → pysar-2.5.2}/pySAR/plots.py +7 -4
{pysar-2.5.1 → pysar-2.5.2}/pySAR/pyDSP.py +63 -108
{pysar-2.5.1 → pysar-2.5.2}/pySAR/pySAR.py +523 -220
{pysar-2.5.1 → pysar-2.5.2}/pySAR/utils.py +14 -10
{pysar-2.5.1 → pysar-2.5.2}/pyproject.toml +1 -1
{pysar-2.5.1 → pysar-2.5.2}/tests/test_descriptors.py +52 -0
{pysar-2.5.1 → pysar-2.5.2}/tests/test_encoding.py +164 -11
{pysar-2.5.1 → pysar-2.5.2}/tests/test_evaluate.py +3 -3
{pysar-2.5.1 → pysar-2.5.2}/tests/test_model.py +130 -12
{pysar-2.5.1 → pysar-2.5.2}/tests/test_plots.py +4 -4
{pysar-2.5.1 → pysar-2.5.2}/tests/test_pyDSP.py +66 -1
{pysar-2.5.1 → pysar-2.5.2}/tests/test_pySAR.py +208 -22
{pysar-2.5.1 → pysar-2.5.2}/tests/test_utils.py +38 -13
pysar-2.5.1/pySAR/globals_.py +0 -18
{pysar-2.5.1 → pysar-2.5.2}/LICENSE +0 -0
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/dependency_links.txt +0 -0
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/not-zip-safe +0 -0
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/requires.txt +0 -0
{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/top_level.txt +0 -0
{pysar-2.5.1 → pysar-2.5.2}/pySAR/py.typed +0 -0
{pysar-2.5.1 → pysar-2.5.2}/setup.cfg +0 -0

{pysar-2.5.1 → pysar-2.5.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PySAR
-Version: 2.5.1
+Version: 2.5.2
 Summary: Analysing Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.
 Author-email: AJ McKenna <amckenna41@qub.ac.uk>
 Maintainer-email: AJ McKenna <amckenna41@qub.ac.uk>
@@ -70,8 +70,13 @@ Dynamic: license-file
 `pySAR` is a Python library for analysing Sequence Activity Relationships (SARs)/Sequence Function Relationships (SFRs) of protein sequences.
+<h2 align="center">
+  The NEW front-end app for pySAR is available
+  <a href="https://pysar-app.vercel.app/" target="_blank">here</a>!
+</h2>
 * 📖 The published research article is available [here][article].
-* 🌍 A front-end app for `pySAR` is available [here][frontend] (coming soon).
 * 💻 A quick Colab notebook demo of `pySAR` is available [here][demo].
 * 📰 A **Medium** article that dives deeper into SARs and the `pySAR` software itself is available [here][medium].
@@ -739,3 +744,4 @@ DOI: 10.1021/acs.jcim.0c00073 <br><br>
 [config]: https://github.com/amckenna41/pySAR/blob/master/CONFIG.md
 [medium]: https://ajmckenna69.medium.com/pysar-a3de9f71733f
 [directed_evolution]: https://en.wikipedia.org/wiki/Directed_evolution_(protein_engineering)
+[frontend]: https://pysar-app.vercel.app/

{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PySAR
-Version: 2.5.1
+Version: 2.5.2
 Summary: Analysing Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.
 Author-email: AJ McKenna <amckenna41@qub.ac.uk>
 Maintainer-email: AJ McKenna <amckenna41@qub.ac.uk>
@@ -70,8 +70,13 @@ Dynamic: license-file
 `pySAR` is a Python library for analysing Sequence Activity Relationships (SARs)/Sequence Function Relationships (SFRs) of protein sequences.
+<h2 align="center">
+  The NEW front-end app for pySAR is available
+  <a href="https://pysar-app.vercel.app/" target="_blank">here</a>!
+</h2>
 * 📖 The published research article is available [here][article].
-* 🌍 A front-end app for `pySAR` is available [here][frontend] (coming soon).
 * 💻 A quick Colab notebook demo of `pySAR` is available [here][demo].
 * 📰 A **Medium** article that dives deeper into SARs and the `pySAR` software itself is available [here][medium].
@@ -739,3 +744,4 @@ DOI: 10.1021/acs.jcim.0c00073 <br><br>
 [config]: https://github.com/amckenna41/pySAR/blob/master/CONFIG.md
 [medium]: https://ajmckenna69.medium.com/pysar-a3de9f71733f
 [directed_evolution]: https://en.wikipedia.org/wiki/Directed_evolution_(protein_engineering)
+[frontend]: https://pysar-app.vercel.app/

{pysar-2.5.1 → pysar-2.5.2}/PySAR.egg-info/SOURCES.txt RENAMED Viewed

@@ -9,6 +9,7 @@ PySAR.egg-info/requires.txt
 PySAR.egg-info/top_level.txt
 docs/conf.py
 pySAR/__init__.py
+pySAR/config.py
 pySAR/descriptors.py
 pySAR/encoding.py
 pySAR/evaluate.py

{pysar-2.5.1 → pysar-2.5.2}/README.md RENAMED Viewed

@@ -20,8 +20,13 @@
 `pySAR` is a Python library for analysing Sequence Activity Relationships (SARs)/Sequence Function Relationships (SFRs) of protein sequences.
+<h2 align="center">
+  The NEW front-end app for pySAR is available
+  <a href="https://pysar-app.vercel.app/" target="_blank">here</a>!
+</h2>
 * 📖 The published research article is available [here][article].
-* 🌍 A front-end app for `pySAR` is available [here][frontend] (coming soon).
 * 💻 A quick Colab notebook demo of `pySAR` is available [here][demo].
 * 📰 A **Medium** article that dives deeper into SARs and the `pySAR` software itself is available [here][medium].
@@ -688,4 +693,5 @@ DOI: 10.1021/acs.jcim.0c00073 <br><br>
 [license]: https://github.com/amckenna41/pySAR/blob/master/LICENSE
 [config]: https://github.com/amckenna41/pySAR/blob/master/CONFIG.md
 [medium]: https://ajmckenna69.medium.com/pysar-a3de9f71733f
-[directed_evolution]: https://en.wikipedia.org/wiki/Directed_evolution_(protein_engineering)
+[directed_evolution]: https://en.wikipedia.org/wiki/Directed_evolution_(protein_engineering)
+[frontend]: https://pysar-app.vercel.app/

{pysar-2.5.1 → pysar-2.5.2}/docs/conf.py RENAMED Viewed

@@ -15,7 +15,7 @@ sys.path.insert(0, os.path.abspath('..'))
 project = 'pySAR'
 copyright = '2026, AJ McKenna'
 author = 'AJ McKenna'
-release = '2.5.1'
+release = '2.5.2'
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

{pysar-2.5.1 → pysar-2.5.2}/pySAR/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """ pySAR software metadata. """
 __name__ = 'pySAR'
-__version__ = "2.5.1"
+__version__ = "2.5.2"
 __description__ = 'A Python package used to analysis Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.'
 __author__ = 'AJ McKenna: https://github.com/amckenna41'
 __authorEmail__ = 'amckenna41@qub.ac.uk'
@@ -13,6 +13,9 @@ __keywords__ = ["bioinformatics", "protein engineering", "python", "pypi", "mach
         "directed evolution", "drug discovery", "sequence activity relationships", "SAR", "aaindex", "protpy", "protein descriptors"]
 __test_suite__ = "tests"
+from .encoding import SortKey, EncodingResult
+from .config import PySARConfig
 __all__ = [
     '__version__',
     '__description__',
@@ -25,4 +28,7 @@ __all__ = [
     '__status__',
     '__keywords__',
     '__test_suite__',
+    'SortKey',
+    'EncodingResult',
+    'PySARConfig',
 ]

pysar-2.5.2/pySAR/config.py ADDED Viewed

@@ -0,0 +1,103 @@
+################################################################################
+#################                  PySARConfig                 #################
+################################################################################
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Union
+@dataclass
+class PySARConfig:
+    """
+    Typed configuration container for PySAR and Encoding.
+    All parameters mirror the keys in the JSON configuration files so a
+    ``PySARConfig`` instance can be used wherever a config filepath is accepted.
+    Fields left as *None* fall back to the defaults encoded in the JSON file.
+    Parameters
+    ==========
+    :config_file: str
+        Path to the JSON configuration file.  When provided all other fields
+        are used as overrides rather than replacements.
+    :dataset: str
+        Path to the CSV dataset of protein sequences and activity values.
+    :sequence_col: str
+        Name of the column in *dataset* that contains the protein sequences.
+    :activity_col: str
+        Name of the column in *dataset* that contains the activity/fitness values.
+    :algorithm: str
+        Sklearn regression algorithm name (e.g. ``'plsregression'``, ``'randomforest'``).
+    :parameters: dict
+        Keyword arguments forwarded to the sklearn model constructor.
+    :test_split: float
+        Fraction of data held back for testing (0 < test_split < 1).
+    :use_dsp: bool
+        Apply a DSP (FFT) pipeline to the AAI-encoded sequences before modelling.
+    :spectrum: str
+        Informational spectrum to use when *use_dsp* is True.
+        One of ``'power'``, ``'real'``, ``'imaginary'``, ``'absolute'``.
+    :window_type: str
+        Window function to apply before the FFT (e.g. ``'hamming'``, ``'blackman'``).
+    :filter_type: str
+        Filter to apply after the FFT (e.g. ``'savgol'``, ``'medfilt'``).
+    :descriptors_csv: str
+        Path to a pre-calculated descriptors CSV file.  When provided the
+        ``Descriptors`` class will import values directly rather than
+        recomputing them.
+    Usage
+    =====
+    >>> cfg = PySARConfig(
+    ...     config_file="thermostability.json",
+    ...     algorithm="randomforest",
+    ...     test_split=0.1,
+    ... )
+    >>> from pySAR import PySAR
+    >>> sar = PySAR(cfg.config_file, algorithm=cfg.algorithm, test_split=cfg.test_split)
+    """
+    config_file: str = ""
+    dataset: Optional[str] = None
+    sequence_col: Optional[str] = None
+    activity_col: Optional[str] = None
+    algorithm: Optional[str] = None
+    parameters: Optional[Dict[str, Any]] = None
+    test_split: Optional[float] = None
+    use_dsp: Optional[bool] = None
+    spectrum: Optional[str] = None
+    window_type: Optional[str] = None
+    filter_type: Optional[str] = None
+    descriptors_csv: Optional[str] = None
+    def to_kwargs(self) -> Dict[str, Any]:
+        """
+        Return a dict of non-None, non-config_file fields suitable for passing
+        as ``**kwargs`` to :class:`~pySAR.pySAR.PySAR` or
+        :class:`~pySAR.encoding.Encoding`.
+        Returns
+        =======
+        :kwargs: dict
+            Only fields that have been explicitly set (i.e. are not None) are
+            included.  The ``config_file`` field is excluded since it is passed
+            as a positional argument.
+        """
+        result: Dict[str, Any] = {}
+        for field_name in (
+            "dataset",
+            "sequence_col",
+            "activity_col",
+            "algorithm",
+            "parameters",
+            "test_split",
+            "use_dsp",
+            "spectrum",
+            "window_type",
+            "filter_type",
+            "descriptors_csv",
+        ):
+            value = getattr(self, field_name)
+            if value is not None:
+                result[field_name] = value
+        return result

{pysar-2.5.1 → pysar-2.5.2}/pySAR/descriptors.py RENAMED Viewed

@@ -12,6 +12,7 @@ import itertools
 import time
 from tqdm import tqdm
 from functools import lru_cache
+from concurrent.futures import ThreadPoolExecutor, as_completed
 from .utils import *
 import protpy as protpy
@@ -374,13 +375,15 @@ class Descriptors():
     [14] B. Hollas, “An analysis of the autocorrelation descriptor for molecules,” J. Math. Chem.,
         vol. 33, no. 2, pp. 91–101, 2003.
     """
-    def __init__(self,
-                 config_file: str = "",
-                 protein_seqs: Optional[Union[pd.Series, str]] = None,
+    def __init__(self,
+                 config_file: str = "",
+                 protein_seqs: Optional[Union[pd.Series, str]] = None,
+                 n_jobs: int = 1,
                  **kwargs) -> None:
         self.config_file = config_file
         self.protein_seqs = protein_seqs
+        self.n_jobs = max(1, int(n_jobs))
         self.kwargs = locals()['kwargs'] #get any keyword argument variables of class
         self.config_parameters = {}
@@ -1995,55 +1998,40 @@ class Descriptors():
         #start time counter
         start = time.time()
-        #iterate over all descriptors, calculating each using their respective function and the protpy package
-        for descr in tqdm(self.all_descriptors_list(), unit=" descriptor", position=0,
-            desc="Descriptors", mininterval=30, ncols=90):
-            #if descriptor attribute DF is empty then call its respective get_descriptor function
-            if (descr == "amino_acid_composition" and getattr(self, "amino_acid_composition").empty):
-                self.amino_acid_composition = self.get_amino_acid_composition()
-            if (descr == "dipeptide_composition" and getattr(self, "dipeptide_composition").empty):
-                    self.dipeptide_composition = self.get_dipeptide_composition()
-            if (descr == "tripeptide_composition" and getattr(self, "tripeptide_composition").empty):
-                    self.tripeptide_composition = self.get_tripeptide_composition()
-            if (descr == "moreaubroto_autocorrelation" and getattr(self, "moreaubroto_autocorrelation").empty):
-                self.moreaubroto_autocorrelation = self.get_moreaubroto_autocorrelation()
-            if (descr == "moran_autocorrelation" and getattr(self, "moran_autocorrelation").empty):
-                self.moran_autocorrelation = self.get_moran_autocorrelation()
-            if (descr == "geary_autocorrelation" and getattr(self, "geary_autocorrelation").empty):
-                self.geary_autocorrelation = self.get_geary_autocorrelation()
-            if (descr == "ctd" and getattr(self, "ctd").empty):
-                    self.ctd = self.get_ctd()
-            if (descr == "ctd_composition" and getattr(self, "ctd_composition").empty):
-                    self.ctd_composition = self.get_ctd_composition()
-            if (descr == "ctd_transition" and getattr(self, "ctd_transition").empty):
-                self.ctd_transition = self.get_ctd_transition()
-            if (descr == "ctd_distribution" and getattr(self, "ctd_distribution").empty):
-                self.ctd_distribution = self.get_ctd_distribution()
-            if (descr == "conjoint_triad" and getattr(self, "conjoint_triad").empty):
-                    self.conjoint_triad = self.get_conjoint_triad()
-            if (descr == "sequence_order_coupling_number" and getattr(self, "sequence_order_coupling_number").empty):
-                    self.sequence_order_coupling_number = self.get_sequence_order_coupling_number()
-            if (descr == "quasi_sequence_order" and getattr(self, "quasi_sequence_order").empty):
-                    self.quasi_sequence_order = self.get_quasi_sequence_order()
-            if (descr == "pseudo_amino_acid_composition" and getattr(self, "pseudo_amino_acid_composition").empty):
-                    self.pseudo_amino_acid_composition = self.get_pseudo_amino_acid_composition()
+        #map each descriptor name to its getter for sequential and parallel dispatch
+        _getter_map = [
+            ('amino_acid_composition',                  self.get_amino_acid_composition),
+            ('dipeptide_composition',                   self.get_dipeptide_composition),
+            ('tripeptide_composition',                  self.get_tripeptide_composition),
+            ('moreaubroto_autocorrelation',             self.get_moreaubroto_autocorrelation),
+            ('moran_autocorrelation',                   self.get_moran_autocorrelation),
+            ('geary_autocorrelation',                   self.get_geary_autocorrelation),
+            ('ctd',                                     self.get_ctd),
+            ('ctd_composition',                         self.get_ctd_composition),
+            ('ctd_transition',                          self.get_ctd_transition),
+            ('ctd_distribution',                        self.get_ctd_distribution),
+            ('conjoint_triad',                          self.get_conjoint_triad),
+            ('sequence_order_coupling_number',          self.get_sequence_order_coupling_number),
+            ('quasi_sequence_order',                    self.get_quasi_sequence_order),
+            ('pseudo_amino_acid_composition',           self.get_pseudo_amino_acid_composition),
+            ('amphiphilic_pseudo_amino_acid_composition', self.get_amphiphilic_pseudo_amino_acid_composition),
+        ]
-            if (descr == "amphiphilic_pseudo_amino_acid_composition" and getattr(self, "amphiphilic_pseudo_amino_acid_composition").empty):
-                    self.amphiphilic_pseudo_amino_acid_composition = self.get_amphiphilic_pseudo_amino_acid_composition()
+        if self.n_jobs > 1:
+            #compute descriptors concurrently; skip any already populated from a prior import
+            pending = [(name, getter) for name, getter in _getter_map if getattr(self, name).empty]
+            with ThreadPoolExecutor(max_workers=self.n_jobs) as executor:
+                futures = {executor.submit(getter): name for name, getter in pending}
+                for future in tqdm(as_completed(futures), total=len(futures), unit=" descriptor",
+                                   desc="Descriptors", ncols=90):
+                    name = futures[future]
+                    setattr(self, name, future.result())
+        else:
+            #iterate over all descriptors sequentially, calculating each using their respective function
+            for name, getter in tqdm(_getter_map, unit=" descriptor", position=0,
+                                     desc="Descriptors", mininterval=30, ncols=90):
+                if getattr(self, name).empty:
+                    setattr(self, name, getter())
         #stop time counter, calculate elapsed time
         end = time.time()
@@ -2320,13 +2308,14 @@ class Descriptors():
        return all_descriptors
-    def _calculate_descriptor_batch(self,
-                                   descriptor_func: Callable,
+    def _calculate_descriptor_batch(self,
+                                   descriptor_func: Callable,
                                    desc_name: str = "",
                                    **kwargs) -> pd.DataFrame:
         """
         Generic helper method to calculate descriptors for all sequences, preventing code repetition.
+        Uses self.n_jobs threads to parallelise across sequences when n_jobs > 1.
         Parameters
         ==========
         :descriptor_func: Callable
@@ -2335,16 +2324,28 @@ class Descriptors():
             Name of descriptor for progress tracking
         :kwargs: dict
             Additional keyword arguments to pass to descriptor function
         Returns
         =======
         :pd.DataFrame
             Dataframe with calculated descriptor values for all sequences
         """
-        iterator = tqdm(self.protein_seqs, desc=f"Computing {desc_name}") if desc_name else self.protein_seqs
+        seqs = list(self.protein_seqs)
-        # accumulate results in a list to avoid O(n²) repeated concat
-        desc_list = [descriptor_func(seq, **kwargs) for seq in iterator]
+        if self.n_jobs <= 1:
+            iterator = tqdm(seqs, desc=f"Computing {desc_name}", ncols=90) if desc_name else seqs
+            # accumulate results in a list to avoid O(n²) repeated concat
+            desc_list = [descriptor_func(seq, **kwargs) for seq in iterator]
+        else:
+            desc_list = [None] * len(seqs)
+            with ThreadPoolExecutor(max_workers=self.n_jobs) as executor:
+                futures = {executor.submit(descriptor_func, seq, **kwargs): i
+                           for i, seq in enumerate(seqs)}
+                progress = tqdm(as_completed(futures), total=len(seqs),
+                                desc=f"Computing {desc_name}", ncols=90) if desc_name else as_completed(futures)
+                for future in progress:
+                    i = futures[future]
+                    desc_list[i] = future.result()
         return pd.concat(desc_list, ignore_index=False).reset_index(drop=True)

PySAR 2.5.1__tar.gz → 2.5.2__tar.gz

PySAR 2.5.1tar.gz → 2.5.2tar.gz