PyPI - openprotein-python - Versions diffs - 0.8.4__tar.gz → 0.8.6__tar.gz - Mend

openprotein-python 0.8.4tar.gz → 0.8.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (87) hide show

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: openprotein-python
-Version: 0.8.4
+Version: 0.8.6
 Summary: OpenProtein Python interface.
 Author-email: Mark Gee <markgee@ne47.bio>, "Timothy Truong Jr." <ttruong@ne47.bio>, Tristan Bepler <tbepler@ne47.bio>
 License-Expression: MIT
@@ -28,14 +28,14 @@ The OpenProtein.AI Python Interface provides a user-friendly library to interact
 # Table of Contents
-|   | Workflow                                           | Description                                          |
-|---|----------------------------------------------------|------------------------------------------------------|
-| 0 | [`Quick start`](#Quick-start)                    | Quick start guide                     |
-| 1 | [`Installation`](https://docs.openprotein.ai/api-python/installation.html)                    | Install guide for pip and conda.                     |
-| 2 | [`Session management`](https://docs.openprotein.ai/api-python/overview.html)        | An overview of the OpenProtein Python Client & the asynchronous jobs system. |
-| 3 | [`Asssay-based Sequence Learning`](https://docs.openprotein.ai/api-python/core_workflow.html) | Covers core tasks such as data upload, model training & prediction, and sequence design. |
-| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/api-python/poet_workflow.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation. |
-| 5 | [`Protein Language Models & Embeddings`](https://docs.openprotein.ai/api-python/embedding_workflow.html) | Covers methods for creating sequence embeddings with proprietary & open-source models. |
+|   | Workflow                                                                                                     | Description                                                                              |
+|---|--------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
+| 0 | [`Quick start`](#Quick-start)                                                                                | Quick start guide                                                                        |
+| 1 | [`Installation`](https://docs.openprotein.ai/api-python/installation.html)                                   | Install guide for pip and conda.                                                         |
+| 2 | [`Session management`](https://docs.openprotein.ai/api-python/overview.html)                                 | An overview of the OpenProtein Python Client & the asynchronous jobs system.             |
+| 3 | [`Asssay-based Sequence Learning`](https://docs.openprotein.ai/api-python/core_workflow.html)                | Covers core tasks such as data upload, model training & prediction, and sequence design. |
+| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/api-python/poet_workflow.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation.        |
+| 5 | [`Protein Language Models & Embeddings`](https://docs.openprotein.ai/api-python/embedding_workflow.html)     | Covers methods for creating sequence embeddings with proprietary & open-source models.   |
 # Quick-start

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/README.md RENAMED Viewed

@@ -10,14 +10,14 @@ The OpenProtein.AI Python Interface provides a user-friendly library to interact
 # Table of Contents
-|   | Workflow                                           | Description                                          |
-|---|----------------------------------------------------|------------------------------------------------------|
-| 0 | [`Quick start`](#Quick-start)                    | Quick start guide                     |
-| 1 | [`Installation`](https://docs.openprotein.ai/api-python/installation.html)                    | Install guide for pip and conda.                     |
-| 2 | [`Session management`](https://docs.openprotein.ai/api-python/overview.html)        | An overview of the OpenProtein Python Client & the asynchronous jobs system. |
-| 3 | [`Asssay-based Sequence Learning`](https://docs.openprotein.ai/api-python/core_workflow.html) | Covers core tasks such as data upload, model training & prediction, and sequence design. |
-| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/api-python/poet_workflow.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation. |
-| 5 | [`Protein Language Models & Embeddings`](https://docs.openprotein.ai/api-python/embedding_workflow.html) | Covers methods for creating sequence embeddings with proprietary & open-source models. |
+|   | Workflow                                                                                                     | Description                                                                              |
+|---|--------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
+| 0 | [`Quick start`](#Quick-start)                                                                                | Quick start guide                                                                        |
+| 1 | [`Installation`](https://docs.openprotein.ai/api-python/installation.html)                                   | Install guide for pip and conda.                                                         |
+| 2 | [`Session management`](https://docs.openprotein.ai/api-python/overview.html)                                 | An overview of the OpenProtein Python Client & the asynchronous jobs system.             |
+| 3 | [`Asssay-based Sequence Learning`](https://docs.openprotein.ai/api-python/core_workflow.html)                | Covers core tasks such as data upload, model training & prediction, and sequence design. |
+| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/api-python/poet_workflow.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation.        |
+| 5 | [`Protein Language Models & Embeddings`](https://docs.openprotein.ai/api-python/embedding_workflow.html)     | Covers methods for creating sequence embeddings with proprietary & open-source models.   |
 # Quick-start

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/common/__init__.py RENAMED Viewed

@@ -1,5 +1,5 @@
 """Common classes and utilities for OpenProtein."""
-from .features import FeatureType
+from .features import Feature, FeatureType
 from .model_metadata import ModelDescription, ModelMetadata, TokenInfo
-from .reduction import ReductionType
+from .reduction import Reduction, ReductionType

openprotein_python-0.8.6/openprotein/common/features.py ADDED Viewed

@@ -0,0 +1,15 @@
+"""Feature types used in OpenProtein."""
+from enum import Enum
+from typing import Literal
+class FeatureType(str, Enum):
+    PLM = "PLM"
+    SVD = "SVD"
+# NOTE: only works with python 3.12+
+# Feature = Literal[*tuple([r.value for r in FeatureType])]
+Feature = Literal["PLM", "SVD"]

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/common/model_metadata.py RENAMED Viewed

@@ -28,6 +28,6 @@ class ModelMetadata(BaseModel):
     max_sequence_length: int | None = None
     dimension: int
     output_types: list[str]
-    input_tokens: list[str]
+    input_tokens: list[str] | None
     output_tokens: list[str] | None = None
     token_descriptions: list[list[TokenInfo]]

openprotein_python-0.8.6/openprotein/common/reduction.py ADDED Viewed

@@ -0,0 +1,14 @@
+"""Reduction types used in OpenProtein."""
+from enum import Enum
+from typing import Literal
+class ReductionType(str, Enum):
+    MEAN = "MEAN"
+    SUM = "SUM"
+# NOTE: only works with python 3.12+
+# Reduction = Literal[*tuple([r.value for r in ReductionType])]
+Reduction = Literal["MEAN", "SUM"]

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/data/api.py RENAMED Viewed

@@ -64,7 +64,9 @@ def assaydata_post(
         raise APIError(f"Unable to post assay data: {response.text}")
-def assaydata_list(session: APISession) -> list[AssayMetadata]:
+def assaydata_list(
+    session: APISession, limit: int | None = None, offset: int | None = None
+) -> list[AssayMetadata]:
     """
     Get a list of all assay metadata.
@@ -72,6 +74,10 @@ def assaydata_list(session: APISession) -> list[AssayMetadata]:
     ----------
     session : APISession
         Session object for API communication.
+    limit : int, optional
+        Limit the number of assays to return.
+    offset : int, optional
+        Offset of assays to retrieve. Useful with limit.
     Returns
     -------
@@ -84,7 +90,12 @@ def assaydata_list(session: APISession) -> list[AssayMetadata]:
         If an error occurs during the API request.
     """
     endpoint = "v1/assaydata"
-    response = session.get(endpoint)
+    params = {}
+    if limit is not None:
+        params["limit"] = limit
+    if offset is not None:
+        params["offset"] = offset
+    response = session.get(endpoint, params=params)
     if response.status_code == 200:
         return TypeAdapter(list[AssayMetadata]).validate_python(response.json())
     else:

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/data/data.py RENAMED Viewed

@@ -14,16 +14,23 @@ class DataAPI:
     def __init__(self, session: APISession):
         self.session = session
-    def list(self) -> list[AssayDataset]:
+    def list(
+        self, limit: int | None = None, offset: int | None = None
+    ) -> list[AssayDataset]:
         """
         List all assay datasets.
+        limit : int, optional
+            Limit the number of assays to return.
+        offset : int, optional
+            Offset of assays to retrieve. Useful with limit.
         Returns
         -------
         List[AssayDataset]
             List of all assay datasets.
         """
-        metadata = api.assaydata_list(self.session)
+        metadata = api.assaydata_list(session=self.session, limit=limit, offset=offset)
         return [AssayDataset(self.session, x) for x in metadata]
     def create(

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/embeddings/models.py RENAMED Viewed

@@ -3,7 +3,13 @@
 from typing import TYPE_CHECKING
 from openprotein.base import APISession
-from openprotein.common import FeatureType, ModelMetadata, ReductionType
+from openprotein.common import (
+    Feature,
+    FeatureType,
+    ModelMetadata,
+    Reduction,
+    ReductionType,
+)
 from openprotein.data import AssayDataset, AssayMetadata, DataAPI
 from openprotein.errors import InvalidParameterError
@@ -199,9 +205,9 @@ class EmbeddingModel:
     def fit_svd(
         self,
         sequences: list[bytes] | list[str] | None = None,
-        assay: AssayDataset | None = None,
+        assay: AssayDataset | AssayMetadata | None = None,
         n_components: int = 1024,
-        reduction: ReductionType | None = None,
+        reduction: Reduction | ReductionType | None = None,
         **kwargs,
     ) -> "SVDModel":
         """
@@ -236,6 +242,11 @@ class EmbeddingModel:
         # local import for cyclic dep
         from openprotein.svd import SVDAPI
+        # runtime check on value
+        if isinstance(reduction, str):
+            reduction = ReductionType(reduction)
+            reduction = reduction.value
         svd_api = getattr(self.session, "svd", None)
         assert isinstance(svd_api, SVDAPI)
@@ -246,9 +257,8 @@ class EmbeddingModel:
             raise InvalidParameterError(
                 "Expected either assay or sequences to fit SVD on!"
             )
-        model_id = self.id
         return svd_api.fit_svd(
-            model_id=model_id,
+            model=self,
             sequences=sequences,
             assay=assay,
             n_components=n_components,
@@ -259,9 +269,9 @@ class EmbeddingModel:
     def fit_umap(
         self,
         sequences: list[bytes] | list[str] | None = None,
-        assay: AssayDataset | None = None,
+        assay: AssayDataset | AssayMetadata | None = None,
         n_components: int = 2,
-        reduction: ReductionType | None = ReductionType.MEAN,
+        reduction: Reduction | ReductionType = "MEAN",
         **kwargs,
     ) -> "UMAPModel":
         """
@@ -274,11 +284,11 @@ class EmbeddingModel:
         ----------
         sequences : list of bytes or list of str or None, optional
             Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
-        assay : AssayDataset or None, optional
+        assay : AssayDataset or AssayMetadata or None, optional
             Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
         n_components : int, optional
             Number of components in UMAP fit. Determines output shapes. Default is 2.
-        reduction : ReductionType or None, optional
+        reduction : Reduction or ReductionType or None, optional
             Embeddings reduction to use (e.g. mean). Defaults to MEAN.
         kwargs :
             Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.
@@ -296,6 +306,16 @@ class EmbeddingModel:
         # local import for cyclic dep
         from openprotein.umap import UMAPAPI
+        if reduction is None:
+            raise InvalidParameterError(
+                "Expected reduction if using EmbeddingModel to fit UMAP"
+            )
+        # runtime check on value
+        if isinstance(reduction, str):
+            reduction = ReductionType(reduction)
+            reduction = reduction.value
         umap_api = getattr(self.session, "umap", None)
         assert isinstance(umap_api, UMAPAPI)
@@ -306,12 +326,18 @@ class EmbeddingModel:
             raise InvalidParameterError(
                 "Expected either assay or sequences to fit UMAP on!"
             )
+        # get assay_id
+        assay_id = (
+            assay.assay_id
+            if isinstance(assay, AssayMetadata)
+            else assay.id if isinstance(assay, AssayDataset) else assay
+        )
         model_id = self.id
         return umap_api.fit_umap(
             model_id=model_id,
             feature_type=FeatureType.PLM,
             sequences=sequences,
-            assay_id=assay.id if assay is not None else None,
+            assay_id=assay_id,
             n_components=n_components,
             reduction=reduction,
             **kwargs,
@@ -319,7 +345,7 @@ class EmbeddingModel:
     def fit_gp(
         self,
-        assay: AssayMetadata | AssayDataset | str,
+        assay: AssayDataset | AssayMetadata | str,
         properties: list[str],
         reduction: ReductionType,
         name: str | None = None,
@@ -358,26 +384,9 @@ class EmbeddingModel:
         # local import to resolve cyclic
         from openprotein.predictor import PredictorAPI
-        data_api = getattr(self.session, "data", None)
-        assert isinstance(data_api, DataAPI)
         predictor_api = getattr(self.session, "predictor", None)
         assert isinstance(predictor_api, PredictorAPI)
-        # get assay if str
-        assay = data_api.get(assay_id=assay) if isinstance(assay, str) else assay
-        # extract assay_id
-        if len(properties) == 0:
-            raise InvalidParameterError("Expected (at-least) 1 property to train")
-        if not set(properties) <= set(assay.measurement_names):
-            raise InvalidParameterError(
-                f"Expected all provided properties to be a subset of assay's measurements: {assay.measurement_names}"
-            )
-        # TODO - support multitask
-        if len(properties) > 1:
-            raise InvalidParameterError(
-                "Training a multitask GP is not yet supported (i.e. number of properties should only be 1 for now)"
-            )
         # inject into predictor api
         return predictor_api.fit_gp(
             assay=assay,

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/fold/alphafold2.py RENAMED Viewed

@@ -47,10 +47,8 @@ class AlphaFold2Model(FoldModel):
             number of times to recycle models
         num_models : int
             number of models to train - best model will be used
-        max_msa : Union[str, int]
-            maximum number of sequences in the msa to use.
-        relax_max_iterations : int
-            maximum number of iterations
+        num_relax : int
+            maximum number of iterations for relax
         Returns
         -------
@@ -61,6 +59,7 @@ class AlphaFold2Model(FoldModel):
                 "Inputs to AlphaFold 2 have been updated. 'msa' should be supplied as 'proteins' argument. Support will be dropped in the future."
             )
             proteins = kwargs["msa"]
+            assert isinstance(proteins, MSAFuture), "Expected msa to be an MSAFuture"
         if "ligands" in kwargs or "dnas" in kwargs or "rnas" in kwargs:
             with warnings.catch_warnings():
                 warnings.simplefilter("always")  # Force warning to always show
@@ -73,6 +72,10 @@ class AlphaFold2Model(FoldModel):
             msa_to_seed: dict[str, Counter] = dict()
             for protein in proteins:
                 if (msa := protein.msa) is not None:
+                    if isinstance(msa, Protein.NullMSA):
+                        raise ValueError(
+                            "AlphaFold 2 expects MSA and does not support single sequence mode"
+                        )
                     msa_id = msa.id if isinstance(msa, MSAFuture) else msa
                     if msa_id in msa_to_seed:
                         seeds = msa_to_seed[msa_id]

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/fold/future.py RENAMED Viewed

@@ -9,11 +9,11 @@ from typing_extensions import Self
 from openprotein import config
 from openprotein.base import APISession
 from openprotein.chains import DNA, RNA, Ligand
-from openprotein.jobs import Future, MappedFuture
+from openprotein.jobs import Future, JobsAPI, MappedFuture
 from openprotein.protein import Protein
 from . import api
-from .schemas import FoldJob
+from .schemas import FoldJob, FoldMetadata
 if TYPE_CHECKING:
     from .boltz import BoltzAffinity, BoltzConfidence
@@ -34,34 +34,39 @@ class FoldResultFuture(MappedFuture, Future):
     def __init__(
         self,
         session: APISession,
-        job: FoldJob,
+        job: FoldJob | None = None,
+        metadata: FoldMetadata | None = None,
         sequences: list[bytes] | None = None,
         max_workers: int = config.MAX_CONCURRENT_WORKERS,
     ):
         """
         Initialize a FoldResultFuture instance.
-        Parameters
-        ----------
-        session : APISession
-            The API session to use for requests.
-        job : FoldJob
-            The fold job associated with this future.
-        sequences : list[bytes], optional
-            List of sequences submitted for the fold request. If None, sequences will be fetched.
-        max_workers : int, optional
-            Maximum number of concurrent workers. Default is config.MAX_CONCURRENT_WORKERS.
+        Takes in either a fold job, or the fold job metadata.
+        :meta private:
         """
-        super().__init__(session, job, max_workers)
+        # initialize the fold job metadata
+        if metadata is None:
+            if job is None or job.job_id is None:
+                raise ValueError("Expected fold metadata or job")
+            metadata = api.fold_get(session, job.job_id)
+        self._metadata = metadata
+        if job is None:
+            jobs_api = getattr(session, "jobs", None)
+            assert isinstance(jobs_api, JobsAPI)
+            job = FoldJob.create(jobs_api.get_job(job_id=metadata.job_id))
         if sequences is None:
             sequences = api.fold_get_sequences(self.session, job_id=job.job_id)
         self._sequences = sequences
+        super().__init__(session, job, max_workers)
     @classmethod
     def create(
         cls: type[Self],
         session: APISession,
-        job: FoldJob,
+        job: FoldJob | None = None,
+        metadata: FoldMetadata | None = None,
         **kwargs,
     ) -> "Self | FoldComplexResultFuture":
         """
@@ -81,7 +86,13 @@ class FoldResultFuture(MappedFuture, Future):
         FoldResultFuture or FoldComplexResultFuture
             An instance of FoldResultFuture or FoldComplexResultFuture depending on the model.
         """
-        model_id = api.fold_get(session=session, job_id=job.job_id).model_id
+        if job is not None:
+            job_id = job.job_id
+        elif metadata is not None:
+            job_id = metadata.job_id
+        else:
+            raise ValueError("Expected fold metadata or job")
+        model_id = api.fold_get(session=session, job_id=job_id).model_id
         if model_id.startswith("boltz") or model_id.startswith("alphafold"):
             return FoldComplexResultFuture(session=session, job=job, **kwargs)
         else:
@@ -101,22 +112,6 @@ class FoldResultFuture(MappedFuture, Future):
             self._sequences = api.fold_get_sequences(self.session, self.job.job_id)
         return self._sequences
-    @property
-    def model_id(self) -> str:
-        """
-        Get the model ID used for the fold request.
-        Returns
-        -------
-        str
-            Model ID.
-        """
-        if self._model_id is None:
-            self._model_id = api.fold_get(
-                session=self.session, job_id=self.job.job_id
-            ).model_id
-        return self._model_id
     @property
     def id(self):
         """
@@ -129,6 +124,17 @@ class FoldResultFuture(MappedFuture, Future):
         """
         return self.job.job_id
+    @property
+    def metadata(self) -> FoldMetadata:
+        """The fold metadata."""
+        return self._metadata
+    @property
+    def model_id(self) -> str:
+        """The fold model used."""
+        return self._metadata.model_id
     def __keys__(self):
         """
         Get the list of sequences submitted for the fold request.
@@ -189,7 +195,8 @@ class FoldComplexResultFuture(Future):
     def __init__(
         self,
         session: APISession,
-        job: FoldJob,
+        job: FoldJob | None = None,
+        metadata: FoldMetadata | None = None,
         model_id: str | None = None,
         proteins: list[Protein] | None = None,
         ligands: list[Ligand] | None = None,
@@ -216,6 +223,16 @@ class FoldComplexResultFuture(Future):
         rnas : list[RNA], optional
             List of RNAs submitted for fold request.
         """
+        # initialize the fold job metadata
+        if metadata is None:
+            if job is None or job.job_id is None:
+                raise ValueError("Expected fold metadata or job")
+            metadata = api.fold_get(session, job.job_id)
+        self._metadata = metadata
+        if job is None:
+            jobs_api = getattr(session, "jobs", None)
+            assert isinstance(jobs_api, JobsAPI)
+            job = FoldJob.create(jobs_api.get_job(job_id=metadata.job_id))
         super().__init__(session, job)
         self._model_id = model_id
         self._proteins = proteins
@@ -229,6 +246,11 @@ class FoldComplexResultFuture(Future):
         self._confidence: list["BoltzConfidence"] | None = None
         self._affinity: "BoltzAffinity | None" = None
+    @property
+    def metadata(self) -> FoldMetadata:
+        """The fold metadata."""
+        return self._metadata
     @property
     def model_id(self) -> str:
         """
@@ -433,6 +455,8 @@ class FoldComplexResultFuture(Future):
         AttributeError
             If confidence is not supported for the model.
         """
+        from .boltz import BoltzConfidence
         if self.model_id not in {"boltz-1", "boltz-1x", "boltz-2"}:
             raise AttributeError("confidence not supported for non-Boltz model")
         if self._confidence is None:
@@ -464,6 +488,8 @@ class FoldComplexResultFuture(Future):
         AttributeError
             If affinity is not supported for the model.
         """
+        from .boltz import BoltzAffinity
         if self.model_id not in {"boltz-1", "boltz-1x", "boltz-2"}:
             raise AttributeError("affinity not supported for non-Boltz model")
         if self._affinity is None:

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/predictor/api.py RENAMED Viewed

@@ -162,8 +162,8 @@ def predictor_fit_gp_post(
         body["name"] = name
     if description is not None:
         body["description"] = description
-    # add kwargs for embeddings kwargs
-    body.update(kwargs)
+    # add kwargs for embeddings kwargs to features
+    body["features"].update(kwargs)
     response = session.post(endpoint, json=body)
     return PredictorTrainJob.model_validate(response.json())

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/predictor/predictor.py RENAMED Viewed

@@ -1,10 +1,11 @@
 """Predictor API providing the interface to train and predict predictors."""
 from openprotein.base import APISession
-from openprotein.common import FeatureType, ReductionType
+from openprotein.common import Feature, FeatureType, Reduction, ReductionType
 from openprotein.data import (
     AssayDataset,
     AssayMetadata,
+    DataAPI,
 )
 from openprotein.embeddings import EmbeddingModel, EmbeddingsAPI
 from openprotein.errors import InvalidParameterError
@@ -120,8 +121,8 @@ class PredictorAPI:
         assay: AssayDataset | AssayMetadata | str,
         properties: list[str],
         model: EmbeddingModel | SVDModel | str,
-        feature_type: FeatureType | None = None,
-        reduction: ReductionType | None = None,
+        feature_type: Feature | FeatureType | None = None,
+        reduction: Reduction | ReductionType | None = None,
         name: str | None = None,
         description: str | None = None,
         **kwargs,
@@ -139,10 +140,10 @@ class PredictorAPI:
             Instance of either EmbeddingModel or SVDModel to use depending
             on feature type. Can also be a str specifying the model id,
             but then feature_type would have to be specified.
-        feature_type : FeatureType or None
+        feature_type : Feature or FeatureType or None
             Type of features to use for encoding sequences. "SVD" or "PLM".
             None would require model to be EmbeddingModel or SVDModel.
-        reduction  : str or None, optional
+        reduction  : Reduction or ReductionType or None, optional
             Type of embedding reduction to use for computing features.
             E.g. "MEAN" or "SUM". Used only if using EmbeddingModel, and
             must be non-nil if using an EmbeddingModel. Defaults to None.
@@ -154,6 +155,29 @@ class PredictorAPI:
         PredictorModel
             The GP model being fit.
         """
+        data_api = getattr(self.session, "data", None)
+        assert isinstance(data_api, DataAPI)
+        # 1. Check assay data input
+        # get assay if str
+        assay = data_api.get(assay_id=assay) if isinstance(assay, str) else assay
+        # extract assay_id
+        assay_id = (
+            assay.assay_id
+            if isinstance(assay, AssayMetadata)
+            else assay.id if isinstance(assay, AssayDataset) else assay
+        )
+        if len(properties) == 0:
+            raise InvalidParameterError("Expected (at-least) 1 property to train")
+        if not set(properties) <= set(assay.measurement_names):
+            raise InvalidParameterError(
+                f"Expected all provided properties to be a subset of assay's measurements: {assay.measurement_names}"
+            )
+        # TODO - support multitask
+        if len(properties) > 1:
+            raise InvalidParameterError(
+                "Training a multitask GP is not yet supported (i.e. number of properties should only be 1 for now)"
+            )
+        # 2. Check features input
         # extract feature type
         feature_type = (
             FeatureType.PLM
@@ -164,6 +188,15 @@ class PredictorAPI:
             raise InvalidParameterError(
                 "Expected feature_type to be provided if passing str model_id as model"
             )
+        # runtime check on value
+        if isinstance(feature_type, str):
+            feature_type = FeatureType(feature_type)
+        # 3. Check reduction
+        if isinstance(reduction, str):
+            reduction = ReductionType(reduction)
+            reduction = reduction.value
         # get model if model_id
         if feature_type == FeatureType.PLM:
             if reduction is None:
@@ -183,19 +216,14 @@ class PredictorAPI:
                 model = svd_api.get_svd(model)
             assert isinstance(model, SVDModel), "Expected SVDModel"
             model_id = model.id
-        # get assay_id
-        assay_id = (
-            assay.assay_id
-            if isinstance(assay, AssayMetadata)
-            else assay.id if isinstance(assay, AssayDataset) else assay
-        )
         return PredictorModel(
             session=self.session,
             job=api.predictor_fit_gp_post(
                 session=self.session,
                 assay_id=assay_id,
                 properties=properties,
-                feature_type=feature_type,
+                feature_type=feature_type.value,
                 model_id=model_id,
                 reduction=reduction,
                 name=name,

{openprotein_python-0.8.4 → openprotein_python-0.8.6}/openprotein/predictor/schemas.py RENAMED Viewed

@@ -29,6 +29,8 @@ class Features(BaseModel):
     model_id: str | None = None
     reduction: str | None = None
+    # TODO: model extra kwargs
     model_config = ConfigDict(protected_namespaces=())

openprotein-python 0.8.4__tar.gz → 0.8.6__tar.gz

openprotein-python 0.8.4tar.gz → 0.8.6tar.gz