PyPI - seer-pas-sdk - Versions diffs - 1.1.1__tar.gz → 1.2.0__tar.gz - Mend

seer-pas-sdk 1.1.1tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: seer-pas-sdk
-Version: 1.1.1
+Version: 1.2.0
 Summary: SDK for Seer Proteograph Analysis Suite (PAS)
 Author-email: Ryan Sun <rsun@seer.bio>
 License:

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/docs/index.qmd RENAMED Viewed

@@ -18,12 +18,27 @@ $ pip install seer-pas-sdk
 This page gives an overview of the SDK's feature. Complete documentation for each class / method can be found [here](reference/).
 ### Configuration
-PAS has a simple authorization system that just involves your username and password fields like on the web app. You can define your username and password for your own ready reference and convenience as follows:
+The PAS SDK has a simple authorization system that involves your username and password fields like on the web app. You can define your username and password for your own ready reference and convenience as follows:
 ```{python}
 USERNAME = "gnu403"
 PASSWORD = "Test!234567"
 ```
+The PAS SDK requires either a `tenant` or `tenant_id` argument in the SDK object constructor.
+`tenant` refers to the user provided name of the tenant.
+`tenant_id` refers to the immutable and unique identifier of the tenant.
+`tenant_id` is an absolute reference to the tenant, even if the tenant name is changed.
+More details on multi-tenant management can be found in the [Multi Tenant Management](#multi-tenant-management) section below.
+You can define your tenant name or tenant ID as follows:
+```{python}
+TENANT = "My Tenant Name"
+TENANT_ID = "abc1234abc1234"
+```
 You may also choose to pass in an `instance` param in the SDK object to instantiate the PAS SDK to the EU or US instance.:
 ```{python}
 INSTANCE = "US"
@@ -38,10 +53,13 @@ After importing the SeerSDK module, you can instantiate an object in the followi
 from seer_pas_sdk import SeerSDK
 # Instantiate an SDK object with your credentials:
-sdk = SeerSDK(USERNAME, PASSWORD)
+sdk = SeerSDK(USERNAME, PASSWORD, tenant=TENANT)
-# You could alternatively pass your credentials and/or the instance directly into the instantiated object.
-sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE)
+# Instantiate an SDK object with your credentials and instance:
+sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE, tenant=TENANT)
+# Instantiate an SDK object with your credentials and tenant ID:
+sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE, tenant_id=TENANT_ID)
 ```
 ```{python}
@@ -56,18 +74,16 @@ Additional information and examples can also be found below.
 ### Multi Tenant Management
 Introduced in version 0.2.0
-By default, you will be active in your home tenant upon log in. The home tenant is defined as the organization account that issued the original invitation for the user to join PAS.
-The optional 'tenant' parameter is available in the SeerSDK constructor to navigate directly to a desired tenant.
-A notification message will display upon login.
 The following tools are available to navigate between tenants:
 ```{python}
 #| eval: false
 from seer_pas_sdk import SeerSDK
+# Assume tenant upon login
 sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE, tenant='My Active Tenant')
+sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE, tenant_id='myuuidstring-1234')
 # Retrieve value of current active tenant
 print(sdk.get_active_tenant())
@@ -578,10 +594,17 @@ log(analysis)
 ### Find Analyses
-Returns a list of analyses objects for the authenticated user. If no `analysis_id` is provided, returns all analyses for the authenticated user.
+Returns a list of analyses objects for the authenticated user. If `None` is provided for all query arguments, returns all analyses available to the user within the active tenant.
 ###### <u>Params</u>
-`analysis_id`: (`str`, optional) Unique ID of the analysis to be fetched, defaulted to None.
+* `analysis_id`: (`str`, optional) Unique ID of the analysis to be fetched, defaulted to None.
+* `analysis_name`: (`str`, optional) Name of the analysis to be fetched, defaulted to None. Results will be matched on a substring basis.
+* `folder_id`: (`str`, optional) Unique ID of the folder to fetch analyses from, defaulted to None.
+* `folder_name`: (`str`, optional) Name of the folder to fetch analyses from, defaulted to None.
+* `project_id`: (`str`, optional) Unique ID of the project to filter the result set of analyses, defaulted to None.
+* `project_name`: (`str`, optional) Name of the project to filter the result set of analyses, defaulted to None.
+* `plate_name`: (`str`, optional) Name of a plate to filter the result set of analyses, defaulted to None.
+* `as_df`: (`bool`, optional) Whether the result should be converted to a DataFrame, defaulted to False.
 <br>
 ###### <u>Returns</u>
@@ -955,11 +978,9 @@ log(sdk.group_analysis_results(group_analysis_id, box_plot_info))
 Downloads the FASTA file(s) associated with an analysis protocol. You can specify an analysis_id (the function will resolve the protocol automatically) or provide an analysis_protocol_id directly.
 ###### <u>Params</u>
-* `analysis_protocol_id`: (`str`, optional) ID of the analysis protocol whose FASTA file(s) you want.
-* `analysis_id`: (`str`, optional) ID of the analysis whose protocol FASTA file(s) you want.
-* `download_path`: (`str`, optional) Directory to save files to. Defaults to the current working directory.
+* `analysis_protocol_id`: (`str`, optional) The unique ID of the analysis protocol associated with the FASTA files to download.
+* `analysis_id`: (`str`, optional) The unique ID of the analysis whose protocol FASTA file(s) will be downloaded.
+* `analysis_name`: (`str`, optional) The name of the analysis whose protocol FASTA file(s) will be downloaded.
 Note: Provide either analysis_id or analysis_protocol_id (but not both).
@@ -977,6 +998,10 @@ sdk.download_analysis_protocol_fasta(
 )
 ```
+```
+['./uniprot_human_2023_08.fasta', './contaminants.fasta']
+```
 Download by analysis protocol ID to a specific folder:
 ```{python}
 #| eval: false
@@ -991,16 +1016,20 @@ sdk.download_analysis_protocol_fasta(
 ```
 <br>
-### Get Analysis Protocol FASTA link
-Returns signed download links for the FASTA file(s) associated with an analysis protocol. You can specify an analysis_id (the function will resolve the protocol automatically) or provide an analysis_protocol_id directly.
+### Get Analysis Protocol FASTA URLs
+Returns download URLs for the FASTA file(s) associated with an analysis protocol. You can specify an analysis_id (the function will resolve the protocol automatically) or provide an analysis_protocol_id directly.
+Download URLs are valid for 15 minutes after generation.
 ###### <u>Params</u>
-* `analysis_protocol_id`: (`str`, optional) ID of the analysis protocol whose FASTA file(s) you want.
-* `analysis_id`: (`str`, optional) ID of the analysis whose protocol FASTA file(s) you want.
-Note: Provide either analysis_id or analysis_protocol_id (but not both).
+* `analysis_protocol_id`: (`str`, optional) The unique ID of the analysis protocol associated with the FASTA files.
+* `analysis_id`: (`str`, optional) The unique ID of the analysis whose protocol FASTA file(s) should be retrieved.
+* `analysis_name`: (`str`, optional) The name of the analysis whose protocol FASTA file(s) should be retrieved.
+If both parameters are provided, `analysis_protocol_id` takes precedence.
 ###### <u>Returns</u>
-* links: (`list[dict]`) List of dictionaries containing filename and signed URL for each FASTA file.
+* links: (`dict`) Dictionary containing filename and signed URL as key-value pairs for the FASTA files linked to the protocol.
 ###### <u>Examples</u>
 Get by analysis ID:
@@ -1012,10 +1041,8 @@ sdk.get_analysis_protocol_fasta_link(
 ```
 ```
-[
-  {"filename": "uniprot_human_2023_08.fasta", "url": "https://...signed..."},
-  {"filename": "contaminants.fasta", "url": "https://...signed..."}
-]
+  {"uniprot_human_2023_08.fasta" : "https://...signed...",
+  "contaminants.fasta" : "https://...signed..."}
 ```
 Get by analysis protocol ID:
 ```{python}
@@ -1026,8 +1053,8 @@ sdk.get_analysis_protocol_fasta_link(
 ```
 ```
 [
-  {"filename": "uniprot_human_2023_08.fasta", "url": "https://...signed..."},
-  {"filename": "contaminants.fasta", "url": "https://...signed..."}
+  {"uniprot_human_2023_08.fasta" : "https://...signed...",
+   "contaminants.fasta" : "https://...signed..."}
 ]
 ```
 <hr>

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/common/__init__.py RENAMED Viewed

@@ -679,6 +679,52 @@ def camel_case(s):
     return "".join([s[0].lower(), s[1:]])
+def validate_d_zip_file(file):
+    """
+    Return True if a .d.zip file aligns with Seer requirements for PAS upload.
+    Parameters
+    ----------
+    file : str
+        The name of the zip file.
+    Returns
+    -------
+    bool
+        True if the .d.zip file is valid, False otherwise.
+    """
+    if not file.lower().endswith(".d.zip"):
+        return False
+    basename = os.path.basename(file)
+    # Remove the .zip extension to get the .d folder name
+    d_name = basename[:-4]
+    try:
+        with zipfile.ZipFile(file, "r") as zf:
+            names = zf.namelist()
+    except:
+        return False
+    if not names:
+        return False
+    # check for files at the root level
+    root_entries = [n for n in names if "/" not in n.rstrip("/")]
+    if root_entries:
+        return False
+    # find folders
+    top_level = {n.split("/")[0] for n in names}
+    if len(top_level) != 1 or d_name not in top_level:
+        return False
+    return True
 def rename_d_zip_file(source, destination):
     """
     Renames a .d.zip file. The function extracts the contents of the source zip file, renames the inner .d folder, and rezips the contents into the destination zip file.

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/core/sdk.py RENAMED Viewed

@@ -31,31 +31,57 @@ class SeerSDK:
     >>> seer_sdk = SeerSDK(USERNAME, PASSWORD, INSTANCE)
     """
-    def __init__(self, username, password, instance="US", tenant=None):
+    def __init__(
+        self,
+        username: str,
+        password: str,
+        instance: str = "US",
+        tenant: str = None,
+        tenant_id: str = None,
+    ):
         try:
             self._auth = Auth(username, password, instance)
             self._auth._login()
             print(f"User '{username}' logged in.\n")
+        except Exception as e:
+            raise ValueError(
+                f"Could not log in.\nPlease check your credentials and/or instance: {e}."
+            )
-            if not tenant:
-                tenant = self._auth.active_tenant_id
-            try:
-                self.switch_tenant(tenant)
-            except Exception as e:
+        # direct logged in user to the specified tenant
+        tenant_data = pd.DataFrame(
+            self.get_user_tenant(index=False),
+            columns=["institution", "tenantId"],
+        ).rename(
+            columns={"institution": "Tenant name", "tenantId": "Tenant ID"}
+        )
+        tenant_names = tenant_data["Tenant name"].tolist()
+        tenant_ids = tenant_data["Tenant ID"].tolist()
+        # precondition: None is not a valid tenant_name or tenant_id.
+        if tenant_id is None and tenant is None:
+            self.logout()
+            if None in tenant_names:
                 print(
-                    f"Encountered an error directing you to tenant {tenant}: {e}."
+                    "Warning: You have access to a tenant with no name. Please either provide a tenant name in the PAS website or specify a tenant_id to access that tenant."
                 )
-                print("Logging into home tenant...")
-                # If an error occurs while directing the user to a tenant, default to home tenant.
-                print(f"You are now active in {self.get_active_tenant_name()}")
-        except ServerError as e:
-            raise e
-        except Exception as e:
             raise ValueError(
-                f"Could not log in.\nPlease check your credentials and/or instance: {e}."
+                f"Either tenant or tenant_id must be provided. Please indicate one of the following tenants: \n{tenant_data.to_string(index=False)}"
             )
+        if tenant_id not in tenant_ids:
+            if tenant in tenant_names:
+                # if multiple tenants exist for the same institution name, fall back on multiple tenant error in switch_tenant.
+                self.switch_tenant(tenant)
+            else:
+                self.logout()
+                raise ValueError(
+                    f"Invalid tenant or tenant_id provided. Please indicate one of the following tenants: \n{tenant_data.to_string(index=False)}"
+                )
+        else:
+            self.switch_tenant(tenant_id)
     def logout(self):
         """
         Perform a logout operation for the current user of the SDK instance.
@@ -115,7 +141,7 @@ class SeerSDK:
             response = s.get(f"{self._auth.url}api/v1/usertenants")
             if response.status_code != 200:
-                raise ValueError(
+                raise ServerError(
                     "Invalid request. Please check your parameters."
                 )
@@ -1471,7 +1497,7 @@ class SeerSDK:
                             analysis_protocol_engine=res["analysis_engine"],
                         )
                     )
-                except:
+                except Exception:
                     res["fasta"] = ""
                 return res
         else:
@@ -1590,10 +1616,7 @@ class SeerSDK:
                 try:
                     res[entry]["fasta"] = ",".join(
                         self._get_analysis_protocol_fasta_filenames(
-                            analysis_protocol_id=res[entry]["id"],
-                            analysis_protocol_engine=res[entry].get(
-                                "analysis_engine", None
-                            ),
+                            analysis_protocol_id=res[entry]["id"]
                         )
                     )
                 except:
@@ -1821,18 +1844,12 @@ class SeerSDK:
                 if not res.get("is_folder") and res.get(
                     "analysis_protocol_id"
                 ):
-                    analysis_protocol = self.get_analysis_protocol(
-                        analysis_protocol_id=res.get("analysis_protocol_id")
-                    )
                     try:
                         res["fasta"] = ",".join(
                             self._get_analysis_protocol_fasta_filenames(
                                 analysis_protocol_id=res.get(
                                     "analysis_protocol_id"
-                                ),
-                                analysis_protocol_engine=analysis_protocol.get(
-                                    "analysis_engine", None
-                                ),
+                                )
                             )
                         )
                     except Exception as e:
@@ -1854,49 +1871,60 @@ class SeerSDK:
             else:
                 return res[0]
+    def _lookup_analysis_folders(self):
+        """
+        Helper function to map analysis folder ids to names.
+        """
+        with self._get_auth_session("getanalysisfolders") as s:
+            URL = f"{self._auth.url}api/v1/analyses"
+            params = {"all": "true", "folderonly": "true"}
+            folders = s.get(URL, params=params)
+            if folders.status_code != 200:
+                raise ValueError(
+                    "Failed to fetch analysis folders. Please check your connection."
+                )
+            res = folders.json()["data"]
+            return res
     def find_analyses(
         self,
         analysis_id: str = None,
+        analysis_name: str = None,
         folder_id: str = None,
-        show_folders: bool = True,
-        analysis_only: bool = True,
+        folder_name: str = None,
         project_id: str = None,
+        project_name: str = None,
         plate_name: str = None,
         as_df=False,
-        **kwargs,
     ):
         """
-        Returns a list of analyses objects for the authenticated user. If no id is provided, returns all analyses for the authenticated user.
-        Search parameters may be passed in as keyword arguments to filter the results. Acceptable values are 'analysis_name', 'folder_name', 'description', 'notes', or 'number_msdatafile'.
-        Only search on a single field is supported.
+        Returns a list of analyses objects for the authenticated user. If None is provided for all query arguments, returns all analyses for the authenticated user.
         Parameters
         ----------
         analysis_id : str, optional
             ID of the analysis to be fetched, defaulted to None.
+        analysis_name : str, optional
+            Name of the analysis to be fetched, defaulted to None. Results will be matched on substring basis.
         folder_id : str, optional
-            ID of the folder to be fetched, defaulted to None.
+            Unique ID of an analysis folder to filter results, defaulted to None.
-        show_folders : bool, optional
-            Mark True if folder contents are to be returned in the response, i.e. recursive search, defaulted to True.
-            Will be disabled if an analysis id is provided.
-        analysis_only : bool, optional
-            Mark True if only analyses objects are to be returned in the response, defaulted to True.
-            If marked false, folder objects will also be included in the response.
+        folder_name : str, optional
+            Name of an analysis folder to filter results, defaulted to None.
         project_id : str, optional
-            ID of the project to be fetched, defaulted to None.
+            Unique ID of an analysis folder to filter results, defaulted to None.
+        project_name : str, optional
+            Name of a project to filter results, defaulted to None.
         plate_name : str, optional
-            Name of the plate to be fetched, defaulted to None.
+            Name of a plate to filter results, defaulted to None.
         as_df : bool, optional
-            whether the result should be converted to a DataFrame, defaulted to False.
-        **kwargs : dict, optional
-            Search keyword parameters to be passed in. Acceptable values are 'analysis_name', 'folder_name', 'analysis_protocol_name', 'description', 'notes', or 'number_msdatafile'.
+            Whether the result should be converted to a DataFrame, defaulted to False.
         Returns
         -------
@@ -1930,51 +1958,44 @@ class SeerSDK:
         URL = f"{self._auth.url}api/v1/analyses"
         res = []
-        search_field = None
-        search_item = None
-        if kwargs:
-            if len(kwargs.keys()) > 1:
-                raise ValueError("Please include only one search parameter.")
-            search_field = list(kwargs.keys())[0]
-            search_item = kwargs[search_field]
-            if not search_item:
-                raise ValueError(
-                    f"Please provide a non null value for {search_field}"
-                )
-        if search_field and search_field not in [
-            "analysis_name",
-            "folder_name",
-            "analysis_protocol_name",
-            "description",
-            "notes",
-            "number_msdatafile",
-        ]:
-            raise ValueError(
-                "Invalid search field. Please choose between 'analysis_name', 'folder_name', 'analysis_protocol_name', 'description', 'notes', or 'number_msdatafile'."
-            )
         if analysis_id:
             try:
                 return [self.get_analysis(analysis_id=analysis_id)]
-            except:
+            except Exception:
                 return []
+        analysis_folders = self._lookup_analysis_folders()
+        analysis_folder_id_to_name = {
+            x["id"]: x["analysis_name"] for x in analysis_folders
+        }
+        analysis_folder_name_to_id = {
+            v: k for k, v in analysis_folder_id_to_name.items()
+        }
+        if folder_name and not folder_id:
+            folder_id = analysis_folder_name_to_id.get(folder_name, None)
+            if not folder_id:
+                raise ValueError(f"No folder found with name '{folder_name}'.")
+        if project_name and not project_id:
+            project = self.get_project(project_name=project_name)
+            if not project:
+                raise ValueError(
+                    f"No project found with name '{project_name}'."
+                )
+            project_id = project["id"]
         with self._get_auth_session("findanalyses") as s:
-            params = {"all": "true"}
+            params = {"all": "true", "analysisonly": "true"}
             if folder_id:
                 params["folder"] = folder_id
-            if search_field:
-                params["searchFields"] = search_field
-                params["searchItem"] = search_item
+            if analysis_name:
+                params["searchFields"] = "analysis_name"
+                params["searchItem"] = analysis_name
                 del params["all"]
-                if search_field == "folder_name":
-                    params["searchFields"] = "analysis_name"
             if project_id:
                 params["projectId"] = project_id
@@ -1989,9 +2010,8 @@ class SeerSDK:
                 )
             res = analyses.json()["data"]
-            folders = []
             spaces = {x["id"]: x["usergroup_name"] for x in self.get_spaces()}
-            protocol_to_engine_map = dict()
+            protocol_to_fasta = {}
             for entry in range(len(res)):
                 if "tenant_id" in res[entry]:
                     del res[entry]["tenant_id"]
@@ -2005,11 +2025,14 @@ class SeerSDK:
                     ][location(res[entry]["parameter_file_path"]) :]
                 if (
-                    show_folders
-                    and not analysis_id
-                    and res[entry]["is_folder"]
+                    "folder_id" in res[entry]
+                    and res[entry]["folder_id"] is not None
                 ):
-                    folders.append(res[entry]["id"])
+                    res[entry]["folder_name"] = analysis_folder_id_to_name.get(
+                        res[entry]["folder_id"], None
+                    )
+                    res[entry]["folder_uuid"] = res[entry]["folder_id"]
+                    del res[entry]["folder_id"]
                 if "user_group" in res[entry]:
                     res[entry]["space"] = spaces.get(
@@ -2020,51 +2043,34 @@ class SeerSDK:
                 if (not res[entry].get("is_folder")) and res[entry].get(
                     "analysis_protocol_id"
                 ):
-                    if (
-                        res[entry]["analysis_protocol_id"]
-                        in protocol_to_engine_map
-                    ):
-                        analysis_protocol_engine = protocol_to_engine_map[
-                            res[entry]["analysis_protocol_id"]
+                    # analysis_protocol_id for this result row
+                    local_analysis_protocol_id = res[entry].get(
+                        "analysis_protocol_id"
+                    )
+                    if local_analysis_protocol_id in protocol_to_fasta:
+                        res[entry]["fasta"] = protocol_to_fasta[
+                            local_analysis_protocol_id
                         ]
                     else:
                         try:
-                            analysis_protocol = self.get_analysis_protocol(
-                                analysis_protocol_id=res[entry].get(
-                                    "analysis_protocol_id"
+                            res[entry]["fasta"] = ",".join(
+                                self._get_analysis_protocol_fasta_filenames(
+                                    local_analysis_protocol_id,
+                                    analysis_protocol_engine=res[entry].get(
+                                        "analysis_engine"
+                                    ),
                                 )
                             )
-                            analysis_protocol_engine = analysis_protocol.get(
-                                "analysis_engine", None
+                            protocol_to_fasta[local_analysis_protocol_id] = (
+                                res[entry]["fasta"]
                             )
-                            protocol_to_engine_map[
-                                res[entry]["analysis_protocol_id"]
-                            ] = analysis_protocol_engine
                         except:
-                            analysis_protocol_engine = None
-                    try:
-                        res[entry]["fasta"] = ",".join(
-                            self._get_analysis_protocol_fasta_filenames(
-                                res[entry]["analysis_protocol_id"],
-                                analysis_protocol_engine=analysis_protocol_engine,
+                            print(
+                                f"Warning: Could not fetch fasta files for analysis {res[entry].get('analysis_name')}."
                             )
-                        )
-                    except:
-                        print(
-                            f"Warning: Could not fetch fasta files for analysis {res[entry].get('analysis_name')}."
-                        )
-                        res[entry]["fasta"] = None
                 else:
                     res[entry]["fasta"] = None
-            # recursive solution to get analyses in folders
-            for folder in folders:
-                res += self.find_analyses(folder_id=folder)
-            if analysis_only:
-                res = [
-                    analysis for analysis in res if not analysis["is_folder"]
-                ]
             if not res and as_df:
                 return pd.DataFrame(columns=ANALYSIS_COLUMNS)
             return res if not as_df else dict_to_df(res)
@@ -4059,7 +4065,7 @@ class SeerSDK:
                 print(f"Downloaded file to {download_path}/{file}")
     def _get_analysis_protocol_fasta_filenames(
-        self, analysis_protocol_id: str, analysis_protocol_engine: str
+        self, analysis_protocol_id: str, analysis_protocol_engine: str = None
     ):
         """
         Helper function - Get the fasta file name(s) associated with a given analysis protocol and engine.
@@ -4069,6 +4075,14 @@ class SeerSDK:
             Returns:
                 list[str]: A list of fasta file names associated with the analysis protocol.
         """
+        if not analysis_protocol_engine:
+            analysis_protocol_engine = self.get_analysis_protocol(
+                analysis_protocol_id=analysis_protocol_id
+            ).get("analysis_engine")
+            if not analysis_protocol_engine:
+                raise ValueError(
+                    f"Could not retrieve analysis protocol engine for analysis protocol {analysis_protocol_id}."
+                )
         analysis_protocol_engine = analysis_protocol_engine.lower()
         if analysis_protocol_engine == "diann":
             URL = f"{self._auth.url}api/v1/analysisProtocols/editableParameters/diann/{analysis_protocol_id}"
@@ -4108,8 +4122,35 @@ class SeerSDK:
                 raise ServerError("No fasta file name returned from server.")
             return fasta_filenames
-    def get_analysis_protocol_fasta_link(
-        self, analysis_protocol_id=None, analysis_id=None, analysis_name=None
+    def _get_analysis_protocol_fasta_url(
+        self, analysis_protocol_fasta_name: str
+    ):
+        """
+        Helper function - Get the download link for a given fasta file name.
+        Args:
+            analysis_protocol_fasta_name (str): Name of the fasta file.
+        Returns:
+            str: The URL to download the fasta file.
+        """
+        URL = f"{self._auth.url}api/v1/analysisProtocolFiles/getUrl"
+        with self._get_auth_session("getanalysisprotocolfilesurl") as s:
+            response = s.post(
+                URL, json={"filepath": analysis_protocol_fasta_name}
+            )
+            if response.status_code != 200:
+                raise ServerError(
+                    f"Could not retrieve download link for {analysis_protocol_fasta_name}."
+                )
+            url = response.json()["url"]
+            return url
+    def get_analysis_protocol_fasta_urls(
+        self,
+        analysis_protocol_id=None,
+        analysis_id=None,
+        analysis_name=None,
     ):
         """Get the download link(s) for the fasta file(s) associated with a given analysis protocol.
         Args:
@@ -4120,52 +4161,31 @@ class SeerSDK:
         Returns:
             list[dict]: A list of dictionaries containing the 'filename' and the 'url' to download the fasta file.
         """
-        if analysis_name and (not analysis_id):
-            analyses = self.find_analyses(analysis_name=analysis_name)
-            if len(analyses) > 1:
-                raise ValueError(
-                    f"Multiple analyses found with name {analysis_name}. Please provide an analysis ID instead."
-                )
-            elif len(analyses) == 0:
-                raise ValueError(
-                    f"No analyses found with name {analysis_name}."
-                )
-            else:
-                analysis_id = analyses[0]["id"]
-        if not (bool(analysis_protocol_id) ^ bool(analysis_id)):
-            raise ValueError(
-                "Please provide either an analysis ID or an analysis protocol ID."
-            )
         if not analysis_protocol_id:
-            try:
-                analysis_protocol_id = self.get_analysis(
-                    analysis_id=analysis_id
-                )["analysis_protocol_id"]
-            except KeyError:
-                raise ValueError(f"Could not parse server response.")
+            analysis = self.get_analysis(
+                analysis_id=analysis_id,
+                analysis_name=analysis_name,
+            )
+            analysis_protocol_id = analysis.get("analysis_protocol_id")
-        engine = self.get_analysis_protocol(
-            analysis_protocol_id=analysis_protocol_id
-        ).get("analysis_engine", None)
         fasta_filenames = self._get_analysis_protocol_fasta_filenames(
-            analysis_protocol_id=analysis_protocol_id,
-            analysis_protocol_engine=engine,
+            analysis_protocol_id=analysis_protocol_id
         )
-        URL = f"{self._auth.url}api/v1/analysisProtocolFiles/getUrl"
-        links = []
-        for file in fasta_filenames:
-            with self._get_auth_session("getanalysisprotocolfilesurl") as s:
-                filename = os.path.basename(file)
-                response = s.post(URL, json={"filepath": file})
-                if response.status_code != 200:
-                    print(
-                        f"ERROR: Could not retrieve download link for {filename}."
-                    )
-                    continue
-                url = response.json()["url"]
-                links.append({"filename": filename, "url": url})
+        links = {}
+        for filepath in fasta_filenames:
+            filename = os.path.basename(filepath)
+            try:
+                url = self._get_analysis_protocol_fasta_url(
+                    analysis_protocol_fasta_name=filepath
+                )
+            except ServerError:
+                print(
+                    f"ERROR: Could not retrieve download link for {filename}."
+                )
+                continue
+            links[filename] = url
         return links
     def download_analysis_protocol_fasta(
@@ -4186,20 +4206,28 @@ class SeerSDK:
         Returns:
             list[str] : The path to the downloaded fasta file(s).
         """
-        links = [
-            (x["filename"], x["url"])
-            for x in self.get_analysis_protocol_fasta_link(
-                analysis_protocol_id=analysis_protocol_id,
-                analysis_id=analysis_id,
-                analysis_name=analysis_name,
+        if not analysis_protocol_id:
+            analysis = self.get_analysis(
+                analysis_id=analysis_id, analysis_name=analysis_name
             )
-        ]
+            analysis_protocol_id = analysis.get("analysis_protocol_id")
+        filepaths = self._get_analysis_protocol_fasta_filenames(
+            analysis_protocol_id=analysis_protocol_id
+        )
         if not download_path:
             download_path = os.getcwd()
         downloads = []
-        for filename, url in links:
+        for filepath in filepaths:
+            # run sequentially to avoid signed url expiration
+            url = self._get_analysis_protocol_fasta_url(
+                analysis_protocol_fasta_name=filepath
+            )
+            filename = os.path.basename(filepath)
+            # relative path of the file after download
+            local_filename = f"{download_path}/{filename}"
             print(f"Downloading {filename}")
             for _ in range(2):
                 try:
@@ -4215,7 +4243,7 @@ class SeerSDK:
                         )
                         urllib.request.urlretrieve(
                             url,
-                            f"{download_path}/{filename}",
+                            local_filename,
                             reporthook=download_hook(t),
                             data=None,
                         )
@@ -4224,5 +4252,5 @@ class SeerSDK:
                     if not os.path.isdir(f"{download_path}"):
                         os.makedirs(f"{download_path}")
-            downloads.append(f"{download_path}/{filename}")
+            downloads.append(local_filename)
         return downloads

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/core/unsupported.py RENAMED Viewed

@@ -4,6 +4,7 @@ seer_pas_sdk.core.unsupported -- in development
 import os
 import shutil
+from pathlib import Path
 from typing import List as _List
@@ -827,20 +828,29 @@ class _UnsupportedSDK(_SeerSDK):
             )
         # Step 1: Check if paths and file extensions are valid.
+        invalid_d_zip_files = []
         for file in ms_data_files:
             if not valid_ms_data_file(file):
                 raise ValueError(
                     "Invalid file or file format. Please check your file."
                 )
+            if file.endswith(".d.zip") and (not validate_d_zip_file(file)):
+                invalid_d_zip_files.append(file)
+        if invalid_d_zip_files:
+            raise ValueError(
+                f"The following .d.zip files are invalid: {', '.join(invalid_d_zip_files)}. Please check your files."
+            )
         extensions = set(
-            [os.path.splitext(file.lower())[1] for file in ms_data_files]
+            ["".join(Path(file).suffixes) for file in ms_data_files]
         )
         if filenames and ".d.zip" in extensions:
             raise ValueError(
                 "Please leave the 'filenames' parameter empty when working with .d.zip files. SeerSDK.rename_d_zip_file() is available for this use case."
             )
         # Step 2: Use active tenant to fetch the tenant_id.
         tenant_id = self.get_active_tenant_id()
@@ -1473,6 +1483,7 @@ class _UnsupportedSDK(_SeerSDK):
         # 1. Get msrun data for analysis
         samples = self.find_samples(analysis_id=analysis_id)
         sample_name_to_id = {s["sample_name"]: s["id"] for s in samples}
+        sample_uuid_to_id = {s["id"]: s["sample_id"] for s in samples}
         # for np rollup, a row represents an msrun
         msruns = self.find_msruns(sample_ids=sample_name_to_id.values())
         file_to_msrun = {
@@ -1636,8 +1647,7 @@ class _UnsupportedSDK(_SeerSDK):
                 )
             )
             df = df[included_columns]
-            df.columns = [title_case_to_snake_case(x) for x in df.columns]
-            return df
         else:
             # precursor
             # working only in report.tsv
@@ -1678,10 +1688,17 @@ class _UnsupportedSDK(_SeerSDK):
                 "IM",
                 "iIM",
             ]
-            df = search_results[included_columns]
-            df.columns = [title_case_to_snake_case(x) for x in df.columns]
+            df = pd.DataFrame(search_results[included_columns])
+        df.columns = [title_case_to_snake_case(x) for x in df.columns]
+        df["sample_uuid"] = df["sample_id"]
+        df["sample_id"] = df["sample_uuid"].apply(
+            lambda x: sample_uuid_to_id.get(x)
+        )
-            return df
+        if rollup == "panel":
+            df.drop(columns=["msrun_id"], inplace=True, errors="ignore")
+        return df
     def get_search_data_analytes(self, analysis_id: str, analyte_type: str):
         if analyte_type not in ["protein", "peptide", "precursor"]:
@@ -1734,27 +1751,57 @@ class _UnsupportedSDK(_SeerSDK):
                 how="left",
             )
         elif analyte_type == "peptide":
-            peptide_results = self.get_search_result(
-                analysis_id=analysis_id, analyte_type="peptide", rollup="np"
+            # The below logic performs the following:
+            # 1. orders each peptide group by Global.PG.Q.Value, Lib.PG.Q.Value, and Protein Group (ascending)
+            # 2. for each peptide group, select the first row to find the precursor with the lowest Q values
+            # 3. broadcasts the associated protein group columns across all rows with the same peptide.
+            #
+            # This ensures that for each peptide, we retain consistent protein information while avoiding duplication.
+            report_results = report_results.sort_values(
+                [
+                    "Peptide",
+                    "Global.PG.Q.Value",
+                    "Lib.PG.Q.Value",
+                    "Protein Group",
+                ]
             )
-            peptide_results = peptide_results[["Peptide", "Protein Group"]]
-            search_results = pd.merge(
-                peptide_results,
-                search_results,
-                on=["Protein Group"],
-                how="left",
+            columns_to_broadcast = ["Protein Group", "Protein.Ids"]
+            broadcasted = (
+                report_results.groupby("Peptide")
+                .apply(
+                    lambda x: pd.Series(
+                        {
+                            col: x.iloc[0][col]
+                            for col in columns_to_broadcast + ["Peptide"]
+                        }
+                    )
+                )
+                .reset_index(drop=True)
+            )
+            report_results = (
+                report_results.drop(columns=columns_to_broadcast)
+                .merge(broadcasted, on="Peptide", how="left")
+                .drop_duplicates(subset=["Peptide"])
             )
-            report_results = report_results[
-                ["Peptide", "Protein.Ids", "Protein.Group"]
-            ]
-            report_results.drop_duplicates(subset=["Peptide"], inplace=True)
             df = pd.merge(
-                search_results,
                 report_results,
-                on=["Peptide"],
+                search_results,
+                on=["Protein Group"],
                 how="left",
             )
+            df = df[
+                [
+                    "Peptide",
+                    "Protein Group",
+                    "Protein.Ids",
+                    "Protein Names",
+                    "Gene Names",
+                ]
+            ]
         else:
             # precursor
             search_results = search_results[
@@ -1762,9 +1809,6 @@ class _UnsupportedSDK(_SeerSDK):
                     "Protein Group",
                     "Protein Names",
                     "Gene Names",
-                    "Biological Process",
-                    "Molecular Function",
-                    "Cellular Component",
                 ]
             ]
             search_results.drop_duplicates(
@@ -1779,7 +1823,6 @@ class _UnsupportedSDK(_SeerSDK):
                     "Protein.Ids",
                     "Protein.Names",
                     "Genes",
-                    "First.Protein.Description",
                     "Modified.Sequence",
                     "Proteotypic",
                     "Global.Q.Value",
@@ -1788,8 +1831,43 @@ class _UnsupportedSDK(_SeerSDK):
                     "Lib.PG.Q.Value",
                 ]
             ]
-            report_results.drop_duplicates(
-                subset=["Protein Group"], inplace=True
+            # The below logic performs the following:
+            # 1. orders each peptide group by Global.PG.Q.Value, Lib.PG.Q.Value, and Protein Group (ascending)
+            # 2. for each peptide group, select the first row to find the precursor with the lowest Q values
+            # 3. broadcasts the associated protein group columns across all rows with the same peptide.
+            #
+            # This ensures that for each peptide, we retain consistent protein information while avoiding duplication.
+            columns_to_broadcast = [
+                "Protein Group",
+                "Protein.Ids",
+                "Protein.Names",
+                "Genes",
+            ]
+            report_results = report_results.sort_values(
+                [
+                    "Peptide",
+                    "Global.PG.Q.Value",
+                    "Lib.PG.Q.Value",
+                    "Protein Group",
+                ],
+            )
+            broadcasted = (
+                report_results.groupby("Peptide")
+                .apply(
+                    lambda x: pd.Series(
+                        {
+                            col: x.iloc[0][col]
+                            for col in columns_to_broadcast + ["Peptide"]
+                        }
+                    )
+                )
+                .reset_index(drop=True)
+            )
+            report_results = (
+                report_results.drop(columns=columns_to_broadcast)
+                .merge(broadcasted, on="Peptide", how="left")
+                .drop_duplicates(subset=["Peptide", "Precursor.Charge"])
             )
             df = pd.merge(
                 report_results,
@@ -1806,7 +1884,6 @@ class _UnsupportedSDK(_SeerSDK):
                     "Protein.Ids",
                     "Protein.Names",
                     "Genes",
-                    "First.Protein.Description",
                     "Modified.Sequence",
                     "Proteotypic",
                     "Global.Q.Value",
@@ -1814,9 +1891,6 @@ class _UnsupportedSDK(_SeerSDK):
                     "Lib.Q.Value",
                     "Lib.PG.Q.Value",
                     "Gene Names",
-                    "Biological Process",
-                    "Molecular Function",
-                    "Cellular Component",
                 ]
             ]
             df.rename(

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: seer-pas-sdk
-Version: 1.1.1
+Version: 1.2.0
 Summary: SDK for Seer Proteograph Analysis Suite (PAS)
 Author-email: Ryan Sun <rsun@seer.bio>
 License:

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/.github/workflows/lint.yml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/.github/workflows/publish.yml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/.github/workflows/test.yml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/.gitignore RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/.pre-commit-config.yaml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/LICENSE.txt RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/README.md RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/docs/_quarto.yml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/pyproject.toml RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/auth/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/auth/auth.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/common/errors.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/common/groupanalysis.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/core/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/objects/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/objects/groupanalysis.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/objects/headers.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/objects/platemap.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk/objects/volcanoplot.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk.egg-info/requires.txt RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/seer_pas_sdk.egg-info/top_level.txt RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/setup.cfg RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/conftest.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/objects/__init__.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/objects/test_platemap.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/test_auth.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/test_common.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/test_objects.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/test_sdk.py RENAMED Viewed

File without changes

{seer_pas_sdk-1.1.1 → seer_pas_sdk-1.2.0}/tests/unsupported_platemap.py RENAMED Viewed

File without changes

seer-pas-sdk 1.1.1__tar.gz → 1.2.0__tar.gz

seer-pas-sdk 1.1.1tar.gz → 1.2.0tar.gz