PyPI - tol-sdk - Versions diffs - 1.7.0__py3-none-any.whl → 1.7.2__py3-none-any.whl - Mend

tol-sdk 1.7.0py3-none-any.whl → 1.7.2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

tol/api_client/client.py CHANGED Viewed

@@ -392,7 +392,7 @@ class JsonApiClient(HttpClient):
         return r.json()
     def __detail_url(self, object_type: str, object_id: str) -> str:
-        return f'{self.__data_url}/{object_type}/{quote(object_id)}'
+        return f'{self.__data_url}/{object_type}/{quote(str(object_id))}'
     def __list_url(self, object_type: str) -> str:
         return f'{self.__data_url}/{object_type}'
@@ -424,7 +424,7 @@ class JsonApiClient(HttpClient):
         hop_string = '/'.join(relationship_hops)
         base_url = (
-            f'{self.__data_url}/{object_type}:to-one/{quote(object_id)}'
+            f'{self.__data_url}/{object_type}:to-one/{quote(str(object_id))}'
         )
         return f'{base_url}/{hop_string}'
@@ -434,9 +434,8 @@ class JsonApiClient(HttpClient):
         object_id: str,
         relationship_name: str
     ) -> str:
         base_url = (
-            f'{self.__data_url}/{object_type}:to-many/{quote(object_id)}'
+            f'{self.__data_url}/{object_type}:to-many/{quote(str(object_id))}'
         )
         return f'{base_url}/{relationship_name}'

tol/benchling/sql/sequencing_request_sequencing_platform_pacbio.sql CHANGED Viewed

@@ -23,11 +23,11 @@ Output: Table with cols:
 3) submission_sample_id: [character] Foreign key to other entities and results in Benchling. Origin: BWH
 4) eln_file_registry_id: [character] id in Benchling Registry. Origin: BWH
 5) extraction_id: [character] Original DNA extract entity name. For pooled samples, the first DNA extract pooled. Origin: BWH
-6) pooled_sample_id: [character] DNA pooled sample id. Present only if the submitted sample was pooled. Origin: BWH
-7) submission_sample_name: [character] Entity name. Origin: BWH
-8) fluidx_id: [character] Container barcode of the DNA fluidx tube. Origin: BWH
-9) programme_id: [character] ToLID. Origin: BWH
-10) specimen_id: [character] Specimen ID. Origin: STS
+6) submission_sample_name: [character] Entity name. Origin: BWH
+7) fluidx_id: [character] Container barcode of the DNA fluidx tube. Origin: BWH
+8) programme_id: [character] ToLID. Origin: BWH
+9) specimen_id: [character] Specimen ID. Origin: STS
+10) tube_name: [character] Name of the submission tube/container.
 11) sanger_sample_id: [character] Sanger Sample ID or Sanger UUID of the PacBio submission.
 12) plate_name: [character] Name of submission plate.
 13) pipeline: [character] name of the submission pipeline.
@@ -47,12 +47,12 @@ Output: Table with cols:
 27) priority: [character]
 28) completion_date: [Date]
 29) sequencing_platform: [character] Sequencing platform: pacbio.
-30) source: [character] Data source: v1, v2, legacy_bnt or v1_pooled
+30) source: [character] Data source: v1, v1_pooled, v2, v2_pooled, legacy_bnt
 NOTES:
 1) Data types were casted explicitly to conserved the data type stored in BWH.
-2) To add the Fluidx ID of the origininal DNA extract a few filters were applied to
+2) To add the Fluidx ID of the original DNA extract a few filters were applied to
 delete Vouchers, tubes archived because they were made in error, and
 invalid container names.
 3) Pooled samples must be added as an independent CTE because the filters for DNA fluidx tubes
@@ -76,7 +76,12 @@ pacbio_submissions_container_routine AS (
 		c_dna.barcode AS fluidx_id,
 		t.programme_id,
 		t.specimen_id,
-		con.name AS sanger_sample_id,
+		con.name AS tube_name,
+		CASE
+			WHEN pbsum.submission_date < DATE '2025-09-01'
+				THEN con.name
+			ELSE ssid.sanger_sample_id
+		END AS sanger_sample_id,
 		NULL::varchar AS plate_name,
 		NULL::varchar AS pipeline,
 		pbsum.sequencing_type_please_fill AS library_type,
@@ -119,17 +124,19 @@ pacbio_submissions_container_routine AS (
 		ON subsam.project_id$ = proj.id
 	 LEFT JOIN folder$raw AS f
         ON subsam.folder_id$ = f.id
+	LEFT JOIN sanger_sample_id$raw AS ssid
+		ON con.id = ssid.sample_tube
 	WHERE pbsum.archived$ = FALSE -- Excluding archived submission containers
 		-- Filters to add DNA extract fluidx tubes
 		AND tube.type IS NULL  -- Selecting non-Voucher containers
 	    AND (c_dna.archive_purpose$ != ('Made in error') OR c_dna.archive_purpose$ IS NULL) -- Excluding containers made by mistake
 		AND c_dna.barcode LIKE 'F%' -- Selecting only valid FluidX IDs
-		AND proj.name = 'ToL Core Lab' -- Selecting ToL Core Lab sbmissions only
+		AND proj.name = 'ToL Core Lab' -- Selecting ToL Core Lab submissions only
 		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move')
 ),
-pacbio_submissions_container_pooled_deprecated AS (
+pacbio_submissions_container_pooled AS (
 	SELECT DISTINCT
 		t.sts_id,
 		t.taxon_id,
@@ -140,8 +147,13 @@ pacbio_submissions_container_pooled_deprecated AS (
 		subsam.name$ AS eln_submission_sample_name,
 		c_pool.barcode AS fluidx_id,
 		t.programme_id,
-		t.specimen_id,
-		con.name AS sanger_sample_id,
+		t.specimen_id,
+		con.name AS tube_name,
+		CASE
+			WHEN pbsum.submission_date < DATE '2025-09-01'
+				THEN con.name
+			ELSE ssid.sanger_sample_id
+		END AS sanger_sample_id,
 		NULL::varchar AS plate_name,
 		NULL::varchar AS pipeline,
 		pbsum.sequencing_type_please_fill AS library_type,
@@ -186,6 +198,8 @@ pacbio_submissions_container_pooled_deprecated AS (
 		ON subsam.project_id$ = proj.id
 	LEFT JOIN folder$raw AS f
 		ON subsam.folder_id$ = f.id
+	LEFT JOIN sanger_sample_id$raw AS ssid
+		ON con.id = ssid.sample_tube
 	WHERE pbsum.archived$ = FALSE -- Excluding archived submission containers
 		-- Filters to add DNA extract fluidx tubes
 		AND tube.type IS NULL  -- Selecting non-Voucher containers
@@ -195,70 +209,6 @@ pacbio_submissions_container_pooled_deprecated AS (
 		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move')
 ),
-pacbio_submissions_container_pooled AS (
-	SELECT DISTINCT
-		t.sts_id,
-		t.taxon_id,
-		tp.id AS tissue_prep_id,
-		subsam.id AS eln_submission_sample_id,
-		subsam.file_registry_id$ AS eln_file_registry_id,
-		subsam.pooled_sample AS extraction_id,
-		subsam.name$ AS submission_sample_name,
-		c_pool.barcode AS fluidx_id,
-		t.programme_id,
-		t.specimen_id,
-		con.name AS sanger_sample_id,
-		plt.name AS plate_name,
-		pbsubm_p.pipeline,
-		pbsubm_p.library_type,
-		pbsubm_p.retention_instructions,
-		pbsubm_p.gb_yield_of_ccs_data_required,
-		pbsubm_p.number_of_smrt_cells_required,
-		pbsubm_p.sheared_femto_fragment_size_bp,
-		pbsubm_p.post_spri_concentration_ngul,
-		pbsubm_p.post_spri_volume_ul,
-		pbsubm_p.nanodrop_260280,
-		pbsubm_p.nanodrop_260230,
-		pbsubm_p.nanodrop_concentration_ngul,
-		pbsubm_p.sample_prep_additional_requirements,
-		pbsubm_p.include_5mc_cells_in_cpg_motifs,
-		pbsubm_p.cc5_output_includes_kinetics_information,
-		pbsubm_p.priority,
-		DATE(pbsubm_p.created_at$) AS completion_date,
-		'pacbio'::varchar AS sequencing_platform,
-		'v1_pooled'::varchar AS source
-	FROM pacbio_submission_plate_output$raw AS pbsubm_p
-	LEFT JOIN submission_samples$raw AS subsam
-		ON pbsubm_p.sample_name = subsam.id
-	LEFT JOIN pooled_samples$raw AS pool
-		ON subsam.pooled_sample = pool.id
-	LEFT JOIN dna_extract$raw AS dna -- Chunk to add Tissue metadata
-		ON pool.samples ->> 0 = dna.id
-	LEFT JOIN tissue_prep$raw AS tp
-		ON dna.tissue_prep = tp.id
-	LEFT JOIN tissue$raw AS t
-		ON tp.tissue = t.id -- End of Tissue metadata Chunk
-	LEFT JOIN container_content$raw AS cc_pool -- Chunk to add DNA fluidx id
-		ON pool.id = cc_pool.entity_id
-	LEFT JOIN container$raw AS c_pool
-		ON cc_pool.container_id = c_pool.id
-	LEFT JOIN tube$raw AS tube
-		ON c_pool.id = tube.id -- End of DNA fluidx id Chunk
-	LEFT JOIN container$raw AS con -- To add sanger uuid
-		ON pbsubm_p.sanger_uuid ->> 0 = con.id
-	LEFT JOIN plate$raw AS plt
-		ON con.plate_id = plt.id
-	LEFT JOIN project$raw AS proj
-		ON subsam.project_id$ = proj.id
-	LEFT JOIN folder$raw AS f
-		ON subsam.folder_id$ = f.id
-	WHERE subsam.pooled_sample IS NOT NULL
-		AND proj.name = 'ToL Core Lab'
-		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move', 'R&D Sample Processing Requests')
-		AND pbsubm_p.archived$ = FALSE
-),
 pacbio_submissions_container_legacy_deprecated AS (
 	SELECT DISTINCT
@@ -272,6 +222,7 @@ pacbio_submissions_container_legacy_deprecated AS (
 		c_dna.barcode AS fluidx_id,
 		t.programme_id,
 		t.specimen_id,
+		con.name AS tube_name,
 		con.name AS sanger_sample_id,
 		NULL::varchar AS plate_name,
 		NULL::varchar AS pipeline,
@@ -335,10 +286,11 @@ pacbio_submissions_plate_automated_manifest AS (
 		c_dna.barcode AS fluidx_id,
 		t.programme_id,
 		t.specimen_id,
+		con.name AS tube_name,
 		con.name AS sanger_sample_id,
 		plt.name AS plate_name,
 		pbsubm_p.pipeline,
-		NULL::varchar AS library_type,
+		pbsubm_p.library_type,
 		pbsubm_p.retention_instructions,
 		pbsubm_p.gb_yield_of_ccs_data_required,
 		pbsubm_p.number_of_smrt_cells_required,
@@ -352,7 +304,7 @@ pacbio_submissions_plate_automated_manifest AS (
 		pbsubm_p.include_5mc_cells_in_cpg_motifs,
 		pbsubm_p.cc5_output_includes_kinetics_information,
 		pbsubm_p.priority,
-		pbsubm_p.created_at$ AS completion_date,
+		DATE(pbsubm_p.created_at$) AS completion_date,
 		'pacbio'::varchar AS sequencing_platform,
 		'v2'::varchar AS source
 	FROM pacbio_submission_plate_output$raw AS pbsubm_p
@@ -388,7 +340,73 @@ pacbio_submissions_plate_automated_manifest AS (
 		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move', 'R&D Sample Processing Requests')
 ),
+pacbio_submissions_plate_automated_manifest_pooled AS (
+	SELECT DISTINCT
+		t.sts_id,
+		t.taxon_id,
+		tp.id AS tissue_prep_id,
+		subsam.id AS eln_submission_sample_id,
+		subsam.file_registry_id$ AS eln_file_registry_id,
+		subsam.pooled_sample AS extraction_id,
+		subsam.name$ AS submission_sample_name,
+		c_pool.barcode AS fluidx_id,
+		t.programme_id,
+		t.specimen_id,
+		con.name AS tube_name,
+		con.name AS sanger_sample_id,
+		plt.name AS plate_name,
+		pbsubm_p.pipeline,
+		pbsubm_p.library_type,
+		pbsubm_p.retention_instructions,
+		pbsubm_p.gb_yield_of_ccs_data_required,
+		pbsubm_p.number_of_smrt_cells_required,
+		pbsubm_p.sheared_femto_fragment_size_bp,
+		pbsubm_p.post_spri_concentration_ngul,
+		pbsubm_p.post_spri_volume_ul,
+		pbsubm_p.nanodrop_260280,
+		pbsubm_p.nanodrop_260230,
+		pbsubm_p.nanodrop_concentration_ngul,
+		pbsubm_p.sample_prep_additional_requirements,
+		pbsubm_p.include_5mc_cells_in_cpg_motifs,
+		pbsubm_p.cc5_output_includes_kinetics_information,
+		pbsubm_p.priority,
+		DATE(pbsubm_p.created_at$) AS completion_date,
+		'pacbio'::varchar AS sequencing_platform,
+		'v2_pooled'::varchar AS source
+	FROM pacbio_submission_plate_output$raw AS pbsubm_p
+	LEFT JOIN submission_samples$raw AS subsam
+		ON pbsubm_p.sample_name = subsam.id
+	LEFT JOIN pooled_samples$raw AS pool
+		ON subsam.pooled_sample = pool.id
+	LEFT JOIN dna_extract$raw AS dna -- Chunk to add Tissue metadata
+		ON pool.samples ->> 0 = dna.id
+	LEFT JOIN tissue_prep$raw AS tp
+		ON dna.tissue_prep = tp.id
+	LEFT JOIN tissue$raw AS t
+		ON tp.tissue = t.id -- End of Tissue metadata Chunk
+	LEFT JOIN container_content$raw AS cc_pool -- Chunk to add DNA fluidx id
+		ON pool.id = cc_pool.entity_id
+	LEFT JOIN container$raw AS c_pool
+		ON cc_pool.container_id = c_pool.id
+	LEFT JOIN tube$raw AS tube
+		ON c_pool.id = tube.id -- End of DNA fluidx id Chunk
+	LEFT JOIN container$raw AS con -- To add sanger uuid
+		ON pbsubm_p.sanger_uuid ->> 0 = con.id
+	LEFT JOIN plate$raw AS plt
+		ON con.plate_id = plt.id
+	LEFT JOIN project$raw AS proj
+		ON subsam.project_id$ = proj.id
+	LEFT JOIN folder$raw AS f
+		ON subsam.folder_id$ = f.id
+	WHERE subsam.pooled_sample IS NOT NULL
+		AND proj.name = 'ToL Core Lab'
+		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move', 'R&D Sample Processing Requests')
+		AND pbsubm_p.archived$ = FALSE
+),
 pacbio_submissions_plate_routine AS (
 	SELECT
 		t.sts_id,
 		t.taxon_id,
@@ -397,10 +415,11 @@ pacbio_submissions_plate_routine AS (
 		subsam.file_registry_id$ AS eln_file_registry_id,
 		subsam.original_dna_extract AS extraction_id,
 		subsam.name$ AS submission_sample_name,
-		NULL::varchar AS fluidx_id,
+		c_dna.barcode AS fluidx_id,
 		t.programme_id,
 		t.specimen_id,
-		CAST(pbsubm_p.sanger_sample_id ->>0 AS varchar) AS sanger_sample_id,
+		c_subsam.name AS tube_name,
+		ssid.sanger_sample_id AS sanger_sample_id,
 		plate.name$ AS plate_name,
 		NULL::varchar AS pipeline,
 		pbsubm_p.sequencing_type AS library_type,
@@ -423,6 +442,10 @@ pacbio_submissions_plate_routine AS (
 	FROM pacbio_sequencing_submission_plate_output$raw AS pbsubm_p
 	LEFT JOIN submission_samples$raw AS subsam
 		ON pbsubm_p.submission_sample = subsam.id
+	LEFT JOIN container_content$raw AS cc_subsam -- Chunk to connect SubSam to the well
+		ON subsam.id = cc_subsam.entity_id
+	LEFT JOIN container$raw AS c_subsam
+		ON cc_subsam.container_id = c_subsam.id -- End of connecting SubSam to well
 	LEFT JOIN dna_extract$raw AS dna
 		ON subsam.original_dna_extract = dna.id
 	LEFT JOIN tissue_prep$raw AS tp
@@ -431,17 +454,100 @@ pacbio_submissions_plate_routine AS (
 		ON tp.tissue = t.id
 	LEFT JOIN container$raw AS con
 		ON pbsubm_p.plate_well_id ->>0 = con.id
+	LEFT JOIN container_content$raw AS cc_dna -- Chunk to add DNA fluidx id
+		ON dna.id = cc_dna.entity_id
+	LEFT JOIN container$raw AS c_dna
+		ON cc_dna.container_id = c_dna.id
+	LEFT JOIN tube$raw AS tube
+		ON c_dna.id = tube.id -- End of DNA fluidx id Chunk
 	LEFT JOIN "_96w_pacbio_plate$raw" AS plate
 		ON con.plate_id = plate.id
+	LEFT JOIN sanger_sample_id$raw AS ssid
+		ON con.id = ssid.sample_tube
+	LEFT JOIN project$raw AS proj
+		ON subsam.project_id$ = proj.id
+	 LEFT JOIN folder$raw AS f
+        ON subsam.folder_id$ = f.id
+	WHERE pbsubm_p.archived$ = FALSE -- Excluding archived submissions
+		AND tube.type IS NULL  -- Selecting non-Voucher containers
+	    AND (c_dna.archive_purpose$ != ('Made in error') OR c_dna.archive_purpose$ IS NULL) -- Excluding containers made by mistake
+		AND c_dna.barcode LIKE 'F%' -- Selecting only valid FluidX IDs
+		AND proj.name = 'ToL Core Lab' -- Selecting ToL Core Lab submissions only
+		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move')
+),
+pacbio_submissions_plate_routine_pooled AS (
+	SELECT
+		t.sts_id,
+		t.taxon_id,
+		tp.id AS tissue_prep_id,
+		subsam.id AS submission_sample_id,
+		subsam.file_registry_id$ AS eln_file_registry_id,
+		subsam.pooled_sample AS extraction_id,
+		subsam.name$ AS submission_sample_name,
+		c_pool.barcode AS fluidx_id,
+		t.programme_id,
+		t.specimen_id,
+		c_subsam.name AS tube_name,
+		ssid.sanger_sample_id AS sanger_sample_id,
+		plate.name$ AS plate_name,
+		NULL::varchar AS pipeline,
+		pbsubm_p.sequencing_type AS library_type,
+		NULL::varchar AS retention_instructions,
+		NULL::float8 AS gb_yield_of_ccs_data_required,
+		pbsubm_p.number_of_smrt_cells_required,
+		NULL::float8 AS sheared_femto_fragment_size_bp,
+		NULL::float8 AS post_spri_concentration_ngul,
+		NULL::JSONB AS post_spri_volume_ul,
+		NULL::float8 AS nanodrop_260280,
+		NULL::float8 AS nanodrop_260230,
+		NULL::float8 AS nanodrop_concentration_ngul,
+		NULL::varchar AS sample_prep_additional_requirements,
+		NULL::varchar AS include_5mc_cells_in_cpg_motifs,
+		NULL::varchar AS cc5_output_includes_kinetics_information,
+		NULL::varchar AS priority,
+		pbsubm_p.created_at$ AS completion_date,
+		'pacbio'::varchar AS sequencing_platform,
+		'v2'::varchar AS SOURCE
+	FROM pacbio_sequencing_submission_plate_output$raw AS pbsubm_p
+	LEFT JOIN submission_samples$raw AS subsam
+		ON pbsubm_p.submission_sample = subsam.id
+	LEFT JOIN container_content$raw AS cc_subsam -- Connect SubSam to the well
+		ON subsam.id = cc_subsam.entity_id
+	LEFT JOIN container$raw AS c_subsam
+		ON cc_subsam.container_id = c_subsam.id -- End of chunk to connect subsam to the well
+	LEFT JOIN container$raw AS con -- Chunk to get plate ID
+		ON pbsubm_p.plate_well_id ->>0 = con.id
+	LEFT JOIN "_96w_pacbio_plate$raw" AS plate
+		ON con.plate_id = plate.id -- End of chunk to get the plate ID
+	LEFT JOIN sanger_sample_id$raw AS ssid
+		ON con.id = ssid.sample_tube
+	LEFT JOIN pooled_samples$raw AS pool
+		ON subsam.pooled_sample = pool.id
+	LEFT JOIN container_content$raw AS cc_pool -- Chunk to connect pooled sample to the FluidX tube
+		ON pool.id = cc_pool.entity_id
+	LEFT JOIN container$raw AS c_pool
+		ON cc_pool.container_id = c_pool.id -- End of chunk to connect pooled sample to the FluidX tube
+	LEFT JOIN dna_extract$raw AS dna -- Chunk to add Tissue metadata
+		ON pool.samples ->> 0 = dna.id
+	LEFT JOIN tissue_prep$raw AS tp
+		ON dna.tissue_prep = tp.id
+	LEFT JOIN tissue$raw AS t
+		ON tp.tissue = t.id -- End of Tissue metadata Chunk
+	LEFT JOIN project$raw AS proj
+		ON subsam.project_id$ = proj.id
+	 LEFT JOIN folder$raw AS f
+        ON subsam.folder_id$ = f.id
+	WHERE subsam.pooled_sample IS NOT NULL
+	    AND pbsubm_p.archived$ = FALSE
+		AND proj.name = 'ToL Core Lab' -- Selecting ToL Core Lab submissions only
+		AND f.name IN ('Routine Throughput', 'PacBio prep', 'Submissions', 'Core Lab Entities', 'Benchling MS Project Move')
 )
 SELECT *
 FROM pacbio_submissions_container_routine
 UNION
 SELECT *
-FROM pacbio_submissions_container_pooled_deprecated
-UNION
-SELECT *
 FROM pacbio_submissions_container_pooled
 UNION
 SELECT *
@@ -449,7 +555,13 @@ FROM pacbio_submissions_container_legacy_deprecated
 UNION
 SELECT *
 FROM pacbio_submissions_plate_automated_manifest
+UNION
+SELECT *
+FROM pacbio_submissions_plate_automated_manifest_pooled
 UNION
 SELECT *
 FROM pacbio_submissions_plate_routine
+UNION
+SELECT *
+FROM pacbio_submissions_plate_routine_pooled
 ORDER BY source DESC

tol/mlwh/mlwh_datasource.py CHANGED Viewed

@@ -462,7 +462,7 @@ class MlwhDataSource(DataSource, DetailGetter, ListGetter):
         return "','".join([str(s) for s in values])
     def _conditions_string(self, platform_type: str, in_list: Dict):
-        if in_list is None:
+        if not in_list:
             return '1=1'  # Something to go with the where clause
         sql_conditions = []
         if platform_type.lower() == 'illumina':
@@ -484,7 +484,7 @@ class MlwhDataSource(DataSource, DetailGetter, ListGetter):
     def _execute_query(self, query, object_type):
         cur_mlwh = self.mlwh.cursor(dictionary=True)
         cur_mlwh.execute(query)
-        for row in cur_mlwh.fetchall():
+        for row in cur_mlwh:
             yield self._format_mlwh_row(object_type, row)
     def __get_in_lists(self, f: DataSourceFilter):

tol/validators/__init__.py CHANGED Viewed

@@ -4,4 +4,6 @@
 from .allowed_values import AllowedValues, AllowedValuesValidator  # noqa
 from .allowed_keys import AllowedKeysValidator  # noqa
+from .regex import Regex, RegexValidator  # noqa
+from .regex_by_value import RegexByValueValidator  # noqa
 from .unique_values import UniqueValuesValidator  # noqa

tol/validators/regex.py ADDED Viewed

@@ -0,0 +1,109 @@
+# SPDX-FileCopyrightText: 2025 Genome Research Ltd.
+#
+# SPDX-License-Identifier: MIT
+import re
+from dataclasses import dataclass
+from typing import Any
+from tol.core import DataObject
+from tol.core.validate import Validator
+@dataclass(frozen=True, kw_only=True)
+class Regex:
+    key: str
+    regex: str
+    is_error: bool = True
+    detail: str = 'Value is not allowed for given key'
+    def is_allowed(self, __v: Any) -> bool:
+        # Check regex
+        return re.search(self.regex, str(__v or ''))
+RegexDict = dict[
+    str,
+    str | bool | list[Any],
+]
+"""Can also specify `Regex` as a `dict`"""
+class RegexValidator(Validator):
+    """
+    Validates an incoming stream of `DataObject` instances
+    according to the specified allowed values for a given
+    key.
+    """
+    def __init__(
+        self,
+        config: list[Regex | RegexDict]
+    ) -> None:
+        super().__init__()
+        self.__config = self.__get_config(config)
+    def _validate_data_object(
+        self,
+        obj: DataObject
+    ) -> None:
+        for k, v in obj.attributes.items():
+            self.__validate_attribute(obj, k, v)
+    def __get_config(
+        self,
+        config: list[Regex | RegexDict],
+    ) -> list[Regex]:
+        # Ensure config is in Regex format
+        # (as you can either pass in a list of Regex or a RegexDict,
+        # which can be used to initialize a Regex)
+        return [
+            c if isinstance(c, Regex) else Regex(**c)
+            for c in config
+        ]
+    def __validate_attribute(
+        self,
+        obj: DataObject,
+        key: str,
+        value: Any,
+    ) -> None:
+        config = self.__filter_config(key)
+        for c in config:
+            if not c.is_allowed(value):
+                self.__add_result(obj, c)
+    def __filter_config(
+        self,
+        key: str,
+    ) -> list[Regex]:
+        return [
+            a for a in self.__config
+            if a.key == key
+        ]
+    def __add_result(
+        self,
+        obj: DataObject,
+        c: Regex,
+    ) -> None:
+        if c.is_error:
+            self.add_error(
+                object_id=obj.id,
+                detail=c.detail,
+                field=c.key
+            )
+        else:
+            self.add_warning(
+                object_id=obj.id,
+                detail=c.detail,
+                field=c.key,
+            )

tol/validators/regex_by_value.py ADDED Viewed

@@ -0,0 +1,99 @@
+# SPDX-FileCopyrightText: 2025 Genome Research Ltd.
+#
+# SPDX-License-Identifier: MIT
+from typing import Any
+from tol.core import DataObject
+from tol.core.validate import Validator
+from .regex import Regex
+RegexDict = dict[
+    str,
+    str | bool | list[Any],
+]
+Config = dict[str, str | dict[str, list[Regex | RegexDict]]]
+"""Can also specify `Regex` as a `dict`"""
+class RegexByValueValidator(Validator):
+    """
+    Validates an incoming stream of `DataObject` instances
+    according to the specified allowed values for a given
+    key.
+    """
+    def __init__(
+        self,
+        config: dict[str, str | list[str]]
+    ) -> None:
+        super().__init__()
+        self.__config = self.__get_config(config)
+    def __get_config(
+        self,
+        config: Config,
+    ) -> Config:
+        return {
+            'key_column': config['key_column'],
+            'regexes': {
+                k: [
+                    # Ensure they're all in Regex format
+                    # (as you can either pass in a list of Regex or a RegexDict,
+                    # which can be used to initialize a Regex)
+                    c if isinstance(c, Regex) else Regex(**c)
+                    for c in v
+                ]
+                for k, v in config['regexes'].items()
+            }
+        }
+    def _validate_data_object(
+        self,
+        obj: DataObject
+    ) -> None:
+        # Pull out value of the 'key_column' attribute
+        key_column_value = obj.attributes.get(self.__config['key_column'])
+        if not key_column_value:
+            return
+        # Pull out relevant regex list based on this value: {[{'name': 'regex'}]}
+        regex_list = self.__config['regexes'].get(key_column_value)
+        if not regex_list:
+            return
+        self.__validate_attribute(obj, regex_list)
+    def __validate_attribute(
+        self,
+        obj: DataObject,
+        regexes: list[Regex],
+    ) -> None:
+        for r in regexes:
+            attribute_name = r.key
+            value = obj.attributes.get(attribute_name)
+            if not r.is_allowed(value):
+                self.__add_result(obj, r)
+    def __add_result(
+        self,
+        obj: DataObject,
+        c: Regex,
+    ) -> None:
+        if c.is_error:
+            self.add_error(
+                object_id=obj.id,
+                detail=c.detail,
+                field=c.key
+            )
+        else:
+            self.add_warning(
+                object_id=obj.id,
+                detail=c.detail,
+                field=c.key,
+            )

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tol-sdk
-Version: 1.7.0
+Version: 1.7.2
 Summary: SDK for interaction with ToL, Sanger and external services
 Author-email: ToL Platforms Team <tol-platforms@sanger.ac.uk>
 License: MIT
@@ -37,7 +37,7 @@ Requires-Dist: atlassian-python-api==3.41.14; extra == "jira"
 Provides-Extra: json
 Requires-Dist: minio==7.2.15; extra == "json"
 Provides-Extra: mysql
-Requires-Dist: mysql-connector-python; extra == "mysql"
+Requires-Dist: mysql-connector-python==9.5.0; extra == "mysql"
 Provides-Extra: postgresql
 Requires-Dist: SQLAlchemy==2.0.35; extra == "postgresql"
 Requires-Dist: psycopg2-binary==2.9.9; extra == "postgresql"

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/RECORD RENAMED Viewed

@@ -31,7 +31,7 @@ tol/api_base/misc/relation_url.py,sha256=qfo-okp8Gv9-PEDghMfGZ2pHdYbHRhohvA9v3Go
 tol/api_base/misc/stats_parameters.py,sha256=IVpHqUeGQyjuih59jwqT-fIQMCBeESi2T9b4r9i4J28,1721
 tol/api_client/__init__.py,sha256=58SAywuMrIUCBAY9us_d_RLTMnaUTYWWts0LRQC5wLo,187
 tol/api_client/api_datasource.py,sha256=GOHvAmFzrHdux2wxY-MwUbp6eWbbS01L7FvmVyXJVZM,14330
-tol/api_client/client.py,sha256=hy1MT1iDCDDOOZfhsD5-8vA7ksf56BIoe96UCiqQ06Y,13965
+tol/api_client/client.py,sha256=gcnX4iCZtjnCC6qylizXxLe3l6xLhME6LEJH0UeW7V4,13979
 tol/api_client/converter.py,sha256=X6VPk4nrvmXF8EOXy36sn1nvPvYTBYKZ66ofxyQbaY8,4681
 tol/api_client/exception.py,sha256=MkvJaIyRVCzQ2rKOYnCOcT747mpOeQwGJJl3Kkb1BsQ,3999
 tol/api_client/factory.py,sha256=DIYFmFsQlwOCrUfexiEMBN3ovnreMqUFYNU8hcNvSao,3405
@@ -59,7 +59,7 @@ tol/benchling/sql/results_pacbio_prep.sql,sha256=a1tGu9irtsyPlCA0_FxBrqYj_uFiYPM
 tol/benchling/sql/results_pacbio_prep_pooled.sql,sha256=WZfMZbOeOfD55iDQQEwPNc7mF0gSJfMj94A1m9whLtw,6291
 tol/benchling/sql/sample.sql,sha256=ZFRXWabV9jjivAECCJz-zi05a5oSzTUQSbT2ssy4sGU,4174
 tol/benchling/sql/sequencing_request_sequencing_platform_hic.sql,sha256=W5VCnWvR16CJHnljzqdcQKJ8GWoci9QQhYVA2SbNyKk,3044
-tol/benchling/sql/sequencing_request_sequencing_platform_pacbio.sql,sha256=I-qx-nZuGRybojEVqIfCpIsyT58nzTUA-3pIcUMQeLo,18373
+tol/benchling/sql/sequencing_request_sequencing_platform_pacbio.sql,sha256=ecJvV_qvZwnWKvr5tS0kZVHySNTEgU0irbklsE6tUfQ,22958
 tol/benchling/sql/sequencing_request_sequencing_platform_rnaseq.sql,sha256=zZ3d_VLXMHAp3n6agX-Y-Oj6HIzk32IDq5UssJuYPMs,3420
 tol/benchling/sql/sequencing_request_sequencing_platform_wgs.sql,sha256=RVSR2y_ZYT0j-NDYYAzGSkLwofXy6A_9AfI0RFuztbQ,7062
 tol/benchling/sql/tissue_prep.sql,sha256=8JAOUaXDc0nu0qJeIYLdTXY5APklSighDoESHtzZ8vw,6141
@@ -230,7 +230,7 @@ tol/labwhere/factory.py,sha256=33ljl5jLZ8bMTXLZauVyKQNxPX5UHgtMyb-NI_9Vemg,2327
 tol/labwhere/labwhere_datasource.py,sha256=z8h1781yM_zJQXXHEXrGzbSnQmIHEZ3v7gMv67xhpvI,4079
 tol/labwhere/parser.py,sha256=7C5ZHMqk0gDOUoPM-5KLeQa79pr1Hx69kgzHDY2gc-M,2815
 tol/mlwh/__init__.py,sha256=fLh6NTRmDi63IpXfUCs9NOc_hLVAkGkoRozjGh36GBU,125
-tol/mlwh/mlwh_datasource.py,sha256=IKw0-lRMhGycWnSfWOvWRXkVAS3Ab7qE0-NXDFR_Ue8,25072
+tol/mlwh/mlwh_datasource.py,sha256=TTnPEm1-vGc1qRYAJVN4X2skJcAIowGkd5wzMKIAyus,25057
 tol/prefect/__init__.py,sha256=VhGEUNR-0Fi5SmLPZzxJt7GYbaIjCFK6GGe786ezNj8,199
 tol/prefect/converter.py,sha256=YCWgb01QtRPoAgI6C6Gav1Ti69k_TfIXgfMEsrXQLOA,4321
 tol/prefect/factory.py,sha256=mO4KVnaEYMv-ojGJuiencNQMq6PAMU8cIc4QN5Kq8Gw,2208
@@ -319,13 +319,15 @@ tol/treeval/treeval_datasource.py,sha256=GzY6JwH67b5QdV-UVdCFJfgGAIuZ96J2nl53YxZ
 tol/utils/__init__.py,sha256=764-Na1OaNGUDWpMIu51ZtXG7n_nB5MccUFK6LmkWRI,138
 tol/utils/csv.py,sha256=mihww25fSn72c4h-RFeqD_pFIG6KHZP4v1_C0rx81ws,421
 tol/utils/s3.py,sha256=aoYCwJ-qcMqFrpxmViFqPa0O1jgp0phtztO3-0CSNjw,491
-tol/validators/__init__.py,sha256=XXwCt8JPQ5-w2kN1bVjJPLXbS9F4s1nJUnY9jaKdmVk,272
+tol/validators/__init__.py,sha256=bIMjfuRd358nUPLp6fMG9nTs43gM9aA9oY_AINgxkWU,379
 tol/validators/allowed_keys.py,sha256=BJMomJtaQdxsdGsueDtLewv75TlwdIXiQipLGFcJ7_c,1331
 tol/validators/allowed_values.py,sha256=yJ5SdiUlV7PSKORtsBJ9hYSqwvlx_esbFmFL_Gxh-p4,2262
+tol/validators/regex.py,sha256=dKodGH0sv6DbqWeV6QXE6-GYjnG4rMO0rg8IEIaQG60,2364
+tol/validators/regex_by_value.py,sha256=o99NJlWPgQ0GrpVnep8-cHfjWnc9F2rChmXHIxjrMrk,2543
 tol/validators/unique_values.py,sha256=stI-1i006WEbERcjSMapRggJkEF-RFDzw2uUtXBAE_M,1885
-tol_sdk-1.7.0.dist-info/licenses/LICENSE,sha256=RF9Jacy-9BpUAQQ20INhTgtaNBkmdTolYCHtrrkM2-8,1077
-tol_sdk-1.7.0.dist-info/METADATA,sha256=I8e_3R5_nYW6u3e3CFZgUT-H1Cyo3BwlmIUa6cLlUEI,3072
-tol_sdk-1.7.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-tol_sdk-1.7.0.dist-info/entry_points.txt,sha256=jH3HfTwxjzog7E3lq8CKpUWGIRY9FSXbyL6CpUmv6D0,36
-tol_sdk-1.7.0.dist-info/top_level.txt,sha256=PwKMQLphyZNvagBoriVbl8uwHXQl8IC1niawVG0iXMM,10
-tol_sdk-1.7.0.dist-info/RECORD,,
+tol_sdk-1.7.2.dist-info/licenses/LICENSE,sha256=RF9Jacy-9BpUAQQ20INhTgtaNBkmdTolYCHtrrkM2-8,1077
+tol_sdk-1.7.2.dist-info/METADATA,sha256=FOII5eZYn_0x2XpUiv_dTnTUBz7_ZArof7jrD5NZHms,3079
+tol_sdk-1.7.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+tol_sdk-1.7.2.dist-info/entry_points.txt,sha256=jH3HfTwxjzog7E3lq8CKpUWGIRY9FSXbyL6CpUmv6D0,36
+tol_sdk-1.7.2.dist-info/top_level.txt,sha256=PwKMQLphyZNvagBoriVbl8uwHXQl8IC1niawVG0iXMM,10
+tol_sdk-1.7.2.dist-info/RECORD,,

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/WHEEL RENAMED Viewed

File without changes

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

{tol_sdk-1.7.0.dist-info → tol_sdk-1.7.2.dist-info}/top_level.txt RENAMED Viewed

File without changes

tol-sdk 1.7.0__py3-none-any.whl → 1.7.2__py3-none-any.whl

tol-sdk 1.7.0py3-none-any.whl → 1.7.2py3-none-any.whl