PyPI - dasl-client - Versions diffs - 1.0.25__tar.gz → 1.0.27__tar.gz - Mend

dasl-client 1.0.25tar.gz → 1.0.27tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of dasl-client might be problematic. Click here for more details.

Files changed (41) hide show

dasl_client-1.0.27/PKG-INFO ADDED Viewed

@@ -0,0 +1,144 @@
+Metadata-Version: 2.4
+Name: dasl_client
+Version: 1.0.27
+Summary: The DASL client library used for interacting with the DASL workspace
+Author-email: Antimatter Team <support@antimatter.io>
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: dasl_api==0.1.26
+Requires-Dist: databricks-sdk>=0.41.0
+Requires-Dist: pydantic>=2
+Requires-Dist: typing_extensions>=4.10.0
+Requires-Dist: pyyaml==6.0.2
+Dynamic: license-file
+# DASL Client Library
+The DASL (Databricks Antimatter Security Lakehouse) Client Library is a Python SDK for interacting with DASL services.
+This library provides an interface for interacting with DASL services, allowing you to manage
+datasources, rules, workspace configurations, and more from Databricks notebooks.
+## Features
+* **Simple Authentication**: Automatic workspace detection in Databricks notebooks
+* **Datasource Management**: Create, update, list, and delete datasources
+* **Rule Management**: Define and manage security detection rules to identify threats
+* **Workspace Configuration**: Update and retrieve DASL's workspace-level settings
+## Installation
+Install from PyPI:
+```bash
+pip install dasl-client
+```
+## Quick Start
+### Databricks Notebook Environment (Recommended)
+The DASL client works best in Databricks notebooks with automatic authentication:
+```python
+from dasl_client import Client
+# Automatically detects Databricks context and authenticates
+client = Client.for_workspace()
+print("Connected to DASL!")
+# List existing datasources
+print("Existing datasources:")
+for datasource in client.list_datasources():
+    print(f"  - {datasource.metadata.name}")
+# List detection rules
+print("Existing detection rules:")
+for rule in client.list_rules():
+    print(f"  - {rule.metadata.name}")
+```
+### Creating a Datasource
+```python
+from dasl_client import DataSource, Schedule, BronzeSpec, SilverSpec
+# Create a new datasource
+datasource = Datasource(
+    source="aws",
+    source_type="cloudtrail",
+    autoloader=Autoloader(
+        enabled=True,
+        schedule=Schedule(
+            at_least_every="1h",
+            enabled=True
+        )
+    ),
+    bronze=BronzeSpec(
+        bronze_table="security_logs_bronze",
+        skip_bronze_loading=False
+    ),
+    silver=SilverSpec(
+        # Configure silver layer here, see the API reference for more details
+    ),
+    gold=GoldSpec(
+        # Configure gold layer here, see the API reference for more details
+    )
+)
+# Create the datasource
+created_datasource = client.create_datasource(datasource)
+print(f"Created datasource: {created.metadata.name}")
+```
+### Creating a Detection Rule
+```python
+from dasl_client.types import Rule, Schedule
+# Create a new detection rule to detect failed logins
+rule = Rule(
+    schedule=Schedule(
+        at_least_every="2h",
+        enabled=True,
+    ),
+    input=Rule.Input(
+        stream=Rule.Input.Stream(
+            tables=[
+                Rule.Input.Stream.Table(name="http_activity"),
+            ],
+            filter="disposition = 'Blocked'",
+            starting_timestamp=datetime(2025, 7, 8, 16, 47, 30),
+        ),
+    ),
+    output=Rule.Output(
+        summary="record was blocked",
+    ),
+)
+try:
+    created_rule = client.create_rule("Detect Blocked HTTP Activity", rule)
+    print(f"Successfully created rule: {created_rule.metadata.name}")
+except Exception as e:
+    print(f"Error creating rule: {e}")
+```
+## Requirements
+- Python 3.8+
+- Access to a Databricks workspace with DASL enabled
+- `databricks-sdk>=0.41.0`
+- `pydantic>=2`
+## Documentation
+For complete DASL Client documentation, examples, and API reference:
+- [DASL Client Documentation](https://antimatter-dasl-client.readthedocs-hosted.com/)
+- [API Reference](https://antimatter-dasl-client.readthedocs-hosted.com/en/latest/api-reference/)
+- [Quickstart Guide](https://antimatter-dasl-client.readthedocs-hosted.com/en/latest/quickstart.html)
+## Support
+- **Email**: support@antimatter.io
+- **Documentation**: [DASL Documentation](https://docs.sl.antimatter.io)

dasl_client-1.0.27/README.md ADDED Viewed

@@ -0,0 +1,129 @@
+# DASL Client Library
+The DASL (Databricks Antimatter Security Lakehouse) Client Library is a Python SDK for interacting with DASL services.
+This library provides an interface for interacting with DASL services, allowing you to manage
+datasources, rules, workspace configurations, and more from Databricks notebooks.
+## Features
+* **Simple Authentication**: Automatic workspace detection in Databricks notebooks
+* **Datasource Management**: Create, update, list, and delete datasources
+* **Rule Management**: Define and manage security detection rules to identify threats
+* **Workspace Configuration**: Update and retrieve DASL's workspace-level settings
+## Installation
+Install from PyPI:
+```bash
+pip install dasl-client
+```
+## Quick Start
+### Databricks Notebook Environment (Recommended)
+The DASL client works best in Databricks notebooks with automatic authentication:
+```python
+from dasl_client import Client
+# Automatically detects Databricks context and authenticates
+client = Client.for_workspace()
+print("Connected to DASL!")
+# List existing datasources
+print("Existing datasources:")
+for datasource in client.list_datasources():
+    print(f"  - {datasource.metadata.name}")
+# List detection rules
+print("Existing detection rules:")
+for rule in client.list_rules():
+    print(f"  - {rule.metadata.name}")
+```
+### Creating a Datasource
+```python
+from dasl_client import DataSource, Schedule, BronzeSpec, SilverSpec
+# Create a new datasource
+datasource = Datasource(
+    source="aws",
+    source_type="cloudtrail",
+    autoloader=Autoloader(
+        enabled=True,
+        schedule=Schedule(
+            at_least_every="1h",
+            enabled=True
+        )
+    ),
+    bronze=BronzeSpec(
+        bronze_table="security_logs_bronze",
+        skip_bronze_loading=False
+    ),
+    silver=SilverSpec(
+        # Configure silver layer here, see the API reference for more details
+    ),
+    gold=GoldSpec(
+        # Configure gold layer here, see the API reference for more details
+    )
+)
+# Create the datasource
+created_datasource = client.create_datasource(datasource)
+print(f"Created datasource: {created.metadata.name}")
+```
+### Creating a Detection Rule
+```python
+from dasl_client.types import Rule, Schedule
+# Create a new detection rule to detect failed logins
+rule = Rule(
+    schedule=Schedule(
+        at_least_every="2h",
+        enabled=True,
+    ),
+    input=Rule.Input(
+        stream=Rule.Input.Stream(
+            tables=[
+                Rule.Input.Stream.Table(name="http_activity"),
+            ],
+            filter="disposition = 'Blocked'",
+            starting_timestamp=datetime(2025, 7, 8, 16, 47, 30),
+        ),
+    ),
+    output=Rule.Output(
+        summary="record was blocked",
+    ),
+)
+try:
+    created_rule = client.create_rule("Detect Blocked HTTP Activity", rule)
+    print(f"Successfully created rule: {created_rule.metadata.name}")
+except Exception as e:
+    print(f"Error creating rule: {e}")
+```
+## Requirements
+- Python 3.8+
+- Access to a Databricks workspace with DASL enabled
+- `databricks-sdk>=0.41.0`
+- `pydantic>=2`
+## Documentation
+For complete DASL Client documentation, examples, and API reference:
+- [DASL Client Documentation](https://antimatter-dasl-client.readthedocs-hosted.com/)
+- [API Reference](https://antimatter-dasl-client.readthedocs-hosted.com/en/latest/api-reference/)
+- [Quickstart Guide](https://antimatter-dasl-client.readthedocs-hosted.com/en/latest/quickstart.html)
+## Support
+- **Email**: support@antimatter.io
+- **Documentation**: [DASL Documentation](https://docs.sl.antimatter.io)

{dasl_client-1.0.25 → dasl_client-1.0.27}/dasl_client/client.py RENAMED Viewed

@@ -8,6 +8,8 @@ from pyspark.sql import DataFrame
 from dasl_api import (
     CoreV1Api,
     DbuiV1Api,
+    DbuiV1QueryExtendRequest,
+    CoreV1QueryExtendRequestDateRange,
     DbuiV1QueryGenerateRequest,
     DbuiV1QueryGenerateRequestTimeRange,
     DbuiV1QueryGenerateStatus,
@@ -597,7 +599,7 @@ class Client:
     def exec_rule(
         self,
         spark,
-        rule_in: Rule,
+        rule_in: Rule | str,
     ) -> ExecRule:
         """
         Locally execute a Rule. Must be run from within a Databricks
@@ -607,19 +609,25 @@ class Client:
         :param spark: Spark context from Databricks notebook. Will be
             injected into the execution environment for use by the
             Rule notebook.
-        :param rule_in: The specification of the Rule to execute.
+        :param rule_in:
+            The specification of the Rule to execute. If specified as
+            a string, it should be in YAML format.
         :returns ExecRule: A class containing various information and
             functionality relating to the execution. See the docs for
             ExecRule for additional details, but note that you must
             call its cleanup function or tables created just for this
             request will leak.
         """
+        rule = rule_in
+        if isinstance(rule_in, str):
+            rule = Rule.from_yaml_str(rule_in)
         Helpers.ensure_databricks()
         with error_handler():
             result = self._core_client().core_v1_render_rule(
                 self._workspace(),
-                rule_in.to_api_obj(),
+                rule.to_api_obj(),
             )
             try:
@@ -794,6 +802,60 @@ class Client:
                 .id
             )
+    def extend_query(
+        self,
+        id: str,
+        warehouse: Optional[str] = None,
+        start_date: Optional[str] = None,
+        end_date: Optional[str] = None,
+    ) -> str:
+        """
+        Extend an existing query to cover a larger time range . If the query
+        is ordered by time and contains no aggregations, this will add the
+        additional data to the existing underlying query, returning the
+        existing ID. If the existing table cannot be extended, a new table
+        will be created to cover the updated time range.
+        :param id: The ID of the query to extend.
+        :param warehouse: The SQL warehouse use to execute the SQL. If
+            omitted, the default SQL warehouse specified in the workspace
+            config will be used.
+        :param start_date: An optional starting date to extend the existing
+            query by. If not provided, the current start date of the query
+            will be used.
+        :param end_date: An optional end date to extend the existing
+            query by. If not provided, the current end date of the query
+            will be used.
+        :returns str: The ID of the query generation operation. This value
+            can be used with get_query_status to track the progress of
+            the generation process, and eventually to perform lookups
+            on the completed query. If the current query could be extended,
+            this id will be the same as the one provided. If a new query had
+            to be generated, the new ID is returned.
+        """
+        time_range = None
+        if start_date is not None or end_date is not None:
+            time_range = CoreV1QueryExtendRequestDateRange(
+                startDate=start_date,
+                endDate=end_date,
+            )
+        req = DbuiV1QueryExtendRequest(
+            warehouse=warehouse,
+            timeRange=time_range,
+        )
+        with error_handler():
+            return (
+                self._dbui_client()
+                .dbui_v1_query_extend(
+                    self._workspace(),
+                    id,
+                    req,
+                )
+                .id
+            )
     def get_query_status(
         self,
         id: str,

{dasl_client-1.0.25 → dasl_client-1.0.27}/dasl_client/conn/conn.py RENAMED Viewed

@@ -19,7 +19,9 @@ def get_base_conn(enable_retries: bool = True, host: Optional[str] = None) -> Ap
     :return: An API conn without any auth
     """
     if host is None:
-        host = os.getenv("DASL_API_URL", "https://api.prod.sl.antimatter.io")
+        host = os.getenv(
+            "DASL_API_URL", "https://api.sl.us-east-1.cloud.databricks.com"
+        )
     config = Configuration(host=host)
     if enable_retries:
         # configure retries with backup for all HTTP verbs; we do not limit this to only

{dasl_client-1.0.25 → dasl_client-1.0.27}/dasl_client/helpers.py RENAMED Viewed

@@ -3,7 +3,7 @@ import os
 class Helpers:
-    default_region = "us-east-1"
+    default_region = "aws-us-east-1"
     @staticmethod
     def ensure_databricks():

{dasl_client-1.0.25 → dasl_client-1.0.27}/dasl_client/preset_development/errors.py RENAMED Viewed

@@ -9,6 +9,26 @@ class PresetError(Exception):
     pass
+class StageExecutionException(PresetError):
+    def __init__(
+        self,
+        medallion_layer="unknown",
+        exception_map: Dict[str, List[str]] = {},
+        verbose: bool = False,
+    ):
+        self.exception_map = exception_map
+        message = (
+            f"Field specification errors encountered in {medallion_layer} stage.\n\n"
+        )
+        for table, exceptions in exception_map.items():
+            message += f"Table: {table}\n"
+            count = 1
+            for exception in exceptions:
+                message += f"Exception {count}:\n{exception.split('JVM')[0] if not verbose else exception}\n\n"
+                count += 1
+        super().__init__(message)
 class InvalidGoldTableSchemaError(PresetError):
     def __init__(self, schema: str, additional_message: str = ""):
         self.schema = schema

{dasl_client-1.0.25 → dasl_client-1.0.27}/dasl_client/preset_development/preview_engine.py RENAMED Viewed

@@ -49,6 +49,7 @@ class PreviewEngine:
         """
         self._spark = spark
         self._ds_params = ds_params
+        self.__stage_exception = {}
         self._preset = yaml.safe_load(preset_yaml_str)
         self._pretransform_name = ds_params._pretransform_name
@@ -129,7 +130,7 @@ class PreviewEngine:
         if missing_keys:
             raise MissingSilverKeysError(missing_keys)
-    def _compile_stages(self) -> None:
+    def _compile_stages(self, force_evaluation: bool = False) -> None:
         """
         Creates Stage objects, setting silver pretransform to None if not provided.
         """
@@ -160,15 +161,21 @@ class PreviewEngine:
                     break
         self._silver = [
-            Stage(self._spark, "silver transform", table)
+            Stage(
+                self._spark,
+                "silver transform",
+                table,
+                force_evaluation=force_evaluation,
+            )
             for table in self._preset.get("silver", {}).get("transform", [])
         ]
         self._gold = [
-            Stage(self._spark, "gold", table) for table in self._preset.get("gold", [])
+            Stage(self._spark, "gold", table, force_evaluation=force_evaluation)
+            for table in self._preset.get("gold", [])
         ]
     def _run(
-        self, df: DataFrame
+        self, df: DataFrame, verbose: bool = False
     ) -> Tuple[DataFrame, Dict[str, DataFrame], Dict[str, DataFrame]]:
         """
         Runs all stages, in medallion stage order. This allows prior stage outputs to feed
@@ -232,6 +239,14 @@ class PreviewEngine:
         for table in self._silver:
             silver_output_map[table._name] = table.run(df)
+        # Check for silver stage exceptions.
+        # NOTE: These exception lists only get populated if force_evaluation is enabled.
+        for table in self._silver:
+            if exceptions := table.get_exceptions():
+                self.__stage_exception[table._name] = exceptions
+        if self.__stage_exception:
+            raise StageExecutionException("silver", self.__stage_exception, verbose)
         gold_output_map = {}
         for table in self._gold:
             # We store as gold_name/silver_input to prevent clobbering on duplicate gold table use.
@@ -239,12 +254,92 @@ class PreviewEngine:
                 silver_output_map[table._input]
             )
+        # Check for gold stage exceptions.
+        # NOTE: These exception lists only get populated if force_evaluation is enabled.
+        for table in self._gold:
+            if exceptions := table.get_exceptions():
+                self.__stage_exception[table._name] = exceptions
+        if self.__stage_exception:
+            raise StageExecutionException("gold", self.__stage_exception, verbose)
         return (
             (df, silver_output_map, gold_output_map, pre_bronze_output)
             if self._pre_silver
             else (None, silver_output_map, gold_output_map, pre_bronze_output)
         )
+    def __get_sql_type(self, data_type) -> str:
+        """
+        Helper to convert Spark data type objects to SQL type strings.
+        """
+        if isinstance(data_type, StringType):
+            return "STRING"
+        elif isinstance(data_type, IntegerType):
+            return "INT"
+        elif isinstance(data_type, LongType):
+            return "BIGINT"
+        elif isinstance(data_type, FloatType):
+            return "FLOAT"
+        elif isinstance(data_type, DoubleType):
+            return "DOUBLE"
+        elif isinstance(data_type, BooleanType):
+            return "BOOLEAN"
+        elif isinstance(data_type, TimestampType):
+            return "TIMESTAMP"
+        elif isinstance(data_type, DateType):
+            return "DATE"
+        elif isinstance(data_type, ArrayType):
+            return f"ARRAY<{self.__get_sql_type(data_type.elementType)}>"
+        elif isinstance(data_type, MapType):
+            return f"MAP<{self.__get_sql_type(data_type.keyType)}, {self.__get_sql_type(data_type.valueType)}>"
+        elif isinstance(data_type, StructType):
+            fields = ", ".join(
+                [
+                    f"{field.name}: {self.__get_sql_type(field.dataType)}"
+                    for field in data_type.fields
+                ]
+            )
+            return f"STRUCT<{fields}>"
+        elif isinstance(data_type, VariantType):
+            return f"VARIANT"
+        else:
+            return f"UNKNOWN ({data_type})"
+    def __format_gold_column_merge_exception(
+        self,
+        columns: Dict[str, List[Exception]],
+        gold_df: DataFrame,
+        verbose: bool = False,
+    ):
+        """
+        Formatter for various exceptions that occur during the merge of gold tables.
+        """
+        missing_column_flag = False
+        for column, info in columns.items():
+            # RANT: it is annoying, but basically every exception comes back from the
+            # query analyzer as pyspark.errors.exceptions.connect.AnalysisException,
+            # so we are forced into this awkward string search.
+            str_e = str(info["exception"])
+            str_e = str_e.split("JVM")[0] if not verbose else str_e
+            if "LEGACY_ERROR_TEMP_DELTA_0007" in str_e:
+                print(
+                    f"-> Column \"{column}\" of type \"{self.__get_sql_type(info['type'])}\" does not exist in gold table \"{info['table']}\"."
+                )
+                missing_column_flag = True
+            elif "DELTA_FAILED_TO_MERGE_FIELDS" in str_e:
+                print(
+                    f"-> Column \"{column}\" of type \"{self.__get_sql_type(info['type'])}\" is not compatiable with gold table \"{info['table']}\"'s \"{column}\" of type \"{self.__get_sql_type(gold_df.schema[column].dataType)}\""
+                )
+            else:
+                print(
+                    f"-> Column \"{column}\" raised the following unformatted exception when appending to gold table \"{info['table']}\":\n{str_e}"
+                )
+        if missing_column_flag:
+            print(
+                f"\nA write to 1 or more non-existent columns occured - available columns are: {', '.join(gold_df.columns)}"
+            )
     def _render_output(
         self,
         input_df: DataFrame,
@@ -253,6 +348,7 @@ class PreviewEngine:
         ],
         gold_table_catalog: str,
         gold_table_schema: str,
+        verbose: bool = False,
     ) -> None:
         """
         Displays formatted HTML output from executed Stages' DataFrames.
@@ -278,31 +374,6 @@ class PreviewEngine:
                 """
             )
-        def check_struct_compatibility(
-            target_field: StructField, df_field: StructField, prefix=""
-        ):
-            if not (
-                isinstance(target_field.dataType, StructType)
-                and isinstance(df_field.dataType, StructType)
-            ):
-                return
-            target_fields = {
-                field.name: field for field in target_field.dataType.fields
-            }
-            for field in df_field.dataType.fields:
-                if field.name not in target_fields:
-                    raise GoldTableCompatibilityError(
-                        f"Extra field found in gold stage output STRUCT column {prefix}{target_field.name}: {field.name}"
-                    )
-                else:
-                    if isinstance(field.dataType, StructType):
-                        check_struct_compatibility(
-                            target_fields[field.name],
-                            field,
-                            prefix=prefix + target_field.name + ".",
-                        )
         (pre_silver, silver, gold, pre_bronze) = stage_dataframes
         d("Autoloader Input", 1)
         display(input_df)
@@ -343,17 +414,33 @@ class PreviewEngine:
             self._ds_params.add_gold_schema_table(full_name)
             # Perform the type checks by trying to insert data into the table
-            try:
-                df.write.mode("append").save(
-                    f"{self._ds_params.get_autoloader_temp_schema_location()}/{full_name}"
-                )
-            except Exception as e:
-                raise GoldTableCompatibilityError(
-                    f"Preset gold table '{full_name}' did not match the gold schema for {fqn_gold_table_name}: {repr(e)}"
-                )
-            d("Resultant gold table preview", 3)
-            display(df)
+            df_columns = df.columns
+            df_single_columns = {}
+            df_append_exceptions = {}
+            for column in df_columns:
+                df_single_columns[column] = df.select(column)
+            for column, df_single_column in df_single_columns.items():
+                try:
+                    df_single_column.write.mode("append").save(
+                        f"{self._ds_params.get_autoloader_temp_schema_location()}/{full_name}"
+                    )
+                except Exception as e:
+                    df_append_exceptions[column] = {
+                        "type": df_single_column.schema[column].dataType,
+                        "exception": e,
+                        "table": name,
+                    }
+            self.__format_gold_column_merge_exception(
+                df_append_exceptions, delta_df, verbose
+            )
+            if not df_append_exceptions:
+                # alls good. display the output.
+                d("Resultant gold table preview", 3)
+                unioned_df = delta_df.unionByName(df, allowMissingColumns=True)
+                display(unioned_df)
     def is_backtick_escaped(self, name: str) -> bool:
         """
@@ -374,7 +461,13 @@ class PreviewEngine:
             return name
         return f"`{name}`"
-    def evaluate(self, gold_table_schema: str, display: bool = True) -> None:
+    def evaluate(
+        self,
+        gold_table_schema: str,
+        display: bool = True,
+        force_evaluation: bool = False,
+        verbose: bool = False,
+    ) -> None:
         """
         Evaluates the loaded preset YAML using the input datasource configuration to load
         records. Finally, checks that the output from the Gold stages is compatible with
@@ -429,16 +522,17 @@ class PreviewEngine:
                         schema_hints_file
                     )
-        self._compile_stages()
+        self._compile_stages(force_evaluation=force_evaluation)
         with self._ds_params as df:
-            self._result_df_map = self._run(df)
+            self._result_df_map = self._run(df, verbose)
             if display:
                 self._render_output(
                     df,
                     self._result_df_map,
                     self.force_apply_backticks(catalog_name),
                     self.force_apply_backticks(schema_name),
+                    verbose,
                 )
     def results(

dasl-client 1.0.25__tar.gz → 1.0.27__tar.gz

Potentially problematic release.

dasl-client 1.0.25tar.gz → 1.0.27tar.gz