PyPI - pointblank - Versions diffs - 0.8.4__py3-none-any.whl → 0.8.6__py3-none-any.whl - Mend

pointblank 0.8.4py3-none-any.whl → 0.8.6py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

pointblank/__init__.py +2 -0
pointblank/_constants.py +13 -0
pointblank/_constants_translations.py +216 -0
pointblank/_interrogation.py +182 -0
pointblank/_utils.py +2 -0
pointblank/column.py +352 -4
pointblank/data/api-docs.txt +270 -4
pointblank/validate.py +462 -5
pointblank-0.8.6.dist-info/METADATA +312 -0
{pointblank-0.8.4.dist-info → pointblank-0.8.6.dist-info}/RECORD +13 -13
pointblank-0.8.4.dist-info/METADATA +0 -269
{pointblank-0.8.4.dist-info → pointblank-0.8.6.dist-info}/WHEEL +0 -0
{pointblank-0.8.4.dist-info → pointblank-0.8.6.dist-info}/licenses/LICENSE +0 -0
{pointblank-0.8.4.dist-info → pointblank-0.8.6.dist-info}/top_level.txt +0 -0

pointblank/data/api-docs.txt CHANGED Viewed

@@ -4171,6 +4171,201 @@ col_count_match(self, count: 'int | FrameT | Any', inverse: 'bool' = False, pre:
         columns in the target table. So, the single test unit passed.
+conjointly(self, *exprs: 'Callable', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
+        Perform multiple row-wise validations for joint validity.
+        The `conjointly()` validation method checks whether each row in the table passes multiple
+        validation conditions simultaneously. This enables compound validation logic where a test
+        unit (typically a row) must satisfy all specified conditions to pass the validation.
+        This method accepts multiple validation expressions as callables, which should return
+        boolean expressions when applied to the data. You can use lambdas that incorporate
+        Polars/Pandas/Ibis expressions (based on the target table type) or create more complex
+        validation functions. The validation will operate over the number of test units that is
+        equal to the number of rows in the table (determined after any `pre=` mutation has been
+        applied).
+        Parameters
+        ----------
+        *exprs
+            Multiple validation expressions provided as callable functions. Each callable should
+            accept a table as its single argument and return a boolean expression or Series/Column
+            that evaluates to boolean values for each row.
+        pre
+            An optional preprocessing function or lambda to apply to the data table during
+            interrogation. This function should take a table as input and return a modified table.
+            Have a look at the *Preprocessing* section for more information on how to use this
+            argument.
+        thresholds
+            Set threshold failure levels for reporting and reacting to exceedences of the levels.
+            The thresholds are set at the step level and will override any global thresholds set in
+            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
+            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
+            section for information on how to set threshold levels.
+        actions
+            Optional actions to take when the validation step meets or exceeds any set threshold
+            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
+            define the actions.
+        brief
+            An optional brief description of the validation step that will be displayed in the
+            reporting table. You can use the templating elements like `"{step}"` to insert
+            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
+            the entire brief will be automatically generated. If `None` (the default) then there
+            won't be a brief.
+        active
+            A boolean value indicating whether the validation step should be active. Using `False`
+            will make the validation step inactive (still reporting its presence and keeping indexes
+            for the steps unchanged).
+        Returns
+        -------
+        Validate
+            The `Validate` object with the added validation step.
+        Preprocessing
+        -------------
+        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
+        table during interrogation. This function should take a table as input and return a modified
+        table. This is useful for performing any necessary transformations or filtering on the data
+        before the validation step is applied.
+        The preprocessing function can be any callable that takes a table as input and returns a
+        modified table. For example, you could use a lambda function to filter the table based on
+        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
+        transformed table, it only exists during the validation step and is not stored in the
+        `Validate` object or used in subsequent validation steps.
+        Thresholds
+        ----------
+        The `thresholds=` parameter is used to set the failure-condition levels for the validation
+        step. If they are set here at the step level, these thresholds will override any thresholds
+        set at the global level in `Validate(thresholds=...)`.
+        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
+        can either be set as a proportion failing of all test units (a value between `0` to `1`),
+        or, the absolute number of failing test units (as integer that's `1` or greater).
+        Thresholds can be defined using one of these input schemes:
+        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
+        thresholds)
+        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
+        the 'error' level, and position `2` is the 'critical' level
+        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
+        'critical'
+        4. a single integer/float value denoting absolute number or fraction of failing test units
+        for the 'warning' level only
+        If the number of failing test units exceeds set thresholds, the validation step will be
+        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
+        set, you're free to set any combination of them.
+        Aside from reporting failure conditions, thresholds can be used to determine the actions to
+        take for each level of failure (using the `actions=` parameter).
+        Examples
+        --------
+        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
+        `b`, and `c`). The table is shown below:
+        ```python
+        import pointblank as pb
+        import polars as pl
+        tbl = pl.DataFrame(
+            {
+                "a": [5, 7, 1, 3, 9, 4],
+                "b": [6, 3, 0, 5, 8, 2],
+                "c": [10, 4, 8, 9, 10, 5],
+            }
+        )
+        pb.preview(tbl)
+        ```
+        Let's validate that the values in each row satisfy multiple conditions simultaneously:
+        1. Column `a` should be greater than 2
+        2. Column `b` should be less than 7
+        3. The sum of `a` and `b` should be less than the value in column `c`
+        We'll use `conjointly()` to check all these conditions together:
+        ```python
+        validation = (
+            pb.Validate(data=tbl)
+            .conjointly(
+                lambda df: pl.col("a") > 2,
+                lambda df: pl.col("b") < 7,
+                lambda df: pl.col("a") + pl.col("b") < pl.col("c")
+            )
+            .interrogate()
+        )
+        validation
+        ```
+        The validation table shows that not all rows satisfy all three conditions together. For a
+        row to pass the conjoint validation, all three conditions must be true for that row.
+        We can also use preprocessing to filter the data before applying the conjoint validation:
+        ```python
+        validation = (
+            pb.Validate(data=tbl)
+            .conjointly(
+                lambda df: pl.col("a") > 2,
+                lambda df: pl.col("b") < 7,
+                lambda df: pl.col("a") + pl.col("b") < pl.col("c"),
+                pre=lambda df: df.filter(pl.col("c") > 5)
+            )
+            .interrogate()
+        )
+        validation
+        ```
+        This allows for more complex validation scenarios where the data is first prepared and then
+        validated against multiple conditions simultaneously.
+        Or, you can use the backend-agnostic column expression helper
+        [`expr_col()`](`pointblank.expr_col`) to write expressions that work across different table
+        backends:
+        ```python
+        tbl = pl.DataFrame(
+            {
+                "a": [5, 7, 1, 3, 9, 4],
+                "b": [6, 3, 0, 5, 8, 2],
+                "c": [10, 4, 8, 9, 10, 5],
+            }
+        )
+        # Using backend-agnostic syntax with expr_col()
+        validation = (
+            pb.Validate(data=tbl)
+            .conjointly(
+                lambda df: pb.expr_col("a") > 2,
+                lambda df: pb.expr_col("b") < 7,
+                lambda df: pb.expr_col("a") + pb.expr_col("b") < pb.expr_col("c")
+            )
+            .interrogate()
+        )
+        validation
+        ```
+        Using [`expr_col()`](`pointblank.expr_col`) allows your validation code to work consistently
+        across Pandas, Polars, and Ibis table backends without changes, making your validation
+        pipelines more portable.
+        See Also
+        --------
+        Look at the documentation of the [`expr_col()`](`pointblank.expr_col`) function for more
+        information on how to use it with different table backends.
 ## The Column Selection family
@@ -4195,18 +4390,20 @@ col(exprs: 'str | ColumnSelector | ColumnSelectorNarwhals') -> 'Column | ColumnL
     [`interrogate()`](`pointblank.Validate.interrogate`) is called), Pointblank will then check that
     the column exists in the input table.
+    For creating expressions to use with the `conjointly()` validation method, use the
+    [`expr_col()`](`pointblank.expr_col`) function instead.
     Parameters
     ----------
     exprs
         Either the name of a single column in the target table, provided as a string, or, an
         expression involving column selector functions (e.g., `starts_with("a")`,
-        `ends_with("e") | starts_with("a")`, etc.). Please read the documentation for further
-        details on which input forms are valid depending on the context.
+        `ends_with("e") | starts_with("a")`, etc.).
     Returns
     -------
-    Column
-        A `Column` object representing the column.
+    Column | ColumnLiteral | ColumnSelectorNarwhals:
+        A column object or expression representing the column reference.
     Usage with the `columns=` Argument
     -----------------------------------
@@ -4450,6 +4647,11 @@ col(exprs: 'str | ColumnSelector | ColumnSelectorNarwhals') -> 'Column | ColumnL
     [`matches()`](`pointblank.matches`) column selector functions from Narwhals, combined with the
     `&` operator. This is necessary to specify the set of columns that are numeric *and* match the
     text `"2023"` or `"2024"`.
+    See Also
+    --------
+    Create a column expression for use in `conjointly()` validation with the
+    [`expr_col()`](`pointblank.expr_col`) function.
 starts_with(text: 'str', case_sensitive: 'bool' = False) -> 'StartsWith'
@@ -5474,6 +5676,69 @@ last_n(n: 'int', offset: 'int' = 0) -> 'LastN'
     `paid_2022`, and `paid_2024`.
+expr_col(column_name: 'str') -> 'ColumnExpression'
+    Create a column expression for use in `conjointly()` validation.
+    This function returns a ColumnExpression object that supports operations like `>`, `<`, `+`,
+    etc. for use in [`conjointly()`](`pointblank.Validate.conjointly`) validation expressions.
+    Parameters
+    ----------
+    column_name
+        The name of the column to reference.
+    Returns
+    -------
+    ColumnExpression
+        A column expression that can be used in comparisons and operations.
+    Examples
+    --------
+    Let's say we have a table with three columns: `a`, `b`, and `c`. We want to validate that:
+    - The values in column `a` are greater than `2`.
+    - The values in column `b` are less than `7`.
+    - The sum of columns `a` and `b` is less than the values in column `c`.
+    We can use the `expr_col()` function to create a column expression for each of these conditions.
+    ```python
+    import pointblank as pb
+    import polars as pl
+    tbl = pl.DataFrame(
+        {
+            "a": [5, 7, 1, 3, 9, 4],
+            "b": [6, 3, 0, 5, 8, 2],
+            "c": [10, 4, 8, 9, 10, 5],
+        }
+    )
+    # Using expr_col() to create backend-agnostic validation expressions
+    validation = (
+        pb.Validate(data=tbl)
+        .conjointly(
+            lambda df: pb.expr_col("a") > 2,
+            lambda df: pb.expr_col("b") < 7,
+            lambda df: pb.expr_col("a") + pb.expr_col("b") < pb.expr_col("c")
+        )
+        .interrogate()
+    )
+    validation
+    ```
+    The above code creates a validation object that checks the specified conditions using the
+    `expr_col()` function. The resulting validation table will show whether each condition was
+    satisfied for each row in the table.
+    See Also
+    --------
+    The [`conjointly()`](`pointblank.Validate.conjointly`) validation method, which is where this
+    function should be used.
 ## The Interrogation and Reporting family
@@ -5916,6 +6181,7 @@ get_data_extracts(self, i: 'int | list[int] | None' = None, frame: 'bool' = Fals
         - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
         - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
         - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
+        - [`rows_distinct()`](`pointblank.Validate.rows_distinct`)
         An extracted row means that a test unit failed for that row in the validation step. The
         extracted rows are a subset of the original table and are useful for further analysis or for

pointblank 0.8.4__py3-none-any.whl → 0.8.6__py3-none-any.whl

pointblank 0.8.4py3-none-any.whl → 0.8.6py3-none-any.whl