pointblank 0.9.0__py3-none-any.whl → 0.9.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -708,8 +708,9 @@ FinalActions(*args)
708
708
  In this example, the `send_alert()` function is defined to check the validation summary for
709
709
  critical failures. If any are found, an alert message is printed to the console. The function is
710
710
  passed to the `FinalActions` class, which ensures it will be executed after all validation steps
711
- are complete. Note that we used the `get_validation_summary()` function to retrieve the summary
712
- of the validation results to help craft the alert message.
711
+ are complete. Note that we used the
712
+ [`get_validation_summary()`](`pointblank.get_validation_summary`) function to retrieve the
713
+ summary of the validation results to help craft the alert message.
713
714
 
714
715
  Multiple final actions can be provided in a sequence. They will be executed in the order they
715
716
  are specified after all validation steps have completed:
@@ -4367,6 +4368,192 @@ rows_distinct(self, columns_subset: 'str | list[str] | None' = None, pre: 'Calla
4367
4368
  others.
4368
4369
 
4369
4370
 
4371
+ rows_complete(self, columns_subset: 'str | list[str] | None' = None, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
4372
+
4373
+ Validate whether row data are complete by having no missing values.
4374
+
4375
+ The `rows_complete()` method checks whether rows in the table are complete. Completeness
4376
+ of a row means that there are no missing values within the row. This validation will operate
4377
+ over the number of test units that is equal to the number of rows in the table (determined
4378
+ after any `pre=` mutation has been applied). A subset of columns can be specified for the
4379
+ completeness check. If no subset is provided, all columns in the table will be used.
4380
+
4381
+ Parameters
4382
+ ----------
4383
+ columns_subset
4384
+ A single column or a list of columns to use as a subset for the completeness check. If
4385
+ `None` (the default), then all columns in the table will be used.
4386
+ pre
4387
+ An optional preprocessing function or lambda to apply to the data table during
4388
+ interrogation. This function should take a table as input and return a modified table.
4389
+ Have a look at the *Preprocessing* section for more information on how to use this
4390
+ argument.
4391
+ segments
4392
+ An optional directive on segmentation, which serves to split a validation step into
4393
+ multiple (one step per segment). Can be a single column name, a tuple that specifies a
4394
+ column name and its corresponding values to segment on, or a combination of both
4395
+ (provided as a list). Read the *Segmentation* section for usage information.
4396
+ thresholds
4397
+ Set threshold failure levels for reporting and reacting to exceedences of the levels.
4398
+ The thresholds are set at the step level and will override any global thresholds set in
4399
+ `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
4400
+ be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
4401
+ section for information on how to set threshold levels.
4402
+ actions
4403
+ Optional actions to take when the validation step meets or exceeds any set threshold
4404
+ levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
4405
+ define the actions.
4406
+ brief
4407
+ An optional brief description of the validation step that will be displayed in the
4408
+ reporting table. You can use the templating elements like `"{step}"` to insert
4409
+ the step number, or `"{auto}"` to include an automatically generated brief. If `True`
4410
+ the entire brief will be automatically generated. If `None` (the default) then there
4411
+ won't be a brief.
4412
+ active
4413
+ A boolean value indicating whether the validation step should be active. Using `False`
4414
+ will make the validation step inactive (still reporting its presence and keeping indexes
4415
+ for the steps unchanged).
4416
+
4417
+ Returns
4418
+ -------
4419
+ Validate
4420
+ The `Validate` object with the added validation step.
4421
+
4422
+ Preprocessing
4423
+ -------------
4424
+ The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
4425
+ table during interrogation. This function should take a table as input and return a modified
4426
+ table. This is useful for performing any necessary transformations or filtering on the data
4427
+ before the validation step is applied.
4428
+
4429
+ The preprocessing function can be any callable that takes a table as input and returns a
4430
+ modified table. For example, you could use a lambda function to filter the table based on
4431
+ certain criteria or to apply a transformation to the data. Note that you can refer to
4432
+ columns via `columns_subset=` that are expected to be present in the transformed table, but
4433
+ may not exist in the table before preprocessing. Regarding the lifetime of the transformed
4434
+ table, it only exists during the validation step and is not stored in the `Validate` object
4435
+ or used in subsequent validation steps.
4436
+
4437
+ Segmentation
4438
+ ------------
4439
+ The `segments=` argument allows for the segmentation of a validation step into multiple
4440
+ segments. This is useful for applying the same validation step to different subsets of the
4441
+ data. The segmentation can be done based on a single column or specific fields within a
4442
+ column.
4443
+
4444
+ Providing a single column name will result in a separate validation step for each unique
4445
+ value in that column. For example, if you have a column called `"region"` with values
4446
+ `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
4447
+ region.
4448
+
4449
+ Alternatively, you can provide a tuple that specifies a column name and its corresponding
4450
+ values to segment on. For example, if you have a column called `"date"` and you want to
4451
+ segment on only specific dates, you can provide a tuple like
4452
+ `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
4453
+ (i.e., no validation steps will be created for them).
4454
+
4455
+ A list with a combination of column names and tuples can be provided as well. This allows
4456
+ for more complex segmentation scenarios. The following inputs are all valid:
4457
+
4458
+ - `segments=["region", ("date", ["2023-01-01", "2023-01-02"])]`: segments on unique values
4459
+ in the `"region"` column and specific dates in the `"date"` column
4460
+ - `segments=["region", "date"]`: segments on unique values in the `"region"` and `"date"`
4461
+ columns
4462
+
4463
+ The segmentation is performed during interrogation, and the resulting validation steps will
4464
+ be numbered sequentially. Each segment will have its own validation step, and the results
4465
+ will be reported separately. This allows for a more granular analysis of the data and helps
4466
+ identify issues within specific segments.
4467
+
4468
+ Importantly, the segmentation process will be performed after any preprocessing of the data
4469
+ table. Because of this, one can conceivably use the `pre=` argument to generate a column
4470
+ that can be used for segmentation. For example, you could create a new column called
4471
+ `"segment"` through use of `pre=` and then use that column for segmentation.
4472
+
4473
+ Thresholds
4474
+ ----------
4475
+ The `thresholds=` parameter is used to set the failure-condition levels for the validation
4476
+ step. If they are set here at the step level, these thresholds will override any thresholds
4477
+ set at the global level in `Validate(thresholds=...)`.
4478
+
4479
+ There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
4480
+ can either be set as a proportion failing of all test units (a value between `0` to `1`),
4481
+ or, the absolute number of failing test units (as integer that's `1` or greater).
4482
+
4483
+ Thresholds can be defined using one of these input schemes:
4484
+
4485
+ 1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
4486
+ thresholds)
4487
+ 2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
4488
+ the 'error' level, and position `2` is the 'critical' level
4489
+ 3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
4490
+ 'critical'
4491
+ 4. a single integer/float value denoting absolute number or fraction of failing test units
4492
+ for the 'warning' level only
4493
+
4494
+ If the number of failing test units exceeds set thresholds, the validation step will be
4495
+ marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
4496
+ set, you're free to set any combination of them.
4497
+
4498
+ Aside from reporting failure conditions, thresholds can be used to determine the actions to
4499
+ take for each level of failure (using the `actions=` parameter).
4500
+
4501
+ Examples
4502
+ --------
4503
+ For the examples here, we'll use a simple Polars DataFrame with three string columns
4504
+ (`col_1`, `col_2`, and `col_3`). The table is shown below:
4505
+
4506
+ ```python
4507
+ import pointblank as pb
4508
+ import polars as pl
4509
+
4510
+ tbl = pl.DataFrame(
4511
+ {
4512
+ "col_1": ["a", None, "c", "d"],
4513
+ "col_2": ["a", "a", "c", None],
4514
+ "col_3": ["a", "a", "d", None],
4515
+ }
4516
+ )
4517
+
4518
+ pb.preview(tbl)
4519
+ ```
4520
+
4521
+ Let's validate that the rows in the table are complete with `rows_complete()`. We'll
4522
+ determine if this validation had any failing test units (there are four test units, one for
4523
+ each row). A failing test units means that a given row is not complete (i.e., has at least
4524
+ one missing value).
4525
+
4526
+ ```python
4527
+ validation = (
4528
+ pb.Validate(data=tbl)
4529
+ .rows_complete()
4530
+ .interrogate()
4531
+ )
4532
+
4533
+ validation
4534
+ ```
4535
+
4536
+ From this validation table we see that there are two failing test units. This is because
4537
+ two rows in the table have at least one missing value (the second row and the last row).
4538
+
4539
+ We can also use a subset of columns to determine completeness. Let's specify the subset
4540
+ using columns `col_2` and `col_3` for the next validation.
4541
+
4542
+ ```python
4543
+ validation = (
4544
+ pb.Validate(data=tbl)
4545
+ .rows_complete(columns_subset=["col_2", "col_3"])
4546
+ .interrogate()
4547
+ )
4548
+
4549
+ validation
4550
+ ```
4551
+
4552
+ The validation table reports a single failing test units. The last row contains missing
4553
+ values in both the `col_2` and `col_3` columns.
4554
+ others.
4555
+
4556
+
4370
4557
  col_schema_match(self, schema: 'Schema', complete: 'bool' = True, in_order: 'bool' = True, case_sensitive_colnames: 'bool' = True, case_sensitive_dtypes: 'bool' = True, full_match_dtypes: 'bool' = True, pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
4371
4558
 
4372
4559
  Do columns in the table (and their types) match a predefined schema?
@@ -4991,6 +5178,306 @@ conjointly(self, *exprs: 'Callable', pre: 'Callable | None' = None, thresholds:
4991
5178
  information on how to use it with different table backends.
4992
5179
 
4993
5180
 
5181
+ specially(self, expr: 'Callable', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'
5182
+
5183
+ Perform a specialized validation with customized logic.
5184
+
5185
+ The `specially()` validation method allows for the creation of specialized validation
5186
+ expressions that can be used to validate specific conditions or logic in the data. This
5187
+ method provides maximum flexibility by accepting a custom callable that encapsulates
5188
+ your validation logic.
5189
+
5190
+ The callable function can have one of two signatures:
5191
+
5192
+ - a function accepting a single parameter (the data table): `def validate(data): ...`
5193
+ - a function with no parameters: `def validate(): ...`
5194
+
5195
+ The second form is particularly useful for environment validations that don't need to
5196
+ inspect the data table.
5197
+
5198
+ The callable function must ultimately return one of:
5199
+
5200
+ 1. a single boolean value or boolean list
5201
+ 2. a table where the final column contains boolean values (column name is unimportant)
5202
+
5203
+ The validation will operate over the number of test units that is equal to the number of
5204
+ rows in the data table (if returning a table with boolean values). If returning a scalar
5205
+ boolean value, the validation will operate over a single test unit. For a return of a list
5206
+ of boolean values, the length of the list constitutes the number of test units.
5207
+
5208
+ Parameters
5209
+ ----------
5210
+ expr
5211
+ A callable function that defines the specialized validation logic. This function should:
5212
+ (1) accept the target data table as its single argument (though it may ignore it), or
5213
+ (2) take no parameters at all (for environment validations). The function must
5214
+ ultimately return boolean values representing validation results. Design your function
5215
+ to incorporate any custom parameters directly within the function itself using closure
5216
+ variables or default parameters.
5217
+ pre
5218
+ An optional preprocessing function or lambda to apply to the data table during
5219
+ interrogation. This function should take a table as input and return a modified table.
5220
+ Have a look at the *Preprocessing* section for more information on how to use this
5221
+ argument.
5222
+ thresholds
5223
+ Set threshold failure levels for reporting and reacting to exceedences of the levels.
5224
+ The thresholds are set at the step level and will override any global thresholds set in
5225
+ `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
5226
+ be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
5227
+ section for information on how to set threshold levels.
5228
+ actions
5229
+ Optional actions to take when the validation step meets or exceeds any set threshold
5230
+ levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
5231
+ define the actions.
5232
+ brief
5233
+ An optional brief description of the validation step that will be displayed in the
5234
+ reporting table. You can use the templating elements like `"{step}"` to insert
5235
+ the step number, or `"{auto}"` to include an automatically generated brief. If `True`
5236
+ the entire brief will be automatically generated. If `None` (the default) then there
5237
+ won't be a brief.
5238
+ active
5239
+ A boolean value indicating whether the validation step should be active. Using `False`
5240
+ will make the validation step inactive (still reporting its presence and keeping indexes
5241
+ for the steps unchanged).
5242
+
5243
+ Returns
5244
+ -------
5245
+ Validate
5246
+ The `Validate` object with the added validation step.
5247
+
5248
+ Preprocessing
5249
+ -------------
5250
+ The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
5251
+ table during interrogation. This function should take a table as input and return a modified
5252
+ table. This is useful for performing any necessary transformations or filtering on the data
5253
+ before the validation step is applied.
5254
+
5255
+ The preprocessing function can be any callable that takes a table as input and returns a
5256
+ modified table. For example, you could use a lambda function to filter the table based on
5257
+ certain criteria or to apply a transformation to the data. Regarding the lifetime of the
5258
+ transformed table, it only exists during the validation step and is not stored in the
5259
+ `Validate` object or used in subsequent validation steps.
5260
+
5261
+ Thresholds
5262
+ ----------
5263
+ The `thresholds=` parameter is used to set the failure-condition levels for the validation
5264
+ step. If they are set here at the step level, these thresholds will override any thresholds
5265
+ set at the global level in `Validate(thresholds=...)`.
5266
+
5267
+ There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
5268
+ can either be set as a proportion failing of all test units (a value between `0` to `1`),
5269
+ or, the absolute number of failing test units (as integer that's `1` or greater).
5270
+
5271
+ Thresholds can be defined using one of these input schemes:
5272
+
5273
+ 1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
5274
+ thresholds)
5275
+ 2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
5276
+ the 'error' level, and position `2` is the 'critical' level
5277
+ 3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
5278
+ 'critical'
5279
+ 4. a single integer/float value denoting absolute number or fraction of failing test units
5280
+ for the 'warning' level only
5281
+
5282
+ If the number of failing test units exceeds set thresholds, the validation step will be
5283
+ marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
5284
+ set, you're free to set any combination of them.
5285
+
5286
+ Aside from reporting failure conditions, thresholds can be used to determine the actions to
5287
+ take for each level of failure (using the `actions=` parameter).
5288
+
5289
+ Examples
5290
+ --------
5291
+ The `specially()` method offers maximum flexibility for validation, allowing you to create
5292
+ custom validation logic that fits your specific needs. The following examples demonstrate
5293
+ different patterns and use cases for this powerful validation approach.
5294
+
5295
+ ### Simple validation with direct table access
5296
+
5297
+ This example shows the most straightforward use case where we create a function that
5298
+ directly checks if the sum of two columns is positive.
5299
+
5300
+ ```python
5301
+ import pointblank as pb
5302
+ import polars as pl
5303
+
5304
+ simple_tbl = pl.DataFrame({
5305
+ "a": [5, 7, 1, 3, 9, 4],
5306
+ "b": [6, 3, 0, 5, 8, 2]
5307
+ })
5308
+
5309
+ # Simple function that validates directly on the table
5310
+ def validate_sum_positive(data):
5311
+ return data.select(pl.col("a") + pl.col("b") > 0)
5312
+
5313
+ (
5314
+ pb.Validate(data=simple_tbl)
5315
+ .specially(expr=validate_sum_positive)
5316
+ .interrogate()
5317
+ )
5318
+ ```
5319
+
5320
+ The function returns a Polars DataFrame with a single boolean column indicating whether
5321
+ the sum of columns `a` and `b` is positive for each row. Each row in the resulting DataFrame
5322
+ is a distinct test unit. This pattern works well for simple validations where you don't need
5323
+ configurable parameters.
5324
+
5325
+ ### Advanced validation with closure variables for parameters
5326
+
5327
+ When you need to make your validation configurable, you can use the function factory pattern
5328
+ (also known as closures) to create parameterized validations:
5329
+
5330
+ ```python
5331
+ # Create a parameterized validation function using closures
5332
+ def make_column_ratio_validator(col1, col2, min_ratio):
5333
+ def validate_column_ratio(data):
5334
+ return data.select((pl.col(col1) / pl.col(col2)) > min_ratio)
5335
+ return validate_column_ratio
5336
+
5337
+ (
5338
+ pb.Validate(data=simple_tbl)
5339
+ .specially(
5340
+ expr=make_column_ratio_validator(col1="a", col2="b", min_ratio=0.5)
5341
+ )
5342
+ .interrogate()
5343
+ )
5344
+ ```
5345
+
5346
+ This approach allows you to create reusable validation functions that can be configured with
5347
+ different parameters without modifying the function itself.
5348
+
5349
+ ### Validation function returning a list of booleans
5350
+
5351
+ This example demonstrates how to create a validation function that returns a list of boolean
5352
+ values, where each element represents a separate test unit:
5353
+
5354
+ ```python
5355
+ import pointblank as pb
5356
+ import polars as pl
5357
+ import random
5358
+
5359
+ # Create sample data
5360
+ transaction_tbl = pl.DataFrame({
5361
+ "transaction_id": [f"TX{i:04d}" for i in range(1, 11)],
5362
+ "amount": [120.50, 85.25, 50.00, 240.75, 35.20, 150.00, 85.25, 65.00, 210.75, 90.50],
5363
+ "category": ["food", "shopping", "entertainment", "travel", "utilities",
5364
+ "food", "shopping", "entertainment", "travel", "utilities"]
5365
+ })
5366
+
5367
+ # Define a validation function that returns a list of booleans
5368
+ def validate_transaction_rules(data):
5369
+ # Create a list to store individual test results
5370
+ test_results = []
5371
+
5372
+ # Check each row individually against multiple business rules
5373
+ for row in data.iter_rows(named=True):
5374
+ # Rule: transaction IDs must start with "TX" and be 6 chars long
5375
+ valid_id = row["transaction_id"].startswith("TX") and len(row["transaction_id"]) == 6
5376
+
5377
+ # Rule: Amounts must be appropriate for their category
5378
+ valid_amount = True
5379
+ if row["category"] == "food" and (row["amount"] < 10 or row["amount"] > 200):
5380
+ valid_amount = False
5381
+ elif row["category"] == "utilities" and (row["amount"] < 20 or row["amount"] > 300):
5382
+ valid_amount = False
5383
+ elif row["category"] == "entertainment" and row["amount"] > 100:
5384
+ valid_amount = False
5385
+
5386
+ # A transaction passes if it satisfies both rules
5387
+ test_results.append(valid_id and valid_amount)
5388
+
5389
+ return test_results
5390
+
5391
+ (
5392
+ pb.Validate(data=transaction_tbl)
5393
+ .specially(
5394
+ expr=validate_transaction_rules,
5395
+ brief="Validate transaction IDs and amounts by category."
5396
+ )
5397
+ .interrogate()
5398
+ )
5399
+ ```
5400
+
5401
+ This example shows how to create a validation function that applies multiple business rules
5402
+ to each row and returns a list of boolean results. Each boolean in the list represents a
5403
+ separate test unit, and a test unit passes only if all rules are satisfied for a given row.
5404
+
5405
+ The function iterates through each row in the data table, checking:
5406
+
5407
+ 1. if transaction IDs follow the required format
5408
+ 2. if transaction amounts are appropriate for their respective categories
5409
+
5410
+ This approach is powerful when you need to apply complex, conditional logic that can't be
5411
+ easily expressed using the built-in validation functions.
5412
+
5413
+ ### Table-level validation returning a single boolean
5414
+
5415
+ Sometimes you need to validate properties of the entire table rather than row-by-row. In
5416
+ these cases, your function can return a single boolean value:
5417
+
5418
+ ```python
5419
+ def validate_table_properties(data):
5420
+ # Check if table has at least one row with column 'a' > 10
5421
+ has_large_values = data.filter(pl.col("a") > 10).height > 0
5422
+
5423
+ # Check if mean of column 'b' is positive
5424
+ has_positive_mean = data.select(pl.mean("b")).item() > 0
5425
+
5426
+ # Return a single boolean for the entire table
5427
+ return has_large_values and has_positive_mean
5428
+
5429
+ (
5430
+ pb.Validate(data=simple_tbl)
5431
+ .specially(expr=validate_table_properties)
5432
+ .interrogate()
5433
+ )
5434
+ ```
5435
+
5436
+ This example demonstrates how to perform multiple checks on the table as a whole and combine
5437
+ them into a single validation result.
5438
+
5439
+ ### Environment validation that doesn't use the data table
5440
+
5441
+ The `specially()` validation method can even be used to validate aspects of your environment
5442
+ that are completely independent of the data:
5443
+
5444
+ ```python
5445
+ def validate_pointblank_version():
5446
+ try:
5447
+ import importlib.metadata
5448
+ version = importlib.metadata.version("pointblank")
5449
+ version_parts = version.split(".")
5450
+
5451
+ # Get major and minor components regardless of how many parts there are
5452
+ major = int(version_parts[0])
5453
+ minor = int(version_parts[1])
5454
+
5455
+ # Check both major and minor components for version `0.9+`
5456
+ return (major > 0) or (major == 0 and minor >= 9)
5457
+
5458
+ except Exception as e:
5459
+ # More specific error handling could be added here
5460
+ print(f"Version check failed: {e}")
5461
+ return False
5462
+
5463
+ (
5464
+ pb.Validate(data=simple_tbl)
5465
+ .specially(
5466
+ expr=validate_pointblank_version,
5467
+ brief="Check Pointblank version `>=0.9.0`."
5468
+ )
5469
+ .interrogate()
5470
+ )
5471
+ ```
5472
+
5473
+ This pattern shows how to validate external dependencies or environment conditions as part
5474
+ of your validation workflow. Notice that the function doesn't take any parameters at all,
5475
+ which makes it cleaner when the validation doesn't need to access the data table.
5476
+
5477
+ By combining these patterns, you can create sophisticated validation workflows that address
5478
+ virtually any data quality requirement in your organization.
5479
+
5480
+
4994
5481
 
4995
5482
  ## The Column Selection family
4996
5483
 
@@ -6614,6 +7101,7 @@ get_step_report(self, i: 'int', columns_subset: 'str | list[str] | Column | None
6614
7101
  - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
6615
7102
  - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
6616
7103
  - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
7104
+ - [`rows_complete()`](`pointblank.Validate.rows_complete`)
6617
7105
  - [`conjointly()`](`pointblank.Validate.conjointly`)
6618
7106
 
6619
7107
  The [`rows_distinct()`](`pointblank.Validate.rows_distinct`) validation step will produce a
@@ -6698,17 +7186,133 @@ get_json_report(self, use_fields: 'list[str] | None' = None, exclude_fields: 'li
6698
7186
 
6699
7187
  Get a report of the validation results as a JSON-formatted string.
6700
7188
 
7189
+ The `get_json_report()` method provides a machine-readable report of validation results in
7190
+ JSON format. This is particularly useful for programmatic processing, storing validation
7191
+ results, or integrating with other systems. The report includes detailed information about
7192
+ each validation step, such as assertion type, columns validated, threshold values, test
7193
+ results, and more.
7194
+
7195
+ By default, all available validation information fields are included in the report. However,
7196
+ you can customize the fields to include or exclude using the `use_fields=` and
7197
+ `exclude_fields=` parameters.
7198
+
6701
7199
  Parameters
6702
7200
  ----------
6703
7201
  use_fields
6704
- A list of fields to include in the report. If `None`, all fields are included.
7202
+ An optional list of specific fields to include in the report. If provided, only these
7203
+ fields will be included in the JSON output. If `None` (the default), all standard
7204
+ validation report fields are included. Have a look at the *Available Report Fields*
7205
+ section below for a list of fields that can be included in the report.
6705
7206
  exclude_fields
6706
- A list of fields to exclude from the report. If `None`, no fields are excluded.
7207
+ An optional list of fields to exclude from the report. If provided, these fields will
7208
+ be omitted from the JSON output. If `None` (the default), no fields are excluded.
7209
+ This parameter cannot be used together with `use_fields=`. The *Available Report Fields*
7210
+ provides a listing of fields that can be excluded from the report.
6707
7211
 
6708
7212
  Returns
6709
7213
  -------
6710
7214
  str
6711
- A JSON-formatted string representing the validation report.
7215
+ A JSON-formatted string representing the validation report, with each validation step
7216
+ as an object in the report array.
7217
+
7218
+ Available Report Fields
7219
+ -----------------------
7220
+ The JSON report can include any of the standard validation report fields, including:
7221
+
7222
+ - `i`: the step number (1-indexed)
7223
+ - `i_o`: the original step index from the validation plan (pre-expansion)
7224
+ - `assertion_type`: the type of validation assertion (e.g., `"col_vals_gt"`, etc.)
7225
+ - `column`: the column being validated (or columns used in certain validations)
7226
+ - `values`: the comparison values or parameters used in the validation
7227
+ - `inclusive`: whether the comparison is inclusive (for range-based validations)
7228
+ - `na_pass`: whether `NA`/`Null` values are considered passing (for certain validations)
7229
+ - `pre`: preprocessing function applied before validation
7230
+ - `segments`: data segments to which the validation was applied
7231
+ - `thresholds`: threshold level statement that was used for the validation step
7232
+ - `label`: custom label for the validation step
7233
+ - `brief`: a brief description of the validation step
7234
+ - `active`: whether the validation step is active
7235
+ - `all_passed`: whether all test units passed in the step
7236
+ - `n`: total number of test units
7237
+ - `n_passed`, `n_failed`: number of test units that passed and failed
7238
+ - `f_passed`, `f_failed`: Fraction of test units that passed and failed
7239
+ - `warning`, `error`, `critical`: whether the namesake threshold level was exceeded (is
7240
+ `null` if threshold not set)
7241
+ - `time_processed`: when the validation step was processed (ISO 8601 format)
7242
+ - `proc_duration_s`: the processing duration in seconds
7243
+
7244
+ Examples
7245
+ --------
7246
+ Let's create a validation plan with a few validation steps and generate a JSON report of the
7247
+ results:
7248
+
7249
+ ```python
7250
+ import pointblank as pb
7251
+ import polars as pl
7252
+
7253
+ # Create a sample DataFrame
7254
+ tbl = pl.DataFrame({
7255
+ "a": [5, 7, 8, 9],
7256
+ "b": [3, 4, 2, 1]
7257
+ })
7258
+
7259
+ # Create and execute a validation plan
7260
+ validation = (
7261
+ pb.Validate(data=tbl)
7262
+ .col_vals_gt(columns="a", value=6)
7263
+ .col_vals_lt(columns="b", value=4)
7264
+ .interrogate()
7265
+ )
7266
+
7267
+ # Get the full JSON report
7268
+ json_report = validation.get_json_report()
7269
+
7270
+ print(json_report)
7271
+ ```
7272
+
7273
+ You can also customize which fields to include:
7274
+
7275
+ ```python
7276
+ json_report = validation.get_json_report(
7277
+ use_fields=["i", "assertion_type", "column", "n_passed", "n_failed"]
7278
+ )
7279
+
7280
+ print(json_report)
7281
+ ```
7282
+
7283
+ Or which fields to exclude:
7284
+
7285
+ ```python
7286
+ json_report = validation.get_json_report(
7287
+ exclude_fields=[
7288
+ "i_o", "thresholds", "pre", "segments", "values",
7289
+ "na_pass", "inclusive", "label", "brief", "active",
7290
+ "time_processed", "proc_duration_s"
7291
+ ]
7292
+ )
7293
+
7294
+ print(json_report)
7295
+ ```
7296
+
7297
+ The JSON output can be further processed or analyzed programmatically:
7298
+
7299
+ ```python
7300
+ import json
7301
+
7302
+ # Parse the JSON report
7303
+ report_data = json.loads(validation.get_json_report())
7304
+
7305
+ # Extract and analyze validation results
7306
+ failing_steps = [step for step in report_data if step["n_failed"] > 0]
7307
+ print(f"Number of failing validation steps: {len(failing_steps)}")
7308
+ ```
7309
+
7310
+ See Also
7311
+ --------
7312
+ - [`get_tabular_report()`](`pointblank.Validate.get_tabular_report`): Get a formatted HTML
7313
+ report as a GT table
7314
+ - [`get_data_extracts()`](`pointblank.Validate.get_data_extracts`): Get rows that
7315
+ failed validation
6712
7316
 
6713
7317
 
6714
7318
  get_sundered_data(self, type='pass') -> 'FrameT'
@@ -8857,7 +9461,7 @@ send_slack_notification(webhook_url: 'str | None' = None, step_msg: 'str | None'
8857
9461
  validation
8858
9462
  ```
8859
9463
 
8860
- By placing the `notify_slack` function in the `Validate(actions=Actions(critical=))` argument,
9464
+ By placing the `notify_slack()` function in the `Validate(actions=Actions(critical=))` argument,
8861
9465
  you can ensure that the notification is sent whenever the 'critical' threshold is reached (as
8862
9466
  set here, when 15% or more of the test units fail). The notification will include information
8863
9467
  about the validation step that triggered the alert.
@@ -8887,7 +9491,7 @@ send_slack_notification(webhook_url: 'str | None' = None, step_msg: 'str | None'
8887
9491
  )
8888
9492
  ```
8889
9493
 
8890
- In this case, the same `notify_slack` function is used, but it is placed in
9494
+ In this case, the same `notify_slack()` function is used, but it is placed in
8891
9495
  `Validate(final_actions=FinalActions())`. This results in the summary notification being sent
8892
9496
  after all validation steps are completed, regardless of whether any steps failed or not.
8893
9497