PyPI - validmind - Versions diffs - 2.5.6__py3-none-any.whl → 2.5.15__py3-none-any.whl - Mend

validmind 2.5.6py3-none-any.whl → 2.5.15py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (212) hide show

validmind/tests/model_validation/sklearn/SilhouettePlot.py CHANGED Viewed

@@ -20,36 +20,44 @@ from validmind.vm_models import (
 @dataclass
 class SilhouettePlot(Metric):
     """
-    Calculates and visualizes Silhouette Score, assessing degree of data point suitability to its cluster in ML models.
-    **Purpose:** This test calculates the Silhouette Score, which is a model performance metric used in clustering
-    applications. Primarily, the Silhouette Score evaluates how similar an object (data point) is to its own cluster
-    compared to other clusters. The metric ranges between -1 and 1, where a high value indicates that the object is
-    well matched to its own cluster and poorly matched to neighboring clusters. Thus, the goal is to achieve a high
-    Silhouette Score, implying well-separated clusters.
-    **Test Mechanism:** The test first extracts the true and predicted labels from the model's training data. The test
-    runs the Silhouette Score function, which takes as input the training dataset features and the predicted labels,
-    subsequently calculating the average score. This average Silhouette Score is printed for reference. The script then
-    calculates the silhouette coefficients for each data point, helping to form the Silhouette Plot. Each cluster is
-    represented in this plot, with color distinguishing between different clusters. A red dashed line indicates the
-    average Silhouette Score. The Silhouette Scores are also collected into a structured table, facilitating model
-    performance analysis and comparison.
-    **Signs of High Risk:**
+    Calculates and visualizes Silhouette Score, assessing the degree of data point suitability to its cluster in ML
+    models.
+    ### Purpose
+    This test calculates the Silhouette Score, which is a model performance metric used in clustering applications.
+    Primarily, the Silhouette Score evaluates how similar a data point is to its own cluster compared to other
+    clusters. The metric ranges between -1 and 1, where a high value indicates that the object is well matched to its
+    own cluster and poorly matched to neighboring clusters. Thus, the goal is to achieve a high Silhouette Score,
+    implying well-separated clusters.
+    ### Test Mechanism
+    The test first extracts the true and predicted labels from the model's training data. The test runs the Silhouette
+    Score function, which takes as input the training dataset features and the predicted labels, subsequently
+    calculating the average score. This average Silhouette Score is printed for reference. The script then calculates
+    the silhouette coefficients for each data point, helping to form the Silhouette Plot. Each cluster is represented
+    in this plot, with color distinguishing between different clusters. A red dashed line indicates the average
+    Silhouette Score. The Silhouette Scores are also collected into a structured table, facilitating model performance
+    analysis and comparison.
+    ### Signs of High Risk
     - A low Silhouette Score, potentially indicating that the clusters are not well separated and that data points may
     not be fitting well to their respective clusters.
     - A Silhouette Plot displaying overlapping clusters or the absence of clear distinctions between clusters visually
     also suggests poor clustering performance.
-    **Strengths:**
+    ### Strengths
     - The Silhouette Score provides a clear and quantitative measure of how well data points have been grouped into
     clusters, offering insights into model performance.
     - The Silhouette Plot provides an intuitive, graphical representation of the clustering mechanism, aiding visual
     assessments of model performance.
     - It does not require ground truth labels, so it's useful when true cluster assignments are not known.
-    **Limitations:**
+    ### Limitations
     - The Silhouette Score may be susceptible to the influence of outliers, which could impact its accuracy and
     reliability.
     - It assumes the clusters are convex and isotropic, which might not be the case with complex datasets.

validmind/tests/model_validation/sklearn/TrainingTestDegradation.py CHANGED Viewed

@@ -32,33 +32,40 @@ class TrainingTestDegradation(ThresholdTest):
     """
     Tests if model performance degradation between training and test datasets exceeds a predefined threshold.
-    **Purpose**: The 'TrainingTestDegradation' class serves as a test to verify that the degradation in performance
-    between the training and test datasets does not exceed a predefined threshold. This test serves as a measure to
-    check the model's ability to generalize from its training data to unseen test data. It assesses key classification
-    metric scores such as accuracy, precision, recall and f1 score, to verify the model's robustness and reliability.
-    **Test Mechanism**: The code applies several predefined metrics including accuracy, precision, recall and f1 scores
-    to the model's predictions for both the training and test datasets. It calculates the degradation as the difference
-    between the training score and test score divided by the training score. The test is considered successful if the
-    degradation for each metric is less than the preset maximum threshold of 10%. The results are summarized in a table
-    showing each metric's train score, test score, degradation percentage, and pass/fail status.
-    **Signs of High Risk**:
+    ### Purpose
+    The `TrainingTestDegradation` class serves as a test to verify that the degradation in performance between the
+    training and test datasets does not exceed a predefined threshold. This test measures the model's ability to
+    generalize from its training data to unseen test data, assessing key classification metrics such as accuracy,
+    precision, recall, and f1 score to verify the model's robustness and reliability.
+    ### Test Mechanism
+    The code applies several predefined metrics, including accuracy, precision, recall, and f1 scores, to the model's
+    predictions for both the training and test datasets. It calculates the degradation as the difference between the
+    training score and test score divided by the training score. The test is considered successful if the degradation
+    for each metric is less than the preset maximum threshold of 10%. The results are summarized in a table showing
+    each metric's train score, test score, degradation percentage, and pass/fail status.
+    ### Signs of High Risk
     - A degradation percentage that exceeds the maximum allowed threshold of 10% for any of the evaluated metrics.
     - A high difference or gap between the metric scores on the training and the test datasets.
     - The 'Pass/Fail' column displaying 'Fail' for any of the evaluated metrics.
-    **Strengths**:
-    - This test provides a quantitative measure of the model's ability to generalize to unseen data, which is key for
-    predicting its practical real-world performance.
+    ### Strengths
+    - Provides a quantitative measure of the model's ability to generalize to unseen data, which is key for predicting
+    its practical real-world performance.
     - By evaluating multiple metrics, it takes into account different facets of model performance and enables a more
     holistic evaluation.
     - The use of a variable predefined threshold allows the flexibility to adjust the acceptability criteria for
     different scenarios.
-    **Limitations**:
-    - The test compares raw performance on training and test data, but does not factor in the nature of the data. Areas
-    with less representation in the training set, for instance, might still perform poorly on unseen data.
+    ### Limitations
+    - The test compares raw performance on training and test data but does not factor in the nature of the data. Areas
+    with less representation in the training set might still perform poorly on unseen data.
     - It requires good coverage and balance in the test and training datasets to produce reliable results, which may
     not always be available.
     - The test is currently only designed for classification tasks.

validmind/tests/model_validation/sklearn/VMeasure.py CHANGED Viewed

@@ -14,42 +14,43 @@ class VMeasure(ClusterPerformance):
     """
     Evaluates homogeneity and completeness of a clustering model using the V Measure Score.
-    **1. Purpose:**
+    ### Purpose
     The purpose of this metric, V Measure Score (V Score), is to evaluate the performance of a clustering model. It
     measures the homogeneity and completeness of a set of cluster labels, where homogeneity refers to each cluster
     containing only members of a single class and completeness meaning all members of a given class are assigned to the
     same cluster.
-    **2. Test Mechanism:**
-    ClusterVMeasure is a class that inherits from another class, ClusterPerformance. It uses the v_measure_score
+    ### Test Mechanism
+    ClusterVMeasure is a class that inherits from another class, ClusterPerformance. It uses the `v_measure_score`
     function from the sklearn module's metrics package. The required inputs to perform this metric are the model, train
     dataset, and test dataset. The test is appropriate for models tasked with clustering.
-    **3. Signs of High Risk:**
+    ### Signs of High Risk
     - Low V Measure Score: A low V Measure Score indicates that the clustering model has poor homogeneity or
     completeness, or both. This might signal that the model is failing to correctly cluster the data.
-    **4. Strengths:**
+    ### Strengths
     - The V Measure Score is a harmonic mean between homogeneity and completeness. This ensures that both attributes
     are taken into account when evaluating the model, providing an overall measure of its cluster validity.
     - The metric does not require knowledge of the ground truth classes when measuring homogeneity and completeness,
     making it applicable in instances where such information is unavailable.
-    **5. Limitations:**
-    - The V Score can be influenced by the number of clusters, which means that it might not always reflect the quality
-    of the clustering. Partitioning the data into many small clusters could lead to high homogeneity but low
-    completeness, leading to a low V Score even if the clustering might be useful.
+    ### Limitations
+    - The V Measure Score can be influenced by the number of clusters, which means that it might not always reflect the
+    quality of the clustering. Partitioning the data into many small clusters could lead to high homogeneity but low
+    completeness, leading to a low V Measure Score even if the clustering might be useful.
     - It assumes equal importance of homogeneity and completeness. In some applications, one may be more important than
-    the other. The V Score does not provide flexibility in assigning different weights to homogeneity and completeness.
+    the other. The V Measure Score does not provide flexibility in assigning different weights to homogeneity and
+    completeness.
     """
     name = "v_measure_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind/tests/model_validation/sklearn/WeakspotsDiagnosis.py CHANGED Viewed

@@ -27,21 +27,23 @@ class WeakspotsDiagnosis(ThresholdTest):
     Identifies and visualizes weak spots in a machine learning model's performance across various sections of the
     feature space.
-    **Purpose:**
+    ### Purpose
     The weak spots test is applied to evaluate the performance of a machine learning model within specific regions of
     its feature space. This test slices the feature space into various sections, evaluating the model's outputs within
     each section against specific performance metrics (e.g., accuracy, precision, recall, and F1 scores). The ultimate
     aim is to identify areas where the model's performance falls below the set thresholds, thereby exposing its
     possible weaknesses and limitations.
-    **Test Mechanism:**
+    ### Test Mechanism
     The test mechanism adopts an approach of dividing the feature space of the training dataset into numerous bins. The
     model's performance metrics (accuracy, precision, recall, F1 scores) are then computed for each bin on both the
     training and test datasets. A "weak spot" is identified if any of the performance metrics fall below a
     predetermined threshold for a particular bin on the test dataset. The test results are visually plotted as bar
     charts for each performance metric, indicating the bins which fail to meet the established threshold.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Any performance metric of the model dropping below the set thresholds.
     - Significant disparity in performance between the training and test datasets within a bin could be an indication
@@ -49,7 +51,7 @@ class WeakspotsDiagnosis(ThresholdTest):
     - Regions or slices with consistently low performance metrics. Such instances could mean that the model struggles
     to handle specific types of input data adequately, resulting in potentially inaccurate predictions.
-    **Strengths:**
+    ### Strengths
     - The test helps pinpoint precise regions of the feature space where the model's performance is below par, allowing
     for more targeted improvements to the model.
@@ -58,7 +60,7 @@ class WeakspotsDiagnosis(ThresholdTest):
     - The test exhibits flexibility, letting users set different thresholds for various performance metrics according
     to the specific requirements of the application.
-    **Limitations:**
+    ### Limitations
     - The binning system utilized for the feature space in the test could over-simplify the model's behavior within
     each bin. The granularity of this slicing depends on the chosen 'bins' parameter and can sometimes be arbitrary.

validmind/tests/model_validation/statsmodels/AutoARIMA.py CHANGED Viewed

@@ -15,13 +15,16 @@ class AutoARIMA(Metric):
     """
     Evaluates ARIMA models for time-series forecasting, ranking them using Bayesian and Akaike Information Criteria.
-    **Purpose**: The AutoARIMA validation test is designed to evaluate and rank AutoRegressive Integrated Moving
-    Average (ARIMA) models. These models are primarily used for forecasting time-series data. The validation test
-    automatically fits multiple ARIMA models, with varying parameters, to every variable within the given dataset. The
-    models are then ranked based on their Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC)
-    values, which provide a basis for the efficient model selection process.
+    ### Purpose
+    The AutoARIMA validation test is designed to evaluate and rank AutoRegressive Integrated Moving Average (ARIMA)
+    models. These models are primarily used for forecasting time-series data. The validation test automatically fits
+    multiple ARIMA models, with varying parameters, to every variable within the given dataset. The models are then
+    ranked based on their Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) values, which
+    provide a basis for the efficient model selection process.
+    ### Test Mechanism
-    **Test Mechanism**:
     This metric proceeds by generating an array of feasible combinations of ARIMA model parameters which are within a
     prescribed limit. These limits include `max_p`, `max_d`, `max_q`; they represent the autoregressive, differencing,
     and moving average components respectively. Upon applying these sets of parameters, the validation test fits each
@@ -31,28 +34,31 @@ class AutoARIMA(Metric):
     found to be non-stationary, a warning message is sent out, given that ARIMA models necessitate input series to be
     stationary.
-    **Signs of High Risk**:
-    * If the p-value of the Augmented Dickey-Fuller test for a variable exceeds 0.05, a warning is logged. This warning
+    ### Signs of High Risk
+    - If the p-value of the Augmented Dickey-Fuller test for a variable exceeds 0.05, a warning is logged. This warning
     indicates that the series might not be stationary, leading to potentially inaccurate results.
-    * Consistent failure in fitting ARIMA models (as made evident through logged errors) might disclose issues with
+    - Consistent failure in fitting ARIMA models (as made evident through logged errors) might disclose issues with
     either the data or model stability.
-    **Strengths**:
-    * The AutoARIMA validation test simplifies the often complex task of selecting the most suitable ARIMA model based
+    ### Strengths
+    - The AutoARIMA validation test simplifies the often complex task of selecting the most suitable ARIMA model based
     on BIC and AIC criteria.
-    * The mechanism incorporates a check for non-stationarity within the data, which is a critical prerequisite for
+    - The mechanism incorporates a check for non-stationarity within the data, which is a critical prerequisite for
     ARIMA models.
-    * The exhaustive search through all possible combinations of model parameters enhances the likelihood of
+    - The exhaustive search through all possible combinations of model parameters enhances the likelihood of
     identifying the best-fit model.
-    **Limitations**:
-    * This validation test can be computationally costly as it involves creating and fitting multiple ARIMA models for
+    ### Limitations
+    - This validation test can be computationally costly as it involves creating and fitting multiple ARIMA models for
     every variable.
-    * Although the test checks for non-stationarity and logs warnings where present, it does not apply any
+    - Although the test checks for non-stationarity and logs warnings where present, it does not apply any
     transformations to the data to establish stationarity.
-    * The selection of models leans solely on BIC and AIC criteria, which may not yield the best predictive model in
+    - The selection of models leans solely on BIC and AIC criteria, which may not yield the best predictive model in
     all scenarios.
-    * The test is only applicable to regression tasks involving time-series data, and may not work effectively for
+    - The test is only applicable to regression tasks involving time-series data, and may not work effectively for
     other types of machine learning tasks.
     """

validmind/tests/model_validation/statsmodels/BoxPierce.py CHANGED Viewed

@@ -11,31 +11,35 @@ class BoxPierce(Metric):
     """
     Detects autocorrelation in time-series data through the Box-Pierce test to validate model performance.
-    **Purpose:** The Box-Pierce test is utilized to detect the presence of autocorrelation in a time-series dataset.
+    ### Purpose
+    The Box-Pierce test is utilized to detect the presence of autocorrelation in a time-series dataset.
     Autocorrelation, or serial correlation, refers to the degree of similarity between observations based on the
     temporal spacing between them. This test is essential for affirming the quality of a time-series model by ensuring
     that the error terms in the model are random and do not adhere to a specific pattern.
-    **Test Mechanism:** The implementation of the Box-Pierce test involves calculating a test statistic along with a
-    corresponding p-value derived from the dataset features. These quantities are used to test the null hypothesis that
-    posits the data to be independently distributed. This is achieved by iterating over every feature column in the
-    time-series data and applying the `acorr_ljungbox` function of the statsmodels library. The function yields the
-    Box-Pierce test statistic as well as the respective p-value, all of which are cached as test results.
+    ### Test Mechanism
+    The implementation of the Box-Pierce test involves calculating a test statistic along with a corresponding p-value
+    derived from the dataset features. These quantities are used to test the null hypothesis that posits the data to be
+    independently distributed. This is achieved by iterating over every feature column in the time-series data and
+    applying the `acorr_ljungbox` function of the statsmodels library. The function yields the Box-Pierce test
+    statistic as well as the respective p-value, all of which are cached as test results.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - A low p-value, typically under 0.05 as per statistical convention, throws the null hypothesis of independence
     into question. This implies that the dataset potentially houses autocorrelations, thus indicating a high-risk
     scenario concerning model performance.
     - Large Box-Pierce test statistic values may indicate the presence of autocorrelation.
-    **Strengths:**
+    ### Strengths
     - Detects patterns in data that are supposed to be random, thereby ensuring no underlying autocorrelation.
     - Can be computed efficiently given its low computational complexity.
     - Can be widely applied to most regression problems, making it very versatile.
-    **Limitations:**
+    ### Limitations
     - Assumes homoscedasticity (constant variance) and normality of residuals, which may not always be the case in
     real-world datasets.
@@ -43,7 +47,7 @@ class BoxPierce(Metric):
     correlations.
     - It only provides a general indication of the existence of autocorrelation, without providing specific insights
     into the nature or patterns of the detected autocorrelation.
-    - In the presence of exhibits trends or seasonal patterns, the Box-Pierce test may yield misleading results.
+    - In the presence of trends or seasonal patterns, the Box-Pierce test may yield misleading results.
     - Applicability is limited to time-series data, which limits its overall utility.
     """

validmind/tests/model_validation/statsmodels/CumulativePredictionProbabilities.py CHANGED Viewed

@@ -2,138 +2,107 @@
 # See the LICENSE file in the root of this repository for details.
 # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
-from dataclasses import dataclass
 import numpy as np
 import plotly.graph_objects as go
 from matplotlib import cm
-from validmind.vm_models import Figure, Metric
+from validmind import tags, tasks
-@dataclass
-class CumulativePredictionProbabilities(Metric):
+@tags("visualization", "credit_risk", "logistic_regression")
+@tasks("classification")
+def CumulativePredictionProbabilities(dataset, model, title="Cumulative Probabilities"):
     """
     Visualizes cumulative probabilities of positive and negative classes for both training and testing in logistic
     regression models.
-    **Purpose**: This metric is utilized to evaluate the distribution of predicted probabilities for positive and
-    negative classes in a logistic regression model. It's not solely intended to measure the model's performance but
-    also provides a visual assessment of the model's behavior by plotting the cumulative probabilities for positive and
-    negative classes across both the training and test datasets.
+    ### Purpose
+    This metric is utilized to evaluate the distribution of predicted probabilities for positive and negative classes
+    in a logistic regression model. It provides a visual assessment of the model's behavior by plotting the cumulative
+    probabilities for positive and negative classes across both the training and test datasets.
+    ### Test Mechanism
+    The logistic regression model is evaluated by first computing the predicted probabilities for each instance in both
+    the training and test datasets, which are then added as a new column in these sets. The cumulative probabilities
+    for positive and negative classes are subsequently calculated and sorted in ascending order. Cumulative
+    distributions of these probabilities are created for both positive and negative classes across both training and
+    test datasets. These cumulative probabilities are represented visually in a plot, containing two subplots - one for
+    the training data and the other for the test data, with lines representing cumulative distributions of positive and
+    negative classes.
-    **Test Mechanism**: The logistic regression model is evaluated by first computing the predicted probabilities for
-    each instance in both the training and test datasets, which are then added as a new column in these sets. The
-    cumulative probabilities for positive and negative classes are subsequently calculated and sorted in ascending
-    order. Cumulative distributions of these probabilities are created for both positive and negative classes across
-    both training and test datasets. These cumulative probabilities are represented visually in a plot, containing two
-    subplots - one for the training data and the other for the test data, with lines representing cumulative
-    distributions of positive and negative classes.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - Imbalanced distribution of probabilities for either positive or negative classes.
     - Notable discrepancies or significant differences between the cumulative probability distributions for the
     training data versus the test data.
     - Marked discrepancies or large differences between the cumulative probability distributions for positive and
     negative classes.
-    **Strengths**:
-    - It offers not only numerical probabilities but also provides a visual illustration of data, which enhances the
-    ease of understanding and interpreting the model's behavior.
+    ### Strengths
+    - Provides a visual illustration of data, which enhances the ease of understanding and interpreting the model's
+    behavior.
     - Allows for the comparison of model's behavior across training and testing datasets, providing insights about how
     well the model is generalized.
-    - It differentiates between positive and negative classes and their respective distribution patterns, which can aid
-    in problem diagnosis.
+    - Differentiates between positive and negative classes and their respective distribution patterns, aiding in
+    problem diagnosis.
+    ### Limitations
-    **Limitations**:
     - Exclusive to classification tasks and specifically to logistic regression models.
     - Graphical results necessitate human interpretation and may not be directly applicable for automated risk
     detection.
-    - The method does not give a solitary quantifiable measure of model risk, rather it offers a visual representation
-    and broad distributional information.
+    - The method does not give a solitary quantifiable measure of model risk, instead, it offers a visual
+    representation and broad distributional information.
     - If the training and test datasets are not representative of the overall data distribution, the metric could
     provide misleading results.
     """
-    name = "cumulative_prediction_probabilities"
-    required_inputs = ["model", "datasets"]
-    tasks = ["classification"]
-    tags = ["logistic_regression", "visualization"]
-    default_params = {"title": "Cumulative Probabilities"}
-    @staticmethod
-    def plot_cumulative_prob(dataframes, dataset_titles, target_col, title):
-        figures = []
-        # Generate a colormap and convert to Plotly-accepted color format
-        # Adjust 'viridis' to any other matplotlib colormap if desired
-        colormap = cm.get_cmap("viridis")
-        for _, (df, dataset_title) in enumerate(zip(dataframes, dataset_titles)):
-            fig = go.Figure()
-            # Get unique classes and assign colors
-            classes = sorted(df[target_col].unique())
-            colors = [
-                colormap(i / len(classes))[:3] for i in range(len(classes))
-            ]  # RGB
-            color_dict = {
-                cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
-                for cls, rgb in zip(classes, colors)
-            }
-            for class_value in sorted(df[target_col].unique()):
-                # Calculate cumulative distribution for the current class
-                sorted_probs = np.sort(
-                    df[df[target_col] == class_value]["probabilities"]
-                )
-                cumulative_probs = np.cumsum(sorted_probs) / np.sum(sorted_probs)
-                fig.add_trace(
-                    go.Scatter(
-                        x=sorted_probs,
-                        y=cumulative_probs,
-                        mode="lines",
-                        name=f"{dataset_title} {target_col} = {class_value}",
-                        line=dict(
-                            color=color_dict[class_value],
-                        ),
-                    )
-                )
-            fig.update_layout(
-                title_text=f"{title} - {dataset_title}",
-                xaxis_title="Probability",
-                yaxis_title="Cumulative Distribution",
-                legend_title=target_col,
-            )
-            figures.append(fig)
-        return figures
-    def run(self):
-        dataset_titles = [dataset.input_id for dataset in self.inputs.datasets]
-        target_column = self.inputs.datasets[0].target_column
-        title = self.params.get("title", self.default_params["title"])
-        dataframes = []
-        metric_value = {"cum_prob": {}}
-        for dataset in self.inputs.datasets:
-            df = dataset.df.copy()
-            y_prob = dataset.y_prob(self.inputs.model)
-            df["probabilities"] = y_prob
-            dataframes.append(df)
-            metric_value["cum_prob"][dataset.input_id] = list(df["probabilities"])
-        figures = self.plot_cumulative_prob(
-            dataframes, dataset_titles, target_column, title
-        )
+    df = dataset.df
+    df["probabilities"] = dataset.y_prob(model)
-        figures_list = [
-            Figure(
-                for_object=self,
-                key=f"cumulative_prob_{title.replace(' ', '_')}_{i+1}",
-                figure=fig,
+    fig = _plot_cumulative_prob(df, dataset.target_column, title)
+    return fig
+def _plot_cumulative_prob(df, target_col, title):
+    # Generate a colormap and convert to Plotly-accepted color format
+    # Adjust 'viridis' to any other matplotlib colormap if desired
+    colormap = cm.get_cmap("viridis")
+    fig = go.Figure()
+    # Get unique classes and assign colors
+    classes = sorted(df[target_col].unique())
+    colors = [colormap(i / len(classes))[:3] for i in range(len(classes))]  # RGB
+    color_dict = {
+        cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
+        for cls, rgb in zip(classes, colors)
+    }
+    for class_value in sorted(df[target_col].unique()):
+        # Calculate cumulative distribution for the current class
+        sorted_probs = np.sort(df[df[target_col] == class_value]["probabilities"])
+        cumulative_probs = np.cumsum(sorted_probs) / np.sum(sorted_probs)
+        fig.add_trace(
+            go.Scatter(
+                x=sorted_probs,
+                y=cumulative_probs,
+                mode="lines",
+                name=f"{target_col} = {class_value}",
+                line=dict(
+                    color=color_dict[class_value],
+                ),
             )
-            for i, fig in enumerate(figures)
-        ]
+        )
+        fig.update_layout(
+            title_text=f"{title}",
+            xaxis_title="Probability",
+            yaxis_title="Cumulative Distribution",
+        )
-        return self.cache_results(metric_value=metric_value, figures=figures_list)
+    return fig

validmind/tests/model_validation/statsmodels/DurbinWatsonTest.py CHANGED Viewed

@@ -14,32 +14,39 @@ class DurbinWatsonTest(Metric):
     """
     Assesses autocorrelation in time series data features using the Durbin-Watson statistic.
-    **Purpose**: The Durbin-Watson Test metric detects autocorrelation in time series data (where a set of data values
-    influences their predecessors). Autocorrelation is a crucial factor for regression tasks as these often assume the
+    ### Purpose
+    The Durbin-Watson Test metric detects autocorrelation in time series data (where a set of data values influences
+    their predecessors). Autocorrelation is a crucial factor for regression tasks as these often assume the
     independence of residuals. A model with significant autocorrelation may give unreliable predictions.
-    **Test Mechanism**: Utilizing the `durbin_watson` function in the `statsmodels` Python library, the Durbin-Watson
-    (DW) Test metric generates a statistical value for each feature of the training dataset. The function is looped
-    over all columns of the dataset, calculating and caching the DW value for each column for further analysis. A DW
-    metric value nearing 2 indicates no autocorrelation. Conversely, values approaching 0 suggest positive
-    autocorrelation, and those leaning towards 4 imply negative autocorrelation.
+    ### Test Mechanism
+    Utilizing the `durbin_watson` function in the `statsmodels` Python library, the Durbin-Watson (DW) Test metric
+    generates a statistical value for each feature of the training dataset. The function is looped over all columns of
+    the dataset, calculating and caching the DW value for each column for further analysis. A DW metric value nearing 2
+    indicates no autocorrelation. Conversely, values approaching 0 suggest positive autocorrelation, and those leaning
+    towards 4 imply negative autocorrelation.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - If a feature's DW value significantly deviates from 2, it could signal a high risk due to potential
     autocorrelation issues in the dataset.
-    - A value closer to '0' could imply positive autocorrelation, while a value nearer to '4' could point to negative
+    - A value closer to 0 could imply positive autocorrelation, while a value nearer to 4 could point to negative
     autocorrelation, both leading to potentially unreliable prediction models.
-    **Strengths**:
+    ### Strengths
     - The metric specializes in identifying autocorrelation in prediction model residuals.
     - Autocorrelation detection assists in diagnosing violation of various modeling technique assumptions, particularly
     in regression analysis and time-series data modeling.
-    **Limitations**:
+    ### Limitations
     - The Durbin-Watson Test mainly detects linear autocorrelation and could overlook other types of relationships.
     - The metric is highly sensitive to data points order. Shuffling the order could lead to notably different results.
     - The test only checks for first-order autocorrelation (between a variable and its immediate predecessor) and fails
-    to detect higher order autocorrelation.
+    to detect higher-order autocorrelation.
     """
     name = "durbin_watson"

validmind 2.5.6__py3-none-any.whl → 2.5.15__py3-none-any.whl

validmind 2.5.6py3-none-any.whl → 2.5.15py3-none-any.whl