PyPI - validmind - Versions diffs - 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl - Mend

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (233) hide show

validmind/tests/model_validation/sklearn/WeakspotsDiagnosis.py CHANGED Viewed

@@ -27,21 +27,23 @@ class WeakspotsDiagnosis(ThresholdTest):
     Identifies and visualizes weak spots in a machine learning model's performance across various sections of the
     feature space.
-    **Purpose:**
+    ### Purpose
     The weak spots test is applied to evaluate the performance of a machine learning model within specific regions of
     its feature space. This test slices the feature space into various sections, evaluating the model's outputs within
     each section against specific performance metrics (e.g., accuracy, precision, recall, and F1 scores). The ultimate
     aim is to identify areas where the model's performance falls below the set thresholds, thereby exposing its
     possible weaknesses and limitations.
-    **Test Mechanism:**
+    ### Test Mechanism
     The test mechanism adopts an approach of dividing the feature space of the training dataset into numerous bins. The
     model's performance metrics (accuracy, precision, recall, F1 scores) are then computed for each bin on both the
     training and test datasets. A "weak spot" is identified if any of the performance metrics fall below a
     predetermined threshold for a particular bin on the test dataset. The test results are visually plotted as bar
     charts for each performance metric, indicating the bins which fail to meet the established threshold.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Any performance metric of the model dropping below the set thresholds.
     - Significant disparity in performance between the training and test datasets within a bin could be an indication
@@ -49,7 +51,7 @@ class WeakspotsDiagnosis(ThresholdTest):
     - Regions or slices with consistently low performance metrics. Such instances could mean that the model struggles
     to handle specific types of input data adequately, resulting in potentially inaccurate predictions.
-    **Strengths:**
+    ### Strengths
     - The test helps pinpoint precise regions of the feature space where the model's performance is below par, allowing
     for more targeted improvements to the model.
@@ -58,7 +60,7 @@ class WeakspotsDiagnosis(ThresholdTest):
     - The test exhibits flexibility, letting users set different thresholds for various performance metrics according
     to the specific requirements of the application.
-    **Limitations:**
+    ### Limitations
     - The binning system utilized for the feature space in the test could over-simplify the model's behavior within
     each bin. The granularity of this slicing depends on the chosen 'bins' parameter and can sometimes be arbitrary.

validmind/tests/model_validation/statsmodels/AutoARIMA.py CHANGED Viewed

@@ -15,13 +15,16 @@ class AutoARIMA(Metric):
     """
     Evaluates ARIMA models for time-series forecasting, ranking them using Bayesian and Akaike Information Criteria.
-    **Purpose**: The AutoARIMA validation test is designed to evaluate and rank AutoRegressive Integrated Moving
-    Average (ARIMA) models. These models are primarily used for forecasting time-series data. The validation test
-    automatically fits multiple ARIMA models, with varying parameters, to every variable within the given dataset. The
-    models are then ranked based on their Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC)
-    values, which provide a basis for the efficient model selection process.
+    ### Purpose
+    The AutoARIMA validation test is designed to evaluate and rank AutoRegressive Integrated Moving Average (ARIMA)
+    models. These models are primarily used for forecasting time-series data. The validation test automatically fits
+    multiple ARIMA models, with varying parameters, to every variable within the given dataset. The models are then
+    ranked based on their Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) values, which
+    provide a basis for the efficient model selection process.
+    ### Test Mechanism
-    **Test Mechanism**:
     This metric proceeds by generating an array of feasible combinations of ARIMA model parameters which are within a
     prescribed limit. These limits include `max_p`, `max_d`, `max_q`; they represent the autoregressive, differencing,
     and moving average components respectively. Upon applying these sets of parameters, the validation test fits each
@@ -31,28 +34,31 @@ class AutoARIMA(Metric):
     found to be non-stationary, a warning message is sent out, given that ARIMA models necessitate input series to be
     stationary.
-    **Signs of High Risk**:
-    * If the p-value of the Augmented Dickey-Fuller test for a variable exceeds 0.05, a warning is logged. This warning
+    ### Signs of High Risk
+    - If the p-value of the Augmented Dickey-Fuller test for a variable exceeds 0.05, a warning is logged. This warning
     indicates that the series might not be stationary, leading to potentially inaccurate results.
-    * Consistent failure in fitting ARIMA models (as made evident through logged errors) might disclose issues with
+    - Consistent failure in fitting ARIMA models (as made evident through logged errors) might disclose issues with
     either the data or model stability.
-    **Strengths**:
-    * The AutoARIMA validation test simplifies the often complex task of selecting the most suitable ARIMA model based
+    ### Strengths
+    - The AutoARIMA validation test simplifies the often complex task of selecting the most suitable ARIMA model based
     on BIC and AIC criteria.
-    * The mechanism incorporates a check for non-stationarity within the data, which is a critical prerequisite for
+    - The mechanism incorporates a check for non-stationarity within the data, which is a critical prerequisite for
     ARIMA models.
-    * The exhaustive search through all possible combinations of model parameters enhances the likelihood of
+    - The exhaustive search through all possible combinations of model parameters enhances the likelihood of
     identifying the best-fit model.
-    **Limitations**:
-    * This validation test can be computationally costly as it involves creating and fitting multiple ARIMA models for
+    ### Limitations
+    - This validation test can be computationally costly as it involves creating and fitting multiple ARIMA models for
     every variable.
-    * Although the test checks for non-stationarity and logs warnings where present, it does not apply any
+    - Although the test checks for non-stationarity and logs warnings where present, it does not apply any
     transformations to the data to establish stationarity.
-    * The selection of models leans solely on BIC and AIC criteria, which may not yield the best predictive model in
+    - The selection of models leans solely on BIC and AIC criteria, which may not yield the best predictive model in
     all scenarios.
-    * The test is only applicable to regression tasks involving time-series data, and may not work effectively for
+    - The test is only applicable to regression tasks involving time-series data, and may not work effectively for
     other types of machine learning tasks.
     """

validmind/tests/model_validation/statsmodels/CumulativePredictionProbabilities.py CHANGED Viewed

@@ -2,138 +2,107 @@
 # See the LICENSE file in the root of this repository for details.
 # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
-from dataclasses import dataclass
 import numpy as np
 import plotly.graph_objects as go
 from matplotlib import cm
-from validmind.vm_models import Figure, Metric
+from validmind import tags, tasks
-@dataclass
-class CumulativePredictionProbabilities(Metric):
+@tags("visualization", "credit_risk", "logistic_regression")
+@tasks("classification")
+def CumulativePredictionProbabilities(dataset, model, title="Cumulative Probabilities"):
     """
     Visualizes cumulative probabilities of positive and negative classes for both training and testing in logistic
     regression models.
-    **Purpose**: This metric is utilized to evaluate the distribution of predicted probabilities for positive and
-    negative classes in a logistic regression model. It's not solely intended to measure the model's performance but
-    also provides a visual assessment of the model's behavior by plotting the cumulative probabilities for positive and
-    negative classes across both the training and test datasets.
+    ### Purpose
+    This metric is utilized to evaluate the distribution of predicted probabilities for positive and negative classes
+    in a logistic regression model. It provides a visual assessment of the model's behavior by plotting the cumulative
+    probabilities for positive and negative classes across both the training and test datasets.
+    ### Test Mechanism
+    The logistic regression model is evaluated by first computing the predicted probabilities for each instance in both
+    the training and test datasets, which are then added as a new column in these sets. The cumulative probabilities
+    for positive and negative classes are subsequently calculated and sorted in ascending order. Cumulative
+    distributions of these probabilities are created for both positive and negative classes across both training and
+    test datasets. These cumulative probabilities are represented visually in a plot, containing two subplots - one for
+    the training data and the other for the test data, with lines representing cumulative distributions of positive and
+    negative classes.
-    **Test Mechanism**: The logistic regression model is evaluated by first computing the predicted probabilities for
-    each instance in both the training and test datasets, which are then added as a new column in these sets. The
-    cumulative probabilities for positive and negative classes are subsequently calculated and sorted in ascending
-    order. Cumulative distributions of these probabilities are created for both positive and negative classes across
-    both training and test datasets. These cumulative probabilities are represented visually in a plot, containing two
-    subplots - one for the training data and the other for the test data, with lines representing cumulative
-    distributions of positive and negative classes.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - Imbalanced distribution of probabilities for either positive or negative classes.
     - Notable discrepancies or significant differences between the cumulative probability distributions for the
     training data versus the test data.
     - Marked discrepancies or large differences between the cumulative probability distributions for positive and
     negative classes.
-    **Strengths**:
-    - It offers not only numerical probabilities but also provides a visual illustration of data, which enhances the
-    ease of understanding and interpreting the model's behavior.
+    ### Strengths
+    - Provides a visual illustration of data, which enhances the ease of understanding and interpreting the model's
+    behavior.
     - Allows for the comparison of model's behavior across training and testing datasets, providing insights about how
     well the model is generalized.
-    - It differentiates between positive and negative classes and their respective distribution patterns, which can aid
-    in problem diagnosis.
+    - Differentiates between positive and negative classes and their respective distribution patterns, aiding in
+    problem diagnosis.
+    ### Limitations
-    **Limitations**:
     - Exclusive to classification tasks and specifically to logistic regression models.
     - Graphical results necessitate human interpretation and may not be directly applicable for automated risk
     detection.
-    - The method does not give a solitary quantifiable measure of model risk, rather it offers a visual representation
-    and broad distributional information.
+    - The method does not give a solitary quantifiable measure of model risk, instead, it offers a visual
+    representation and broad distributional information.
     - If the training and test datasets are not representative of the overall data distribution, the metric could
     provide misleading results.
     """
-    name = "cumulative_prediction_probabilities"
-    required_inputs = ["model", "datasets"]
-    tasks = ["classification"]
-    tags = ["logistic_regression", "visualization"]
-    default_params = {"title": "Cumulative Probabilities"}
-    @staticmethod
-    def plot_cumulative_prob(dataframes, dataset_titles, target_col, title):
-        figures = []
-        # Generate a colormap and convert to Plotly-accepted color format
-        # Adjust 'viridis' to any other matplotlib colormap if desired
-        colormap = cm.get_cmap("viridis")
-        for _, (df, dataset_title) in enumerate(zip(dataframes, dataset_titles)):
-            fig = go.Figure()
-            # Get unique classes and assign colors
-            classes = sorted(df[target_col].unique())
-            colors = [
-                colormap(i / len(classes))[:3] for i in range(len(classes))
-            ]  # RGB
-            color_dict = {
-                cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
-                for cls, rgb in zip(classes, colors)
-            }
-            for class_value in sorted(df[target_col].unique()):
-                # Calculate cumulative distribution for the current class
-                sorted_probs = np.sort(
-                    df[df[target_col] == class_value]["probabilities"]
-                )
-                cumulative_probs = np.cumsum(sorted_probs) / np.sum(sorted_probs)
-                fig.add_trace(
-                    go.Scatter(
-                        x=sorted_probs,
-                        y=cumulative_probs,
-                        mode="lines",
-                        name=f"{dataset_title} {target_col} = {class_value}",
-                        line=dict(
-                            color=color_dict[class_value],
-                        ),
-                    )
-                )
-            fig.update_layout(
-                title_text=f"{title} - {dataset_title}",
-                xaxis_title="Probability",
-                yaxis_title="Cumulative Distribution",
-                legend_title=target_col,
-            )
-            figures.append(fig)
-        return figures
-    def run(self):
-        dataset_titles = [dataset.input_id for dataset in self.inputs.datasets]
-        target_column = self.inputs.datasets[0].target_column
-        title = self.params.get("title", self.default_params["title"])
-        dataframes = []
-        metric_value = {"cum_prob": {}}
-        for dataset in self.inputs.datasets:
-            df = dataset.df.copy()
-            y_prob = dataset.y_prob(self.inputs.model)
-            df["probabilities"] = y_prob
-            dataframes.append(df)
-            metric_value["cum_prob"][dataset.input_id] = list(df["probabilities"])
-        figures = self.plot_cumulative_prob(
-            dataframes, dataset_titles, target_column, title
-        )
+    df = dataset.df
+    df["probabilities"] = dataset.y_prob(model)
-        figures_list = [
-            Figure(
-                for_object=self,
-                key=f"cumulative_prob_{title.replace(' ', '_')}_{i+1}",
-                figure=fig,
+    fig = _plot_cumulative_prob(df, dataset.target_column, title)
+    return fig
+def _plot_cumulative_prob(df, target_col, title):
+    # Generate a colormap and convert to Plotly-accepted color format
+    # Adjust 'viridis' to any other matplotlib colormap if desired
+    colormap = cm.get_cmap("viridis")
+    fig = go.Figure()
+    # Get unique classes and assign colors
+    classes = sorted(df[target_col].unique())
+    colors = [colormap(i / len(classes))[:3] for i in range(len(classes))]  # RGB
+    color_dict = {
+        cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
+        for cls, rgb in zip(classes, colors)
+    }
+    for class_value in sorted(df[target_col].unique()):
+        # Calculate cumulative distribution for the current class
+        sorted_probs = np.sort(df[df[target_col] == class_value]["probabilities"])
+        cumulative_probs = np.cumsum(sorted_probs) / np.sum(sorted_probs)
+        fig.add_trace(
+            go.Scatter(
+                x=sorted_probs,
+                y=cumulative_probs,
+                mode="lines",
+                name=f"{target_col} = {class_value}",
+                line=dict(
+                    color=color_dict[class_value],
+                ),
             )
-            for i, fig in enumerate(figures)
-        ]
+        )
+        fig.update_layout(
+            title_text=f"{title}",
+            xaxis_title="Probability",
+            yaxis_title="Cumulative Distribution",
+        )
-        return self.cache_results(metric_value=metric_value, figures=figures_list)
+    return fig

validmind/tests/model_validation/statsmodels/DurbinWatsonTest.py CHANGED Viewed

@@ -2,58 +2,85 @@
 # See the LICENSE file in the root of this repository for details.
 # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
-from dataclasses import dataclass
+import pandas as pd
 from statsmodels.stats.stattools import durbin_watson
-from validmind.vm_models import Metric
+from validmind import tags, tasks
-@dataclass
-class DurbinWatsonTest(Metric):
+@tasks("regression")
+@tags("time_series_data", "forecasting", "statistical_test", "statsmodels")
+def DurbinWatsonTest(dataset, model, threshold=[1.5, 2.5]):
     """
     Assesses autocorrelation in time series data features using the Durbin-Watson statistic.
-    **Purpose**: The Durbin-Watson Test metric detects autocorrelation in time series data (where a set of data values
-    influences their predecessors). Autocorrelation is a crucial factor for regression tasks as these often assume the
+    ### Purpose
+    The Durbin-Watson Test metric detects autocorrelation in time series data (where a set of data values influences
+    their predecessors). Autocorrelation is a crucial factor for regression tasks as these often assume the
     independence of residuals. A model with significant autocorrelation may give unreliable predictions.
-    **Test Mechanism**: Utilizing the `durbin_watson` function in the `statsmodels` Python library, the Durbin-Watson
-    (DW) Test metric generates a statistical value for each feature of the training dataset. The function is looped
-    over all columns of the dataset, calculating and caching the DW value for each column for further analysis. A DW
-    metric value nearing 2 indicates no autocorrelation. Conversely, values approaching 0 suggest positive
-    autocorrelation, and those leaning towards 4 imply negative autocorrelation.
+    ### Test Mechanism
+    Utilizing the `durbin_watson` function in the `statsmodels` Python library, the Durbin-Watson (DW) Test metric
+    generates a statistical value for each feature of the training dataset. The function is looped over all columns of
+    the dataset, calculating and caching the DW value for each column for further analysis. A DW metric value nearing 2
+    indicates no autocorrelation. Conversely, values approaching 0 suggest positive autocorrelation, and those leaning
+    towards 4 imply negative autocorrelation.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - If a feature's DW value significantly deviates from 2, it could signal a high risk due to potential
     autocorrelation issues in the dataset.
-    - A value closer to '0' could imply positive autocorrelation, while a value nearer to '4' could point to negative
+    - A value closer to 0 could imply positive autocorrelation, while a value nearer to 4 could point to negative
     autocorrelation, both leading to potentially unreliable prediction models.
-    **Strengths**:
+    ### Strengths
     - The metric specializes in identifying autocorrelation in prediction model residuals.
     - Autocorrelation detection assists in diagnosing violation of various modeling technique assumptions, particularly
     in regression analysis and time-series data modeling.
-    **Limitations**:
+    ### Limitations
     - The Durbin-Watson Test mainly detects linear autocorrelation and could overlook other types of relationships.
     - The metric is highly sensitive to data points order. Shuffling the order could lead to notably different results.
     - The test only checks for first-order autocorrelation (between a variable and its immediate predecessor) and fails
-    to detect higher order autocorrelation.
+    to detect higher-order autocorrelation.
     """
-    name = "durbin_watson"
-    required_inputs = ["dataset"]
-    tasks = ["regression"]
-    tags = ["time_series_data", "forecasting", "statistical_test", "statsmodels"]
-    def run(self):
-        """
-        Calculates DB for each of the dataset features
-        """
-        x_train = self.inputs.dataset.df
-        dw_values = {}
-        for col in x_train.columns:
-            dw_values[col] = durbin_watson(x_train[col].values)
-        return self.cache_results(dw_values)
+    # Validate threshold values
+    if not (0 < threshold[0] < threshold[1] < 4):
+        raise ValueError(
+            "Invalid threshold. It should be in the form [a, b] where 0 < a < b < 4."
+        )
+    # Check if threshold values are around 2
+    if abs(2 - threshold[0]) > 1 or abs(2 - threshold[1]) > 1:
+        raise ValueError(
+            "Threshold values should be around 2 for meaningful Durbin-Watson test results."
+        )
+    y_true = dataset.y
+    y_pred = dataset.y_pred(model)
+    residuals = y_true - y_pred
+    dw_statistic = durbin_watson(residuals)
+    def get_autocorrelation(dw_value, threshold):
+        if dw_value < threshold[0]:
+            return "Positive autocorrelation"
+        elif dw_value > threshold[1]:
+            return "Negative autocorrelation"
+        else:
+            return "No autocorrelation"
+    results = pd.DataFrame(
+        {
+            "dw_statistic": [dw_statistic],
+            "threshold": [str(threshold)],
+            "autocorrelation": [get_autocorrelation(dw_statistic, threshold)],
+        }
+    )
+    return results

validmind/tests/model_validation/statsmodels/GINITable.py CHANGED Viewed

@@ -2,34 +2,37 @@
 # See the LICENSE file in the root of this repository for details.
 # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
-from dataclasses import dataclass
 import numpy as np
 import pandas as pd
 from sklearn.metrics import roc_auc_score, roc_curve
-from validmind.vm_models import Metric, ResultSummary, ResultTable, ResultTableMetadata
+from validmind import tags, tasks
-@dataclass
-class GINITable(Metric):
+@tags("model_performance")
+@tasks("classification")
+def GINITable(dataset, model):
     """
     Evaluates classification model performance using AUC, GINI, and KS metrics for training and test datasets.
-    **Purpose**: The 'GINITable' metric is designed to evaluate the performance of a classification model by
-    emphasizing its discriminatory power. Specifically, it calculates and presents three important metrics - the Area
-    under the ROC Curve (AUC), the GINI coefficient, and the Kolmogov-Smirnov (KS) statistic - for both training and
-    test datasets.
+    ### Purpose
+    The 'GINITable' metric is designed to evaluate the performance of a classification model by emphasizing its
+    discriminatory power. Specifically, it calculates and presents three important metrics - the Area under the ROC
+    Curve (AUC), the GINI coefficient, and the Kolmogorov-Smirnov (KS) statistic - for both training and test datasets.
+    ### Test Mechanism
-    **Test Mechanism**: Using a dictionary for storing performance metrics for both the training and test datasets, the
-    'GINITable' metric calculates each of these metrics sequentially. The Area under the ROC Curve (AUC) is calculated
-    via the `roc_auc_score` function from the Scikit-Learn library. The GINI coefficient, a measure of statistical
-    dispersion, is then computed by doubling the AUC and subtracting 1. Finally, the Kolmogov-Smirnov (KS) statistic is
+    Using a dictionary for storing performance metrics for both the training and test datasets, the 'GINITable' metric
+    calculates each of these metrics sequentially. The Area under the ROC Curve (AUC) is calculated via the
+    `roc_auc_score` function from the Scikit-Learn library. The GINI coefficient, a measure of statistical dispersion,
+    is then computed by doubling the AUC and subtracting 1. Finally, the Kolmogorov-Smirnov (KS) statistic is
     calculated via the `roc_curve` function from Scikit-Learn, with the False Positive Rate (FPR) subtracted from the
     True Positive Rate (TPR) and the maximum value taken from the resulting data. These metrics are then stored in a
     pandas DataFrame for convenient visualization.
-    **Signs of High Risk**:
+    ### Signs of High Risk
     - Low values for performance metrics may suggest a reduction in model performance, particularly a low AUC which
     indicates poor classification performance, or a low GINI coefficient, which could suggest a decreased ability to
     discriminate different classes.
@@ -38,7 +41,8 @@ class GINITable(Metric):
     - Significant discrepancies between the performance on the training dataset and the test dataset may present
     another signal of high risk.
-    **Strengths**:
+    ### Strengths
     - Offers three key performance metrics (AUC, GINI, and KS) in one test, providing a more comprehensive evaluation
     of the model.
     - Provides a direct comparison between the model's performance on training and testing datasets, which aids in
@@ -47,7 +51,8 @@ class GINITable(Metric):
     performance even when dealing with imbalanced datasets.
     - Presents the metrics in a user-friendly table format for easy comprehension and analysis.
-    **Limitations**:
+    ### Limitations
     - The GINI coefficient and KS statistic are both dependent on the AUC value. Therefore, any errors in the
     calculation of the latter will adversely impact the former metrics too.
     - Mainly suited for binary classification models and may require modifications for effective application in
@@ -57,64 +62,26 @@ class GINITable(Metric):
     lead to inaccuracies in the metrics if the data is not appropriately preprocessed.
     """
-    name = "gini_table"
-    required_inputs = ["model", "datasets"]
-    tasks = ["classification"]
-    tags = ["visualization", "model_performance"]
-    def run(self):
-        summary_metrics = self.compute_metrics()
-        return self.cache_results(
-            {
-                "metrics_summary": summary_metrics.to_dict(orient="records"),
-            }
-        )
-    def compute_metrics(self):
-        """Computes AUC, GINI, and KS for an arbitrary number of datasets."""
-        # Initialize the dictionary to store results
-        metrics_dict = {"Dataset": [], "AUC": [], "GINI": [], "KS": []}
-        # Iterate over each dataset in the inputs
-        for _, dataset in enumerate(self.inputs.datasets):
-            dataset_label = (
-                dataset.input_id
-            )  # Use input_id as the label for each dataset
-            metrics_dict["Dataset"].append(dataset_label)
-            # Retrieve y_true and y_pred for the current dataset
-            y_true = np.ravel(dataset.y)  # Flatten y_true to make it one-dimensional
-            y_prob = dataset.y_prob(self.inputs.model)
-            # Compute metrics
-            y_true = np.array(y_true, dtype=float)
-            y_prob = np.array(y_prob, dtype=float)
-            fpr, tpr, _ = roc_curve(y_true, y_prob)
-            ks = max(tpr - fpr)
-            auc = roc_auc_score(y_true, y_prob)
-            gini = 2 * auc - 1
-            # Add the metrics to the dictionary
-            metrics_dict["AUC"].append(auc)
-            metrics_dict["GINI"].append(gini)
-            metrics_dict["KS"].append(ks)
-        # Create a DataFrame to store and return the results
-        metrics_df = pd.DataFrame(metrics_dict)
-        return metrics_df
-    def summary(self, metric_value):
-        summary_metrics_table = metric_value["metrics_summary"]
-        return ResultSummary(
-            results=[
-                ResultTable(
-                    data=summary_metrics_table,
-                    metadata=ResultTableMetadata(
-                        title="AUC, GINI and KS for train and test datasets"
-                    ),
-                )
-            ]
-        )
+    metrics_dict = {"AUC": [], "GINI": [], "KS": []}
+    # Retrieve y_true and y_pred for the current dataset
+    y_true = np.ravel(dataset.y)  # Flatten y_true to make it one-dimensional
+    y_prob = dataset.y_prob(model)
+    # Compute metrics
+    y_true = np.array(y_true, dtype=float)
+    y_prob = np.array(y_prob, dtype=float)
+    fpr, tpr, _ = roc_curve(y_true, y_prob)
+    ks = max(tpr - fpr)
+    auc = roc_auc_score(y_true, y_prob)
+    gini = 2 * auc - 1
+    # Add the metrics to the dictionary
+    metrics_dict["AUC"].append(auc)
+    metrics_dict["GINI"].append(gini)
+    metrics_dict["KS"].append(ks)
+    # Create a DataFrame to store and return the results
+    metrics_df = pd.DataFrame(metrics_dict)
+    return metrics_df

validmind 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl