PyPI - validmind - Versions diffs - 2.5.6__py3-none-any.whl → 2.5.15__py3-none-any.whl - Mend

validmind 2.5.6py3-none-any.whl → 2.5.15py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (212) hide show

validmind/tests/model_validation/sklearn/AdjustedMutualInformation.py CHANGED Viewed

@@ -15,29 +15,36 @@ class AdjustedMutualInformation(ClusterPerformance):
     Evaluates clustering model performance by measuring mutual information between true and predicted labels, adjusting
     for chance.
-    **1. Purpose**: The purpose of this metric (Adjusted Mutual Information) is to evaluate the performance of a
-    machine learning model, more specifically, a clustering model. It measures the mutual information between the true
-    labels and the ones predicted by the model, adjusting for chance.
+    ### Purpose
-    **2. Test Mechanism**: The Adjusted Mutual Information (AMI) uses sklearn's `adjusted_mutual_info_score` function.
-    This function calculates the mutual information between the true labels and the ones predicted while correcting for
-    the chance correlation expected due to random label assignments. This test requires the model, the training
-    dataset, and the test dataset as inputs.
+    The purpose of this metric (Adjusted Mutual Information) is to evaluate the performance of a machine learning
+    model, more specifically, a clustering model. It measures the mutual information between the true labels and the
+    ones predicted by the model, adjusting for chance.
+    ### Test Mechanism
+    The Adjusted Mutual Information (AMI) uses sklearn's `adjusted_mutual_info_score` function. This function
+    calculates the mutual information between the true labels and the ones predicted while correcting for the chance
+    correlation expected due to random label assignments. This test requires the model, the training dataset, and the
+    test dataset as inputs.
+    ### Signs of High Risk
-    **3. Signs of High Risk**:
     - Low Adjusted Mutual Information Score: This score ranges between 0 and 1. A low score (closer to 0) can indicate
     poor model performance as the predicted labels do not align well with the true labels.
-    - In case of high dimensional data, if the algorithm shows high scores, this could also be a potential risk as AMI
+    - In case of high-dimensional data, if the algorithm shows high scores, this could also be a potential risk as AMI
     may not perform reliably.
-    **4. Strengths**:
+    ### Strengths
     - The AMI metric takes into account the randomness of the predicted labels, which makes it more robust than the
     simple Mutual Information.
     - The scale of AMI is not dependent on the sizes of the clustering, allowing for comparability between different
     datasets or models.
     - Good for comparing the output of clustering algorithms where the number of clusters is not known a priori.
-    **5. Limitations**:
+    ### Limitations
     - Adjusted Mutual Information does not take into account the continuous nature of some data. As a result, it may
     not be the best choice for regression or other continuous types of tasks.
     - AMI has the drawback of being biased towards clusterings with a higher number of clusters.
@@ -47,7 +54,7 @@ class AdjustedMutualInformation(ClusterPerformance):
     """
     name = "adjusted_mutual_information"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind/tests/model_validation/sklearn/AdjustedRandIndex.py CHANGED Viewed

@@ -15,38 +15,43 @@ class AdjustedRandIndex(ClusterPerformance):
     Measures the similarity between two data clusters using the Adjusted Rand Index (ARI) metric in clustering machine
     learning models.
-    **1. Purpose:**
+    ### Purpose
     The Adjusted Rand Index (ARI) metric is intended to measure the similarity between two data clusters. This metric
-    is specifically being used for clustering machine learning models to validly quantify how well the model is
-    clustering and producing data groups. It involves comparing the model's produced clusters against the actual (true)
-    clusters found in the dataset.
+    is specifically used for clustering machine learning models to quantify how well the model is clustering and
+    producing data groups. It involves comparing the model's produced clusters against the actual (true) clusters found
+    in the dataset.
+    ### Test Mechanism
+    The Adjusted Rand Index (ARI) is calculated using the `adjusted_rand_score` method from the `sklearn.metrics`
+    module in Python. The test requires inputs including the model itself and the model's training and test datasets.
+    The model's computed clusters and the true clusters are compared, and the similarities are measured to compute the
+    ARI.
-    **2. Test Mechanism:**
-    The Adjusted Rand Index (ARI) is calculated by using the `adjusted_rand_score` method from the sklearn metrics in
-    Python. The test requires inputs including the model itself and the model's training and test datasets. The model's
-    computed clusters and the true clusters are compared, and the similarities are measured to compute the ARI.
+    ### Signs of High Risk
-    **3. Signs of High Risk:**
-    - If the ARI is close to zero, it signifies that the model's cluster assignments are random and don't match the
+    - If the ARI is close to zero, it signifies that the model's cluster assignments are random and do not match the
     actual dataset clusters, indicating a high risk.
     - An ARI of less than zero indicates that the model's clustering performance is worse than random.
-    **4. Strengths:**
-    - ARI is normalized and it hence gives a consistent metric between -1 and +1, irrespective of raw cluster sizes or
+    ### Strengths
+    - ARI is normalized and provides a consistent metric between -1 and +1, irrespective of raw cluster sizes or
     dataset size variations.
-    - It doesn’t require a ground truth for computation which makes it ideal for unsupervised learning model
-    evaluations.
+    - It does not require a ground truth for computation, making it ideal for unsupervised learning model evaluations.
     - It penalizes for false positives and false negatives, providing a robust measure of clustering quality.
-    **5. Limitations:**
+    ### Limitations
     - In real-world situations, true clustering is often unknown, which can hinder the practical application of the ARI.
     - The ARI requires all individual data instances to be independent, which may not always hold true.
-    - It may be difficult to interpret the implications of an ARI score without a context or a benchmark, as it is
+    - It may be difficult to interpret the implications of an ARI score without context or a benchmark, as it is
     heavily dependent on the characteristics of the dataset used.
     """
     name = "adjusted_rand_index"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind/tests/model_validation/sklearn/ClassifierPerformance.py CHANGED Viewed

@@ -24,36 +24,38 @@ class ClassifierPerformance(Metric):
     Evaluates performance of binary or multiclass classification models using precision, recall, F1-Score, accuracy,
     and ROC AUC scores.
-    **Purpose**: The supplied script is designed to evaluate the performance of Machine Learning classification models.
+    ### Purpose
+    The Classifier Performance test is designed to evaluate the performance of Machine Learning classification models.
     It accomplishes this by computing precision, recall, F1-Score, and accuracy, as well as the ROC AUC (Receiver
     operating characteristic - Area under the curve) scores, thereby providing a comprehensive analytic view of the
     models' performance. The test is adaptable, handling binary and multiclass models equally effectively.
-    **Test Mechanism**: The script produces a report that includes precision, recall, F1-Score, and accuracy, by
-    leveraging the `classification_report` from the scikit-learn's metrics module. For multiclass models, macro and
-    weighted averages for these scores are also calculated. Additionally, the ROC AUC scores are calculated and
-    included in the report using the script's unique `multiclass_roc_auc_score` function. The outcome of the test
-    (report format) differs based on whether the model is binary or multiclass.
+    ### Test Mechanism
+    The test produces a report that includes precision, recall, F1-Score, and accuracy, by leveraging the
+    `classification_report` from scikit-learn's metrics module. For multiclass models, macro and weighted averages for
+    these scores are also calculated. Additionally, the ROC AUC scores are calculated and included in the report using
+    the `multiclass_roc_auc_score` function. The outcome of the test (report format) differs based on whether the model
+    is binary or multiclass.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - Low values for precision, recall, F1-Score, accuracy, and ROC AUC, indicating poor performance.
-    - Imbalance in precision and recall scores. Precision highlights correct positive class predictions, while recall
-    indicates the accurate identification of actual positive cases. Imbalance may indicate flawed model performance.
-    - A low ROC AUC score, especially scores close to 0.5 or lower, strongly suggests a failing model.
-    **Strengths**:
-    - The script is versatile, capable of assessing both binary and multiclass models.
-    - It uses a variety of commonly employed performance metrics, offering a comprehensive view of a model's
-    performance.
-    - The use of ROC-AUC as a metric aids in determining the most optimal threshold for classification, especially
-    beneficial when evaluation datasets are unbalanced.
-    **Limitations**:
-    - The test assumes correctly identified labels for binary classification models and raises an exception if the
-    positive class is not labeled as "1". However, this setup may not align with all practical applications.
-    - This script is specifically designed for classification models and is not suited to evaluate regression models.
-    - The metrics computed may provide limited insights in cases where the test dataset does not adequately represent
-    the data the model will encounter in real-world scenarios.
+    - Imbalance in precision and recall scores.
+    - A low ROC AUC score, especially scores close to 0.5 or lower, suggesting a failing model.
+    ### Strengths
+    - Versatile, capable of assessing both binary and multiclass models.
+    - Utilizes a variety of commonly employed performance metrics, offering a comprehensive view of model performance.
+    - The use of ROC-AUC as a metric is beneficial for evaluating unbalanced datasets.
+    ### Limitations
+    - Assumes correctly identified labels for binary classification models.
+    - Specifically designed for classification models and not suitable for regression models.
+    - May provide limited insights if the test dataset does not represent real-world scenarios adequately.
     """
     name = "classifier_performance"
@@ -132,7 +134,7 @@ class ClassifierPerformance(Metric):
         if len(np.unique(y_true)) > 2:
             y_pred = self.inputs.dataset.y_pred(self.inputs.model)
             y_true = y_true.astype(y_pred.dtype)
-            roc_auc = self.multiclass_roc_auc_score(y_true, y_pred)
+            roc_auc = multiclass_roc_auc_score(y_true, y_pred)
         else:
             y_prob = self.inputs.dataset.y_prob(self.inputs.model)
             y_true = y_true.astype(y_prob.dtype).flatten()

validmind/tests/model_validation/sklearn/ClusterCosineSimilarity.py CHANGED Viewed

@@ -16,19 +16,21 @@ class ClusterCosineSimilarity(Metric):
     """
     Measures the intra-cluster similarity of a clustering model using cosine similarity.
-    **1. Purpose:**
+    ### Purpose
     The purpose of this metric is to measure how similar the data points within each cluster of a clustering model are.
     This is done using cosine similarity, which compares the multi-dimensional direction (but not magnitude) of data
     vectors. From a Model Risk Management perspective, this metric is used to quantitatively validate that clusters
     formed by a model have high intra-cluster similarity.
-    **2. Test Mechanism:**
+    ### Test Mechanism
     This test works by first extracting the true and predicted clusters of the model's training data. Then, it computes
     the centroid (average data point) of each cluster. Next, it calculates the cosine similarity between each data
     point within a cluster and its respective centroid. Finally, it outputs the mean cosine similarity of each cluster,
     highlighting how similar, on average, data points in a cluster are to the cluster's centroid.
-    **3. Signs of High Risk:**
+    ### Signs of High Risk
     - Low mean cosine similarity for one or more clusters: If the mean cosine similarity is low, the data points within
     the respective cluster have high variance in their directions. This can be indicative of poor clustering,
@@ -36,7 +38,7 @@ class ClusterCosineSimilarity(Metric):
     - High disparity between mean cosine similarity values across clusters: If there's a significant difference in mean
     cosine similarity across different clusters, this could indicate imbalance in how the model forms clusters.
-    **4. Strengths:**
+    ### Strengths
     - Cosine similarity operates in a multi-dimensional space, making it effective for measuring similarity in high
     dimensional datasets, typical for many machine learning problems.
@@ -44,7 +46,7 @@ class ClusterCosineSimilarity(Metric):
     of each vector.
     - This metric is not dependent on the scale of the variables, making it equally effective on different scales.
-    **5. Limitations:**
+    ### Limitations
     - Cosine similarity does not consider magnitudes (i.e. lengths) of vectors, only their direction. This means it may
     overlook instances where clusters have been adequately separated in terms of magnitude.

validmind/tests/model_validation/sklearn/ClusterPerformance.py CHANGED Viewed

@@ -4,7 +4,7 @@
 from dataclasses import dataclass
-from validmind.vm_models import Metric, ResultSummary, ResultTable
+from validmind.vm_models import Metric
 @dataclass
@@ -13,106 +13,68 @@ class ClusterPerformance(Metric):
     Evaluates and compares a clustering model's performance on training and testing datasets using multiple defined
     metrics.
-    **Purpose:** This metric, ClusterPerformance, evaluates the performance of a clustering model on both the training
-    and testing datasets. It assesses how well the model defines, forms, and distinguishes clusters of data.
-    **Test Mechanism:** The metric is applied by first predicting the clusters of the training and testing datasets
-    using the clustering model. Next, performance metrics, defined in the method `metric_info()`, are calculated
-    against the true labels of the datasets. The results for each metric for both datasets are then collated and
-    returned in a summarized table form listing each metric along with its corresponding train and test values.
-    **Signs of High Risk:**
-    - High discrepancy between the performance metric values on the training and testing datasets. This could signify
-    problems such as overfitting or underfitting.
-    - Low performance metric values on the training and testing datasets. There might be a problem with the model
-    itself or the chosen hyperparameters.
-    - If the model's performance deteriorates consistently across different sets of metrics, this may suggest a broader
-    issue with the model or the dataset.
-    **Strengths:**
-    - Tests the model's performance on both the training and testing datasets, which helps to identify issues such as
-    overfitting or underfitting.
-    - Allows for a broad range of performance metrics to be used, thus providing a comprehensive evaluation of the
-    model's clustering capabilities.
-    - Returns a summarized table, which makes it easy to compare the model's performance across different metrics and
-    datasets.
-    **Limitations:**
-    - The method `metric_info()` needs to be properly overridden in a subclass for this class to be used, and the
-    metrics to be used must be manually defined.
-    - The performance metrics are calculated on predicted cluster labels, so the metric may not capture the model's
-    performance well if the clusters are not well separated or if the model has difficulties with certain kinds of
-    clusters.
-    - Doesn't consider the computational and time complexity of the model. While the model may perform well in terms of
-    the performance metrics, it might be time or resource-intensive. This metric does not account for such scenarios.
-    - Because the comparison is binary (train and test), it might not capture scenarios where the performance changes
-    drastically under different circumstances or categories within the dataset.
+    ### Purpose
+    The Cluster Performance test evaluates the performance of a clustering model on both the training and testing
+    datasets. It assesses how well the model defines, forms, and distinguishes clusters of data.
+    ### Test Mechanism
+    The test mechanism involves predicting the clusters of the training and testing datasets using the clustering
+    model. After prediction, performance metrics defined in the `metric_info()` method are calculated against the true
+    labels of the datasets. The results for each metric for both datasets are then collated and returned in a
+    summarized table form listing each metric along with its corresponding train and test values.
+    ### Signs of High Risk
+    - High discrepancy between the performance metric values on the training and testing datasets.
+    - Low performance metric values on both the training and testing datasets.
+    - Consistent deterioration of performance across different metrics.
+    ### Strengths
+    - Tests the model's performance on both training and testing datasets, helping to identify overfitting or
+    underfitting.
+    - Allows for the use of a broad range of performance metrics, providing a comprehensive evaluation.
+    - Returns a summarized table, making it easy to compare performance across different metrics and datasets.
+    ### Limitations
+    - The `metric_info()` method needs to be properly overridden in a subclass and metrics must be manually defined.
+    - The test may not capture the model's performance well if clusters are not well-separated or the model struggles
+    with certain clusters.
+    - Does not consider the computational and time complexity of the model.
+    - Binary comparison (train and test) might not capture performance changes under different circumstances or dataset
+    categories.
     """
     name = "cluster_performance_metrics"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",
         "model_performance",
     ]
-    def cluster_performance_metrics(
-        self, y_true_train, y_pred_train, y_true_test, y_pred_test, samples, metric_info
-    ):
+    def cluster_performance_metrics(self, y_true_train, y_pred_train, metric_info):
         y_true_train = y_true_train.astype(y_pred_train.dtype).flatten()
-        y_true_test = y_true_test.astype(y_pred_test.dtype).flatten()
         results = []
         for metric_name, metric_fcn in metric_info.items():
-            for _ in samples:
-                train_value = metric_fcn(list(y_true_train), y_pred_train)
-                test_value = metric_fcn(list(y_true_test), y_pred_test)
-            results.append(
-                {
-                    metric_name: {
-                        "train": train_value,
-                        "test": test_value,
-                    }
-                }
-            )
+            train_value = metric_fcn(list(y_true_train), y_pred_train)
+            results.append({metric_name: train_value})
         return results
-    def summary(self, raw_results):
-        """
-        Returns a summarized representation of the dataset split information
-        """
-        table_records = []
-        for result in raw_results:
-            for key, _ in result.items():
-                table_records.append(
-                    {
-                        "Metric": key,
-                        "TRAIN": result[key]["train"],
-                        "TEST": result[key]["test"],
-                    }
-                )
-        return ResultSummary(results=[ResultTable(data=table_records)])
     def metric_info(self):
         raise NotImplementedError
     def run(self):
-        y_true_train = self.inputs.datasets[0].y
-        class_pred_train = self.inputs.datasets[0].y_pred(self.inputs.model)
+        y_true_train = self.inputs.dataset.y
+        class_pred_train = self.inputs.dataset.y_pred(self.inputs.model)
         y_true_train = y_true_train.astype(class_pred_train.dtype)
-        y_true_test = self.inputs.datasets[1].y
-        class_pred_test = self.inputs.datasets[1].y_pred(self.inputs.model)
-        y_true_test = y_true_test.astype(class_pred_test.dtype)
-        samples = ["train", "test"]
         results = self.cluster_performance_metrics(
             y_true_train,
             class_pred_train,
-            y_true_test,
-            class_pred_test,
-            samples,
             self.metric_info(),
         )
         return self.cache_results(metric_value=results)

validmind/tests/model_validation/sklearn/ClusterPerformanceMetrics.py CHANGED Viewed

@@ -16,33 +16,33 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     """
     Evaluates the performance of clustering machine learning models using multiple established metrics.
-    **Purpose:**
+    ### Purpose
     The `ClusterPerformanceMetrics` test is used to assess the performance and validity of clustering machine learning
     models. It evaluates homogeneity, completeness, V measure score, the Adjusted Rand Index, the Adjusted Mutual
     Information, and the Fowlkes-Mallows score of the model. These metrics provide a holistic understanding of the
     model's ability to accurately form clusters of the given dataset.
-    **Test Mechanism:**
+    ### Test Mechanism
     The `ClusterPerformanceMetrics` test runs a clustering ML model over a given dataset and then calculates six
     metrics using the Scikit-learn metrics computation functions: Homogeneity Score, Completeness Score, V Measure,
     Adjusted Rand Index (ARI), Adjusted Mutual Information (AMI), and Fowlkes-Mallows Score. It then returns the result
     as a summary, presenting the metric values for both training and testing datasets.
-    **Signs of High Risk:**
+    ### Signs of High Risk
-    - Low Homogeneity Score: This indicates that the clusters formed contain a variety of classes, resulting in less
-    pure clusters.
-    - Low Completeness Score: This suggests that class instances are scattered across multiple clusters rather than
-    being gathered in a single cluster.
-    - Low V Measure: This would report a low overall clustering performance.
-    - ARI close to 0 or Negative: This implies that clustering results are random or disagree with the true labels.
-    - AMI close to 0: It means that clustering labels are random compared with the true labels.
+    - Low Homogeneity Score: Indicates that the clusters formed contain a variety of classes, resulting in less pure
+    clusters.
+    - Low Completeness Score: Suggests that class instances are scattered across multiple clusters rather than being
+    gathered in a single cluster.
+    - Low V Measure: Reports a low overall clustering performance.
+    - ARI close to 0 or Negative: Implies that clustering results are random or disagree with the true labels.
+    - AMI close to 0: Means that clustering labels are random compared with the true labels.
     - Low Fowlkes-Mallows score: Signifies less precise and poor clustering performance in terms of precision and
     recall.
-    **Strengths:**
+    ### Strengths
     - Provides a comprehensive view of clustering model performance by examining multiple clustering metrics.
     - Uses established and widely accepted metrics from scikit-learn, providing reliability in the results.
@@ -50,9 +50,9 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     - Clearly defined and human-readable descriptions of each score make it easy to understand what each score
     represents.
-    **Limitations:**
+    ### Limitations
-    - It only applies to clustering models; not suitable for other types of machine learning models.
+    - Only applies to clustering models; not suitable for other types of machine learning models.
     - Does not test for overfitting or underfitting in the clustering model.
     - All the scores rely on ground truth labels, the absence or inaccuracy of which can lead to misleading results.
     - Does not consider aspects like computational efficiency of the model or its capability to handle high dimensional
@@ -60,7 +60,7 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     """
     name = "homogeneity_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = ["sklearn", "model_performance"]
     default_metrics = {
@@ -121,10 +121,8 @@ class ClusterPerformanceMetrics(ClusterPerformance):
             for key, _ in result.items():
                 table_records.append(
                     {
-                        "Metric": key,
                         "Description": self.default_metrics_desc[key],
-                        "TRAIN": result[key]["train"],
-                        "TEST": result[key]["test"],
+                        key: result[key],
                     }
                 )

validmind/tests/model_validation/sklearn/CompletenessScore.py CHANGED Viewed

@@ -14,26 +14,32 @@ class CompletenessScore(ClusterPerformance):
     """
     Evaluates a clustering model's capacity to categorize instances from a single class into the same cluster.
-    **Purpose:** The Completeness Score metric is used to assess the performance of clustering models. It measures the
-    extent to which all the data points that are members of a given class are elements of the same cluster. The aim is
-    to determine the capability of the model to categorize all instances from a single class into the same cluster.
+    ### Purpose
-    **Test Mechanism:** This test takes three inputs, a model and its associated training and testing datasets. It
-    invokes the `completeness_score` function from the sklearn library on the labels predicted by the model. High
-    scores indicate that data points from the same class generally appear in the same cluster, while low scores suggest
-    the opposite.
+    The Completeness Score metric is used to assess the performance of clustering models. It measures the extent to
+    which all the data points that are members of a given class are elements of the same cluster. The aim is to
+    determine the capability of the model to categorize all instances from a single class into the same cluster.
+    ### Test Mechanism
+    This test takes three inputs, a model and its associated training and testing datasets. It invokes the
+    `completeness_score` function from the sklearn library on the labels predicted by the model. High scores indicate
+    that data points from the same class generally appear in the same cluster, while low scores suggest the opposite.
+    ### Signs of High Risk
-    **Signs of High Risk:**
     - Low completeness score: This suggests that the model struggles to group instances from the same class into one
     cluster, indicating poor clustering performance.
-    **Strengths:**
+    ### Strengths
     - The Completeness Score provides an effective method for assessing the performance of a clustering model,
     specifically its ability to group class instances together.
     - This test metric conveniently relies on the capabilities provided by the sklearn library, ensuring consistent and
     reliable test results.
-    **Limitations:**
+    ### Limitations
     - This metric only evaluates a specific aspect of clustering, meaning it may not provide a holistic or complete
     view of the model's performance.
     - It cannot assess the effectiveness of the model in differentiating between separate classes, as it is solely
@@ -43,7 +49,7 @@ class CompletenessScore(ClusterPerformance):
     """
     name = "homogeneity_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind/tests/model_validation/sklearn/ConfusionMatrix.py CHANGED Viewed

@@ -17,33 +17,40 @@ class ConfusionMatrix(Metric):
     Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix
     heatmap.
-    **Purpose**: The Confusion Matrix tester is designed to assess the performance of a classification Machine Learning
-    model. This performance is evaluated based on how well the model is able to correctly classify True Positives, True
-    Negatives, False Positives, and False Negatives - fundamental aspects of model accuracy.
-    **Test Mechanism**: The mechanism used involves taking the predicted results (`y_test_predict`) from the
-    classification model and comparing them against the actual values (`y_test_true`). A confusion matrix is built
-    using the unique labels extracted from `y_test_true`, employing scikit-learn's metrics. The matrix is then visually
-    rendered with the help of Plotly's `create_annotated_heatmap` function. A heatmap is created which provides a
-    two-dimensional graphical representation of the model's performance, showcasing distributions of True Positives
-    (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
-    **Signs of High Risk**: Indicators of high risk related to the model include:
+    ### Purpose
+    The Confusion Matrix tester is designed to assess the performance of a classification Machine Learning model. This
+    performance is evaluated based on how well the model is able to correctly classify True Positives, True Negatives,
+    False Positives, and False Negatives - fundamental aspects of model accuracy.
+    ### Test Mechanism
+    The mechanism used involves taking the predicted results (`y_test_predict`) from the classification model and
+    comparing them against the actual values (`y_test_true`). A confusion matrix is built using the unique labels
+    extracted from `y_test_true`, employing scikit-learn's metrics. The matrix is then visually rendered with the help
+    of Plotly's `create_annotated_heatmap` function. A heatmap is created which provides a two-dimensional graphical
+    representation of the model's performance, showcasing distributions of True Positives (TP), True Negatives (TN),
+    False Positives (FP), and False Negatives (FN).
+    ### Signs of High Risk
     - High numbers of False Positives (FP) and False Negatives (FN), depicting that the model is not effectively
     classifying the values.
     - Low numbers of True Positives (TP) and True Negatives (TN), implying that the model is struggling with correctly
     identifying class labels.
-    **Strengths**: The Confusion Matrix tester brings numerous strengths:
+    ### Strengths
     - It provides a simplified yet comprehensive visual snapshot of the classification model's predictive performance.
     - It distinctly brings out True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives
-    (FN), thus, making it easier to focus on potential areas of improvement.
+    (FN), thus making it easier to focus on potential areas of improvement.
     - The matrix is beneficial in dealing with multi-class classification problems as it can provide a simple view of
     complex model performances.
     - It aids in understanding the different types of errors that the model could potentially make, as it provides
     in-depth insights into Type-I and Type-II errors.
-    **Limitations**: Despite its various strengths, the Confusion Matrix tester does exhibit some limitations:
+    ### Limitations
     - In cases of unbalanced classes, the effectiveness of the confusion matrix might be lessened. It may wrongly
     interpret the accuracy of a model that is essentially just predicting the majority class.
     - It does not provide a single unified statistic that could evaluate the overall performance of the model.

validmind 2.5.6__py3-none-any.whl → 2.5.15__py3-none-any.whl

validmind 2.5.6py3-none-any.whl → 2.5.15py3-none-any.whl