PyPI - validmind - Versions diffs - 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl - Mend

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (233) hide show

validmind/tests/model_validation/sklearn/ClassifierPerformance.py CHANGED Viewed

@@ -24,36 +24,38 @@ class ClassifierPerformance(Metric):
     Evaluates performance of binary or multiclass classification models using precision, recall, F1-Score, accuracy,
     and ROC AUC scores.
-    **Purpose**: The supplied script is designed to evaluate the performance of Machine Learning classification models.
+    ### Purpose
+    The Classifier Performance test is designed to evaluate the performance of Machine Learning classification models.
     It accomplishes this by computing precision, recall, F1-Score, and accuracy, as well as the ROC AUC (Receiver
     operating characteristic - Area under the curve) scores, thereby providing a comprehensive analytic view of the
     models' performance. The test is adaptable, handling binary and multiclass models equally effectively.
-    **Test Mechanism**: The script produces a report that includes precision, recall, F1-Score, and accuracy, by
-    leveraging the `classification_report` from the scikit-learn's metrics module. For multiclass models, macro and
-    weighted averages for these scores are also calculated. Additionally, the ROC AUC scores are calculated and
-    included in the report using the script's unique `multiclass_roc_auc_score` function. The outcome of the test
-    (report format) differs based on whether the model is binary or multiclass.
+    ### Test Mechanism
+    The test produces a report that includes precision, recall, F1-Score, and accuracy, by leveraging the
+    `classification_report` from scikit-learn's metrics module. For multiclass models, macro and weighted averages for
+    these scores are also calculated. Additionally, the ROC AUC scores are calculated and included in the report using
+    the `multiclass_roc_auc_score` function. The outcome of the test (report format) differs based on whether the model
+    is binary or multiclass.
+    ### Signs of High Risk
-    **Signs of High Risk**:
     - Low values for precision, recall, F1-Score, accuracy, and ROC AUC, indicating poor performance.
-    - Imbalance in precision and recall scores. Precision highlights correct positive class predictions, while recall
-    indicates the accurate identification of actual positive cases. Imbalance may indicate flawed model performance.
-    - A low ROC AUC score, especially scores close to 0.5 or lower, strongly suggests a failing model.
-    **Strengths**:
-    - The script is versatile, capable of assessing both binary and multiclass models.
-    - It uses a variety of commonly employed performance metrics, offering a comprehensive view of a model's
-    performance.
-    - The use of ROC-AUC as a metric aids in determining the most optimal threshold for classification, especially
-    beneficial when evaluation datasets are unbalanced.
-    **Limitations**:
-    - The test assumes correctly identified labels for binary classification models and raises an exception if the
-    positive class is not labeled as "1". However, this setup may not align with all practical applications.
-    - This script is specifically designed for classification models and is not suited to evaluate regression models.
-    - The metrics computed may provide limited insights in cases where the test dataset does not adequately represent
-    the data the model will encounter in real-world scenarios.
+    - Imbalance in precision and recall scores.
+    - A low ROC AUC score, especially scores close to 0.5 or lower, suggesting a failing model.
+    ### Strengths
+    - Versatile, capable of assessing both binary and multiclass models.
+    - Utilizes a variety of commonly employed performance metrics, offering a comprehensive view of model performance.
+    - The use of ROC-AUC as a metric is beneficial for evaluating unbalanced datasets.
+    ### Limitations
+    - Assumes correctly identified labels for binary classification models.
+    - Specifically designed for classification models and not suitable for regression models.
+    - May provide limited insights if the test dataset does not represent real-world scenarios adequately.
     """
     name = "classifier_performance"
@@ -132,7 +134,7 @@ class ClassifierPerformance(Metric):
         if len(np.unique(y_true)) > 2:
             y_pred = self.inputs.dataset.y_pred(self.inputs.model)
             y_true = y_true.astype(y_pred.dtype)
-            roc_auc = self.multiclass_roc_auc_score(y_true, y_pred)
+            roc_auc = multiclass_roc_auc_score(y_true, y_pred)
         else:
             y_prob = self.inputs.dataset.y_prob(self.inputs.model)
             y_true = y_true.astype(y_prob.dtype).flatten()

validmind/tests/model_validation/sklearn/ClusterCosineSimilarity.py CHANGED Viewed

@@ -16,19 +16,21 @@ class ClusterCosineSimilarity(Metric):
     """
     Measures the intra-cluster similarity of a clustering model using cosine similarity.
-    **1. Purpose:**
+    ### Purpose
     The purpose of this metric is to measure how similar the data points within each cluster of a clustering model are.
     This is done using cosine similarity, which compares the multi-dimensional direction (but not magnitude) of data
     vectors. From a Model Risk Management perspective, this metric is used to quantitatively validate that clusters
     formed by a model have high intra-cluster similarity.
-    **2. Test Mechanism:**
+    ### Test Mechanism
     This test works by first extracting the true and predicted clusters of the model's training data. Then, it computes
     the centroid (average data point) of each cluster. Next, it calculates the cosine similarity between each data
     point within a cluster and its respective centroid. Finally, it outputs the mean cosine similarity of each cluster,
     highlighting how similar, on average, data points in a cluster are to the cluster's centroid.
-    **3. Signs of High Risk:**
+    ### Signs of High Risk
     - Low mean cosine similarity for one or more clusters: If the mean cosine similarity is low, the data points within
     the respective cluster have high variance in their directions. This can be indicative of poor clustering,
@@ -36,7 +38,7 @@ class ClusterCosineSimilarity(Metric):
     - High disparity between mean cosine similarity values across clusters: If there's a significant difference in mean
     cosine similarity across different clusters, this could indicate imbalance in how the model forms clusters.
-    **4. Strengths:**
+    ### Strengths
     - Cosine similarity operates in a multi-dimensional space, making it effective for measuring similarity in high
     dimensional datasets, typical for many machine learning problems.
@@ -44,7 +46,7 @@ class ClusterCosineSimilarity(Metric):
     of each vector.
     - This metric is not dependent on the scale of the variables, making it equally effective on different scales.
-    **5. Limitations:**
+    ### Limitations
     - Cosine similarity does not consider magnitudes (i.e. lengths) of vectors, only their direction. This means it may
     overlook instances where clusters have been adequately separated in terms of magnitude.

validmind/tests/model_validation/sklearn/ClusterPerformance.py CHANGED Viewed

@@ -4,7 +4,7 @@
 from dataclasses import dataclass
-from validmind.vm_models import Metric, ResultSummary, ResultTable
+from validmind.vm_models import Metric
 @dataclass
@@ -13,106 +13,68 @@ class ClusterPerformance(Metric):
     Evaluates and compares a clustering model's performance on training and testing datasets using multiple defined
     metrics.
-    **Purpose:** This metric, ClusterPerformance, evaluates the performance of a clustering model on both the training
-    and testing datasets. It assesses how well the model defines, forms, and distinguishes clusters of data.
-    **Test Mechanism:** The metric is applied by first predicting the clusters of the training and testing datasets
-    using the clustering model. Next, performance metrics, defined in the method `metric_info()`, are calculated
-    against the true labels of the datasets. The results for each metric for both datasets are then collated and
-    returned in a summarized table form listing each metric along with its corresponding train and test values.
-    **Signs of High Risk:**
-    - High discrepancy between the performance metric values on the training and testing datasets. This could signify
-    problems such as overfitting or underfitting.
-    - Low performance metric values on the training and testing datasets. There might be a problem with the model
-    itself or the chosen hyperparameters.
-    - If the model's performance deteriorates consistently across different sets of metrics, this may suggest a broader
-    issue with the model or the dataset.
-    **Strengths:**
-    - Tests the model's performance on both the training and testing datasets, which helps to identify issues such as
-    overfitting or underfitting.
-    - Allows for a broad range of performance metrics to be used, thus providing a comprehensive evaluation of the
-    model's clustering capabilities.
-    - Returns a summarized table, which makes it easy to compare the model's performance across different metrics and
-    datasets.
-    **Limitations:**
-    - The method `metric_info()` needs to be properly overridden in a subclass for this class to be used, and the
-    metrics to be used must be manually defined.
-    - The performance metrics are calculated on predicted cluster labels, so the metric may not capture the model's
-    performance well if the clusters are not well separated or if the model has difficulties with certain kinds of
-    clusters.
-    - Doesn't consider the computational and time complexity of the model. While the model may perform well in terms of
-    the performance metrics, it might be time or resource-intensive. This metric does not account for such scenarios.
-    - Because the comparison is binary (train and test), it might not capture scenarios where the performance changes
-    drastically under different circumstances or categories within the dataset.
+    ### Purpose
+    The Cluster Performance test evaluates the performance of a clustering model on both the training and testing
+    datasets. It assesses how well the model defines, forms, and distinguishes clusters of data.
+    ### Test Mechanism
+    The test mechanism involves predicting the clusters of the training and testing datasets using the clustering
+    model. After prediction, performance metrics defined in the `metric_info()` method are calculated against the true
+    labels of the datasets. The results for each metric for both datasets are then collated and returned in a
+    summarized table form listing each metric along with its corresponding train and test values.
+    ### Signs of High Risk
+    - High discrepancy between the performance metric values on the training and testing datasets.
+    - Low performance metric values on both the training and testing datasets.
+    - Consistent deterioration of performance across different metrics.
+    ### Strengths
+    - Tests the model's performance on both training and testing datasets, helping to identify overfitting or
+    underfitting.
+    - Allows for the use of a broad range of performance metrics, providing a comprehensive evaluation.
+    - Returns a summarized table, making it easy to compare performance across different metrics and datasets.
+    ### Limitations
+    - The `metric_info()` method needs to be properly overridden in a subclass and metrics must be manually defined.
+    - The test may not capture the model's performance well if clusters are not well-separated or the model struggles
+    with certain clusters.
+    - Does not consider the computational and time complexity of the model.
+    - Binary comparison (train and test) might not capture performance changes under different circumstances or dataset
+    categories.
     """
     name = "cluster_performance_metrics"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",
         "model_performance",
     ]
-    def cluster_performance_metrics(
-        self, y_true_train, y_pred_train, y_true_test, y_pred_test, samples, metric_info
-    ):
+    def cluster_performance_metrics(self, y_true_train, y_pred_train, metric_info):
         y_true_train = y_true_train.astype(y_pred_train.dtype).flatten()
-        y_true_test = y_true_test.astype(y_pred_test.dtype).flatten()
         results = []
         for metric_name, metric_fcn in metric_info.items():
-            for _ in samples:
-                train_value = metric_fcn(list(y_true_train), y_pred_train)
-                test_value = metric_fcn(list(y_true_test), y_pred_test)
-            results.append(
-                {
-                    metric_name: {
-                        "train": train_value,
-                        "test": test_value,
-                    }
-                }
-            )
+            train_value = metric_fcn(list(y_true_train), y_pred_train)
+            results.append({metric_name: train_value})
         return results
-    def summary(self, raw_results):
-        """
-        Returns a summarized representation of the dataset split information
-        """
-        table_records = []
-        for result in raw_results:
-            for key, _ in result.items():
-                table_records.append(
-                    {
-                        "Metric": key,
-                        "TRAIN": result[key]["train"],
-                        "TEST": result[key]["test"],
-                    }
-                )
-        return ResultSummary(results=[ResultTable(data=table_records)])
     def metric_info(self):
         raise NotImplementedError
     def run(self):
-        y_true_train = self.inputs.datasets[0].y
-        class_pred_train = self.inputs.datasets[0].y_pred(self.inputs.model)
+        y_true_train = self.inputs.dataset.y
+        class_pred_train = self.inputs.dataset.y_pred(self.inputs.model)
         y_true_train = y_true_train.astype(class_pred_train.dtype)
-        y_true_test = self.inputs.datasets[1].y
-        class_pred_test = self.inputs.datasets[1].y_pred(self.inputs.model)
-        y_true_test = y_true_test.astype(class_pred_test.dtype)
-        samples = ["train", "test"]
         results = self.cluster_performance_metrics(
             y_true_train,
             class_pred_train,
-            y_true_test,
-            class_pred_test,
-            samples,
             self.metric_info(),
         )
         return self.cache_results(metric_value=results)

validmind/tests/model_validation/sklearn/ClusterPerformanceMetrics.py CHANGED Viewed

@@ -16,33 +16,33 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     """
     Evaluates the performance of clustering machine learning models using multiple established metrics.
-    **Purpose:**
+    ### Purpose
     The `ClusterPerformanceMetrics` test is used to assess the performance and validity of clustering machine learning
     models. It evaluates homogeneity, completeness, V measure score, the Adjusted Rand Index, the Adjusted Mutual
     Information, and the Fowlkes-Mallows score of the model. These metrics provide a holistic understanding of the
     model's ability to accurately form clusters of the given dataset.
-    **Test Mechanism:**
+    ### Test Mechanism
     The `ClusterPerformanceMetrics` test runs a clustering ML model over a given dataset and then calculates six
     metrics using the Scikit-learn metrics computation functions: Homogeneity Score, Completeness Score, V Measure,
     Adjusted Rand Index (ARI), Adjusted Mutual Information (AMI), and Fowlkes-Mallows Score. It then returns the result
     as a summary, presenting the metric values for both training and testing datasets.
-    **Signs of High Risk:**
+    ### Signs of High Risk
-    - Low Homogeneity Score: This indicates that the clusters formed contain a variety of classes, resulting in less
-    pure clusters.
-    - Low Completeness Score: This suggests that class instances are scattered across multiple clusters rather than
-    being gathered in a single cluster.
-    - Low V Measure: This would report a low overall clustering performance.
-    - ARI close to 0 or Negative: This implies that clustering results are random or disagree with the true labels.
-    - AMI close to 0: It means that clustering labels are random compared with the true labels.
+    - Low Homogeneity Score: Indicates that the clusters formed contain a variety of classes, resulting in less pure
+    clusters.
+    - Low Completeness Score: Suggests that class instances are scattered across multiple clusters rather than being
+    gathered in a single cluster.
+    - Low V Measure: Reports a low overall clustering performance.
+    - ARI close to 0 or Negative: Implies that clustering results are random or disagree with the true labels.
+    - AMI close to 0: Means that clustering labels are random compared with the true labels.
     - Low Fowlkes-Mallows score: Signifies less precise and poor clustering performance in terms of precision and
     recall.
-    **Strengths:**
+    ### Strengths
     - Provides a comprehensive view of clustering model performance by examining multiple clustering metrics.
     - Uses established and widely accepted metrics from scikit-learn, providing reliability in the results.
@@ -50,9 +50,9 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     - Clearly defined and human-readable descriptions of each score make it easy to understand what each score
     represents.
-    **Limitations:**
+    ### Limitations
-    - It only applies to clustering models; not suitable for other types of machine learning models.
+    - Only applies to clustering models; not suitable for other types of machine learning models.
     - Does not test for overfitting or underfitting in the clustering model.
     - All the scores rely on ground truth labels, the absence or inaccuracy of which can lead to misleading results.
     - Does not consider aspects like computational efficiency of the model or its capability to handle high dimensional
@@ -60,7 +60,7 @@ class ClusterPerformanceMetrics(ClusterPerformance):
     """
     name = "homogeneity_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = ["sklearn", "model_performance"]
     default_metrics = {
@@ -121,10 +121,8 @@ class ClusterPerformanceMetrics(ClusterPerformance):
             for key, _ in result.items():
                 table_records.append(
                     {
-                        "Metric": key,
                         "Description": self.default_metrics_desc[key],
-                        "TRAIN": result[key]["train"],
-                        "TEST": result[key]["test"],
+                        key: result[key],
                     }
                 )

validmind/tests/model_validation/sklearn/CompletenessScore.py CHANGED Viewed

@@ -14,26 +14,32 @@ class CompletenessScore(ClusterPerformance):
     """
     Evaluates a clustering model's capacity to categorize instances from a single class into the same cluster.
-    **Purpose:** The Completeness Score metric is used to assess the performance of clustering models. It measures the
-    extent to which all the data points that are members of a given class are elements of the same cluster. The aim is
-    to determine the capability of the model to categorize all instances from a single class into the same cluster.
+    ### Purpose
-    **Test Mechanism:** This test takes three inputs, a model and its associated training and testing datasets. It
-    invokes the `completeness_score` function from the sklearn library on the labels predicted by the model. High
-    scores indicate that data points from the same class generally appear in the same cluster, while low scores suggest
-    the opposite.
+    The Completeness Score metric is used to assess the performance of clustering models. It measures the extent to
+    which all the data points that are members of a given class are elements of the same cluster. The aim is to
+    determine the capability of the model to categorize all instances from a single class into the same cluster.
+    ### Test Mechanism
+    This test takes three inputs, a model and its associated training and testing datasets. It invokes the
+    `completeness_score` function from the sklearn library on the labels predicted by the model. High scores indicate
+    that data points from the same class generally appear in the same cluster, while low scores suggest the opposite.
+    ### Signs of High Risk
-    **Signs of High Risk:**
     - Low completeness score: This suggests that the model struggles to group instances from the same class into one
     cluster, indicating poor clustering performance.
-    **Strengths:**
+    ### Strengths
     - The Completeness Score provides an effective method for assessing the performance of a clustering model,
     specifically its ability to group class instances together.
     - This test metric conveniently relies on the capabilities provided by the sklearn library, ensuring consistent and
     reliable test results.
-    **Limitations:**
+    ### Limitations
     - This metric only evaluates a specific aspect of clustering, meaning it may not provide a holistic or complete
     view of the model's performance.
     - It cannot assess the effectiveness of the model in differentiating between separate classes, as it is solely
@@ -43,7 +49,7 @@ class CompletenessScore(ClusterPerformance):
     """
     name = "homogeneity_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind/tests/model_validation/sklearn/ConfusionMatrix.py CHANGED Viewed

@@ -17,33 +17,40 @@ class ConfusionMatrix(Metric):
     Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix
     heatmap.
-    **Purpose**: The Confusion Matrix tester is designed to assess the performance of a classification Machine Learning
-    model. This performance is evaluated based on how well the model is able to correctly classify True Positives, True
-    Negatives, False Positives, and False Negatives - fundamental aspects of model accuracy.
-    **Test Mechanism**: The mechanism used involves taking the predicted results (`y_test_predict`) from the
-    classification model and comparing them against the actual values (`y_test_true`). A confusion matrix is built
-    using the unique labels extracted from `y_test_true`, employing scikit-learn's metrics. The matrix is then visually
-    rendered with the help of Plotly's `create_annotated_heatmap` function. A heatmap is created which provides a
-    two-dimensional graphical representation of the model's performance, showcasing distributions of True Positives
-    (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
-    **Signs of High Risk**: Indicators of high risk related to the model include:
+    ### Purpose
+    The Confusion Matrix tester is designed to assess the performance of a classification Machine Learning model. This
+    performance is evaluated based on how well the model is able to correctly classify True Positives, True Negatives,
+    False Positives, and False Negatives - fundamental aspects of model accuracy.
+    ### Test Mechanism
+    The mechanism used involves taking the predicted results (`y_test_predict`) from the classification model and
+    comparing them against the actual values (`y_test_true`). A confusion matrix is built using the unique labels
+    extracted from `y_test_true`, employing scikit-learn's metrics. The matrix is then visually rendered with the help
+    of Plotly's `create_annotated_heatmap` function. A heatmap is created which provides a two-dimensional graphical
+    representation of the model's performance, showcasing distributions of True Positives (TP), True Negatives (TN),
+    False Positives (FP), and False Negatives (FN).
+    ### Signs of High Risk
     - High numbers of False Positives (FP) and False Negatives (FN), depicting that the model is not effectively
     classifying the values.
     - Low numbers of True Positives (TP) and True Negatives (TN), implying that the model is struggling with correctly
     identifying class labels.
-    **Strengths**: The Confusion Matrix tester brings numerous strengths:
+    ### Strengths
     - It provides a simplified yet comprehensive visual snapshot of the classification model's predictive performance.
     - It distinctly brings out True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives
-    (FN), thus, making it easier to focus on potential areas of improvement.
+    (FN), thus making it easier to focus on potential areas of improvement.
     - The matrix is beneficial in dealing with multi-class classification problems as it can provide a simple view of
     complex model performances.
     - It aids in understanding the different types of errors that the model could potentially make, as it provides
     in-depth insights into Type-I and Type-II errors.
-    **Limitations**: Despite its various strengths, the Confusion Matrix tester does exhibit some limitations:
+    ### Limitations
     - In cases of unbalanced classes, the effectiveness of the confusion matrix might be lessened. It may wrongly
     interpret the accuracy of a model that is essentially just predicting the majority class.
     - It does not provide a single unified statistic that could evaluate the overall performance of the model.

validmind/tests/model_validation/sklearn/FeatureImportance.py ADDED Viewed

@@ -0,0 +1,95 @@
+# Copyright © 2023-2024 ValidMind Inc. All rights reserved.
+# See the LICENSE file in the root of this repository for details.
+# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
+import pandas as pd
+from sklearn.inspection import permutation_importance
+from validmind import tags, tasks
+@tags("model_explainability", "sklearn")
+@tasks("regression", "time_series_forecasting")
+def FeatureImportance(dataset, model, num_features=3):
+    """
+    Compute feature importance scores for a given model and generate a summary table
+    with the top important features.
+    ### Purpose
+    The Feature Importance Comparison test is designed to compare the feature importance scores for different models
+    when applied to various datasets. By doing so, it aims to identify the most impactful features and assess the
+    consistency of feature importance across models.
+    ### Test Mechanism
+    This test works by iterating through each dataset-model pair and calculating permutation feature importance (PFI)
+    scores. It then generates a summary table containing the top `num_features` important features for each model. The
+    process involves:
+    - Extracting features and target data from each dataset.
+    - Computing PFI scores using `sklearn.inspection.permutation_importance`.
+    - Sorting and selecting the top features based on their importance scores.
+    - Compiling these features into a summary table for comparison.
+    ### Signs of High Risk
+    - Key features expected to be important are ranked low, indicating potential issues with model training or data
+    quality.
+    - High variance in feature importance scores across different models, suggesting instability in feature selection.
+    ### Strengths
+    - Provides a clear comparison of the most important features for each model.
+    - Uses permutation importance, which is a model-agnostic method and can be applied to any estimator.
+    ### Limitations
+    - Assumes that the dataset is provided as a DataFrameDataset object with `x_df` and `y_df` methods to access
+    feature and target data.
+    - Requires that `model.model` is compatible with `sklearn.inspection.permutation_importance`.
+    - The function's output is dependent on the number of features specified by `num_features`, which defaults to 3 but
+    can be adjusted.
+    """
+    results_list = []
+    x = dataset.x_df()
+    y = dataset.y_df()
+    pfi_values = permutation_importance(
+        model.model,
+        x,
+        y,
+        random_state=0,
+        n_jobs=-2,
+    )
+    # Create a dictionary to store PFI scores
+    pfi = {
+        column: pfi_values["importances_mean"][i] for i, column in enumerate(x.columns)
+    }
+    # Sort features by their importance
+    sorted_features = sorted(pfi.items(), key=lambda item: item[1], reverse=True)
+    # Extract the top `num_features` features
+    top_features = sorted_features[:num_features]
+    # Prepare the result for the current model and dataset
+    result = {}
+    # Dynamically add feature columns to the result
+    for i in range(num_features):
+        if i < len(top_features):
+            result[
+                f"Feature {i + 1}"
+            ] = f"[{top_features[i][0]}; {top_features[i][1]:.4f}]"
+        else:
+            result[f"Feature {i + 1}"] = None
+    # Append the result to the list
+    results_list.append(result)
+    # Convert the results list to a DataFrame
+    results_df = pd.DataFrame(results_list)
+    return results_df

validmind/tests/model_validation/sklearn/FowlkesMallowsScore.py CHANGED Viewed

@@ -15,27 +15,27 @@ class FowlkesMallowsScore(ClusterPerformance):
     Evaluates the similarity between predicted and actual cluster assignments in a model using the Fowlkes-Mallows
     score.
-    **Purpose:**
+    ### Purpose
     The FowlkesMallowsScore is a performance metric used to validate clustering algorithms within machine learning
     models. The score intends to evaluate the matching grade between two clusters. It measures the similarity between
     the predicted and actual cluster assignments, thus gauging the accuracy of the model's clustering capability.
-    **Test Mechanism:**
+    ### Test Mechanism
     The FowlkesMallowsScore method applies the `fowlkes_mallows_score` function from the `sklearn` library to evaluate
     the model's accuracy in clustering different types of data. The test fetches the datasets from the model's training
     and testing datasets as inputs then compares the resulting clusters against the previously known clusters to obtain
     a score. A high score indicates a better clustering performance by the model.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - A low Fowlkes-Mallows score (near zero): This indicates that the model's clustering capability is poor and the
     algorithm isn't properly grouping data.
-    - Inconsistently low scores across different datasets: this may indicate that the model's clustering performance is
+    - Inconsistently low scores across different datasets: This may indicate that the model's clustering performance is
     not robust and the model may fail when applied to unseen data.
-    **Strengths:**
+    ### Strengths
     - The Fowlkes-Mallows score is a simple and effective method for evaluating the performance of clustering
     algorithms.
@@ -43,7 +43,7 @@ class FowlkesMallowsScore(ClusterPerformance):
     comprehensive measure of model performance.
     - The Fowlkes-Mallows score is non-biased meaning it treats False Positives and False Negatives equally.
-    **Limitations:**
+    ### Limitations
     - As a pairwise-based method, this score can be computationally intensive for large datasets and can become
     unfeasible as the size of the dataset increases.
@@ -54,7 +54,7 @@ class FowlkesMallowsScore(ClusterPerformance):
     """
     name = "fowlkes_mallows_score"
-    required_inputs = ["model", "datasets"]
+    required_inputs = ["model", "dataset"]
     tasks = ["clustering"]
     tags = [
         "sklearn",

validmind 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl