validmind 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- validmind/__version__.py +1 -1
- validmind/ai/test_descriptions.py +80 -119
- validmind/ai/test_result_description/config.yaml +29 -0
- validmind/ai/test_result_description/context.py +73 -0
- validmind/ai/test_result_description/image_processing.py +124 -0
- validmind/ai/test_result_description/system.jinja +39 -0
- validmind/ai/test_result_description/user.jinja +25 -0
- validmind/api_client.py +89 -43
- validmind/client.py +2 -2
- validmind/client_config.py +11 -14
- validmind/datasets/credit_risk/__init__.py +1 -0
- validmind/datasets/credit_risk/datasets/lending_club_biased.csv.gz +0 -0
- validmind/datasets/credit_risk/lending_club_bias.py +142 -0
- validmind/datasets/regression/fred_timeseries.py +67 -138
- validmind/template.py +1 -0
- validmind/test_suites/__init__.py +0 -2
- validmind/test_suites/statsmodels_timeseries.py +1 -1
- validmind/test_suites/summarization.py +0 -1
- validmind/test_suites/time_series.py +0 -43
- validmind/tests/__types__.py +14 -15
- validmind/tests/data_validation/ACFandPACFPlot.py +15 -13
- validmind/tests/data_validation/ADF.py +31 -24
- validmind/tests/data_validation/AutoAR.py +9 -9
- validmind/tests/data_validation/AutoMA.py +23 -16
- validmind/tests/data_validation/AutoSeasonality.py +18 -16
- validmind/tests/data_validation/AutoStationarity.py +21 -16
- validmind/tests/data_validation/BivariateScatterPlots.py +67 -96
- validmind/tests/{model_validation/statsmodels → data_validation}/BoxPierce.py +34 -34
- validmind/tests/data_validation/ChiSquaredFeaturesTable.py +85 -124
- validmind/tests/data_validation/ClassImbalance.py +15 -12
- validmind/tests/data_validation/DFGLSArch.py +19 -13
- validmind/tests/data_validation/DatasetDescription.py +17 -11
- validmind/tests/data_validation/DatasetSplit.py +7 -5
- validmind/tests/data_validation/DescriptiveStatistics.py +28 -21
- validmind/tests/data_validation/Duplicates.py +33 -25
- validmind/tests/data_validation/EngleGrangerCoint.py +35 -33
- validmind/tests/data_validation/FeatureTargetCorrelationPlot.py +59 -71
- validmind/tests/data_validation/HighCardinality.py +19 -12
- validmind/tests/data_validation/HighPearsonCorrelation.py +27 -22
- validmind/tests/data_validation/IQROutliersBarPlot.py +13 -10
- validmind/tests/data_validation/IQROutliersTable.py +40 -36
- validmind/tests/data_validation/IsolationForestOutliers.py +21 -14
- validmind/tests/data_validation/JarqueBera.py +70 -0
- validmind/tests/data_validation/KPSS.py +34 -29
- validmind/tests/data_validation/LJungBox.py +66 -0
- validmind/tests/data_validation/LaggedCorrelationHeatmap.py +22 -15
- validmind/tests/data_validation/MissingValues.py +32 -27
- validmind/tests/data_validation/MissingValuesBarPlot.py +25 -21
- validmind/tests/data_validation/PearsonCorrelationMatrix.py +71 -84
- validmind/tests/data_validation/PhillipsPerronArch.py +37 -30
- validmind/tests/data_validation/ProtectedClassesCombination.py +197 -0
- validmind/tests/data_validation/ProtectedClassesDescription.py +130 -0
- validmind/tests/data_validation/ProtectedClassesDisparity.py +133 -0
- validmind/tests/data_validation/ProtectedClassesThresholdOptimizer.py +172 -0
- validmind/tests/data_validation/RollingStatsPlot.py +31 -23
- validmind/tests/data_validation/RunsTest.py +72 -0
- validmind/tests/data_validation/ScatterPlot.py +63 -78
- validmind/tests/data_validation/SeasonalDecompose.py +38 -34
- validmind/tests/{model_validation/statsmodels → data_validation}/ShapiroWilk.py +35 -30
- validmind/tests/data_validation/Skewness.py +35 -37
- validmind/tests/data_validation/SpreadPlot.py +35 -35
- validmind/tests/data_validation/TabularCategoricalBarPlots.py +23 -17
- validmind/tests/data_validation/TabularDateTimeHistograms.py +21 -13
- validmind/tests/data_validation/TabularDescriptionTables.py +51 -16
- validmind/tests/data_validation/TabularNumericalHistograms.py +25 -22
- validmind/tests/data_validation/TargetRateBarPlots.py +21 -14
- validmind/tests/data_validation/TimeSeriesDescription.py +25 -18
- validmind/tests/data_validation/TimeSeriesDescriptiveStatistics.py +23 -17
- validmind/tests/data_validation/TimeSeriesFrequency.py +24 -17
- validmind/tests/data_validation/TimeSeriesHistogram.py +33 -32
- validmind/tests/data_validation/TimeSeriesLinePlot.py +17 -10
- validmind/tests/data_validation/TimeSeriesMissingValues.py +15 -10
- validmind/tests/data_validation/TimeSeriesOutliers.py +37 -33
- validmind/tests/data_validation/TooManyZeroValues.py +16 -11
- validmind/tests/data_validation/UniqueRows.py +11 -6
- validmind/tests/data_validation/WOEBinPlots.py +23 -16
- validmind/tests/data_validation/WOEBinTable.py +35 -30
- validmind/tests/data_validation/ZivotAndrewsArch.py +34 -28
- validmind/tests/data_validation/nlp/CommonWords.py +21 -14
- validmind/tests/data_validation/nlp/Hashtags.py +42 -40
- validmind/tests/data_validation/nlp/LanguageDetection.py +33 -14
- validmind/tests/data_validation/nlp/Mentions.py +21 -15
- validmind/tests/data_validation/nlp/PolarityAndSubjectivity.py +32 -9
- validmind/tests/data_validation/nlp/Punctuations.py +24 -20
- validmind/tests/data_validation/nlp/Sentiment.py +27 -8
- validmind/tests/data_validation/nlp/StopWords.py +26 -19
- validmind/tests/data_validation/nlp/TextDescription.py +39 -36
- validmind/tests/data_validation/nlp/Toxicity.py +32 -9
- validmind/tests/decorator.py +81 -42
- validmind/tests/model_validation/BertScore.py +36 -27
- validmind/tests/model_validation/BleuScore.py +25 -19
- validmind/tests/model_validation/ClusterSizeDistribution.py +38 -34
- validmind/tests/model_validation/ContextualRecall.py +38 -13
- validmind/tests/model_validation/FeaturesAUC.py +32 -13
- validmind/tests/model_validation/MeteorScore.py +46 -33
- validmind/tests/model_validation/ModelMetadata.py +32 -64
- validmind/tests/model_validation/ModelPredictionResiduals.py +75 -73
- validmind/tests/model_validation/RegardScore.py +30 -14
- validmind/tests/model_validation/RegressionResidualsPlot.py +10 -5
- validmind/tests/model_validation/RougeScore.py +36 -30
- validmind/tests/model_validation/TimeSeriesPredictionWithCI.py +30 -14
- validmind/tests/model_validation/TimeSeriesPredictionsPlot.py +27 -30
- validmind/tests/model_validation/TimeSeriesR2SquareBySegments.py +68 -63
- validmind/tests/model_validation/TokenDisparity.py +31 -23
- validmind/tests/model_validation/ToxicityScore.py +26 -17
- validmind/tests/model_validation/embeddings/ClusterDistribution.py +24 -20
- validmind/tests/model_validation/embeddings/CosineSimilarityComparison.py +30 -27
- validmind/tests/model_validation/embeddings/CosineSimilarityDistribution.py +7 -5
- validmind/tests/model_validation/embeddings/CosineSimilarityHeatmap.py +32 -23
- validmind/tests/model_validation/embeddings/DescriptiveAnalytics.py +7 -5
- validmind/tests/model_validation/embeddings/EmbeddingsVisualization2D.py +15 -11
- validmind/tests/model_validation/embeddings/EuclideanDistanceComparison.py +29 -29
- validmind/tests/model_validation/embeddings/EuclideanDistanceHeatmap.py +34 -25
- validmind/tests/model_validation/embeddings/PCAComponentsPairwisePlots.py +38 -26
- validmind/tests/model_validation/embeddings/StabilityAnalysis.py +40 -1
- validmind/tests/model_validation/embeddings/StabilityAnalysisKeyword.py +18 -17
- validmind/tests/model_validation/embeddings/StabilityAnalysisRandomNoise.py +40 -45
- validmind/tests/model_validation/embeddings/StabilityAnalysisSynonyms.py +17 -19
- validmind/tests/model_validation/embeddings/StabilityAnalysisTranslation.py +29 -25
- validmind/tests/model_validation/embeddings/TSNEComponentsPairwisePlots.py +38 -28
- validmind/tests/model_validation/ragas/AnswerCorrectness.py +5 -4
- validmind/tests/model_validation/ragas/AnswerRelevance.py +5 -4
- validmind/tests/model_validation/ragas/AnswerSimilarity.py +5 -4
- validmind/tests/model_validation/ragas/AspectCritique.py +12 -6
- validmind/tests/model_validation/ragas/ContextEntityRecall.py +9 -8
- validmind/tests/model_validation/ragas/ContextPrecision.py +5 -4
- validmind/tests/model_validation/ragas/ContextRecall.py +5 -4
- validmind/tests/model_validation/ragas/ContextUtilization.py +155 -0
- validmind/tests/model_validation/ragas/Faithfulness.py +5 -4
- validmind/tests/model_validation/ragas/NoiseSensitivity.py +152 -0
- validmind/tests/model_validation/ragas/utils.py +6 -0
- validmind/tests/model_validation/sklearn/AdjustedMutualInformation.py +19 -12
- validmind/tests/model_validation/sklearn/AdjustedRandIndex.py +22 -17
- validmind/tests/model_validation/sklearn/ClassifierPerformance.py +27 -25
- validmind/tests/model_validation/sklearn/ClusterCosineSimilarity.py +7 -5
- validmind/tests/model_validation/sklearn/ClusterPerformance.py +40 -78
- validmind/tests/model_validation/sklearn/ClusterPerformanceMetrics.py +15 -17
- validmind/tests/model_validation/sklearn/CompletenessScore.py +17 -11
- validmind/tests/model_validation/sklearn/ConfusionMatrix.py +22 -15
- validmind/tests/model_validation/sklearn/FeatureImportance.py +95 -0
- validmind/tests/model_validation/sklearn/FowlkesMallowsScore.py +7 -7
- validmind/tests/model_validation/sklearn/HomogeneityScore.py +19 -12
- validmind/tests/model_validation/sklearn/HyperParametersTuning.py +35 -30
- validmind/tests/model_validation/sklearn/KMeansClustersOptimization.py +10 -5
- validmind/tests/model_validation/sklearn/MinimumAccuracy.py +32 -32
- validmind/tests/model_validation/sklearn/MinimumF1Score.py +23 -23
- validmind/tests/model_validation/sklearn/MinimumROCAUCScore.py +15 -10
- validmind/tests/model_validation/sklearn/ModelsPerformanceComparison.py +26 -19
- validmind/tests/model_validation/sklearn/OverfitDiagnosis.py +38 -18
- validmind/tests/model_validation/sklearn/PermutationFeatureImportance.py +32 -26
- validmind/tests/model_validation/sklearn/PopulationStabilityIndex.py +8 -6
- validmind/tests/model_validation/sklearn/PrecisionRecallCurve.py +24 -17
- validmind/tests/model_validation/sklearn/ROCCurve.py +12 -7
- validmind/tests/model_validation/sklearn/RegressionErrors.py +74 -130
- validmind/tests/model_validation/sklearn/RegressionErrorsComparison.py +27 -12
- validmind/tests/model_validation/sklearn/{RegressionModelsPerformanceComparison.py → RegressionPerformance.py} +18 -20
- validmind/tests/model_validation/sklearn/RegressionR2Square.py +55 -94
- validmind/tests/model_validation/sklearn/RegressionR2SquareComparison.py +32 -13
- validmind/tests/model_validation/sklearn/RobustnessDiagnosis.py +36 -32
- validmind/tests/model_validation/sklearn/SHAPGlobalImportance.py +66 -5
- validmind/tests/model_validation/sklearn/SilhouettePlot.py +27 -19
- validmind/tests/model_validation/sklearn/TrainingTestDegradation.py +25 -18
- validmind/tests/model_validation/sklearn/VMeasure.py +14 -13
- validmind/tests/model_validation/sklearn/WeakspotsDiagnosis.py +7 -5
- validmind/tests/model_validation/statsmodels/AutoARIMA.py +24 -18
- validmind/tests/model_validation/statsmodels/CumulativePredictionProbabilities.py +73 -104
- validmind/tests/model_validation/statsmodels/DurbinWatsonTest.py +59 -32
- validmind/tests/model_validation/statsmodels/GINITable.py +44 -77
- validmind/tests/model_validation/statsmodels/KolmogorovSmirnov.py +33 -34
- validmind/tests/model_validation/statsmodels/Lilliefors.py +27 -24
- validmind/tests/model_validation/statsmodels/PredictionProbabilitiesHistogram.py +86 -119
- validmind/tests/model_validation/statsmodels/RegressionCoeffs.py +100 -0
- validmind/tests/model_validation/statsmodels/RegressionFeatureSignificance.py +14 -9
- validmind/tests/model_validation/statsmodels/RegressionModelForecastPlot.py +17 -13
- validmind/tests/model_validation/statsmodels/RegressionModelForecastPlotLevels.py +46 -43
- validmind/tests/model_validation/statsmodels/RegressionModelSensitivityPlot.py +38 -36
- validmind/tests/model_validation/statsmodels/RegressionModelSummary.py +30 -28
- validmind/tests/model_validation/statsmodels/RegressionPermutationFeatureImportance.py +18 -11
- validmind/tests/model_validation/statsmodels/ScorecardHistogram.py +75 -107
- validmind/tests/ongoing_monitoring/FeatureDrift.py +10 -6
- validmind/tests/ongoing_monitoring/PredictionAcrossEachFeature.py +31 -25
- validmind/tests/ongoing_monitoring/PredictionCorrelation.py +29 -21
- validmind/tests/ongoing_monitoring/TargetPredictionDistributionPlot.py +31 -23
- validmind/tests/prompt_validation/Bias.py +14 -11
- validmind/tests/prompt_validation/Clarity.py +16 -14
- validmind/tests/prompt_validation/Conciseness.py +7 -5
- validmind/tests/prompt_validation/Delimitation.py +23 -22
- validmind/tests/prompt_validation/NegativeInstruction.py +7 -5
- validmind/tests/prompt_validation/Robustness.py +12 -10
- validmind/tests/prompt_validation/Specificity.py +13 -11
- validmind/tests/prompt_validation/ai_powered_test.py +6 -0
- validmind/tests/run.py +68 -23
- validmind/unit_metrics/__init__.py +81 -144
- validmind/unit_metrics/classification/{sklearn/Accuracy.py → Accuracy.py} +1 -1
- validmind/unit_metrics/classification/{sklearn/F1.py → F1.py} +1 -1
- validmind/unit_metrics/classification/{sklearn/Precision.py → Precision.py} +1 -1
- validmind/unit_metrics/classification/{sklearn/ROC_AUC.py → ROC_AUC.py} +1 -2
- validmind/unit_metrics/classification/{sklearn/Recall.py → Recall.py} +1 -1
- validmind/unit_metrics/regression/{sklearn/AdjustedRSquaredScore.py → AdjustedRSquaredScore.py} +1 -1
- validmind/unit_metrics/regression/GiniCoefficient.py +1 -1
- validmind/unit_metrics/regression/HuberLoss.py +1 -1
- validmind/unit_metrics/regression/KolmogorovSmirnovStatistic.py +1 -1
- validmind/unit_metrics/regression/{sklearn/MeanAbsoluteError.py → MeanAbsoluteError.py} +1 -1
- validmind/unit_metrics/regression/MeanAbsolutePercentageError.py +1 -1
- validmind/unit_metrics/regression/MeanBiasDeviation.py +1 -1
- validmind/unit_metrics/regression/{sklearn/MeanSquaredError.py → MeanSquaredError.py} +1 -1
- validmind/unit_metrics/regression/QuantileLoss.py +1 -1
- validmind/unit_metrics/regression/{sklearn/RSquaredScore.py → RSquaredScore.py} +1 -1
- validmind/unit_metrics/regression/{sklearn/RootMeanSquaredError.py → RootMeanSquaredError.py} +1 -1
- validmind/utils.py +4 -0
- validmind/vm_models/dataset/dataset.py +2 -0
- validmind/vm_models/figure.py +5 -0
- validmind/vm_models/test/metric.py +1 -0
- validmind/vm_models/test/result_wrapper.py +143 -158
- validmind/vm_models/test/threshold_test.py +1 -0
- {validmind-2.5.8.dist-info → validmind-2.5.18.dist-info}/METADATA +4 -3
- validmind-2.5.18.dist-info/RECORD +324 -0
- validmind/tests/data_validation/ANOVAOneWayTable.py +0 -138
- validmind/tests/data_validation/BivariateFeaturesBarPlots.py +0 -142
- validmind/tests/data_validation/BivariateHistograms.py +0 -117
- validmind/tests/data_validation/HeatmapFeatureCorrelations.py +0 -124
- validmind/tests/data_validation/MissingValuesRisk.py +0 -88
- validmind/tests/model_validation/ModelMetadataComparison.py +0 -59
- validmind/tests/model_validation/sklearn/FeatureImportanceComparison.py +0 -83
- validmind/tests/model_validation/statsmodels/JarqueBera.py +0 -73
- validmind/tests/model_validation/statsmodels/LJungBox.py +0 -66
- validmind/tests/model_validation/statsmodels/RegressionCoeffsPlot.py +0 -135
- validmind/tests/model_validation/statsmodels/RegressionModelsCoeffs.py +0 -103
- validmind/tests/model_validation/statsmodels/RunsTest.py +0 -71
- validmind-2.5.8.dist-info/RECORD +0 -318
- {validmind-2.5.8.dist-info → validmind-2.5.18.dist-info}/LICENSE +0 -0
- {validmind-2.5.8.dist-info → validmind-2.5.18.dist-info}/WHEEL +0 -0
- {validmind-2.5.8.dist-info → validmind-2.5.18.dist-info}/entry_points.txt +0 -0
@@ -13,40 +13,39 @@ from validmind.vm_models import Metric, ResultSummary, ResultTable, ResultTableM
|
|
13
13
|
@dataclass
|
14
14
|
class KolmogorovSmirnov(Metric):
|
15
15
|
"""
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
-
|
47
|
-
|
48
|
-
-
|
49
|
-
kurtosis, that could directly impact model fitting.
|
16
|
+
Assesses whether each feature in the dataset aligns with a normal distribution using the Kolmogorov-Smirnov test.
|
17
|
+
|
18
|
+
### Purpose
|
19
|
+
|
20
|
+
The Kolmogorov-Smirnov (KS) test evaluates the distribution of features in a dataset to determine their alignment
|
21
|
+
with a normal distribution. This is important because many statistical methods and machine learning models assume
|
22
|
+
normality in the data distribution.
|
23
|
+
|
24
|
+
### Test Mechanism
|
25
|
+
|
26
|
+
This test calculates the KS statistic and corresponding p-value for each feature in the dataset. It does so by
|
27
|
+
comparing the cumulative distribution function of the feature with an ideal normal distribution. The KS statistic
|
28
|
+
and p-value for each feature are then stored in a dictionary. The p-value threshold to reject the normal
|
29
|
+
distribution hypothesis is not preset, providing flexibility for different applications.
|
30
|
+
|
31
|
+
### Signs of High Risk
|
32
|
+
|
33
|
+
- Elevated KS statistic for a feature combined with a low p-value, indicating a significant divergence from a
|
34
|
+
normal distribution.
|
35
|
+
- Features with notable deviations that could create problems if the model assumes normality in data distribution.
|
36
|
+
|
37
|
+
### Strengths
|
38
|
+
|
39
|
+
- The KS test is sensitive to differences in the location and shape of empirical cumulative distribution functions.
|
40
|
+
- It is non-parametric and adaptable to various datasets, as it does not assume any specific data distribution.
|
41
|
+
- Provides detailed insights into the distribution of individual features.
|
42
|
+
|
43
|
+
### Limitations
|
44
|
+
|
45
|
+
- The test's sensitivity to disparities in the tails of data distribution might cause false alarms about
|
46
|
+
non-normality.
|
47
|
+
- Less effective for multivariate distributions, as it is designed for univariate distributions.
|
48
|
+
- Does not identify specific types of non-normality, such as skewness or kurtosis, which could impact model fitting.
|
50
49
|
"""
|
51
50
|
|
52
51
|
name = "kolmogorov_smirnov"
|
@@ -14,44 +14,47 @@ class Lilliefors(Metric):
|
|
14
14
|
"""
|
15
15
|
Assesses the normality of feature distributions in an ML model's training dataset using the Lilliefors test.
|
16
16
|
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
statistic and p-value.
|
30
|
-
|
31
|
-
|
17
|
+
### Purpose
|
18
|
+
|
19
|
+
The purpose of this metric is to utilize the Lilliefors test, named in honor of the Swedish statistician Hubert
|
20
|
+
Lilliefors, in order to assess whether the features of the machine learning model's training dataset conform to a
|
21
|
+
normal distribution. This is done because the assumption of normal distribution plays a vital role in numerous
|
22
|
+
statistical procedures as well as numerous machine learning models. Should the features fail to follow a normal
|
23
|
+
distribution, some model types may not operate at optimal efficiency. This can potentially lead to inaccurate
|
24
|
+
predictions.
|
25
|
+
|
26
|
+
### Test Mechanism
|
27
|
+
|
28
|
+
The application of this test happens across all feature columns within the training dataset. For each feature, the
|
29
|
+
Lilliefors test returns a test statistic and p-value. The test statistic quantifies how far the feature's
|
30
|
+
distribution is from an ideal normal distribution, whereas the p-value aids in determining the statistical
|
31
|
+
relevance of this deviation. The final results are stored within a dictionary, the keys of which correspond to the
|
32
|
+
name of the feature column, and the values being another dictionary which houses the test statistic and p-value.
|
33
|
+
|
34
|
+
### Signs of High Risk
|
32
35
|
|
33
36
|
- If the p-value corresponding to a specific feature sinks below a pre-established significance level, generally
|
34
37
|
set at 0.05, then it can be deduced that the distribution of that feature significantly deviates from a normal
|
35
38
|
distribution. This can present a high risk for models that assume normality, as these models may perform
|
36
39
|
inaccurately or inefficiently in the presence of such a feature.
|
37
40
|
|
38
|
-
|
41
|
+
### Strengths
|
39
42
|
|
40
43
|
- One advantage of the Lilliefors test is its utility irrespective of whether the mean and variance of the normal
|
41
44
|
distribution are known in advance. This makes it a more robust option in real-world situations where these values
|
42
45
|
might not be known.
|
43
|
-
-
|
46
|
+
- The test has the ability to screen every feature column, offering a holistic view of the dataset.
|
44
47
|
|
45
|
-
|
48
|
+
### Limitations
|
46
49
|
|
47
50
|
- Despite the practical applications of the Lilliefors test in validating normality, it does come with some
|
48
51
|
limitations.
|
49
|
-
-
|
50
|
-
|
51
|
-
-
|
52
|
-
|
53
|
-
-
|
54
|
-
|
52
|
+
- It is only capable of testing unidimensional data, thus rendering it ineffective for datasets with interactions
|
53
|
+
between features or multi-dimensional phenomena.
|
54
|
+
- The test might not be as sensitive as some other tests (like the Anderson-Darling test) in detecting deviations
|
55
|
+
from a normal distribution.
|
56
|
+
- Like any other statistical test, Lilliefors test may also produce false positives or negatives. Hence, banking
|
57
|
+
solely on this test, without considering other characteristics of the data, may give rise to risks.
|
55
58
|
"""
|
56
59
|
|
57
60
|
name = "lilliefors_test"
|
@@ -2,134 +2,101 @@
|
|
2
2
|
# See the LICENSE file in the root of this repository for details.
|
3
3
|
# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
|
4
4
|
|
5
|
-
from dataclasses import dataclass
|
6
5
|
|
7
6
|
import plotly.graph_objects as go
|
8
7
|
from matplotlib import cm
|
9
8
|
|
10
|
-
from validmind
|
9
|
+
from validmind import tags, tasks
|
11
10
|
|
12
11
|
|
13
|
-
@
|
14
|
-
|
12
|
+
@tags("visualization", "credit_risk", "logistic_regression")
|
13
|
+
@tasks("classification")
|
14
|
+
def PredictionProbabilitiesHistogram(
|
15
|
+
dataset, model, title="Histogram of Predictive Probabilities"
|
16
|
+
):
|
15
17
|
"""
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
-
|
30
|
-
-
|
31
|
-
for training and
|
32
|
-
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
18
|
+
Assesses the predictive probability distribution for binary classification to evaluate model performance and
|
19
|
+
potential overfitting or bias.
|
20
|
+
|
21
|
+
### Purpose
|
22
|
+
|
23
|
+
The Prediction Probabilities Histogram test is designed to generate histograms displaying the Probability of
|
24
|
+
Default (PD) predictions for both positive and negative classes in training and testing datasets. This helps in
|
25
|
+
evaluating the performance of a logistic regression model, particularly for credit risk prediction.
|
26
|
+
|
27
|
+
### Test Mechanism
|
28
|
+
|
29
|
+
The metric follows these steps to execute the test:
|
30
|
+
- Extracts the target column from both the train and test datasets.
|
31
|
+
- Uses the model's predict function to calculate probabilities.
|
32
|
+
- Adds these probabilities as a new column to the training and testing dataframes.
|
33
|
+
- Generates histograms for each class (0 or 1) within the training and testing datasets.
|
34
|
+
- Sets different opacities for the histograms to enhance visualization.
|
35
|
+
- Overlays the four histograms (two for training and two for testing) on two different subplot frames.
|
36
|
+
- Returns a plotly graph object displaying the visualization.
|
37
|
+
|
38
|
+
### Signs of High Risk
|
39
|
+
|
40
|
+
- Significant discrepancies between the histograms of training and testing data.
|
37
41
|
- Large disparities between the histograms for the positive and negative classes.
|
38
|
-
-
|
39
|
-
- Unevenly distributed probabilities
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
-
|
45
|
-
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
-
|
52
|
-
|
53
|
-
- This metric is mainly applicable for logistic regression models. It might not be effective or accurate when used
|
54
|
-
on other model types.
|
55
|
-
- While the test provides a robust visual representation of the model's PD predictions, it does not provide a
|
56
|
-
quantifiable measure or score to assess model performance.
|
42
|
+
- Potential overfitting or bias indicated by significant issues.
|
43
|
+
- Unevenly distributed probabilities suggesting inaccurate model predictions.
|
44
|
+
|
45
|
+
### Strengths
|
46
|
+
|
47
|
+
- Offers a visual representation of the PD predictions made by the model, aiding in understanding its behavior.
|
48
|
+
- Assesses both the training and testing datasets, adding depth to model validation.
|
49
|
+
- Highlights disparities between classes, providing insights into class imbalance or data skewness.
|
50
|
+
- Effectively visualizes risk spread, which is particularly beneficial for credit risk prediction.
|
51
|
+
|
52
|
+
### Limitations
|
53
|
+
|
54
|
+
- Specifically tailored for binary classification scenarios and not suited for multi-class classification tasks.
|
55
|
+
- Mainly applicable to logistic regression models, and may not be effective for other model types.
|
56
|
+
- Provides a robust visual representation but lacks a quantifiable measure to assess model performance.
|
57
57
|
"""
|
58
58
|
|
59
|
-
|
60
|
-
|
61
|
-
tasks = ["classification"]
|
62
|
-
tags = ["tabular_data", "visualization", "credit_risk", "logistic_regression"]
|
63
|
-
|
64
|
-
default_params = {"title": "Histogram of Predictive Probabilities"}
|
65
|
-
|
66
|
-
@staticmethod
|
67
|
-
def plot_prob_histogram(dataframes, dataset_titles, target_col, title):
|
68
|
-
figures = []
|
69
|
-
|
70
|
-
# Generate a colormap and convert to Plotly-accepted color format
|
71
|
-
# Adjust 'viridis' to any other matplotlib colormap if desired
|
72
|
-
colormap = cm.get_cmap("viridis")
|
73
|
-
|
74
|
-
for i, (df, dataset_title) in enumerate(zip(dataframes, dataset_titles)):
|
75
|
-
fig = go.Figure()
|
76
|
-
|
77
|
-
# Get unique classes and assign colors
|
78
|
-
classes = sorted(df[target_col].unique())
|
79
|
-
colors = [
|
80
|
-
colormap(i / len(classes))[:3] for i in range(len(classes))
|
81
|
-
] # RGB
|
82
|
-
color_dict = {
|
83
|
-
cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
|
84
|
-
for cls, rgb in zip(classes, colors)
|
85
|
-
}
|
86
|
-
|
87
|
-
# Ensure classes are plotted in the specified order
|
88
|
-
for class_value in sorted(df[target_col].unique()):
|
89
|
-
fig.add_trace(
|
90
|
-
go.Histogram(
|
91
|
-
x=df[df[target_col] == class_value]["probabilities"],
|
92
|
-
opacity=0.75,
|
93
|
-
name=f"{dataset_title} {target_col} = {class_value}",
|
94
|
-
marker=dict(
|
95
|
-
color=color_dict[class_value],
|
96
|
-
),
|
97
|
-
)
|
98
|
-
)
|
99
|
-
fig.update_layout(
|
100
|
-
barmode="overlay",
|
101
|
-
title_text=f"{title} - {dataset_title}",
|
102
|
-
xaxis_title="Probability",
|
103
|
-
yaxis_title="Frequency",
|
104
|
-
)
|
105
|
-
figures.append(fig)
|
106
|
-
return figures
|
107
|
-
|
108
|
-
def run(self):
|
109
|
-
dataset_titles = [dataset.input_id for dataset in self.inputs.datasets]
|
110
|
-
target_column = self.inputs.datasets[0].target_column
|
111
|
-
title = self.params.get("title", self.default_params["title"])
|
112
|
-
|
113
|
-
dataframes = []
|
114
|
-
metric_value = {"prob_histogram": {}}
|
115
|
-
for _, dataset in enumerate(self.inputs.datasets):
|
116
|
-
df = dataset.df.copy()
|
117
|
-
y_prob = dataset.y_prob(self.inputs.model)
|
118
|
-
df["probabilities"] = y_prob
|
119
|
-
dataframes.append(df)
|
120
|
-
metric_value["prob_histogram"][dataset.input_id] = list(df["probabilities"])
|
121
|
-
|
122
|
-
figures = self.plot_prob_histogram(
|
123
|
-
dataframes, dataset_titles, target_column, title
|
124
|
-
)
|
59
|
+
df = dataset.df
|
60
|
+
df["probabilities"] = dataset.y_prob(model)
|
125
61
|
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
for i, fig in enumerate(figures)
|
133
|
-
]
|
62
|
+
fig = _plot_prob_histogram(df, dataset.target_column, title)
|
63
|
+
|
64
|
+
return fig
|
65
|
+
|
66
|
+
|
67
|
+
def _plot_prob_histogram(df, target_col, title):
|
134
68
|
|
135
|
-
|
69
|
+
# Generate a colormap and convert to Plotly-accepted color format
|
70
|
+
# Adjust 'viridis' to any other matplotlib colormap if desired
|
71
|
+
colormap = cm.get_cmap("viridis")
|
72
|
+
|
73
|
+
fig = go.Figure()
|
74
|
+
|
75
|
+
# Get unique classes and assign colors
|
76
|
+
classes = sorted(df[target_col].unique())
|
77
|
+
colors = [colormap(i / len(classes))[:3] for i in range(len(classes))] # RGB
|
78
|
+
color_dict = {
|
79
|
+
cls: f"rgb({int(rgb[0]*255)}, {int(rgb[1]*255)}, {int(rgb[2]*255)})"
|
80
|
+
for cls, rgb in zip(classes, colors)
|
81
|
+
}
|
82
|
+
|
83
|
+
# Ensure classes are plotted in the specified order
|
84
|
+
for class_value in sorted(df[target_col].unique()):
|
85
|
+
fig.add_trace(
|
86
|
+
go.Histogram(
|
87
|
+
x=df[df[target_col] == class_value]["probabilities"],
|
88
|
+
opacity=0.75,
|
89
|
+
name=f"{target_col} = {class_value}",
|
90
|
+
marker=dict(
|
91
|
+
color=color_dict[class_value],
|
92
|
+
),
|
93
|
+
)
|
94
|
+
)
|
95
|
+
fig.update_layout(
|
96
|
+
barmode="overlay",
|
97
|
+
title_text=f"{title}",
|
98
|
+
xaxis_title="Probability",
|
99
|
+
yaxis_title="Frequency",
|
100
|
+
)
|
101
|
+
|
102
|
+
return fig
|
@@ -0,0 +1,100 @@
|
|
1
|
+
# Copyright © 2023-2024 ValidMind Inc. All rights reserved.
|
2
|
+
# See the LICENSE file in the root of this repository for details.
|
3
|
+
# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
|
4
|
+
|
5
|
+
|
6
|
+
import pandas as pd
|
7
|
+
import plotly.graph_objects as go
|
8
|
+
from scipy import stats
|
9
|
+
|
10
|
+
from validmind import tags, tasks
|
11
|
+
from validmind.errors import SkipTestError
|
12
|
+
|
13
|
+
|
14
|
+
@tags("tabular_data", "visualization", "model_training")
|
15
|
+
@tasks("regression")
|
16
|
+
def RegressionCoeffs(model):
|
17
|
+
"""
|
18
|
+
Assesses the significance and uncertainty of predictor variables in a regression model through visualization of
|
19
|
+
coefficients and their 95% confidence intervals.
|
20
|
+
|
21
|
+
### Purpose
|
22
|
+
|
23
|
+
The `RegressionCoeffs` metric visualizes the estimated regression coefficients alongside their 95% confidence intervals,
|
24
|
+
providing insights into the impact and significance of predictor variables on the response variable. This visualization
|
25
|
+
helps to understand the variability and uncertainty in the model's estimates, aiding in the evaluation of the
|
26
|
+
significance of each predictor.
|
27
|
+
|
28
|
+
### Test Mechanism
|
29
|
+
|
30
|
+
The function operates by extracting the estimated coefficients and their standard errors from the regression model.
|
31
|
+
Using these, it calculates the confidence intervals at a 95% confidence level, which indicates the range within which
|
32
|
+
the true coefficient value is expected to fall 95% of the time. The confidence intervals are computed using the
|
33
|
+
Z-value associated with the 95% confidence level. The coefficients and their confidence intervals are then visualized
|
34
|
+
in a bar plot. The x-axis represents the predictor variables, the y-axis represents the estimated coefficients, and
|
35
|
+
the error bars depict the confidence intervals.
|
36
|
+
|
37
|
+
### Signs of High Risk
|
38
|
+
|
39
|
+
- The confidence interval for a coefficient contains the zero value, suggesting that the predictor may not significantly
|
40
|
+
contribute to the model.
|
41
|
+
- Multiple coefficients with confidence intervals that include zero, potentially indicating issues with model reliability.
|
42
|
+
- Very wide confidence intervals, which may suggest high uncertainty in the coefficient estimates and potential model
|
43
|
+
instability.
|
44
|
+
|
45
|
+
### Strengths
|
46
|
+
|
47
|
+
- Provides a clear visualization that allows for easy interpretation of the significance and impact of predictor
|
48
|
+
variables.
|
49
|
+
- Includes confidence intervals, which provide additional information about the uncertainty surrounding each coefficient
|
50
|
+
estimate.
|
51
|
+
|
52
|
+
### Limitations
|
53
|
+
|
54
|
+
- The method assumes normality of residuals and independence of observations, assumptions that may not always hold true
|
55
|
+
in practice.
|
56
|
+
- It does not address issues related to multi-collinearity among predictor variables, which can affect the interpretation
|
57
|
+
of coefficients.
|
58
|
+
- This metric is limited to regression tasks using tabular data and is not applicable to other types of machine learning
|
59
|
+
tasks or data structures.
|
60
|
+
"""
|
61
|
+
|
62
|
+
if model.library != "statsmodels":
|
63
|
+
raise SkipTestError("Only statsmodels are supported for this metric")
|
64
|
+
|
65
|
+
# Extract estimated coefficients and standard errors
|
66
|
+
coefficients = model.regression_coefficients()
|
67
|
+
coef = pd.to_numeric(coefficients["coef"])
|
68
|
+
std_err = pd.to_numeric(coefficients["std err"])
|
69
|
+
|
70
|
+
# Calculate confidence intervals
|
71
|
+
confidence_level = 0.95 # 95% confidence interval
|
72
|
+
z_value = stats.norm.ppf((1 + confidence_level) / 2) # Calculate Z-value
|
73
|
+
lower_ci = coef - z_value * std_err
|
74
|
+
upper_ci = coef + z_value * std_err
|
75
|
+
|
76
|
+
# Create a bar plot with confidence intervals
|
77
|
+
fig = go.Figure()
|
78
|
+
|
79
|
+
fig.add_trace(
|
80
|
+
go.Bar(
|
81
|
+
x=list(coefficients["Feature"].values),
|
82
|
+
y=coef,
|
83
|
+
name="Estimated Coefficients",
|
84
|
+
error_y=dict(
|
85
|
+
type="data",
|
86
|
+
symmetric=False,
|
87
|
+
arrayminus=lower_ci,
|
88
|
+
array=upper_ci,
|
89
|
+
visible=True,
|
90
|
+
),
|
91
|
+
)
|
92
|
+
)
|
93
|
+
|
94
|
+
fig.update_layout(
|
95
|
+
title=f"{model.input_id} Coefficients with Confidence Intervals",
|
96
|
+
xaxis_title="Predictor Variables",
|
97
|
+
yaxis_title="Coefficients",
|
98
|
+
)
|
99
|
+
|
100
|
+
return (fig, coefficients)
|
@@ -19,31 +19,36 @@ class RegressionFeatureSignificance(Metric):
|
|
19
19
|
"""
|
20
20
|
Assesses and visualizes the statistical significance of features in a set of regression models.
|
21
21
|
|
22
|
-
|
22
|
+
### Purpose
|
23
|
+
|
23
24
|
The Regression Feature Significance metric assesses the significance of each feature in a given set of regression
|
24
25
|
models. It creates a visualization displaying p-values for every feature of each model, assisting model developers
|
25
26
|
in understanding which features are most influential in their models.
|
26
27
|
|
27
|
-
|
28
|
+
### Test Mechanism
|
29
|
+
|
28
30
|
The test mechanism involves going through each fitted regression model in a given list, extracting the model
|
29
31
|
coefficients and p-values for each feature, and then plotting these values. The x-axis on the plot contains the
|
30
32
|
p-values while the y-axis denotes the coefficients of each feature. A vertical red line is drawn at the threshold
|
31
33
|
for p-value significance, which is 0.05 by default. Any features with p-values to the left of this line are
|
32
34
|
considered statistically significant at the chosen level.
|
33
35
|
|
34
|
-
|
36
|
+
### Signs of High Risk
|
37
|
+
|
35
38
|
- Any feature with a high p-value (greater than the threshold) is considered a potential high risk, as it suggests
|
36
39
|
the feature is not statistically significant and may not be reliably contributing to the model's predictions.
|
37
40
|
- A high number of such features may indicate problems with the model validation, variable selection, and overall
|
38
41
|
reliability of the model predictions.
|
39
42
|
|
40
|
-
|
43
|
+
### Strengths
|
44
|
+
|
41
45
|
- Helps identify the features that significantly contribute to a model's prediction, providing insights into the
|
42
46
|
feature importance.
|
43
47
|
- Provides tangible, easy-to-understand visualizations to interpret the feature significance.
|
44
48
|
- Facilitates comparison of feature importance across multiple models.
|
45
49
|
|
46
|
-
|
50
|
+
### Limitations
|
51
|
+
|
47
52
|
- This metric assumes model features are independent, which may not always be the case. Multicollinearity (high
|
48
53
|
correlation amongst predictors) can cause high variance and unreliable statistical tests of significance.
|
49
54
|
- The p-value strategy for feature selection doesn't take into account the magnitude of the effect, focusing solely
|
@@ -54,7 +59,7 @@ class RegressionFeatureSignificance(Metric):
|
|
54
59
|
"""
|
55
60
|
|
56
61
|
name = "regression_feature_significance"
|
57
|
-
required_inputs = ["
|
62
|
+
required_inputs = ["model"]
|
58
63
|
|
59
64
|
default_params = {"fontsize": 10, "p_threshold": 0.05}
|
60
65
|
tasks = ["regression"]
|
@@ -70,10 +75,10 @@ class RegressionFeatureSignificance(Metric):
|
|
70
75
|
p_threshold = self.params["p_threshold"]
|
71
76
|
|
72
77
|
# Check models list is not empty
|
73
|
-
if not self.inputs.
|
74
|
-
raise ValueError("
|
78
|
+
if not self.inputs.model:
|
79
|
+
raise ValueError("Model must be provided in the models parameter")
|
75
80
|
|
76
|
-
figures = self._plot_pvalues(self.inputs.
|
81
|
+
figures = self._plot_pvalues(self.inputs.model, fontsize, p_threshold)
|
77
82
|
|
78
83
|
return self.cache_results(figures=figures)
|
79
84
|
|
@@ -19,26 +19,30 @@ class RegressionModelForecastPlot(Metric):
|
|
19
19
|
Generates plots to visually compare the forecasted outcomes of one or more regression models against actual
|
20
20
|
observed values over a specified date range.
|
21
21
|
|
22
|
-
|
23
|
-
regression models by comparing the model's forecasted outcomes against actual observed values within a specified
|
24
|
-
date range. This metric is especially useful in time-series models or any model where the outcome changes over
|
25
|
-
time, allowing direct comparison of predicted vs actual values.
|
22
|
+
### Purpose
|
26
23
|
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
are set to the minimum and maximum date available in the dataset. The test verifies that the provided date range is
|
32
|
-
within the limits of the available data.
|
24
|
+
The "regression_forecast_plot" is intended to visually depict the performance of one or more regression models by
|
25
|
+
comparing the model's forecasted outcomes against actual observed values within a specified date range. This metric
|
26
|
+
is especially useful in time-series models or any model where the outcome changes over time, allowing direct
|
27
|
+
comparison of predicted vs actual values.
|
33
28
|
|
34
|
-
|
29
|
+
### Test Mechanism
|
30
|
+
|
31
|
+
This test generates a plot for each fitted model in the list. The x-axis represents the date ranging from the
|
32
|
+
specified "start_date" to the "end_date", while the y-axis shows the value of the outcome variable. Two lines are
|
33
|
+
plotted: one representing the forecasted values and the other representing the observed values. The "start_date"
|
34
|
+
and "end_date" can be parameters of this test; if these parameters are not provided, they are set to the minimum
|
35
|
+
and maximum date available in the dataset. The test verifies that the provided date range is within the limits of
|
36
|
+
the available data.
|
37
|
+
|
38
|
+
### Signs of High Risk
|
35
39
|
|
36
40
|
- High risk or failure signs could be deduced visually from the plots if the forecasted line significantly deviates
|
37
41
|
from the observed line, indicating the model's predicted values are not matching actual outcomes.
|
38
42
|
- A model that struggles to handle the edge conditions like maximum and minimum data points could also be
|
39
43
|
considered a sign of risk.
|
40
44
|
|
41
|
-
|
45
|
+
### Strengths
|
42
46
|
|
43
47
|
- Visualization: The plot provides an intuitive and clear illustration of how well the forecast matches the actual
|
44
48
|
values, making it straightforward even for non-technical stakeholders to interpret.
|
@@ -46,7 +50,7 @@ class RegressionModelForecastPlot(Metric):
|
|
46
50
|
- Model Evaluation: It can be useful in identifying overfitting or underfitting situations, as these will manifest
|
47
51
|
as discrepancies between the forecasted and observed values.
|
48
52
|
|
49
|
-
|
53
|
+
### Limitations
|
50
54
|
|
51
55
|
- Interpretation Bias: Interpretation of the plot is subjective and can lead to different conclusions by different
|
52
56
|
evaluators.
|