PyPI - validmind - Versions diffs - 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl - Mend

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (233) hide show

validmind/tests/ongoing_monitoring/PredictionAcrossEachFeature.py CHANGED Viewed

@@ -12,31 +12,37 @@ from validmind import tags, tasks
 @tasks("monitoring")
 def PredictionAcrossEachFeature(datasets, model):
     """
-    **Purpose:**
-    This test shows visually the prediction using reference data and monitoring data across each individual feature. If
-    there are significant differences in predictions across feature values from reference to monitoring dataset, then
-    further investigation is needed as the model is producing predictions that are different than what was observed
-    during the training of the model.
-    **Test Mechanism:**
-    The test creates scatter plots for each feature, comparing the reference dataset (used for training) with the
-    monitoring dataset (used in production). Each plot has two subplots: one for the reference data and one for the
-    monitoring data, visualizing the prediction probabilities. This allows for a visual comparison of the model's
-    behavior across different datasets.
-    **Signs of High Risk:**
-    - Significant discrepancies between the reference and monitoring subplots for the same feature
-    - Unexpected patterns or trends in monitoring data that weren't present in reference data
-    **Strengths:**
-    - Provides a clear visual representation of model performance across different features
-    - Allows for easy identification of features where the model's predictions have changed
-    - Facilitates quick detection of potential issues with the model when deployed in production
-    **Limitations:**
-    - Interpretation of scatter plots can be subjective and may require expertise
-    - Visualizations do not provide quantitative metrics for objective evaluation
-    - May not capture all types of distribution changes or issues with the model's predictions
+    Assesses differences in model predictions across individual features between reference and monitoring datasets
+    through visual analysis.
+    ### Purpose
+    The Prediction Across Each Feature test aims to visually compare model predictions for each feature between
+    reference (training) and monitoring (production) datasets. It helps identify significant differences in prediction
+    patterns for further investigation and ensures the model's consistency and stability over time.
+    ### Test Mechanism
+    The test generates scatter plots for each feature, comparing prediction probabilities between the reference and
+    monitoring datasets. Each plot consists of two subplots: one for reference data and one for monitoring data,
+    enabling visual comparison of the model's predictive behavior.
+    ### Signs of High Risk
+    - Significant discrepancies between the reference and monitoring subplots for the same feature.
+    - Unexpected patterns or trends in monitoring data that were absent in reference data.
+    ### Strengths
+    - Provides a clear visual representation of model performance across different features.
+    - Facilitates easy identification of features where the model's predictions have diverged.
+    - Enables quick detection of potential model performance issues in production.
+    ### Limitations
+    - Interpretation of scatter plots can be subjective and may require expertise.
+    - Visualizations do not provide quantitative metrics for objective evaluation.
+    - May not capture all types of distribution changes or issues with the model's predictions.
     """
     """

validmind/tests/ongoing_monitoring/PredictionCorrelation.py CHANGED Viewed

@@ -13,30 +13,38 @@ from validmind import tags, tasks
 @tasks("monitoring")
 def PredictionCorrelation(datasets, model):
     """
-    **Purpose:**
-    The test is used to assess the correlation pairs for each feature between model predictions from reference and
-    monitoring datasets. The primary goal is to detect significant changes in these pairs, which may signal target
-    drift, leading to lower model performance.
+    Assesses correlation changes between model predictions from reference and monitoring datasets to detect potential
+    target drift.
-    **Test Mechanism:**
-    The test calculates the correlation of each feature with model predictions for both reference and monitoring
-    datasets. The test then compares these correlations side-by-side via a bar plot and a correlation table. Features
-    with significant changes in correlation pairs highlight potential risks of model drift.
+    ### Purpose
+    To evaluate the changes in correlation pairs between model predictions and features from reference and monitoring
+    datasets. This helps in identifying significant shifts that may indicate target drift, potentially affecting model
+    performance.
+    ### Test Mechanism
+    This test calculates the correlation of each feature with model predictions for both reference and monitoring
+    datasets. It then compares these correlations side-by-side using a bar plot and a correlation table. Significant
+    changes in correlation pairs are highlighted to signal possible model drift.
+    ### Signs of High Risk
-    **Signs of High Risk:**
     - Significant changes in correlation pairs between the reference and monitoring predictions.
-    - Notable correlation differences indicating a potential shift in the relationship between features and the target
-    variable.
-    **Strengths:**
-    - Allows for visual identification of drift in feature relationships with model predictions.
-    - Comparison via a clear bar plot assists in understanding model stability over time.
-    - Helps in early detection of target drift, enabling timely interventions.
-    **Limitations:**
-    - May require substantial reference and monitoring data for accurate comparison.
-    - Correlation does not imply causation, and other factors might influence changes.
-    - The method solely focuses on linear relationships, potentially missing non-linear interactions.
+    - Notable differences in correlation values, indicating a possible shift in the relationship between features and
+    the target variable.
+    ### Strengths
+    - Provides visual identification of drift in feature relationships with model predictions.
+    - Clear bar plot comparison aids in understanding model stability over time.
+    - Enables early detection of target drift, facilitating timely interventions.
+    ### Limitations
+    - Requires substantial reference and monitoring data for accurate comparison.
+    - Correlation does not imply causation; other factors may influence changes.
+    - Focuses solely on linear relationships, potentially missing non-linear interactions.
     """
     prediction_prob_column = f"{model.input_id}_probabilities"

validmind/tests/ongoing_monitoring/TargetPredictionDistributionPlot.py CHANGED Viewed

@@ -12,29 +12,37 @@ from validmind import tags, tasks
 @tasks("monitoring")
 def TargetPredictionDistributionPlot(datasets, model):
     """
-    **Purpose:**
-    This test provides the prediction distributions from the reference dataset and the new monitoring dataset. If there
-    are significant differences in the distributions, it might indicate different underlying data characteristics that
-    warrant further investigation into the root causes.
-    **Test Mechanism:**
-    The methodology involves generating Kernel Density Estimation (KDE) plots for the prediction probabilities from
-    both the reference and monitoring datasets. By comparing these KDE plots, one can visually assess any significant
-    differences in the prediction distributions between the two datasets.
-    **Signs of High Risk:**
-    - Significant divergence between the distribution curves of the reference and monitoring predictions
-    - Unusual shifts or bimodal distribution in the monitoring predictions compared to the reference predictions
-    **Strengths:**
-    - Visual representation makes it easy to spot differences in prediction distributions
-    - Useful for identifying potential data drift or changes in underlying data characteristics
-    - Simple and efficient to implement using standard plotting libraries
-    **Limitations:**
-    - Subjective interpretation of the visual plots
-    - Might not pinpoint the exact cause of distribution changes
-    - Less effective if the differences in distributions are subtle and not easily visible
+    Assesses differences in prediction distributions between a reference dataset and a monitoring dataset to identify
+    potential data drift.
+    ### Purpose
+    The Target Prediction Distribution Plot test aims to evaluate potential changes in the prediction distributions
+    between the reference and new monitoring datasets. It seeks to identify underlying shifts in data characteristics
+    that warrant further investigation.
+    ### Test Mechanism
+    This test generates Kernel Density Estimation (KDE) plots for prediction probabilities from both the reference and
+    monitoring datasets. By visually comparing the KDE plots, it assesses significant differences in the prediction
+    distributions between the two datasets.
+    ### Signs of High Risk
+    - Significant divergence between the distribution curves of reference and monitoring predictions.
+    - Unusual shifts or bimodal distribution in the monitoring predictions compared to the reference predictions.
+    ### Strengths
+    - Visual representation makes it easy to spot differences in prediction distributions.
+    - Useful for identifying potential data drift or changes in underlying data characteristics.
+    - Simple and efficient to implement using standard plotting libraries.
+    ### Limitations
+    - Subjective interpretation of the visual plots.
+    - Might not pinpoint the exact cause of distribution changes.
+    - Less effective if the differences in distributions are subtle and not easily visible.
     """
     pred_ref = datasets[0].y_prob_df(model)

validmind/tests/prompt_validation/Bias.py CHANGED Viewed

@@ -27,42 +27,45 @@ from .ai_powered_test import (
 @dataclass
 class Bias(ThresholdTest):
     """
-    Evaluates bias in a Large Language Model based on the order and distribution of exemplars in a prompt.
+    Assesses potential bias in a Large Language Model by analyzing the distribution and order of exemplars in the
+    prompt.
+    ### Purpose
-    **Purpose:**
     The Bias Evaluation test calculates if and how the order and distribution of exemplars (examples) in a few-shot
     learning prompt affect the output of a Large Language Model (LLM). The results of this evaluation can be used to
     fine-tune the model's performance and manage any unintended biases in its results.
-    **Test Mechanism:**
+    ### Test Mechanism
     This test uses two checks:
-    1. *Distribution of Exemplars:* The number of positive vs. negative examples in a prompt is varied. The test then
+    1. **Distribution of Exemplars:** The number of positive vs. negative examples in a prompt is varied. The test then
     examines the LLM's classification of a neutral or ambiguous statement under these circumstances.
-    2. *Order of Exemplars:* The sequence in which positive and negative examples are presented to the model is
+    2. **Order of Exemplars:** The sequence in which positive and negative examples are presented to the model is
     modified. Their resultant effect on the LLM's response is studied.
     For each test case, the LLM grades the input prompt on a scale of 1 to 10. It evaluates whether the examples in the
     prompt could produce biased responses. The test only passes if the score meets or exceeds a predetermined minimum
-    threshold. This threshold is set at 7 by default, but it can be modified as per the requirements via the test
+    threshold. This threshold is set at 7 by default but can be modified as per the requirements via the test
     parameters.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - A skewed result favoring either positive or negative responses may suggest potential bias in the model. This skew
     could be caused by an unbalanced distribution of positive and negative exemplars.
     - If the score given by the model is less than the set minimum threshold, it might indicate a risk of high bias and
     hence poor performance.
-    **Strengths:**
+    ### Strengths
-    - This test provides a quantitative measure of potential bias, providing clear guidelines for developers about
+    - This test provides a quantitative measure of potential bias, offering clear guidelines for developers about
     whether their Large Language Model (LLM) contains significant bias.
-    - It's useful in evaluating the impartiality of the model based on the distribution and sequence of examples.
+    - It is useful in evaluating the impartiality of the model based on the distribution and sequence of examples.
     - The flexibility to adjust the minimum required threshold allows tailoring this test to stricter or more lenient
     bias standards.
-    **Limitations:**
+    ### Limitations
     - The test may not pick up on more subtle forms of bias or biases that are not directly related to the distribution
     or order of exemplars.

validmind/tests/prompt_validation/Clarity.py CHANGED Viewed

@@ -29,36 +29,38 @@ class Clarity(ThresholdTest):
     """
     Evaluates and scores the clarity of prompts in a Large Language Model based on specified guidelines.
-    **Purpose:**
+    ### Purpose
     The Clarity evaluation metric is used to assess how clear the prompts of a Large Language Model (LLM) are. This
     assessment is particularly important because clear prompts assist the LLM in more accurately interpreting and
     responding to instructions.
-    **Test Mechanism:**
+    ### Test Mechanism
     The evaluation uses an LLM to scrutinize the clarity of prompts, factoring in considerations such as the inclusion
-    of relevant details, persona adoption, step-by-step instructions, usage of examples and specification of desired
+    of relevant details, persona adoption, step-by-step instructions, usage of examples, and specification of desired
     output length. Each prompt is rated on a clarity scale of 1 to 10, and any prompt scoring at or above the preset
     threshold (default of 7) will be marked as clear. It is important to note that this threshold can be adjusted via
     test parameters, providing flexibility in the evaluation process.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Prompts that consistently score below the clarity threshold
-    - Repeated failure of prompts to adhere to guidelines for clarity. These guidelines could include detail inclusion,
-    persona adoption, explicit step-by-step instructions, use of examples, and specification of output length.
+    - Repeated failure of prompts to adhere to guidelines for clarity, including detail inclusion, persona adoption,
+    explicit step-by-step instructions, use of examples, and specification of output length
-    **Strengths:**
+    ### Strengths
-    - Encourages the development of more effective prompts that aid the LLM in interpreting instructions accurately.
-    - Applies a quantifiable measure (a score from 1 to 10) to evaluate the clarity of prompts.
-    - Threshold for clarity is adjustable, allowing for flexible evaluation depending on the context.
+    - Encourages the development of more effective prompts that aid the LLM in interpreting instructions accurately
+    - Applies a quantifiable measure (a score from 1 to 10) to evaluate the clarity of prompts
+    - Threshold for clarity is adjustable, allowing for flexible evaluation depending on the context
-    **Limitations:**
+    ### Limitations
-    - Scoring system is subjective and relies on the AI’s interpretation of 'clarity'.
+    - Scoring system is subjective and relies on the AI’s interpretation of 'clarity'
     - The test assumes that all required factors (detail inclusion, persona adoption, step-by-step instructions, use of
-    examples, and specification of output length) contribute equally to clarity, which might not always be the case.
-    - The evaluation may not be as effective if used on non-textual models.
+    examples, and specification of output length) contribute equally to clarity, which might not always be the case
+    - The evaluation may not be as effective if used on non-textual models
     """
     name = "clarity"

validmind/tests/prompt_validation/Conciseness.py CHANGED Viewed

@@ -29,31 +29,33 @@ class Conciseness(ThresholdTest):
     """
     Analyzes and grades the conciseness of prompts provided to a Large Language Model.
-    **Purpose:**
+    ### Purpose
     The Conciseness Assessment is designed to evaluate the brevity and succinctness of prompts provided to a Language
     Learning Model (LLM). A concise prompt strikes a balance between offering clear instructions and eliminating
     redundant or unnecessary information, ensuring that the LLM receives relevant input without being overwhelmed.
-    **Test Mechanism:**
+    ### Test Mechanism
     Using an LLM, this test conducts a conciseness analysis on input prompts. The analysis grades the prompt on a scale
     from 1 to 10, where the grade reflects how well the prompt delivers clear instructions without being verbose.
     Prompts that score equal to or above a predefined threshold (default set to 7) are deemed successfully concise.
     This threshold can be adjusted to meet specific requirements.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Prompts that consistently score below the predefined threshold.
     - Prompts that are overly wordy or contain unnecessary information.
     - Prompts that create confusion or ambiguity due to excess or unnecessary information.
-    **Strengths:**
+    ### Strengths
     - Ensures clarity and effectiveness of the prompts.
     - Promotes brevity and preciseness in prompts without sacrificing essential information.
     - Useful for models like LLMs, where input prompt length and clarity greatly influence model performance.
     - Provides a quantifiable measure of prompt conciseness.
-    **Limitations:**
+    ### Limitations
     - The conciseness score is based on an AI's assessment, which might not fully capture human interpretation of
     conciseness.

validmind/tests/prompt_validation/Delimitation.py CHANGED Viewed

@@ -29,38 +29,39 @@ class Delimitation(ThresholdTest):
     """
     Evaluates the proper use of delimiters in prompts provided to Large Language Models.
-    **Purpose:**
-    This test, dubbed the "Delimitation Test", is engineered to assess whether prompts provided to the Language
-    Learning Model (LLM) correctly use delimiters to mark different sections of the input. Well-delimited prompts
-    simplify the interpretation process for LLM, ensuring responses are precise and accurate.
+    ### Purpose
+    The Delimitation Test aims to assess whether prompts provided to the Language Learning Model (LLM) correctly use
+    delimiters to mark different sections of the input. Well-delimited prompts help simplify the interpretation process
+    for the LLM, ensuring that the responses are precise and accurate.
+    ### Test Mechanism
-    **Test Mechanism:**
     The test employs an LLM to examine prompts for appropriate use of delimiters such as triple quotation marks, XML
-    tags, and section titles. Each prompt is assigned a score from 1 to 10 based on its delimitation integrity. Those
+    tags, and section titles. Each prompt is assigned a score from 1 to 10 based on its delimitation integrity. Prompts
     with scores equal to or above the preset threshold (which is 7 by default, although it can be adjusted as
     necessary) pass the test.
-    **Signs of High Risk:**
+    ### Signs of High Risk
-    - The test identifies prompts where a delimiter is missing, improperly placed, or incorrect, which can lead to
-    misinterpretation by the LLM.
-    - A high-risk scenario may involve complex prompts with multiple tasks or diverse data where correct delimitation
-    is integral to understanding.
-    - Low scores (below the threshold) are a clear indicator of high risk.
+    - Prompts missing, improperly placed, or incorrectly used delimiters, leading to misinterpretation by the LLM.
+    - High-risk scenarios with complex prompts involving multiple tasks or diverse data where correct delimitation is
+    crucial.
+    - Scores below the threshold, indicating a high risk.
-    **Strengths:**
+    ### Strengths
-    - This test ensures clarity in the demarcation of different components of given prompts.
-    - It helps reduce ambiguity in understanding prompts, particularly for complex tasks.
-    - Scoring allows for quantified insight into the appropriateness of delimiter usage, aiding continuous improvement.
+    - Ensures clarity in demarcating different components of given prompts.
+    - Reduces ambiguity in understanding prompts, especially for complex tasks.
+    - Provides a quantified insight into the appropriateness of delimiter usage, aiding continuous improvement.
-    **Limitations:**
+    ### Limitations
-    - The test only checks for the presence and placement of delimiter, not whether the correct delimiter type is used
-    for the specific data or task.
-    - It may not fully reveal the impacts of poor delimitation on LLM's final performance.
-    - Depending on the complexity of the tasks and prompts, the preset score threshold may not be refined enough,
-    requiring regular manual adjustment.
+    - Only checks for the presence and placement of delimiters, not whether the correct delimiter type is used for the
+    specific data or task.
+    - May not fully reveal the impacts of poor delimitation on the LLM's final performance.
+    - The preset score threshold may not be refined enough for complex tasks and prompts, requiring regular manual
+    adjustment.
     """
     name = "delimitation"

validmind/tests/prompt_validation/NegativeInstruction.py CHANGED Viewed

@@ -29,34 +29,36 @@ class NegativeInstruction(ThresholdTest):
     """
     Evaluates and grades the use of affirmative, proactive language over negative instructions in LLM prompts.
-    **Purpose:**
+    ### Purpose
     The Negative Instruction test is utilized to scrutinize the prompts given to a Large Language Model (LLM). The
     objective is to ensure these prompts are expressed using proactive, affirmative language. The focus is on
     instructions indicating what needs to be done rather than what needs to be avoided, thereby guiding the LLM more
     efficiently towards the desired output.
-    **Test Mechanism:**
+    ### Test Mechanism
     An LLM is employed to evaluate each prompt. The prompt is graded based on its use of positive instructions with
     scores ranging between 1-10. This grade reflects how effectively the prompt leverages affirmative language while
     shying away from negative or restrictive instructions. A prompt that attains a grade equal to or above a
     predetermined threshold (7 by default) is regarded as adhering effectively to the best practices of positive
     instruction. This threshold can be custom-tailored through the test parameters.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Low score obtained from the LLM analysis, indicating heavy reliance on negative instructions in the prompts.
     - Failure to surpass the preset minimum threshold.
     - The LLM generates ambiguous or undesirable outputs as a consequence of the negative instructions used in the
     prompt.
-    **Strengths:**
+    ### Strengths
     - Encourages the usage of affirmative, proactive language in prompts, aiding in more accurate and advantageous
     model responses.
     - The test result provides a comprehensible score, helping to understand how well a prompt follows the positive
     instruction best practices.
-    **Limitations:**
+    ### Limitations
     - Despite an adequate score, a prompt could still be misleading or could lead to undesired responses due to factors
     not covered by this test.

validmind/tests/prompt_validation/Robustness.py CHANGED Viewed

@@ -24,31 +24,33 @@ class Robustness(ThresholdTest):
     """
     Assesses the robustness of prompts provided to a Large Language Model under varying conditions and contexts.
-    **Purpose:**
+    ### Purpose
     The Robustness test is meant to evaluate the resilience and reliability of prompts provided to a Language Learning
-    Model (LLM). The aim of this test is to guarantee that the prompts consistently generate accurate and the expected
-    outputs, despite being in diverse or challenging scenarios.
+    Model (LLM). The aim of this test is to guarantee that the prompts consistently generate accurate and expected
+    outputs, even in diverse or challenging scenarios.
+    ### Test Mechanism
-    **Test Mechanism:**
     The Robustness test appraises prompts under various conditions, alterations, and contexts to ascertain their
-    stability in producing consistent responses from the LLM. Factors evaluated range from different phrasings,
-    inclusion of potential distracting elements, and various input complexities. By default, the test generates 10
-    inputs for a prompt but can be adjusted according to test parameters.
+    stability in producing consistent responses from the LLM. Factors evaluated include different phrasings, inclusion
+    of potential distracting elements, and various input complexities. By default, the test generates 10 inputs for a
+    prompt but can be adjusted according to test parameters.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - If the output from the tests diverges extensively from the expected results, this indicates high risk.
     - When the prompt doesn't give a consistent performance across various tests.
     - A high risk is indicated when the prompt is susceptible to breaking, especially when the output is expected to be
     of a specific type.
-    **Strengths:**
+    ### Strengths
     - The robustness test helps to ensure stable performance of the LLM prompts and lowers the chances of generating
     unexpected or off-target outputs.
     - This test is vital for applications where predictability and reliability of the LLM’s output are crucial.
-    **Limitations:**
+    ### Limitations
     - Currently, the test only supports single-variable prompts, which restricts its application to more complex models.
     - When there are too many target classes (over 10), the test is skipped, which can leave potential vulnerabilities

validmind/tests/prompt_validation/Specificity.py CHANGED Viewed

@@ -27,40 +27,42 @@ from .ai_powered_test import (
 @dataclass
 class Specificity(ThresholdTest):
     """
-    Evaluates and scores the specificity of prompts provided to a Large Language Model (LLM), based on clarity,
-    detail, and relevance.
+    Evaluates and scores the specificity of prompts provided to a Large Language Model (LLM), based on clarity, detail,
+    and relevance.
+    ### Purpose
-    **Purpose:**
     The Specificity Test evaluates the clarity, precision, and effectiveness of the prompts provided to a Language
-    Learning Model (LLM). It aims to ensure that the instructions embedded in a prompt are indisputably clear and
-    relevant, thereby helping to yank out ambiguity and steer the LLM towards desired outputs. This level of
-    specificity significantly affects the accuracy and relevance of LLM outputs.
+    Model (LLM). It aims to ensure that the instructions embedded in a prompt are indisputably clear and relevant,
+    thereby helping to remove ambiguity and steer the LLM towards desired outputs. This level of specificity
+    significantly affects the accuracy and relevance of LLM outputs.
+    ### Test Mechanism
-    **Test Mechanism:**
     The Specificity Test employs an LLM to grade each prompt based on clarity, detail, and relevance parameters within
     a specificity scale that extends from 1 to 10. On this scale, prompts scoring equal to or more than a predefined
     threshold (set to 7 by default) pass the evaluation, while those scoring below this threshold fail it. Users can
     adjust this threshold as per their requirements.
-    **Signs of High Risk:**
+    ### Signs of High Risk
     - Prompts scoring consistently below the established threshold
     - Vague or ambiguous prompts that do not provide clear direction to the LLM
     - Overly verbose prompts that may confuse the LLM instead of providing clear guidance
-    **Strengths:**
+    ### Strengths
     - Enables precise and clear communication with the LLM to achieve desired outputs
     - Serves as a crucial means to measure the effectiveness of prompts
     - Highly customizable, allowing users to set their threshold based on specific use cases
-    **Limitations:**
+    ### Limitations
     - This test doesn't consider the content comprehension capability of the LLM
     - High specificity score doesn't guarantee a high-quality response from the LLM, as the model's performance is also
     dependent on various other factors
     - Striking a balance between specificity and verbosity can be challenging, as overly detailed prompts might confuse
-    or mislead the model.
+    or mislead the model
     """
     name = "specificity"

validmind/tests/prompt_validation/ai_powered_test.py CHANGED Viewed

@@ -5,6 +5,7 @@
 import re
 from validmind.ai.utils import get_client_and_model
+from validmind.client_config import client_config
 missing_prompt_message = """
 Cannot run prompt validation tests on a model with no prompt.
@@ -24,6 +25,11 @@ def call_model(
     system_prompt: str, user_prompt: str, temperature: float = 0.0, seed: int = 42
 ):
     """Call LLM with the given prompts and return the response"""
+    if not client_config.can_generate_llm_test_descriptions():
+        raise ValueError(
+            "LLM based descriptions are not enabled for your organization."
+        )
     client, model = get_client_and_model()
     return (

validmind 2.5.8__py3-none-any.whl → 2.5.18__py3-none-any.whl

validmind 2.5.8py3-none-any.whl → 2.5.18py3-none-any.whl