validmind 2.1.1__tar.gz → 2.2.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. {validmind-2.1.1 → validmind-2.2.4}/PKG-INFO +5 -3
  2. {validmind-2.1.1 → validmind-2.2.4}/pyproject.toml +5 -2
  3. validmind-2.2.4/validmind/__version__.py +1 -0
  4. {validmind-2.1.1 → validmind-2.2.4}/validmind/ai.py +72 -49
  5. {validmind-2.1.1 → validmind-2.2.4}/validmind/api_client.py +42 -16
  6. {validmind-2.1.1 → validmind-2.2.4}/validmind/client.py +68 -25
  7. validmind-2.2.4/validmind/datasets/llm/rag/__init__.py +11 -0
  8. validmind-2.2.4/validmind/datasets/llm/rag/datasets/rfp_existing_questions_client_1.csv +30 -0
  9. validmind-2.2.4/validmind/datasets/llm/rag/datasets/rfp_existing_questions_client_2.csv +30 -0
  10. validmind-2.2.4/validmind/datasets/llm/rag/datasets/rfp_existing_questions_client_3.csv +53 -0
  11. validmind-2.2.4/validmind/datasets/llm/rag/datasets/rfp_existing_questions_client_4.csv +53 -0
  12. validmind-2.2.4/validmind/datasets/llm/rag/datasets/rfp_existing_questions_client_5.csv +53 -0
  13. validmind-2.2.4/validmind/datasets/llm/rag/rfp.py +41 -0
  14. {validmind-2.1.1 → validmind-2.2.4}/validmind/errors.py +1 -1
  15. validmind-2.2.4/validmind/html_templates/content_blocks.py +140 -0
  16. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/__init__.py +7 -4
  17. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/foundation.py +8 -34
  18. validmind-2.2.4/validmind/models/function.py +51 -0
  19. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/huggingface.py +16 -46
  20. validmind-2.2.4/validmind/models/metadata.py +42 -0
  21. validmind-2.2.4/validmind/models/pipeline.py +66 -0
  22. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/pytorch.py +8 -42
  23. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/r_model.py +33 -82
  24. {validmind-2.1.1 → validmind-2.2.4}/validmind/models/sklearn.py +39 -38
  25. {validmind-2.1.1 → validmind-2.2.4}/validmind/template.py +8 -26
  26. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/__init__.py +43 -20
  27. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/ANOVAOneWayTable.py +1 -1
  28. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/ChiSquaredFeaturesTable.py +1 -1
  29. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/DescriptiveStatistics.py +2 -4
  30. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/Duplicates.py +1 -1
  31. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/IsolationForestOutliers.py +2 -2
  32. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/LaggedCorrelationHeatmap.py +1 -1
  33. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TargetRateBarPlots.py +1 -1
  34. validmind-2.2.4/validmind/tests/data_validation/nlp/LanguageDetection.py +59 -0
  35. validmind-2.2.4/validmind/tests/data_validation/nlp/PolarityAndSubjectivity.py +48 -0
  36. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/Punctuations.py +11 -12
  37. validmind-2.2.4/validmind/tests/data_validation/nlp/Sentiment.py +57 -0
  38. validmind-2.2.4/validmind/tests/data_validation/nlp/Toxicity.py +45 -0
  39. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/decorator.py +12 -7
  40. validmind-2.2.4/validmind/tests/model_validation/BertScore.py +119 -0
  41. validmind-2.2.4/validmind/tests/model_validation/BleuScore.py +107 -0
  42. validmind-2.2.4/validmind/tests/model_validation/ContextualRecall.py +93 -0
  43. validmind-2.2.4/validmind/tests/model_validation/MeteorScore.py +104 -0
  44. validmind-2.2.4/validmind/tests/model_validation/RegardScore.py +125 -0
  45. validmind-2.2.4/validmind/tests/model_validation/RougeScore.py +118 -0
  46. validmind-2.2.4/validmind/tests/model_validation/TokenDisparity.py +103 -0
  47. validmind-2.2.4/validmind/tests/model_validation/ToxicityScore.py +133 -0
  48. validmind-2.2.4/validmind/tests/model_validation/embeddings/CosineSimilarityComparison.py +96 -0
  49. validmind-2.2.4/validmind/tests/model_validation/embeddings/CosineSimilarityHeatmap.py +71 -0
  50. validmind-2.2.4/validmind/tests/model_validation/embeddings/EuclideanDistanceComparison.py +92 -0
  51. validmind-2.2.4/validmind/tests/model_validation/embeddings/EuclideanDistanceHeatmap.py +69 -0
  52. validmind-2.2.4/validmind/tests/model_validation/embeddings/PCAComponentsPairwisePlots.py +78 -0
  53. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/StabilityAnalysis.py +35 -23
  54. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/StabilityAnalysisKeyword.py +3 -0
  55. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/StabilityAnalysisRandomNoise.py +7 -1
  56. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/StabilityAnalysisSynonyms.py +3 -0
  57. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/StabilityAnalysisTranslation.py +3 -0
  58. validmind-2.2.4/validmind/tests/model_validation/embeddings/TSNEComponentsPairwisePlots.py +99 -0
  59. validmind-2.2.4/validmind/tests/model_validation/ragas/AnswerCorrectness.py +131 -0
  60. validmind-2.2.4/validmind/tests/model_validation/ragas/AnswerRelevance.py +134 -0
  61. validmind-2.2.4/validmind/tests/model_validation/ragas/AnswerSimilarity.py +119 -0
  62. validmind-2.2.4/validmind/tests/model_validation/ragas/AspectCritique.py +167 -0
  63. validmind-2.2.4/validmind/tests/model_validation/ragas/ContextEntityRecall.py +133 -0
  64. validmind-2.2.4/validmind/tests/model_validation/ragas/ContextPrecision.py +123 -0
  65. validmind-2.2.4/validmind/tests/model_validation/ragas/ContextRecall.py +123 -0
  66. validmind-2.2.4/validmind/tests/model_validation/ragas/ContextRelevancy.py +114 -0
  67. validmind-2.2.4/validmind/tests/model_validation/ragas/Faithfulness.py +119 -0
  68. validmind-2.2.4/validmind/tests/model_validation/ragas/utils.py +66 -0
  69. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/OverfitDiagnosis.py +3 -7
  70. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/PermutationFeatureImportance.py +8 -9
  71. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/PopulationStabilityIndex.py +5 -10
  72. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/PrecisionRecallCurve.py +3 -2
  73. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ROCCurve.py +2 -1
  74. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/RegressionR2Square.py +1 -1
  75. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/RobustnessDiagnosis.py +2 -3
  76. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/SHAPGlobalImportance.py +7 -11
  77. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/WeakspotsDiagnosis.py +3 -4
  78. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelForecastPlot.py +1 -1
  79. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelForecastPlotLevels.py +1 -1
  80. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelInsampleComparison.py +1 -1
  81. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelOutsampleComparison.py +1 -1
  82. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelSummary.py +1 -1
  83. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelsCoeffs.py +1 -1
  84. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelsPerformance.py +1 -1
  85. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ScorecardHistogram.py +5 -6
  86. validmind-2.2.4/validmind/tests/prompt_validation/__init__.py +0 -0
  87. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/__init__.py +26 -49
  88. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/composite.py +13 -7
  89. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/sklearn/AdjustedRSquaredScore.py +1 -1
  90. {validmind-2.1.1 → validmind-2.2.4}/validmind/utils.py +99 -6
  91. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/__init__.py +1 -1
  92. validmind-2.2.4/validmind/vm_models/dataset/__init__.py +7 -0
  93. validmind-2.2.4/validmind/vm_models/dataset/dataset.py +560 -0
  94. validmind-2.2.4/validmind/vm_models/dataset/utils.py +146 -0
  95. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/model.py +97 -72
  96. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/metric.py +9 -24
  97. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/result_wrapper.py +124 -28
  98. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/threshold_test.py +10 -28
  99. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test_context.py +1 -1
  100. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test_suite/summary.py +3 -4
  101. validmind-2.1.1/validmind/__version__.py +0 -1
  102. validmind-2.1.1/validmind/html_templates/content_blocks.py +0 -65
  103. validmind-2.1.1/validmind/models/catboost.py +0 -33
  104. validmind-2.1.1/validmind/models/statsmodels.py +0 -50
  105. validmind-2.1.1/validmind/models/xgboost.py +0 -30
  106. validmind-2.1.1/validmind/tests/model_validation/BertScore.py +0 -117
  107. validmind-2.1.1/validmind/tests/model_validation/BertScoreAggregate.py +0 -90
  108. validmind-2.1.1/validmind/tests/model_validation/BleuScore.py +0 -78
  109. validmind-2.1.1/validmind/tests/model_validation/ContextualRecall.py +0 -110
  110. validmind-2.1.1/validmind/tests/model_validation/MeteorScore.py +0 -92
  111. validmind-2.1.1/validmind/tests/model_validation/RegardHistogram.py +0 -148
  112. validmind-2.1.1/validmind/tests/model_validation/RegardScore.py +0 -143
  113. validmind-2.1.1/validmind/tests/model_validation/RougeMetrics.py +0 -147
  114. validmind-2.1.1/validmind/tests/model_validation/RougeMetricsAggregate.py +0 -133
  115. validmind-2.1.1/validmind/tests/model_validation/SelfCheckNLIScore.py +0 -112
  116. validmind-2.1.1/validmind/tests/model_validation/TokenDisparity.py +0 -140
  117. validmind-2.1.1/validmind/tests/model_validation/ToxicityHistogram.py +0 -136
  118. validmind-2.1.1/validmind/tests/model_validation/ToxicityScore.py +0 -147
  119. validmind-2.1.1/validmind/vm_models/dataset.py +0 -1303
  120. {validmind-2.1.1 → validmind-2.2.4}/LICENSE +0 -0
  121. {validmind-2.1.1 → validmind-2.2.4}/README.pypi.md +0 -0
  122. {validmind-2.1.1 → validmind-2.2.4}/validmind/__init__.py +0 -0
  123. {validmind-2.1.1 → validmind-2.2.4}/validmind/client_config.py +0 -0
  124. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/__init__.py +0 -0
  125. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/classification/__init__.py +0 -0
  126. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/classification/customer_churn.py +0 -0
  127. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/classification/datasets/bank_customer_churn.csv +0 -0
  128. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/classification/datasets/taiwan_credit.csv +0 -0
  129. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/classification/taiwan_credit.py +0 -0
  130. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/cluster/digits.py +0 -0
  131. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/credit_risk/__init__.py +0 -0
  132. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/credit_risk/datasets/lending_club_loan_data_2007_2014_clean.csv.gz +0 -0
  133. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/credit_risk/lending_club.py +0 -0
  134. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/__init__.py +0 -0
  135. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/cnn_dailymail.py +0 -0
  136. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/datasets/Covid_19.csv +0 -0
  137. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/datasets/cnn_dailymail_100_with_predictions.csv +0 -0
  138. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/datasets/cnn_dailymail_500_with_predictions.csv +0 -0
  139. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/datasets/sentiments_with_predictions.csv +0 -0
  140. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/nlp/twitter_covid_19.py +0 -0
  141. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/__init__.py +0 -0
  142. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/california_housing.py +0 -0
  143. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/CPIAUCSL.csv +0 -0
  144. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/CSUSHPISA.csv +0 -0
  145. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/DRSFRMACBS.csv +0 -0
  146. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/FEDFUNDS.csv +0 -0
  147. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/GDP.csv +0 -0
  148. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/GDPC1.csv +0 -0
  149. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/GS10.csv +0 -0
  150. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/GS3.csv +0 -0
  151. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/GS5.csv +0 -0
  152. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/MORTGAGE30US.csv +0 -0
  153. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred/UNRATE.csv +0 -0
  154. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates.csv +0 -0
  155. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates_test_1.csv +0 -0
  156. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates_test_2.csv +0 -0
  157. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates_test_3.csv +0 -0
  158. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates_test_4.csv +0 -0
  159. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/fred_loan_rates_test_5.csv +0 -0
  160. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/datasets/lending_club_loan_rates.csv +0 -0
  161. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/fred.py +0 -0
  162. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/lending_club.py +0 -0
  163. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/models/fred_loan_rates_model_1.pkl +0 -0
  164. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/models/fred_loan_rates_model_2.pkl +0 -0
  165. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/models/fred_loan_rates_model_3.pkl +0 -0
  166. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/models/fred_loan_rates_model_4.pkl +0 -0
  167. {validmind-2.1.1 → validmind-2.2.4}/validmind/datasets/regression/models/fred_loan_rates_model_5.pkl +0 -0
  168. {validmind-2.1.1/validmind/tests/data_validation → validmind-2.2.4/validmind/html_templates}/__init__.py +0 -0
  169. {validmind-2.1.1 → validmind-2.2.4}/validmind/input_registry.py +0 -0
  170. {validmind-2.1.1 → validmind-2.2.4}/validmind/logging.py +0 -0
  171. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/__init__.py +0 -0
  172. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/classifier.py +0 -0
  173. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/cluster.py +0 -0
  174. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/embeddings.py +0 -0
  175. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/llm.py +0 -0
  176. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/nlp.py +0 -0
  177. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/parameters_optimization.py +0 -0
  178. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/regression.py +0 -0
  179. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/statsmodels_timeseries.py +0 -0
  180. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/summarization.py +0 -0
  181. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/tabular_datasets.py +0 -0
  182. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/text_data.py +0 -0
  183. {validmind-2.1.1 → validmind-2.2.4}/validmind/test_suites/time_series.py +0 -0
  184. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/ACFandPACFPlot.py +0 -0
  185. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/AutoAR.py +0 -0
  186. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/AutoMA.py +0 -0
  187. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/AutoSeasonality.py +0 -0
  188. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/AutoStationarity.py +0 -0
  189. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/BivariateFeaturesBarPlots.py +0 -0
  190. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/BivariateHistograms.py +0 -0
  191. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/BivariateScatterPlots.py +0 -0
  192. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/ClassImbalance.py +0 -0
  193. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/DatasetDescription.py +0 -0
  194. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/DatasetSplit.py +0 -0
  195. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/DefaultRatesbyRiskBandPlot.py +0 -0
  196. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/EngleGrangerCoint.py +0 -0
  197. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/FeatureTargetCorrelationPlot.py +0 -0
  198. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/HeatmapFeatureCorrelations.py +0 -0
  199. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/HighCardinality.py +0 -0
  200. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/HighPearsonCorrelation.py +0 -0
  201. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/IQROutliersBarPlot.py +0 -0
  202. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/IQROutliersTable.py +0 -0
  203. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/MissingValues.py +0 -0
  204. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/MissingValuesBarPlot.py +0 -0
  205. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/MissingValuesRisk.py +0 -0
  206. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/PearsonCorrelationMatrix.py +0 -0
  207. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/PiTCreditScoresHistogram.py +0 -0
  208. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/PiTPDHistogram.py +0 -0
  209. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/RollingStatsPlot.py +0 -0
  210. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/ScatterPlot.py +0 -0
  211. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/SeasonalDecompose.py +0 -0
  212. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/Skewness.py +0 -0
  213. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/SpreadPlot.py +0 -0
  214. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TabularCategoricalBarPlots.py +0 -0
  215. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TabularDateTimeHistograms.py +0 -0
  216. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TabularDescriptionTables.py +0 -0
  217. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TabularNumericalHistograms.py +0 -0
  218. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TimeSeriesFrequency.py +0 -0
  219. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TimeSeriesHistogram.py +0 -0
  220. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TimeSeriesLinePlot.py +0 -0
  221. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TimeSeriesMissingValues.py +0 -0
  222. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TimeSeriesOutliers.py +0 -0
  223. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/TooManyZeroValues.py +0 -0
  224. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/UniqueRows.py +0 -0
  225. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/WOEBinPlots.py +0 -0
  226. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/WOEBinTable.py +0 -0
  227. {validmind-2.1.1/validmind/tests/data_validation/nlp → validmind-2.2.4/validmind/tests/data_validation}/__init__.py +0 -0
  228. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/CommonWords.py +0 -0
  229. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/Hashtags.py +0 -0
  230. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/Mentions.py +0 -0
  231. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/StopWords.py +0 -0
  232. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/data_validation/nlp/TextDescription.py +0 -0
  233. {validmind-2.1.1/validmind/tests/model_validation → validmind-2.2.4/validmind/tests/data_validation/nlp}/__init__.py +0 -0
  234. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/ClusterSizeDistribution.py +0 -0
  235. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/FeaturesAUC.py +0 -0
  236. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/ModelMetadata.py +0 -0
  237. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/RegressionResidualsPlot.py +0 -0
  238. {validmind-2.1.1/validmind/tests/model_validation/sklearn → validmind-2.2.4/validmind/tests/model_validation}/__init__.py +0 -0
  239. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/ClusterDistribution.py +0 -0
  240. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/CosineSimilarityDistribution.py +0 -0
  241. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/DescriptiveAnalytics.py +0 -0
  242. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/embeddings/EmbeddingsVisualization2D.py +0 -0
  243. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/AdjustedMutualInformation.py +0 -0
  244. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/AdjustedRandIndex.py +0 -0
  245. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ClassifierPerformance.py +0 -0
  246. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ClusterCosineSimilarity.py +0 -0
  247. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ClusterPerformance.py +0 -0
  248. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ClusterPerformanceMetrics.py +0 -0
  249. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/CompletenessScore.py +0 -0
  250. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ConfusionMatrix.py +0 -0
  251. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/FowlkesMallowsScore.py +0 -0
  252. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/HomogeneityScore.py +0 -0
  253. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/HyperParametersTuning.py +0 -0
  254. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/KMeansClustersOptimization.py +0 -0
  255. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/MinimumAccuracy.py +0 -0
  256. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/MinimumF1Score.py +0 -0
  257. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/MinimumROCAUCScore.py +0 -0
  258. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/ModelsPerformanceComparison.py +0 -0
  259. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/RegressionErrors.py +0 -0
  260. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/RegressionModelsPerformanceComparison.py +0 -0
  261. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/SilhouettePlot.py +0 -0
  262. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/TrainingTestDegradation.py +0 -0
  263. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/sklearn/VMeasure.py +0 -0
  264. {validmind-2.1.1/validmind/tests/model_validation/statsmodels → validmind-2.2.4/validmind/tests/model_validation/sklearn}/__init__.py +0 -0
  265. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ADF.py +0 -0
  266. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ADFTest.py +0 -0
  267. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/AutoARIMA.py +0 -0
  268. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/BoxPierce.py +0 -0
  269. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/CumulativePredictionProbabilities.py +0 -0
  270. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/DFGLSArch.py +0 -0
  271. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/DurbinWatsonTest.py +0 -0
  272. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/FeatureImportanceAndSignificance.py +0 -0
  273. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/GINITable.py +0 -0
  274. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/JarqueBera.py +0 -0
  275. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/KPSS.py +0 -0
  276. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/KolmogorovSmirnov.py +0 -0
  277. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/LJungBox.py +0 -0
  278. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/Lilliefors.py +0 -0
  279. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/PDRatingClassPlot.py +0 -0
  280. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/PhillipsPerronArch.py +0 -0
  281. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/PredictionProbabilitiesHistogram.py +0 -0
  282. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionCoeffsPlot.py +0 -0
  283. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionFeatureSignificance.py +0 -0
  284. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionModelSensitivityPlot.py +0 -0
  285. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RegressionPermutationFeatureImportance.py +0 -0
  286. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ResidualsVisualInspection.py +0 -0
  287. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/RunsTest.py +0 -0
  288. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ShapiroWilk.py +0 -0
  289. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/ZivotAndrewsArch.py +0 -0
  290. {validmind-2.1.1/validmind/tests/prompt_validation → validmind-2.2.4/validmind/tests/model_validation/statsmodels}/__init__.py +0 -0
  291. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/model_validation/statsmodels/statsutils.py +0 -0
  292. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Bias.py +0 -0
  293. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Clarity.py +0 -0
  294. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Conciseness.py +0 -0
  295. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Delimitation.py +0 -0
  296. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/NegativeInstruction.py +0 -0
  297. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Robustness.py +0 -0
  298. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/Specificity.py +0 -0
  299. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/prompt_validation/ai_powered_test.py +0 -0
  300. {validmind-2.1.1 → validmind-2.2.4}/validmind/tests/test_providers.py +0 -0
  301. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/classification/sklearn/Accuracy.py +0 -0
  302. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/classification/sklearn/F1.py +0 -0
  303. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/classification/sklearn/Precision.py +0 -0
  304. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/classification/sklearn/ROC_AUC.py +0 -0
  305. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/classification/sklearn/Recall.py +0 -0
  306. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/GiniCoefficient.py +0 -0
  307. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/HuberLoss.py +0 -0
  308. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/KolmogorovSmirnovStatistic.py +0 -0
  309. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/MeanAbsolutePercentageError.py +0 -0
  310. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/MeanBiasDeviation.py +0 -0
  311. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/QuantileLoss.py +0 -0
  312. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/sklearn/MeanAbsoluteError.py +0 -0
  313. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/sklearn/MeanSquaredError.py +0 -0
  314. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/sklearn/RSquaredScore.py +0 -0
  315. {validmind-2.1.1 → validmind-2.2.4}/validmind/unit_metrics/regression/sklearn/RootMeanSquaredError.py +0 -0
  316. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/figure.py +0 -0
  317. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/metric_result.py +0 -0
  318. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/output_template.py +0 -0
  319. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/result_summary.py +0 -0
  320. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/test.py +0 -0
  321. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test/threshold_test_result.py +0 -0
  322. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test_suite/runner.py +0 -0
  323. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test_suite/test.py +0 -0
  324. {validmind-2.1.1 → validmind-2.2.4}/validmind/vm_models/test_suite/test_suite.py +0 -0
@@ -1,14 +1,13 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: validmind
3
- Version: 2.1.1
3
+ Version: 2.2.4
4
4
  Summary: ValidMind Developer Framework
5
5
  License: Commercial License
6
6
  Author: Andres Rodriguez
7
7
  Author-email: andres@validmind.ai
8
- Requires-Python: >=3.8,<3.12
8
+ Requires-Python: >=3.8.1,<3.12
9
9
  Classifier: License :: Other/Proprietary License
10
10
  Classifier: Programming Language :: Python :: 3
11
- Classifier: Programming Language :: Python :: 3.8
12
11
  Classifier: Programming Language :: Python :: 3.9
13
12
  Classifier: Programming Language :: Python :: 3.10
14
13
  Classifier: Programming Language :: Python :: 3.11
@@ -26,6 +25,7 @@ Requires-Dist: evaluate (>=0.4.0,<0.5.0)
26
25
  Requires-Dist: ipywidgets (>=8.0.6,<9.0.0)
27
26
  Requires-Dist: kaleido (>=0.2.1,<0.3.0,!=0.2.1.post1)
28
27
  Requires-Dist: langdetect (>=1.0.9,<2.0.0)
28
+ Requires-Dist: latex2mathml (>=3.77.0,<4.0.0)
29
29
  Requires-Dist: levenshtein (>=0.21.1,<0.22.0) ; extra == "all" or extra == "llm"
30
30
  Requires-Dist: llvmlite (>=0.42.0) ; python_version >= "3.12"
31
31
  Requires-Dist: llvmlite ; python_version >= "3.8" and python_full_version <= "3.11.0"
@@ -43,6 +43,7 @@ Requires-Dist: polars (>=0.20.15,<0.21.0)
43
43
  Requires-Dist: pycocoevalcap (>=1.2,<2.0) ; extra == "all" or extra == "llm"
44
44
  Requires-Dist: pypmml (>=0.9.17,<0.10.0)
45
45
  Requires-Dist: python-dotenv (>=0.20.0,<0.21.0)
46
+ Requires-Dist: ragas (>=0.1.7,<0.2.0)
46
47
  Requires-Dist: rouge (>=1.0.1,<2.0.0)
47
48
  Requires-Dist: rpy2 (>=3.5.10,<4.0.0) ; extra == "all" or extra == "r-support"
48
49
  Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
@@ -55,6 +56,7 @@ Requires-Dist: sentry-sdk (>=1.24.0,<2.0.0)
55
56
  Requires-Dist: shap (>=0.42.0,<0.43.0)
56
57
  Requires-Dist: statsmodels (>=0.13.5,<0.14.0)
57
58
  Requires-Dist: tabulate (>=0.8.9,<0.9.0)
59
+ Requires-Dist: textblob (>=0.18.0.post0,<0.19.0)
58
60
  Requires-Dist: textstat (>=0.7.3,<0.8.0)
59
61
  Requires-Dist: torch (>=1.10.0) ; extra == "all" or extra == "llm" or extra == "pytorch"
60
62
  Requires-Dist: torchmetrics (>=1.1.1,<2.0.0) ; extra == "all" or extra == "llm"
@@ -10,9 +10,10 @@ description = "ValidMind Developer Framework"
10
10
  license = "Commercial License"
11
11
  name = "validmind"
12
12
  readme = "README.pypi.md"
13
- version = "2.1.1"
13
+ version = "2.2.4"
14
14
 
15
15
  [tool.poetry.dependencies]
16
+ python = ">=3.8.1,<3.12"
16
17
  aiohttp = {extras = ["speedups"], version = "^3.8.4"}
17
18
  arch = "^5.4.0"
18
19
  bert-score = "^0.3.13"
@@ -22,6 +23,7 @@ evaluate = "^0.4.0"
22
23
  ipywidgets = "^8.0.6"
23
24
  kaleido = "^0.2.1,!=0.2.1.post1"
24
25
  langdetect = "^1.0.9"
26
+ latex2mathml = "^3.77.0"
25
27
  levenshtein = {version = "^0.21.1", optional = true}
26
28
  llvmlite = [
27
29
  {version = "*", python = ">=3.8,<=3.11"},
@@ -42,8 +44,8 @@ plotly-express = "^0.4.1"
42
44
  polars = "^0.20.15"
43
45
  pycocoevalcap = {version = "^1.2", optional = true}
44
46
  pypmml = "^0.9.17"
45
- python = ">=3.8,<3.12"
46
47
  python-dotenv = "^0.20.0"
48
+ ragas = "^0.1.7"
47
49
  rouge = "^1.0.1"
48
50
  rpy2 = {version = "^3.5.10", optional = true}
49
51
  scikit-learn = "^1.0.2"
@@ -58,6 +60,7 @@ sentry-sdk = "^1.24.0"
58
60
  shap = "^0.42.0"
59
61
  statsmodels = "^0.13.5"
60
62
  tabulate = "^0.8.9"
63
+ textblob = "^0.18.0.post0"
61
64
  textstat = "^0.7.3"
62
65
  torch = {version = ">=1.10.0", optional = true}
63
66
  torchmetrics = {version = "^1.1.1", optional = true}
@@ -0,0 +1 @@
1
+ __version__ = "2.2.4"
@@ -8,45 +8,65 @@ import os
8
8
  from openai import AzureOpenAI, OpenAI
9
9
 
10
10
  SYSTEM_PROMPT = """
11
- You are an expert data scientist and MRM specialist tasked with providing concise and'
12
- objective insights based on the results of quantitative model or dataset analysis.
11
+ You are an expert data scientist and MRM specialist.
12
+ You are tasked with analyzing the results of a quantitative test run on some model or dataset.
13
+ Your goal is to create a test description that will act as part of the model documentation.
14
+ You will provide both the developer and other consumers of the documentation with a clear and concise "interpretation" of the results they will see.
15
+ The overarching theme to maintain is MRM documentation.
13
16
 
14
- Examine the provided statistical test results and compose a brief summary. Highlight crucial
15
- insights, focusing on the distribution characteristics, central tendencies (such as mean or median),
16
- and the variability (including standard deviation and range) of the metrics. Evaluate how
17
- these statistics might influence the development and performance of a predictive model. Identify
18
- and explain any discernible trends or anomalies in the test results.
19
-
20
- Your analysis will act as the description of the result in the model documentation.
17
+ Examine the provided statistical test results and compose a description of the results.
18
+ This will act as the description and interpretation of the result in the model documentation.
19
+ It will be displayed alongside the test results table and figures.
21
20
 
22
21
  Avoid long sentences and complex vocabulary.
23
22
  Structure the response clearly and logically.
24
- Use valid Markdown syntax to format the response (tables are supported).
23
+ Use valid Markdown syntax to format the response.
24
+ Respond only with your analysis and insights, not the verbatim test results.
25
+ Respond only with the markdown content, no explanation or context for your response is necessary.
25
26
  Use the Test ID that is provided to form the Test Name e.g. "ClassImbalance" -> "Class Imbalance".
27
+
28
+ Explain the test, its purpose, its mechanism/formula etc and why it is useful.
29
+ If relevant, provide a very brief description of the way this test is used in model/dataset evaluation and how it is interpreted.
30
+ Highlight the key insights from the test results. The key insights should be concise and easily understood.
31
+ End the response with any closing remarks, summary or additional useful information.
32
+
26
33
  Use the following format for the response (feel free to modify slightly if necessary):
27
34
  ```
28
- **<Test Name>** <continue to explain what it does in detail>...
35
+ **<Test Name>** calculates the xyz <continue to explain what it does in detail>...
36
+
37
+ This test is useful for <explain why and for what this test is useful>...
29
38
 
30
- The results of this test <detailed explanation of the results>...
39
+ **Key Insights:**
31
40
 
32
- In summary the following key insights can be gained:
41
+ The following key insights can be identified in the test results:
33
42
 
34
- - **<key insight 1 - title>**: <explanation of key insight 1>
43
+ - **<key insight 1 - title>**: <concise explanation of key insight 1>
35
44
  - ...<continue with any other key insights using the same format>
36
45
  ```
37
46
  It is very important that the text is nicely formatted and contains enough information to be useful to the user as documentation.
38
47
  """.strip()
48
+
49
+
39
50
  USER_PROMPT = """
40
- Test ID: {test_name}
41
- Test Description: {test_description}
42
- Test Results (the raw results of the test):
43
- {test_results}
44
- Test Summary (what the user sees in the documentation):
51
+ Test ID: `{test_name}`
52
+
53
+ <Test Docstring>
54
+ {test_description}
55
+ </Test Docstring>
56
+
57
+ <Test Results Summary>
45
58
  {test_summary}
59
+ </Test Results Summary>
46
60
  """.strip()
61
+
62
+
47
63
  USER_PROMPT_FIGURES = """
48
- Test ID: {test_name}
49
- Test Description: {test_description}
64
+ Test ID: `{test_name}`
65
+
66
+ <Test Docstring>
67
+ {test_description}
68
+ </Test Docstring>
69
+
50
70
  The attached plots show the results of the test.
51
71
  """.strip()
52
72
 
@@ -67,7 +87,7 @@ def __get_client_and_model():
67
87
 
68
88
  if "OPENAI_API_KEY" in os.environ:
69
89
  __client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
70
- __model = os.environ.get("VM_OPENAI_MODEL", "gpt-4-turbo")
90
+ __model = os.environ.get("VM_OPENAI_MODEL", "gpt-4o")
71
91
 
72
92
  elif "AZURE_OPENAI_KEY" in os.environ:
73
93
  if "AZURE_OPENAI_ENDPOINT" not in os.environ:
@@ -113,22 +133,41 @@ class DescriptionFuture:
113
133
  def generate_description_async(
114
134
  test_name: str,
115
135
  test_description: str,
116
- test_results: str,
117
136
  test_summary: str,
118
137
  figures: list = None,
119
138
  ):
120
139
  """Generate the description for the test results"""
121
- client, _ = __get_client_and_model()
140
+ if not test_summary and not figures:
141
+ raise ValueError("No summary or figures provided - cannot generate description")
122
142
 
143
+ client, _ = __get_client_and_model()
123
144
  # get last part of test id
124
145
  test_name = test_name.split(".")[-1]
125
146
 
126
- if not test_results and not test_summary:
127
- if not figures:
128
- raise ValueError("No results, summary or figures provided")
147
+ if test_summary:
148
+ return (
149
+ client.chat.completions.create(
150
+ model="gpt-4o",
151
+ messages=[
152
+ {"role": "system", "content": SYSTEM_PROMPT},
153
+ {
154
+ "role": "user",
155
+ "content": USER_PROMPT.format(
156
+ test_name=test_name,
157
+ test_description=test_description,
158
+ test_summary=test_summary,
159
+ ),
160
+ },
161
+ ],
162
+ )
163
+ .choices[0]
164
+ .message.content.strip("```")
165
+ .strip()
166
+ )
129
167
 
130
- response = client.chat.completions.create(
131
- model="gpt-4-turbo",
168
+ return (
169
+ client.chat.completions.create(
170
+ model="gpt-4o",
132
171
  messages=[
133
172
  {"role": "system", "content": SYSTEM_PROMPT},
134
173
  {
@@ -154,30 +193,15 @@ def generate_description_async(
154
193
  },
155
194
  ],
156
195
  )
157
- else:
158
- response = client.chat.completions.create(
159
- model="gpt-4-turbo",
160
- messages=[
161
- {"role": "system", "content": SYSTEM_PROMPT},
162
- {
163
- "role": "user",
164
- "content": USER_PROMPT.format(
165
- test_name=test_name,
166
- test_description=test_description,
167
- test_results=test_results,
168
- test_summary=test_summary,
169
- ),
170
- },
171
- ],
172
- )
173
-
174
- return response.choices[0].message.content.strip("```").strip()
196
+ .choices[0]
197
+ .message.content.strip("```")
198
+ .strip()
199
+ )
175
200
 
176
201
 
177
202
  def generate_description(
178
203
  test_name: str,
179
204
  test_description: str,
180
- test_results: str,
181
205
  test_summary: str,
182
206
  figures: list = None,
183
207
  ):
@@ -185,7 +209,6 @@ def generate_description(
185
209
  generate_description_async,
186
210
  test_name,
187
211
  test_description,
188
- test_results,
189
212
  test_summary,
190
213
  figures,
191
214
  )
@@ -16,14 +16,13 @@ from io import BytesIO
16
16
  from typing import Any, Callable, Dict, List, Optional, Tuple, Union
17
17
 
18
18
  import aiohttp
19
- import mistune
20
19
  import requests
21
20
  from aiohttp import FormData
22
21
 
23
22
  from .client_config import client_config
24
23
  from .errors import MissingAPICredentialsError, MissingProjectIdError, raise_api_error
25
24
  from .logging import get_logger, init_sentry, send_single_error
26
- from .utils import NumpyEncoder, run_async
25
+ from .utils import NumpyEncoder, md_to_html, run_async
27
26
  from .vm_models import Figure, MetricResult, ThresholdTestResults
28
27
 
29
28
  # TODO: can't import types from vm_models because of circular dependency
@@ -162,14 +161,20 @@ def __ping() -> Dict[str, Any]:
162
161
 
163
162
  init_sentry(client_info.get("sentry_config", {}))
164
163
 
164
+ # Only show this confirmation the first time we connect to the API
165
+ ack_connected = False
166
+ if client_config.project is None:
167
+ ack_connected = True
168
+
165
169
  client_config.project = client_info["project"]
166
170
  client_config.documentation_template = client_info.get("documentation_template", {})
167
171
  client_config.feature_flags = client_info.get("feature_flags", {})
168
172
 
169
- logger.info(
170
- f"Connected to ValidMind. Project: {client_config.project['name']}"
171
- f" ({client_config.project['cuid']})"
172
- )
173
+ if ack_connected:
174
+ logger.info(
175
+ f"Connected to ValidMind. Project: {client_config.project['name']}"
176
+ f" ({client_config.project['cuid']})"
177
+ )
173
178
 
174
179
 
175
180
  def reload():
@@ -344,7 +349,7 @@ async def log_metadata(
344
349
  """
345
350
  metadata_dict = {"content_id": content_id}
346
351
  if text is not None:
347
- metadata_dict["text"] = mistune.html(text)
352
+ metadata_dict["text"] = md_to_html(text, mathml=True)
348
353
  if _json is not None:
349
354
  metadata_dict["json"] = _json
350
355
 
@@ -359,7 +364,11 @@ async def log_metadata(
359
364
 
360
365
 
361
366
  async def log_metrics(
362
- metrics: List[MetricResult], inputs: List[str], output_template: str = None
367
+ metrics: List[MetricResult],
368
+ inputs: List[str],
369
+ output_template: str = None,
370
+ section_id: str = None,
371
+ position: int = None,
363
372
  ) -> Dict[str, Any]:
364
373
  """Logs metrics to ValidMind API.
365
374
 
@@ -367,6 +376,8 @@ async def log_metrics(
367
376
  metrics (list): A list of MetricResult objects
368
377
  inputs (list): A list of input keys (names) that were used to run the test
369
378
  output_template (str): The optional output template for the test
379
+ section_id (str): The section ID add a test driven block to the documentation
380
+ position (int): The position in the section to add the test driven block
370
381
 
371
382
  Raises:
372
383
  Exception: If the API call fails
@@ -374,7 +385,14 @@ async def log_metrics(
374
385
  Returns:
375
386
  dict: The response from the API
376
387
  """
388
+ params = {}
389
+ if section_id:
390
+ params["section_id"] = section_id
391
+ if position is not None:
392
+ params["position"] = position
393
+
377
394
  data = []
395
+
378
396
  for metric in metrics:
379
397
  metric_data = {
380
398
  **metric.serialize(),
@@ -389,6 +407,7 @@ async def log_metrics(
389
407
  try:
390
408
  return await _post(
391
409
  "log_metrics",
410
+ params=params,
392
411
  data=json.dumps(data, cls=NumpyEncoder, allow_nan=False),
393
412
  )
394
413
  except Exception as e:
@@ -397,7 +416,10 @@ async def log_metrics(
397
416
 
398
417
 
399
418
  async def log_test_result(
400
- result: ThresholdTestResults, inputs: List[str], dataset_type: str = "training"
419
+ result: ThresholdTestResults,
420
+ inputs: List[str],
421
+ section_id: str = None,
422
+ position: int = None,
401
423
  ) -> Dict[str, Any]:
402
424
  """Logs test results information
403
425
 
@@ -407,8 +429,8 @@ async def log_test_result(
407
429
  Args:
408
430
  result (validmind.ThresholdTestResults): A ThresholdTestResults object
409
431
  inputs (list): A list of input keys (names) that were used to run the test
410
- dataset_type (str, optional): The type of dataset. Can be one of
411
- "training", "test", or "validation". Defaults to "training".
432
+ section_id (str, optional): The section ID add a test driven block to the documentation
433
+ position (int): The position in the section to add the test driven block
412
434
 
413
435
  Raises:
414
436
  Exception: If the API call fails
@@ -416,10 +438,16 @@ async def log_test_result(
416
438
  Returns:
417
439
  dict: The response from the API
418
440
  """
441
+ params = {}
442
+ if section_id:
443
+ params["section_id"] = section_id
444
+ if position is not None:
445
+ params["position"] = position
446
+
419
447
  try:
420
448
  return await _post(
421
449
  "log_test_results",
422
- params={"dataset_type": dataset_type},
450
+ params=params,
423
451
  data=json.dumps(
424
452
  {
425
453
  **result.serialize(),
@@ -435,7 +463,7 @@ async def log_test_result(
435
463
 
436
464
 
437
465
  def log_test_results(
438
- results: List[ThresholdTestResults], inputs, dataset_type: str = "training"
466
+ results: List[ThresholdTestResults], inputs
439
467
  ) -> List[Callable[..., Dict[str, Any]]]:
440
468
  """Logs test results information
441
469
 
@@ -445,8 +473,6 @@ def log_test_results(
445
473
  Args:
446
474
  results (list): A list of ThresholdTestResults objects
447
475
  inputs (list): A list of input keys (names) that were used to run the test
448
- dataset_type (str, optional): The type of dataset. Can be one of "training",
449
- "test", or "validation". Defaults to "training".
450
476
 
451
477
  Raises:
452
478
  Exception: If the API call fails
@@ -457,7 +483,7 @@ def log_test_results(
457
483
  try:
458
484
  responses = [] # TODO: use asyncio.gather
459
485
  for result in results:
460
- responses.append(run_async(log_test_result, result, inputs, dataset_type))
486
+ responses.append(run_async(log_test_result, result, inputs))
461
487
  except Exception as e:
462
488
  logger.error("Error logging test results to ValidMind API")
463
489
  raise e
@@ -21,20 +21,20 @@ from .errors import (
21
21
  )
22
22
  from .input_registry import input_registry
23
23
  from .logging import get_logger
24
+ from .models.metadata import MetadataModel
24
25
  from .models.r_model import RModel
25
26
  from .template import get_template_test_suite
26
27
  from .template import preview_template as _preview_template
27
28
  from .test_suites import get_by_id as get_test_suite_by_id
28
29
  from .utils import get_dataset_info, get_model_info
29
30
  from .vm_models import TestInput, TestSuite, TestSuiteRunner
30
- from .vm_models.dataset import (
31
- DataFrameDataset,
32
- NumpyDataset,
33
- PolarsDataset,
34
- TorchDataset,
35
- VMDataset,
31
+ from .vm_models.dataset import DataFrameDataset, PolarsDataset, TorchDataset, VMDataset
32
+ from .vm_models.model import (
33
+ ModelAttributes,
34
+ VMModel,
35
+ get_model_class,
36
+ is_model_metadata,
36
37
  )
37
- from .vm_models.model import VMModel, get_model_class
38
38
 
39
39
  pd.option_context("format.precision", 2)
40
40
 
@@ -129,7 +129,7 @@ def init_dataset(
129
129
  )
130
130
  elif dataset_class == "ndarray":
131
131
  logger.info("Numpy ndarray detected. Initializing VM Dataset instance...")
132
- vm_dataset = NumpyDataset(
132
+ vm_dataset = VMDataset(
133
133
  input_id=input_id,
134
134
  raw_dataset=dataset,
135
135
  model=model,
@@ -175,8 +175,10 @@ def init_dataset(
175
175
 
176
176
 
177
177
  def init_model(
178
- model: object,
179
- input_id: str = None,
178
+ model: object = None,
179
+ input_id: str = "model",
180
+ attributes: dict = None,
181
+ predict_fn: callable = None,
180
182
  __log=True,
181
183
  ) -> VMModel:
182
184
  """
@@ -185,14 +187,13 @@ def init_model(
185
187
  also ensures we are creating a model supported libraries.
186
188
 
187
189
  Args:
188
- model: A trained model
189
- train_ds (vm.vm.Dataset): A training dataset (optional)
190
- test_ds (vm.vm.Dataset): A testing dataset (optional)
191
- validation_ds (vm.vm.Dataset): A validation dataset (optional)
190
+ model: A trained model or VMModel instance
192
191
  input_id (str): The input ID for the model (e.g. "my_model"). By default,
193
192
  this will be set to `model` but if you are passing this model as a
194
193
  test input using some other key than `model`, then you should set
195
194
  this to the same key.
195
+ attributes (dict): A dictionary of model attributes
196
+ predict_fn (callable): A function that takes an input and returns a prediction
196
197
 
197
198
  Raises:
198
199
  ValueError: If the model type is not supported
@@ -200,22 +201,64 @@ def init_model(
200
201
  Returns:
201
202
  vm.VMModel: A VM Model instance
202
203
  """
203
- class_obj = get_model_class(model=model)
204
- if not class_obj:
205
- raise UnsupportedModelError(
206
- f"Model type {class_obj} is not supported at the moment."
204
+ # vm_model = model if isinstance(model, VMModel) else None
205
+ # metadata = None
206
+
207
+ # if not vm_model:
208
+ # class_obj = get_model_class(model=model, predict_fn=predict_fn)
209
+ # if not class_obj:
210
+ # if not attributes:
211
+ # raise UnsupportedModelError(
212
+ # f"Model class {str(model.__class__)} is not supported at the moment."
213
+ # )
214
+ # elif not is_model_metadata(attributes):
215
+ # raise UnsupportedModelError(
216
+ # f"Model attributes {str(attributes)} are missing required keys 'architecture' and 'language'."
217
+ # )
218
+ vm_model = model if isinstance(model, VMModel) else None
219
+ class_obj = get_model_class(model=model, predict_fn=predict_fn)
220
+
221
+ if not vm_model and not class_obj:
222
+ if not attributes:
223
+ raise UnsupportedModelError(
224
+ f"Model class {str(model.__class__)} is not supported at the moment."
225
+ )
226
+
227
+ if not is_model_metadata(attributes):
228
+ raise UnsupportedModelError(
229
+ f"Model attributes {str(attributes)} are missing required keys 'architecture' and 'language'."
230
+ )
231
+
232
+ if isinstance(vm_model, VMModel):
233
+ vm_model.input_id = (
234
+ input_id if input_id != "model" else (vm_model.input_id or input_id)
207
235
  )
208
- input_id = input_id or "model"
209
- vm_model = class_obj(
210
- input_id=input_id,
211
- model=model, # Trained model instance
212
- attributes=None,
213
- )
236
+ metadata = get_model_info(vm_model)
237
+ elif hasattr(class_obj, "__name__") and class_obj.__name__ == "PipelineModel":
238
+ vm_model = class_obj(
239
+ pipeline=model,
240
+ input_id=input_id,
241
+ )
242
+ # TODO: Add metadata for pipeline model
243
+ metadata = get_model_info(vm_model)
244
+ elif class_obj:
245
+ vm_model = class_obj(
246
+ input_id=input_id,
247
+ model=model, # Trained model instance
248
+ predict_fn=predict_fn,
249
+ )
250
+ metadata = get_model_info(vm_model)
251
+ else:
252
+ vm_model = MetadataModel(
253
+ input_id=input_id, attributes=ModelAttributes.from_dict(attributes)
254
+ )
255
+ metadata = attributes
256
+
214
257
  if __log:
215
258
  log_input(
216
259
  name=input_id,
217
260
  type="model",
218
- metadata=get_model_info(vm_model),
261
+ metadata=metadata,
219
262
  )
220
263
 
221
264
  input_registry.add(key=input_id, obj=vm_model)
@@ -0,0 +1,11 @@
1
+ # Copyright © 2023-2024 ValidMind Inc. All rights reserved.
2
+ # See the LICENSE file in the root of this repository for details.
3
+ # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
4
+
5
+ """
6
+ Entrypoint for classification datasets.
7
+ """
8
+
9
+ __all__ = [
10
+ "rfp",
11
+ ]
@@ -0,0 +1,30 @@
1
+ Project_Title,RFP_Question_ID,question,ground_truth,Area,Last_Accessed_At,Requester,Status
2
+ Gen AI-Driven Financial Advisory System,1,"What is your experience in developing AI-based applications, and can you provide examples of successful projects?","Our company has 15 years of experience in developing AI-based applications, with a strong portfolio in sectors such as healthcare, finance, and education. For instance, our project MediAI Insight for the healthcare industry demonstrated significant achievements in patient data analysis, resulting in a 30% reduction in diagnostic errors and a 40% improvement in treatment personalization. Our platform has engaged over 200 healthcare facilities, achieving a user satisfaction rate of 95%.",General,18/12/2023,Bank A,Under Review
3
+ Gen AI-Driven Financial Advisory System,2,How do you ensure your AI-based apps remain up-to-date with the latest AI advancements and technologies?,"We maintain a dedicated R&D team focused on integrating the latest AI advancements into our applications. This includes regular updates and feature enhancements based on cutting-edge technologies such as GPT (Generative Pre-trained Transformer) for natural language understanding, CNNs (Convolutional Neural Networks) for advanced image recognition tasks, and DQN (Deep Q-Networks) for decision-making processes in complex environments. Our commitment to these AI methodologies ensures that our applications remain innovative, with capabilities that adapt to evolving market demands and client needs. This approach has enabled us to enhance the predictive accuracy of our financial forecasting tools by 25% and improve the efficiency of our educational content personalization by 40%",General,18/12/2023,Bank A,Under Review
4
+ Gen AI-Driven Financial Advisory System,3,Can your AI-based applications be customized to meet specific user or business needs?,"Absolutely, customization is a core aspect of our offering. We work closely with clients to understand their specific needs and tailor our AI algorithms and app functionalities accordingly, using technologies such as TensorFlow for machine learning models, React for responsive UI/UX designs, and Kubernetes for scalable cloud deployment. This personalized approach allows us to optimize AI functionalities to match unique business processes, enhancing user experience and operational efficiency for each client. For example, for a retail client, we customized our recommendation engine to increase customer retention by 20% through more accurate and personalized product suggestions.",General,18/12/2023,Bank A,Under Review
5
+ Gen AI-Driven Financial Advisory System,4,What measures do you take to ensure user privacy and data security in your AI-based apps?,"User privacy and data security are paramount. We implement robust measures such as end-to-end encryption to secure data transmissions, anonymization techniques to protect user identities, and comprehensive compliance with data protection laws like GDPR and CCPA. We also employ regular security audits and vulnerability assessments to ensure our systems are impenetrable. Additionally, our deployment of advanced intrusion detection systems and the use of secure coding practices reinforce our commitment to safeguarding user data at all times",General,18/12/2023,Bank A,Under Review
6
+ Gen AI-Driven Financial Advisory System,5,How do you approach user interface and experience design in AI-based apps to ensure ease of use and engagement?,"Our design philosophy centers on simplicity and intuitiveness. We conduct extensive user research and testing to inform our UI/UX designs, ensuring that our AI-based apps are accessible and engaging for all users, regardless of their technical expertise. This includes applying principles from human-centered design, utilizing accessibility guidelines such as WCAG 2.1, and conducting iterative testing with diverse user groups. Our commitment to inclusivity and usability leads to higher user adoption rates and satisfaction. For instance, feedback-driven enhancements in our visual design have improved user engagement by over 30% across our applications.",General,18/12/2023,Bank A,Under Review
7
+ Gen AI-Driven Financial Advisory System,6,Describe your support and maintenance services for AI-based applications post-launch.,"Post-launch, we offer comprehensive support and maintenance services, including regular updates, bug fixes, and performance optimization. Our support team is available 24/7 to assist with any issues or questions. We utilize a ticketing system that ensures swift response times, with an average initial response time of under 2 hours. Additionally, we provide monthly performance reports and hold quarterly reviews with clients to discuss system status and potential improvements. Our proactive approach includes using automated monitoring tools to detect and resolve issues before they impact users, ensuring that our applications perform optimally at all times. This service structure has been instrumental in maintaining a client satisfaction rate above 98%.",General,18/12/2023,Bank A,Under Review
8
+ Gen AI-Driven Financial Advisory System,7,How do you measure the success and impact of your AI-based applications on client objectives?,"Success measurement is tailored to each project's objectives. We establish key performance indicators (KPIs) in collaboration with our clients, such as user engagement rates, efficiency improvements, or return on investment (ROI). We then regularly review these metrics using advanced analytics platforms and business intelligence tools to assess the app’s impact. Our approach includes monthly performance analysis meetings where we provide detailed reports and insights on metrics like session duration, user retention rates, and cost savings achieved through automation. We also implement A/B testing to continuously refine and optimize the application based on real-world usage data, ensuring that we make data-driven improvements that align closely with our clients' strategic goals.",General,18/12/2023,Bank A,Under Review
9
+ Gen AI-Driven Financial Advisory System,8,"How do you ensure the ethical use of LLMs in your applications, particularly regarding bias mitigation and data privacy?","We adhere to ethical AI practices by implementing bias detection and mitigation techniques during the training of our Large Language Models (LLMs). This involves using diverse datasets to prevent skewed results and deploying algorithms specifically designed to identify and correct bias in AI outputs. For data privacy, we employ data anonymization and secure data handling protocols, ensuring compliance with GDPR, CCPA, and other relevant regulations. Our systems use state-of-the-art encryption methods for data at rest and in transit, and our data governance policies are rigorously audited by third-party security firms to maintain high standards of data integrity and confidentiality. This commitment extends to regular training for our staff on the latest privacy laws and ethical AI use to ensure that our practices are up-to-date and effective.",Large Language Models,18/12/2023,Bank A,Under Review
10
+ Gen AI-Driven Financial Advisory System,9,"Can you describe the process of training your LLMs, including data sourcing, model selection, and validation methods?","Our LLM training process begins with the meticulous sourcing of diverse and comprehensive datasets from global sources, ensuring a rich variety that includes various languages, dialects, and cultural contexts. This diversity is critical for building models that perform equitably across different demographics. We leverage cutting-edge tools like Apache Kafka for real-time data streaming and Apache Hadoop for handling large datasets efficiently during preprocessing stages. For model architecture selection, we utilize TensorFlow and PyTorch frameworks to design and iterate on neural network structures that best suit each application's unique requirements, whether it's for predictive analytics in finance or customer service chatbots. Depending on the use case, we might choose from a variety of architectures such as Transformer models for their robust handling of sequential data or GANs (Generative Adversarial Networks) for generating new, synthetic data samples for training.",Large Language Models,18/12/2023,Bank A,Under Review
11
+ Gen AI-Driven Financial Advisory System,10,How do you handle the continuous learning and updating of your LLMs to adapt to new data and evolving user needs?,"We implement advanced continuous learning mechanisms that allow our Large Language Models (LLMs) to adapt over time by incorporating new data and feedback loops, ensuring our models remain current and effective. We utilize incremental learning techniques where the model is periodically updated with fresh data without the need for retraining from scratch. This is facilitated by employing online learning algorithms such as Online Gradient Descent, which can quickly adjust model weights in response to new information.
12
+ To efficiently manage this continuous learning process, we use tools like Apache Spark for handling large-scale data processing in a distributed computing environment. This allows for seamless integration of new data streams into our training datasets. We also implement active learning cycles where the models request human feedback on specific outputs that are uncertain, thus refining model predictions over time based on actual user interactions and feedback.
13
+ Additionally, we incorporate reinforcement learning techniques where models are rewarded for improvements in performance metrics like accuracy and user engagement. This helps in fine-tuning the models' responses based on what is most effective in real-world scenarios.
14
+ For monitoring and managing these updates, we use TensorFlow Extended (TFX) for a robust end-to-end platform that ensures our models are consistently validated against performance benchmarks before being deployed. This continuous adaptation framework guarantees that our LLMs are not only keeping pace with evolving user needs and preferences but are also progressively enhancing their relevance and effectiveness.",Large Language Models,18/12/2023,Bank A,Under Review
15
+ Gen AI-Driven Financial Advisory System,11,What measures do you take to ensure the transparency and explainability of decisions made by your LLMs?,"We prioritize transparency and explainability in our AI models by incorporating advanced features such as model interpretability layers and providing comprehensive documentation on how model decisions are made. This approach ensures that users can understand and trust the outputs of our Large Language Models (LLMs). To achieve this, we integrate tools like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) into our models. These tools allow us to break down and communicate the reasoning behind each model decision, fostering trust and facilitating easier audits by stakeholders.",Large Language Models,18/12/2023,Bank A,Under Review
16
+ Gen AI-Driven Financial Advisory System,12,How do you assess and ensure the performance and scalability of your LLMs in high-demand scenarios?,"We conduct extensive performance testing under various load conditions to assess scalability and ensure our LLMs can handle high-demand scenarios efficiently. This involves using tools like Apache JMeter and LoadRunner to simulate different levels of user interaction and data volume, allowing us to evaluate how our systems perform under stress. Additionally, we employ scalable cloud infrastructure, utilizing services like Amazon Web Services (AWS) Elastic Compute Cloud (EC2) and Google Cloud Platform (GCP) Compute Engine, which support dynamic scaling. Optimization techniques such as auto-scaling groups and load balancers are implemented to ensure that our resources adjust automatically based on real-time demands, providing both robustness and cost efficiency.",Large Language Models,18/12/2023,Bank A,Under Review
17
+ Gen AI-Driven Financial Advisory System,13,"Can you provide examples of successful deployments of your LLM-based applications, including the challenges faced and how they were addressed?","We can share case studies of successful LLM-based application deployments, highlighting specific challenges such as data scarcity or model interpretability, and detailing the strategies and solutions we implemented to overcome these challenges. For example, in a project involving natural language processing for a legal firm, we faced significant data scarcity. To address this, we employed techniques like synthetic data generation and transfer learning from related domains to enrich our training datasets. Additionally, the issue of model interpretability was critical for our client’s trust and regulatory compliance. We tackled this by integrating SHAP (SHapley Additive exPlanations) to provide clear, understandable insights into how our model's decisions were made, ensuring transparency and boosting user confidence in the AI system.",Large Language Models,18/12/2023,Bank A,Under Review
18
+ Gen AI-Driven Financial Advisory System,14,What is your approach to integrating LLMs with existing systems and workflows within an organization?,"Our approach involves conducting a thorough analysis of the existing systems and workflows, designing integration plans that minimize disruption, and using APIs and custom connectors to ensure seamless integration of our LLM-based applications. We start by meticulously mapping the client's current infrastructure and operational flows to identify the most efficient points of integration. This is followed by the development of tailored integration plans that prioritize operational continuity and minimize downtime. To achieve seamless integration, we utilize robust APIs and develop custom connectors where necessary, ensuring compatibility with existing software platforms and databases. These tools allow for the smooth transfer of data and maintain the integrity and security of the system, ensuring that the new AI capabilities enhance functionality without compromising existing processes.",Large Language Models,18/12/2023,Bank A,Under Review
19
+ Gen AI-Driven Financial Advisory System,15,"How do you plan to support and maintain LLM-based applications post-deployment, including handling model drift and providing updates?","Our post-deployment support is designed to ensure sustained performance and relevance of our LLM-based applications. We actively monitor for model drift to detect and address any degradation in model accuracy over time due to changes in underlying data patterns. This includes implementing automated systems that alert our team to potential drifts, allowing for timely interventions. Regular model updates and improvements are also part of our support protocol, ensuring that our solutions adapt to new data and evolving industry standards. Additionally, our dedicated technical support team is available to swiftly address any operational issues or adapt to changes in client requirements. This comprehensive support structure guarantees that our applications continue to deliver optimal performance and align with our clients' strategic objectives.",Large Language Models,18/12/2023,Bank A,Under Review
20
+ Gen AI-Driven Financial Advisory System,16,How does your AI solution align with the NIST AI RMF's guidelines for trustworthy and responsible AI?,"Our AI solution is meticulously designed to align with the NIST AI Risk Management Framework (RMF) guidelines, ensuring adherence to principles of trustworthiness and responsibility. We have implemented comprehensive governance structures that oversee the ethical development and deployment of our AI technologies. This includes risk identification and assessment processes where potential risks are analyzed and categorized at each stage of the AI lifecycle. To manage these risks, we have instituted robust risk management controls that are deeply integrated into our development and operational processes. These controls are based on the NIST framework’s best practices, ensuring that our AI solutions are not only effective but also secure and ethical, maintaining transparency and accountability at all times.",AI Regulation,18/12/2023,Bank A,Under Review
21
+ Gen AI-Driven Financial Advisory System,17,Can you describe the governance structures you have in place to manage AI risks as recommended by the NIST AI RMF?,"We have established an AI Risk Council that plays a pivotal role in overseeing AI risk management across our organization. This council is tasked with defining clear roles and responsibilities for AI governance, ensuring that there is a structured approach to managing AI risks. It also integrates AI risk management into our existing governance frameworks to enhance coherence and alignment with broader corporate policies and objectives. Additionally, the AI Risk Council promotes robust collaboration between various business units and our IT department. This collaboration is crucial for sharing insights, aligning strategies, and implementing comprehensive risk management practices effectively across the entire organization. This framework not only supports proactive risk management but also fosters an environment where AI technologies are used responsibly and ethically.",AI Regulation,18/12/2023,Bank A,Under Review
22
+ Gen AI-Driven Financial Advisory System,18,How do you identify and assess AI risks in line with the NIST AI RMF's 'Map' function?,"We conduct thorough assessments of AI systems and the people using AI within our organization. This process involves meticulously identifying potential risks such as data privacy, security, bias, and legal compliance. We assess both the impact and the likelihood of each identified risk to effectively prioritize them. Our approach includes the use of sophisticated tools and methodologies, such as risk matrices and scenario analysis, to quantify and categorize risks accurately. This comprehensive assessment enables us to develop targeted risk mitigation strategies and allocate resources more efficiently, ensuring that the most critical risks are addressed promptly and effectively. This proactive risk management practice helps us maintain the integrity of our AI systems and uphold our ethical and legal responsibilities.",AI Regulation,18/12/2023,Bank A,Under Review
23
+ Gen AI-Driven Financial Advisory System,19,"What measures do you take to ensure transparency and explainability in AI decision-making, as emphasized by the NIST AI RMF?","We prioritize transparency by incorporating explainability features into our AI models, providing detailed documentation on the decision-making processes, and ensuring that stakeholders can understand and trust the outputs of our AI systems. To achieve this, we integrate explainability tools like feature importance scores and decision trees that clearly outline how and why decisions are made by our AI. We supplement these technical tools with comprehensive documentation that describes the algorithms' functions in accessible language. This approach is designed to demystify the AI's operations for non-technical stakeholders, fostering a higher level of trust and acceptance. By ensuring that our AI systems are transparent and their workings understandable, we maintain open communication and build confidence among users and regulators alike.",AI Regulation,18/12/2023,Bank A,Under Review
24
+ Gen AI-Driven Financial Advisory System,20,"How do you track and measure exposure to AI risks, and what metrics do you use, as suggested by the NIST AI RMF's 'Measure' function?","We have developed a set of Key Performance Indicators (KPIs) and metrics specifically designed to assess and analyze AI risk exposure across our systems. These metrics are tracked continuously to provide a clear, quantifiable measure of risk at any given time. To streamline this process, we utilize AI risk assessment tools that automate both data collection and analysis, enhancing the accuracy and efficiency of our monitoring efforts.
25
+ These tools employ advanced analytics to detect subtle shifts in risk patterns, enabling proactive risk management. Regular updates to our risk assessment protocols ensure that they remain aligned with current threat landscapes and regulatory requirements. This systematic monitoring and analysis not only help us maintain control over AI risks but also ensure that we can respond swiftly and effectively to any changes in risk levels, keeping our AI systems secure and compliant.",AI Regulation,18/12/2023,Bank A,Under Review
26
+ Gen AI-Driven Financial Advisory System,21,Describe how your AI solutions manage and mitigate identified risks in accordance with the NIST AI RMF's 'Manage' function.,"We implement and maintain robust risk management controls to mitigate identified risks effectively. This comprehensive approach includes regular updates to our AI models to address evolving challenges and improve performance. We also implement stringent security measures, such as encryption, access controls, and continuous monitoring systems, to safeguard our data and systems from unauthorized access and potential breaches.
27
+ Furthermore, ensuring compliance with data protection laws is a critical part of our risk management strategy. We stay abreast of legal requirements in all operational jurisdictions, such as GDPR in Europe and CCPA in California, and integrate compliance measures into our AI deployments. Regular audits, both internal and by third-party assessors, help ensure that our practices are up-to-date and that we maintain the highest standards of data privacy and security. This holistic approach to risk management enables us to maintain trust and reliability in our AI applications.",AI Regulation,18/12/2023,Bank A,Under Review
28
+ Gen AI-Driven Financial Advisory System,22,How do you ensure that your AI solutions are compliant with U.S. regulations on data privacy and security?,"We ensure compliance with U.S. regulations such as the Federal Information Security Modernization Act (FISMA) and other applicable laws and directives by adopting a risk-based approach to control selection and specification. This approach meticulously considers the constraints and requirements imposed by these regulations. We conduct regular audits and assessments to verify that our security controls meet or exceed the stipulated standards, ensuring that all our data handling and processing activities are fully compliant.
29
+ Our compliance framework is designed to adapt to the specific needs of the environments in which our systems operate, integrating best practices and guidance from regulatory bodies. We also engage with legal and compliance experts to stay updated on any changes in legislation, ensuring our practices remain in line with the latest requirements. This proactive and informed approach allows us to manage risk effectively while maintaining the highest levels of data protection and security as mandated by U.S. law.",AI Regulation,18/12/2023,Bank A,Under Review
30
+ Gen AI-Driven Financial Advisory System,23,"In what ways do you contribute to the continual improvement of AI risk management practices, as envisioned by the NIST AI RMF?","We actively participate in industry working groups and public-private partnerships to contribute to the continual improvement of AI risk management practices. Our engagement in these collaborative efforts not only allows us to share our insights and strategies but also enables us to learn from the collective experiences of the industry, helping to elevate the standards of AI safety and reliability across the board. Additionally, we stay abreast of updates to the NIST AI Risk Management Framework (RMF) and adjust our practices accordingly. This commitment to staying current ensures that our risk management approaches align with the latest guidelines and best practices, reinforcing our dedication to leading-edge, responsible AI development and deployment.",AI Regulation,18/12/2023,Bank A,Under Review