ai-critic 0.2.4__tar.gz → 0.2.5__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ai_critic-0.2.5/PKG-INFO +200 -0
- ai_critic-0.2.5/README.md +190 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/critic.py +91 -0
- ai_critic-0.2.5/ai_critic.egg-info/PKG-INFO +200 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/pyproject.toml +1 -1
- ai_critic-0.2.4/PKG-INFO +0 -76
- ai_critic-0.2.4/README.md +0 -66
- ai_critic-0.2.4/ai_critic.egg-info/PKG-INFO +0 -76
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/__init__.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/__init__.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/config.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/data.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/performance.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/robustness.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/summary.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic/evaluators/validation.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic.egg-info/SOURCES.txt +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic.egg-info/dependency_links.txt +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic.egg-info/requires.txt +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/ai_critic.egg-info/top_level.txt +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/setup.cfg +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/test/test_in_ia.py +0 -0
- {ai_critic-0.2.4 → ai_critic-0.2.5}/test/test_model.py +0 -0
ai_critic-0.2.5/PKG-INFO
ADDED
|
@@ -0,0 +1,200 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ai-critic
|
|
3
|
+
Version: 0.2.5
|
|
4
|
+
Summary: Fast AI evaluator for scikit-learn models
|
|
5
|
+
Author-email: Luiz Seabra <filipedemarco@yahoo.com>
|
|
6
|
+
Requires-Python: >=3.9
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
Requires-Dist: numpy
|
|
9
|
+
Requires-Dist: scikit-learn
|
|
10
|
+
|
|
11
|
+
# ai-critic 🧠: The Quality Gate for Machine Learning Models
|
|
12
|
+
|
|
13
|
+
**ai-critic** is a specialized **decision-making** tool designed to audit the reliability and readiness for deployment of scikit-learn compatible Machine Learning models.
|
|
14
|
+
|
|
15
|
+
Instead of just measuring performance (accuracy, F1 score), **ai-critic** acts as a "Quality Gate," operating the model in search of hidden risks that can lead to production failures, such as data leaks, structural overfitting, and vulnerability to noise.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## 🚀 1. Getting Started (The Basics)
|
|
20
|
+
|
|
21
|
+
This section is ideal for beginners who need a quick verdict on the health of their model.
|
|
22
|
+
|
|
23
|
+
### 1.1. Installation
|
|
24
|
+
|
|
25
|
+
Install the library directly from PyPI:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install ai-critic
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### 1.2. The Quick Verdict
|
|
32
|
+
|
|
33
|
+
With just a few lines, you can get an executive evaluation and a deployment recommendation.
|
|
34
|
+
|
|
35
|
+
```python
|
|
36
|
+
from ai_critic import AICritic
|
|
37
|
+
from sklearn.ensemble import RandomForestClassifier
|
|
38
|
+
from sklearn.datasets import make_classification
|
|
39
|
+
|
|
40
|
+
# 1. Prepare your data and model
|
|
41
|
+
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
|
|
42
|
+
model = RandomForestClassifier(max_depth=5, random_state=42)
|
|
43
|
+
|
|
44
|
+
# 2. Initialize Criticism
|
|
45
|
+
# AICritic performs all audits internally
|
|
46
|
+
critic = AICritic(model, X, y)
|
|
47
|
+
|
|
48
|
+
# 3. Obtain the Executive Summary
|
|
49
|
+
report = critic.evaluate(view="executive")
|
|
50
|
+
|
|
51
|
+
print(f"Verdict: {report['verdict']}")
|
|
52
|
+
print(f"Risk: {report['risk_level']}")
|
|
53
|
+
print(f"Reason Main: {report['main_reason']}")
|
|
54
|
+
|
|
55
|
+
#Expected Output:
|
|
56
|
+
|
|
57
|
+
# Verdict: ✅ Acceptable
|
|
58
|
+
# Risk: Low
|
|
59
|
+
# Main Reason: No critic risks detected.
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## 💡 2. Understanding the Critique (The Intermediary)
|
|
66
|
+
|
|
67
|
+
For the data scientist who needs to understand *why* the model received a verdict and what the next steps are.
|
|
68
|
+
|
|
69
|
+
### 2.1. The Four Pillars of the Audit
|
|
70
|
+
|
|
71
|
+
The **ai-critic** evaluates your model across four critic dimensions.
|
|
72
|
+
|
|
73
|
+
| Category | Main Risk | Code Module |
|
|
74
|
+
| :--- | :--- | :--- |
|
|
75
|
+
| 📈 **Validation** | Suspicious CV Scores | `ai_critic.performance` |
|
|
76
|
+
| 🧪 **Robustness** | Noise Vulnerability | `ai_critic.robustness` |
|
|
77
|
+
|
|
78
|
+
2.2. Visual and Technical Analysis
|
|
79
|
+
|
|
80
|
+
The `evaluate` method allows you to view the results and access the complete technical report.
|
|
81
|
+
|
|
82
|
+
```Python
|
|
83
|
+
# Continuing the previous example...
|
|
84
|
+
|
|
85
|
+
# 1. Generate the full report and visualizations
|
|
86
|
+
# plot=True generates Correlation, Learning Curve, and Robustness graphs
|
|
87
|
+
full_report = critic.evaluate(view="all", plot=True)
|
|
88
|
+
|
|
89
|
+
# 2. Access the Technical Summary for Recommendations
|
|
90
|
+
technical_summary = full_report["technical"]
|
|
91
|
+
|
|
92
|
+
print("\n--- Technical Recommendations ---")
|
|
93
|
+
for i, risk in enumerate(technical_summary["key_risks"]):
|
|
94
|
+
print(f"Risk {i+1}: {risk}")
|
|
95
|
+
print(f"Recommendation: {technical_summary['recommendations'][i]}")
|
|
96
|
+
|
|
97
|
+
# Example of Risk (if there were one):
|
|
98
|
+
# Risk 1: The depth of the tree may be too high for the size of the dataset.
|
|
99
|
+
|
|
100
|
+
# Recommendation: Reduce model complexity or adjust hyperparameters.
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
###2.3. Robustness Test
|
|
104
|
+
|
|
105
|
+
A robust model should maintain its performance even with small disturbances in the data. The `ai-critic` test assesses this by injecting noise into the input data.
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
# Accessing the specific result of the Robustness module
|
|
109
|
+
robustness_result = full_report["details"]["robustness"]
|
|
110
|
+
|
|
111
|
+
print("\n--- Robustness Test ---")
|
|
112
|
+
print(f"Original CV Score: {robustness_result['cv_score_original']:.4f}")
|
|
113
|
+
print(f"CV Score with Noise: {robustness_result['cv_score_noisy']:.4f}")
|
|
114
|
+
print(f"Performance Drop: {robustness_result['performance_drop']:.4f}")
|
|
115
|
+
print(f"Robustness Verdict: {robustness_result['verdict']}")
|
|
116
|
+
|
|
117
|
+
# Possible Verdicts:
|
|
118
|
+
# - Stable: Acceptable drop.
|
|
119
|
+
|
|
120
|
+
# - Fragile: Significant drop (risk).
|
|
121
|
+
|
|
122
|
+
# - Misleading: Original performance inflated by leakage.
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## ⚙️ 3. Integration and Governance (The Advanced)
|
|
129
|
+
|
|
130
|
+
This section is for MLOps engineers and architects looking to integrate **ai-critic** into automated pipelines and create custom deployment logic.
|
|
131
|
+
|
|
132
|
+
###3.1. The Deployment Gate (`deploy_decision`)
|
|
133
|
+
|
|
134
|
+
The `deploy_decision()` method is the final control point. It returns a structured object that classifies problems into *Hard Blockers* (prevent deployment) and *Soft Blockers* (require attention, but can be accepted with reservations).
|
|
135
|
+
|
|
136
|
+
Python
|
|
137
|
+
# Example of use in a CI/CD pipeline
|
|
138
|
+
decision = critic.deploy_decision()
|
|
139
|
+
|
|
140
|
+
if decision["deploy"]:
|
|
141
|
+
print("✅ Deployment Approved. Risk Level: Low.")
|
|
142
|
+
other:
|
|
143
|
+
print(f"❌ Deployment Blocked. Risk Level: {decision['risk_level'].upper()}")
|
|
144
|
+
print("Blocking Issues:")
|
|
145
|
+
for issue in decision["blocking_issues"]:
|
|
146
|
+
print(f"- {problem}")
|
|
147
|
+
|
|
148
|
+
# The decision object also includes a heuristic confidence score (0.0 to 1.0)
|
|
149
|
+
print(f"Heuristic Confidence in Model: {decision['confidence']:.2f}")
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
###3.2. AccessFor custom *governance* rules or logic, you can access the raw data of each module through the `"details"` view.
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
# Accessing Data Leakage Details
|
|
157
|
+
data_details = critic.evaluate(view="details")["data"]
|
|
158
|
+
|
|
159
|
+
if data_details["data_leakage"]["suspected"]:
|
|
160
|
+
|
|
161
|
+
print("\n--- Data Leak Alert ---")
|
|
162
|
+
|
|
163
|
+
for detail in data_details["data_leakage"]["details"]:
|
|
164
|
+
|
|
165
|
+
print(f"Feature {detail['feature_index']} with correlation of {detail['correlation']:.4f}")
|
|
166
|
+
|
|
167
|
+
# Accessing Structural Overfitting Details
|
|
168
|
+
config_details = critic.evaluate(view="details")["config"]
|
|
169
|
+
|
|
170
|
+
if config_details["structural_warnings"]:
|
|
171
|
+
|
|
172
|
+
print("\n--- Structural Alert ---")
|
|
173
|
+
|
|
174
|
+
for warning in config_details["structural_warnings"]:
|
|
175
|
+
|
|
176
|
+
print(f"Warning: {warning['message']} (Max Depth: {warning['max_depth']}, Recommended: {warning['recommended_max_depth']})")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### 3.3. Best Practices and Use Cases
|
|
180
|
+
|
|
181
|
+
| Use | Recommended Action |
|
|
182
|
+
| :--- | :--- |
|
|
183
|
+
| **CI/CD** | Use `deploy_decision()` as an automated quality gate. |
|
|
184
|
+
| **Tuning** | Use the technical view to guide hyperparameter optimization. |
|
|
185
|
+
| **Governance** | Log the details view for auditing and compliance. |
|
|
186
|
+
| **Communication** | Use the executive view to report risks to non-technical stakeholders. |
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## 📄 License
|
|
191
|
+
|
|
192
|
+
Distributed under the **MIT License**.
|
|
193
|
+
|
|
194
|
+
--
|
|
195
|
+
|
|
196
|
+
## 🧠 Final Note
|
|
197
|
+
|
|
198
|
+
> **ai-critic** is not a benchmarking tool. It's a decision-making tool.
|
|
199
|
+
|
|
200
|
+
If a model fails here, it doesn't mean it's "bad," but rather that it **shouldn't be trusted yet**. The goal is to inject the necessary skepticism to build truly robust AI systems.
|
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# ai-critic 🧠: The Quality Gate for Machine Learning Models
|
|
2
|
+
|
|
3
|
+
**ai-critic** is a specialized **decision-making** tool designed to audit the reliability and readiness for deployment of scikit-learn compatible Machine Learning models.
|
|
4
|
+
|
|
5
|
+
Instead of just measuring performance (accuracy, F1 score), **ai-critic** acts as a "Quality Gate," operating the model in search of hidden risks that can lead to production failures, such as data leaks, structural overfitting, and vulnerability to noise.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 🚀 1. Getting Started (The Basics)
|
|
10
|
+
|
|
11
|
+
This section is ideal for beginners who need a quick verdict on the health of their model.
|
|
12
|
+
|
|
13
|
+
### 1.1. Installation
|
|
14
|
+
|
|
15
|
+
Install the library directly from PyPI:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pip install ai-critic
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### 1.2. The Quick Verdict
|
|
22
|
+
|
|
23
|
+
With just a few lines, you can get an executive evaluation and a deployment recommendation.
|
|
24
|
+
|
|
25
|
+
```python
|
|
26
|
+
from ai_critic import AICritic
|
|
27
|
+
from sklearn.ensemble import RandomForestClassifier
|
|
28
|
+
from sklearn.datasets import make_classification
|
|
29
|
+
|
|
30
|
+
# 1. Prepare your data and model
|
|
31
|
+
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
|
|
32
|
+
model = RandomForestClassifier(max_depth=5, random_state=42)
|
|
33
|
+
|
|
34
|
+
# 2. Initialize Criticism
|
|
35
|
+
# AICritic performs all audits internally
|
|
36
|
+
critic = AICritic(model, X, y)
|
|
37
|
+
|
|
38
|
+
# 3. Obtain the Executive Summary
|
|
39
|
+
report = critic.evaluate(view="executive")
|
|
40
|
+
|
|
41
|
+
print(f"Verdict: {report['verdict']}")
|
|
42
|
+
print(f"Risk: {report['risk_level']}")
|
|
43
|
+
print(f"Reason Main: {report['main_reason']}")
|
|
44
|
+
|
|
45
|
+
#Expected Output:
|
|
46
|
+
|
|
47
|
+
# Verdict: ✅ Acceptable
|
|
48
|
+
# Risk: Low
|
|
49
|
+
# Main Reason: No critic risks detected.
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## 💡 2. Understanding the Critique (The Intermediary)
|
|
56
|
+
|
|
57
|
+
For the data scientist who needs to understand *why* the model received a verdict and what the next steps are.
|
|
58
|
+
|
|
59
|
+
### 2.1. The Four Pillars of the Audit
|
|
60
|
+
|
|
61
|
+
The **ai-critic** evaluates your model across four critic dimensions.
|
|
62
|
+
|
|
63
|
+
| Category | Main Risk | Code Module |
|
|
64
|
+
| :--- | :--- | :--- |
|
|
65
|
+
| 📈 **Validation** | Suspicious CV Scores | `ai_critic.performance` |
|
|
66
|
+
| 🧪 **Robustness** | Noise Vulnerability | `ai_critic.robustness` |
|
|
67
|
+
|
|
68
|
+
2.2. Visual and Technical Analysis
|
|
69
|
+
|
|
70
|
+
The `evaluate` method allows you to view the results and access the complete technical report.
|
|
71
|
+
|
|
72
|
+
```Python
|
|
73
|
+
# Continuing the previous example...
|
|
74
|
+
|
|
75
|
+
# 1. Generate the full report and visualizations
|
|
76
|
+
# plot=True generates Correlation, Learning Curve, and Robustness graphs
|
|
77
|
+
full_report = critic.evaluate(view="all", plot=True)
|
|
78
|
+
|
|
79
|
+
# 2. Access the Technical Summary for Recommendations
|
|
80
|
+
technical_summary = full_report["technical"]
|
|
81
|
+
|
|
82
|
+
print("\n--- Technical Recommendations ---")
|
|
83
|
+
for i, risk in enumerate(technical_summary["key_risks"]):
|
|
84
|
+
print(f"Risk {i+1}: {risk}")
|
|
85
|
+
print(f"Recommendation: {technical_summary['recommendations'][i]}")
|
|
86
|
+
|
|
87
|
+
# Example of Risk (if there were one):
|
|
88
|
+
# Risk 1: The depth of the tree may be too high for the size of the dataset.
|
|
89
|
+
|
|
90
|
+
# Recommendation: Reduce model complexity or adjust hyperparameters.
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
###2.3. Robustness Test
|
|
94
|
+
|
|
95
|
+
A robust model should maintain its performance even with small disturbances in the data. The `ai-critic` test assesses this by injecting noise into the input data.
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
# Accessing the specific result of the Robustness module
|
|
99
|
+
robustness_result = full_report["details"]["robustness"]
|
|
100
|
+
|
|
101
|
+
print("\n--- Robustness Test ---")
|
|
102
|
+
print(f"Original CV Score: {robustness_result['cv_score_original']:.4f}")
|
|
103
|
+
print(f"CV Score with Noise: {robustness_result['cv_score_noisy']:.4f}")
|
|
104
|
+
print(f"Performance Drop: {robustness_result['performance_drop']:.4f}")
|
|
105
|
+
print(f"Robustness Verdict: {robustness_result['verdict']}")
|
|
106
|
+
|
|
107
|
+
# Possible Verdicts:
|
|
108
|
+
# - Stable: Acceptable drop.
|
|
109
|
+
|
|
110
|
+
# - Fragile: Significant drop (risk).
|
|
111
|
+
|
|
112
|
+
# - Misleading: Original performance inflated by leakage.
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## ⚙️ 3. Integration and Governance (The Advanced)
|
|
119
|
+
|
|
120
|
+
This section is for MLOps engineers and architects looking to integrate **ai-critic** into automated pipelines and create custom deployment logic.
|
|
121
|
+
|
|
122
|
+
###3.1. The Deployment Gate (`deploy_decision`)
|
|
123
|
+
|
|
124
|
+
The `deploy_decision()` method is the final control point. It returns a structured object that classifies problems into *Hard Blockers* (prevent deployment) and *Soft Blockers* (require attention, but can be accepted with reservations).
|
|
125
|
+
|
|
126
|
+
Python
|
|
127
|
+
# Example of use in a CI/CD pipeline
|
|
128
|
+
decision = critic.deploy_decision()
|
|
129
|
+
|
|
130
|
+
if decision["deploy"]:
|
|
131
|
+
print("✅ Deployment Approved. Risk Level: Low.")
|
|
132
|
+
other:
|
|
133
|
+
print(f"❌ Deployment Blocked. Risk Level: {decision['risk_level'].upper()}")
|
|
134
|
+
print("Blocking Issues:")
|
|
135
|
+
for issue in decision["blocking_issues"]:
|
|
136
|
+
print(f"- {problem}")
|
|
137
|
+
|
|
138
|
+
# The decision object also includes a heuristic confidence score (0.0 to 1.0)
|
|
139
|
+
print(f"Heuristic Confidence in Model: {decision['confidence']:.2f}")
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
###3.2. AccessFor custom *governance* rules or logic, you can access the raw data of each module through the `"details"` view.
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
# Accessing Data Leakage Details
|
|
147
|
+
data_details = critic.evaluate(view="details")["data"]
|
|
148
|
+
|
|
149
|
+
if data_details["data_leakage"]["suspected"]:
|
|
150
|
+
|
|
151
|
+
print("\n--- Data Leak Alert ---")
|
|
152
|
+
|
|
153
|
+
for detail in data_details["data_leakage"]["details"]:
|
|
154
|
+
|
|
155
|
+
print(f"Feature {detail['feature_index']} with correlation of {detail['correlation']:.4f}")
|
|
156
|
+
|
|
157
|
+
# Accessing Structural Overfitting Details
|
|
158
|
+
config_details = critic.evaluate(view="details")["config"]
|
|
159
|
+
|
|
160
|
+
if config_details["structural_warnings"]:
|
|
161
|
+
|
|
162
|
+
print("\n--- Structural Alert ---")
|
|
163
|
+
|
|
164
|
+
for warning in config_details["structural_warnings"]:
|
|
165
|
+
|
|
166
|
+
print(f"Warning: {warning['message']} (Max Depth: {warning['max_depth']}, Recommended: {warning['recommended_max_depth']})")
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### 3.3. Best Practices and Use Cases
|
|
170
|
+
|
|
171
|
+
| Use | Recommended Action |
|
|
172
|
+
| :--- | :--- |
|
|
173
|
+
| **CI/CD** | Use `deploy_decision()` as an automated quality gate. |
|
|
174
|
+
| **Tuning** | Use the technical view to guide hyperparameter optimization. |
|
|
175
|
+
| **Governance** | Log the details view for auditing and compliance. |
|
|
176
|
+
| **Communication** | Use the executive view to report risks to non-technical stakeholders. |
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## 📄 License
|
|
181
|
+
|
|
182
|
+
Distributed under the **MIT License**.
|
|
183
|
+
|
|
184
|
+
--
|
|
185
|
+
|
|
186
|
+
## 🧠 Final Note
|
|
187
|
+
|
|
188
|
+
> **ai-critic** is not a benchmarking tool. It's a decision-making tool.
|
|
189
|
+
|
|
190
|
+
If a model fails here, it doesn't mean it's "bad," but rather that it **shouldn't be trusted yet**. The goal is to inject the necessary skepticism to build truly robust AI systems.
|
|
@@ -130,3 +130,94 @@ class AICritic:
|
|
|
130
130
|
return {k: payload[k] for k in view if k in payload}
|
|
131
131
|
|
|
132
132
|
return payload.get(view)
|
|
133
|
+
def deploy_decision(self):
|
|
134
|
+
"""
|
|
135
|
+
Final deployment gate.
|
|
136
|
+
|
|
137
|
+
Returns
|
|
138
|
+
-------
|
|
139
|
+
dict
|
|
140
|
+
{
|
|
141
|
+
"deploy": bool,
|
|
142
|
+
"risk_level": str,
|
|
143
|
+
"blocking_issues": list[str],
|
|
144
|
+
"confidence": float
|
|
145
|
+
}
|
|
146
|
+
"""
|
|
147
|
+
|
|
148
|
+
# Reusa TODA a pipeline existente
|
|
149
|
+
report = self.evaluate(view="all", plot=False)
|
|
150
|
+
|
|
151
|
+
data_risk = report["details"]["data"]["data_leakage"]["suspected"]
|
|
152
|
+
perfect_cv = report["details"]["performance"]["suspiciously_perfect"]
|
|
153
|
+
robustness_verdict = report["details"]["robustness"]["verdict"]
|
|
154
|
+
structural_warnings = report["details"]["config"]["structural_warnings"]
|
|
155
|
+
|
|
156
|
+
blocking_issues = []
|
|
157
|
+
risk_level = "low"
|
|
158
|
+
|
|
159
|
+
# =========================
|
|
160
|
+
# Hard blockers (❌)
|
|
161
|
+
# =========================
|
|
162
|
+
if data_risk and perfect_cv:
|
|
163
|
+
blocking_issues.append(
|
|
164
|
+
"Data leakage combined with suspiciously perfect CV score"
|
|
165
|
+
)
|
|
166
|
+
risk_level = "high"
|
|
167
|
+
|
|
168
|
+
if robustness_verdict == "misleading":
|
|
169
|
+
blocking_issues.append(
|
|
170
|
+
"Robustness results are misleading due to inflated baseline performance"
|
|
171
|
+
)
|
|
172
|
+
risk_level = "high"
|
|
173
|
+
|
|
174
|
+
if data_risk:
|
|
175
|
+
blocking_issues.append(
|
|
176
|
+
"Suspected target leakage in feature set"
|
|
177
|
+
)
|
|
178
|
+
risk_level = "high"
|
|
179
|
+
|
|
180
|
+
# =========================
|
|
181
|
+
# Soft blockers (⚠️)
|
|
182
|
+
# =========================
|
|
183
|
+
if risk_level != "high":
|
|
184
|
+
if robustness_verdict == "fragile":
|
|
185
|
+
blocking_issues.append(
|
|
186
|
+
"Model performance degrades significantly under noise"
|
|
187
|
+
)
|
|
188
|
+
risk_level = "medium"
|
|
189
|
+
|
|
190
|
+
if perfect_cv:
|
|
191
|
+
blocking_issues.append(
|
|
192
|
+
"Suspiciously perfect cross-validation score"
|
|
193
|
+
)
|
|
194
|
+
risk_level = "medium"
|
|
195
|
+
|
|
196
|
+
if structural_warnings:
|
|
197
|
+
blocking_issues.append(
|
|
198
|
+
"Structural complexity risks detected in model configuration"
|
|
199
|
+
)
|
|
200
|
+
risk_level = "medium"
|
|
201
|
+
|
|
202
|
+
# =========================
|
|
203
|
+
# Final decision
|
|
204
|
+
# =========================
|
|
205
|
+
deploy = len(blocking_issues) == 0
|
|
206
|
+
|
|
207
|
+
# =========================
|
|
208
|
+
# Confidence heuristic
|
|
209
|
+
# =========================
|
|
210
|
+
confidence = 1.0
|
|
211
|
+
confidence -= 0.35 if data_risk else 0
|
|
212
|
+
confidence -= 0.25 if perfect_cv else 0
|
|
213
|
+
confidence -= 0.25 if robustness_verdict in ("fragile", "misleading") else 0
|
|
214
|
+
confidence -= 0.15 if structural_warnings else 0
|
|
215
|
+
confidence = max(0.0, round(confidence, 2))
|
|
216
|
+
|
|
217
|
+
return {
|
|
218
|
+
"deploy": deploy,
|
|
219
|
+
"risk_level": risk_level,
|
|
220
|
+
"blocking_issues": blocking_issues,
|
|
221
|
+
"confidence": confidence
|
|
222
|
+
}
|
|
223
|
+
|
|
@@ -0,0 +1,200 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ai-critic
|
|
3
|
+
Version: 0.2.5
|
|
4
|
+
Summary: Fast AI evaluator for scikit-learn models
|
|
5
|
+
Author-email: Luiz Seabra <filipedemarco@yahoo.com>
|
|
6
|
+
Requires-Python: >=3.9
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
Requires-Dist: numpy
|
|
9
|
+
Requires-Dist: scikit-learn
|
|
10
|
+
|
|
11
|
+
# ai-critic 🧠: The Quality Gate for Machine Learning Models
|
|
12
|
+
|
|
13
|
+
**ai-critic** is a specialized **decision-making** tool designed to audit the reliability and readiness for deployment of scikit-learn compatible Machine Learning models.
|
|
14
|
+
|
|
15
|
+
Instead of just measuring performance (accuracy, F1 score), **ai-critic** acts as a "Quality Gate," operating the model in search of hidden risks that can lead to production failures, such as data leaks, structural overfitting, and vulnerability to noise.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## 🚀 1. Getting Started (The Basics)
|
|
20
|
+
|
|
21
|
+
This section is ideal for beginners who need a quick verdict on the health of their model.
|
|
22
|
+
|
|
23
|
+
### 1.1. Installation
|
|
24
|
+
|
|
25
|
+
Install the library directly from PyPI:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install ai-critic
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### 1.2. The Quick Verdict
|
|
32
|
+
|
|
33
|
+
With just a few lines, you can get an executive evaluation and a deployment recommendation.
|
|
34
|
+
|
|
35
|
+
```python
|
|
36
|
+
from ai_critic import AICritic
|
|
37
|
+
from sklearn.ensemble import RandomForestClassifier
|
|
38
|
+
from sklearn.datasets import make_classification
|
|
39
|
+
|
|
40
|
+
# 1. Prepare your data and model
|
|
41
|
+
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
|
|
42
|
+
model = RandomForestClassifier(max_depth=5, random_state=42)
|
|
43
|
+
|
|
44
|
+
# 2. Initialize Criticism
|
|
45
|
+
# AICritic performs all audits internally
|
|
46
|
+
critic = AICritic(model, X, y)
|
|
47
|
+
|
|
48
|
+
# 3. Obtain the Executive Summary
|
|
49
|
+
report = critic.evaluate(view="executive")
|
|
50
|
+
|
|
51
|
+
print(f"Verdict: {report['verdict']}")
|
|
52
|
+
print(f"Risk: {report['risk_level']}")
|
|
53
|
+
print(f"Reason Main: {report['main_reason']}")
|
|
54
|
+
|
|
55
|
+
#Expected Output:
|
|
56
|
+
|
|
57
|
+
# Verdict: ✅ Acceptable
|
|
58
|
+
# Risk: Low
|
|
59
|
+
# Main Reason: No critic risks detected.
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## 💡 2. Understanding the Critique (The Intermediary)
|
|
66
|
+
|
|
67
|
+
For the data scientist who needs to understand *why* the model received a verdict and what the next steps are.
|
|
68
|
+
|
|
69
|
+
### 2.1. The Four Pillars of the Audit
|
|
70
|
+
|
|
71
|
+
The **ai-critic** evaluates your model across four critic dimensions.
|
|
72
|
+
|
|
73
|
+
| Category | Main Risk | Code Module |
|
|
74
|
+
| :--- | :--- | :--- |
|
|
75
|
+
| 📈 **Validation** | Suspicious CV Scores | `ai_critic.performance` |
|
|
76
|
+
| 🧪 **Robustness** | Noise Vulnerability | `ai_critic.robustness` |
|
|
77
|
+
|
|
78
|
+
2.2. Visual and Technical Analysis
|
|
79
|
+
|
|
80
|
+
The `evaluate` method allows you to view the results and access the complete technical report.
|
|
81
|
+
|
|
82
|
+
```Python
|
|
83
|
+
# Continuing the previous example...
|
|
84
|
+
|
|
85
|
+
# 1. Generate the full report and visualizations
|
|
86
|
+
# plot=True generates Correlation, Learning Curve, and Robustness graphs
|
|
87
|
+
full_report = critic.evaluate(view="all", plot=True)
|
|
88
|
+
|
|
89
|
+
# 2. Access the Technical Summary for Recommendations
|
|
90
|
+
technical_summary = full_report["technical"]
|
|
91
|
+
|
|
92
|
+
print("\n--- Technical Recommendations ---")
|
|
93
|
+
for i, risk in enumerate(technical_summary["key_risks"]):
|
|
94
|
+
print(f"Risk {i+1}: {risk}")
|
|
95
|
+
print(f"Recommendation: {technical_summary['recommendations'][i]}")
|
|
96
|
+
|
|
97
|
+
# Example of Risk (if there were one):
|
|
98
|
+
# Risk 1: The depth of the tree may be too high for the size of the dataset.
|
|
99
|
+
|
|
100
|
+
# Recommendation: Reduce model complexity or adjust hyperparameters.
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
###2.3. Robustness Test
|
|
104
|
+
|
|
105
|
+
A robust model should maintain its performance even with small disturbances in the data. The `ai-critic` test assesses this by injecting noise into the input data.
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
# Accessing the specific result of the Robustness module
|
|
109
|
+
robustness_result = full_report["details"]["robustness"]
|
|
110
|
+
|
|
111
|
+
print("\n--- Robustness Test ---")
|
|
112
|
+
print(f"Original CV Score: {robustness_result['cv_score_original']:.4f}")
|
|
113
|
+
print(f"CV Score with Noise: {robustness_result['cv_score_noisy']:.4f}")
|
|
114
|
+
print(f"Performance Drop: {robustness_result['performance_drop']:.4f}")
|
|
115
|
+
print(f"Robustness Verdict: {robustness_result['verdict']}")
|
|
116
|
+
|
|
117
|
+
# Possible Verdicts:
|
|
118
|
+
# - Stable: Acceptable drop.
|
|
119
|
+
|
|
120
|
+
# - Fragile: Significant drop (risk).
|
|
121
|
+
|
|
122
|
+
# - Misleading: Original performance inflated by leakage.
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## ⚙️ 3. Integration and Governance (The Advanced)
|
|
129
|
+
|
|
130
|
+
This section is for MLOps engineers and architects looking to integrate **ai-critic** into automated pipelines and create custom deployment logic.
|
|
131
|
+
|
|
132
|
+
###3.1. The Deployment Gate (`deploy_decision`)
|
|
133
|
+
|
|
134
|
+
The `deploy_decision()` method is the final control point. It returns a structured object that classifies problems into *Hard Blockers* (prevent deployment) and *Soft Blockers* (require attention, but can be accepted with reservations).
|
|
135
|
+
|
|
136
|
+
Python
|
|
137
|
+
# Example of use in a CI/CD pipeline
|
|
138
|
+
decision = critic.deploy_decision()
|
|
139
|
+
|
|
140
|
+
if decision["deploy"]:
|
|
141
|
+
print("✅ Deployment Approved. Risk Level: Low.")
|
|
142
|
+
other:
|
|
143
|
+
print(f"❌ Deployment Blocked. Risk Level: {decision['risk_level'].upper()}")
|
|
144
|
+
print("Blocking Issues:")
|
|
145
|
+
for issue in decision["blocking_issues"]:
|
|
146
|
+
print(f"- {problem}")
|
|
147
|
+
|
|
148
|
+
# The decision object also includes a heuristic confidence score (0.0 to 1.0)
|
|
149
|
+
print(f"Heuristic Confidence in Model: {decision['confidence']:.2f}")
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
###3.2. AccessFor custom *governance* rules or logic, you can access the raw data of each module through the `"details"` view.
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
# Accessing Data Leakage Details
|
|
157
|
+
data_details = critic.evaluate(view="details")["data"]
|
|
158
|
+
|
|
159
|
+
if data_details["data_leakage"]["suspected"]:
|
|
160
|
+
|
|
161
|
+
print("\n--- Data Leak Alert ---")
|
|
162
|
+
|
|
163
|
+
for detail in data_details["data_leakage"]["details"]:
|
|
164
|
+
|
|
165
|
+
print(f"Feature {detail['feature_index']} with correlation of {detail['correlation']:.4f}")
|
|
166
|
+
|
|
167
|
+
# Accessing Structural Overfitting Details
|
|
168
|
+
config_details = critic.evaluate(view="details")["config"]
|
|
169
|
+
|
|
170
|
+
if config_details["structural_warnings"]:
|
|
171
|
+
|
|
172
|
+
print("\n--- Structural Alert ---")
|
|
173
|
+
|
|
174
|
+
for warning in config_details["structural_warnings"]:
|
|
175
|
+
|
|
176
|
+
print(f"Warning: {warning['message']} (Max Depth: {warning['max_depth']}, Recommended: {warning['recommended_max_depth']})")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### 3.3. Best Practices and Use Cases
|
|
180
|
+
|
|
181
|
+
| Use | Recommended Action |
|
|
182
|
+
| :--- | :--- |
|
|
183
|
+
| **CI/CD** | Use `deploy_decision()` as an automated quality gate. |
|
|
184
|
+
| **Tuning** | Use the technical view to guide hyperparameter optimization. |
|
|
185
|
+
| **Governance** | Log the details view for auditing and compliance. |
|
|
186
|
+
| **Communication** | Use the executive view to report risks to non-technical stakeholders. |
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## 📄 License
|
|
191
|
+
|
|
192
|
+
Distributed under the **MIT License**.
|
|
193
|
+
|
|
194
|
+
--
|
|
195
|
+
|
|
196
|
+
## 🧠 Final Note
|
|
197
|
+
|
|
198
|
+
> **ai-critic** is not a benchmarking tool. It's a decision-making tool.
|
|
199
|
+
|
|
200
|
+
If a model fails here, it doesn't mean it's "bad," but rather that it **shouldn't be trusted yet**. The goal is to inject the necessary skepticism to build truly robust AI systems.
|
ai_critic-0.2.4/PKG-INFO
DELETED
|
@@ -1,76 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ai-critic
|
|
3
|
-
Version: 0.2.4
|
|
4
|
-
Summary: Fast AI evaluator for scikit-learn models
|
|
5
|
-
Author-email: Luiz Seabra <filipedemarco@yahoo.com>
|
|
6
|
-
Requires-Python: >=3.9
|
|
7
|
-
Description-Content-Type: text/markdown
|
|
8
|
-
Requires-Dist: numpy
|
|
9
|
-
Requires-Dist: scikit-learn
|
|
10
|
-
|
|
11
|
-
Performance under noise
|
|
12
|
-
|
|
13
|
-
> Visualizations are optional and do not affect the decision logic.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## ⚙️ Main API
|
|
18
|
-
|
|
19
|
-
### `AICritic(model, X, y)`
|
|
20
|
-
|
|
21
|
-
* `model`: scikit-learn compatible estimator
|
|
22
|
-
* `X`: feature matrix
|
|
23
|
-
* `y`: target vector
|
|
24
|
-
|
|
25
|
-
### `evaluate(view="all", plot=False)`
|
|
26
|
-
|
|
27
|
-
* `view`: `"executive"`, `"technical"`, `"details"`, `"all"` or custom list
|
|
28
|
-
* `plot`: generates graphs when `True`
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
## 🧠 What ai-critic Detects
|
|
33
|
-
|
|
34
|
-
| Category | Risks |
|
|
35
|
-
|
|
36
|
-
| ------------ | ---------------------------------------- |
|
|
37
|
-
|
|
38
|
-
| 🔍 Data | Target Leakage, NaNs, Imbalance |
|
|
39
|
-
|
|
40
|
-
| 🧱 Structure | Excessive Complexity, Overfitting |
|
|
41
|
-
|
|
42
|
-
| 📈 Validation | Perfect or Statistically Suspicious CV |
|
|
43
|
-
|
|
44
|
-
| 🧪 Robustness | Stable, Fragile, or Misleading |
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## 🛡️ Best Practices
|
|
49
|
-
|
|
50
|
-
* **CI/CD:** Use executive output as a *quality gate*
|
|
51
|
-
* **Iteration:** Use technical output during tuning
|
|
52
|
-
* **Governance:** Log detailed output
|
|
53
|
-
* **Skepticism:** Never blindly trust a perfect CV
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## 🧭 Use Cases
|
|
58
|
-
|
|
59
|
-
* Pre-deployment Audit
|
|
60
|
-
* ML Governance
|
|
61
|
-
* CI/CD Pipelines
|
|
62
|
-
* Risk Communication for Non-Technical Users
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
## 📄 License
|
|
67
|
-
|
|
68
|
-
Distributed under the **MIT License**.
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## 🧠 Final Note
|
|
73
|
-
|
|
74
|
-
**ai-critic** is not a *benchmarking* tool. It's a **decision-making tool**.
|
|
75
|
-
|
|
76
|
-
If a model fails here, it doesn't mean it's bad—it means it **shouldn't be trusted yet**.
|
ai_critic-0.2.4/README.md
DELETED
|
@@ -1,66 +0,0 @@
|
|
|
1
|
-
Performance under noise
|
|
2
|
-
|
|
3
|
-
> Visualizations are optional and do not affect the decision logic.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## ⚙️ Main API
|
|
8
|
-
|
|
9
|
-
### `AICritic(model, X, y)`
|
|
10
|
-
|
|
11
|
-
* `model`: scikit-learn compatible estimator
|
|
12
|
-
* `X`: feature matrix
|
|
13
|
-
* `y`: target vector
|
|
14
|
-
|
|
15
|
-
### `evaluate(view="all", plot=False)`
|
|
16
|
-
|
|
17
|
-
* `view`: `"executive"`, `"technical"`, `"details"`, `"all"` or custom list
|
|
18
|
-
* `plot`: generates graphs when `True`
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
## 🧠 What ai-critic Detects
|
|
23
|
-
|
|
24
|
-
| Category | Risks |
|
|
25
|
-
|
|
26
|
-
| ------------ | ---------------------------------------- |
|
|
27
|
-
|
|
28
|
-
| 🔍 Data | Target Leakage, NaNs, Imbalance |
|
|
29
|
-
|
|
30
|
-
| 🧱 Structure | Excessive Complexity, Overfitting |
|
|
31
|
-
|
|
32
|
-
| 📈 Validation | Perfect or Statistically Suspicious CV |
|
|
33
|
-
|
|
34
|
-
| 🧪 Robustness | Stable, Fragile, or Misleading |
|
|
35
|
-
|
|
36
|
-
---
|
|
37
|
-
|
|
38
|
-
## 🛡️ Best Practices
|
|
39
|
-
|
|
40
|
-
* **CI/CD:** Use executive output as a *quality gate*
|
|
41
|
-
* **Iteration:** Use technical output during tuning
|
|
42
|
-
* **Governance:** Log detailed output
|
|
43
|
-
* **Skepticism:** Never blindly trust a perfect CV
|
|
44
|
-
|
|
45
|
-
---
|
|
46
|
-
|
|
47
|
-
## 🧭 Use Cases
|
|
48
|
-
|
|
49
|
-
* Pre-deployment Audit
|
|
50
|
-
* ML Governance
|
|
51
|
-
* CI/CD Pipelines
|
|
52
|
-
* Risk Communication for Non-Technical Users
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## 📄 License
|
|
57
|
-
|
|
58
|
-
Distributed under the **MIT License**.
|
|
59
|
-
|
|
60
|
-
---
|
|
61
|
-
|
|
62
|
-
## 🧠 Final Note
|
|
63
|
-
|
|
64
|
-
**ai-critic** is not a *benchmarking* tool. It's a **decision-making tool**.
|
|
65
|
-
|
|
66
|
-
If a model fails here, it doesn't mean it's bad—it means it **shouldn't be trusted yet**.
|
|
@@ -1,76 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ai-critic
|
|
3
|
-
Version: 0.2.4
|
|
4
|
-
Summary: Fast AI evaluator for scikit-learn models
|
|
5
|
-
Author-email: Luiz Seabra <filipedemarco@yahoo.com>
|
|
6
|
-
Requires-Python: >=3.9
|
|
7
|
-
Description-Content-Type: text/markdown
|
|
8
|
-
Requires-Dist: numpy
|
|
9
|
-
Requires-Dist: scikit-learn
|
|
10
|
-
|
|
11
|
-
Performance under noise
|
|
12
|
-
|
|
13
|
-
> Visualizations are optional and do not affect the decision logic.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## ⚙️ Main API
|
|
18
|
-
|
|
19
|
-
### `AICritic(model, X, y)`
|
|
20
|
-
|
|
21
|
-
* `model`: scikit-learn compatible estimator
|
|
22
|
-
* `X`: feature matrix
|
|
23
|
-
* `y`: target vector
|
|
24
|
-
|
|
25
|
-
### `evaluate(view="all", plot=False)`
|
|
26
|
-
|
|
27
|
-
* `view`: `"executive"`, `"technical"`, `"details"`, `"all"` or custom list
|
|
28
|
-
* `plot`: generates graphs when `True`
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
## 🧠 What ai-critic Detects
|
|
33
|
-
|
|
34
|
-
| Category | Risks |
|
|
35
|
-
|
|
36
|
-
| ------------ | ---------------------------------------- |
|
|
37
|
-
|
|
38
|
-
| 🔍 Data | Target Leakage, NaNs, Imbalance |
|
|
39
|
-
|
|
40
|
-
| 🧱 Structure | Excessive Complexity, Overfitting |
|
|
41
|
-
|
|
42
|
-
| 📈 Validation | Perfect or Statistically Suspicious CV |
|
|
43
|
-
|
|
44
|
-
| 🧪 Robustness | Stable, Fragile, or Misleading |
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## 🛡️ Best Practices
|
|
49
|
-
|
|
50
|
-
* **CI/CD:** Use executive output as a *quality gate*
|
|
51
|
-
* **Iteration:** Use technical output during tuning
|
|
52
|
-
* **Governance:** Log detailed output
|
|
53
|
-
* **Skepticism:** Never blindly trust a perfect CV
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## 🧭 Use Cases
|
|
58
|
-
|
|
59
|
-
* Pre-deployment Audit
|
|
60
|
-
* ML Governance
|
|
61
|
-
* CI/CD Pipelines
|
|
62
|
-
* Risk Communication for Non-Technical Users
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
## 📄 License
|
|
67
|
-
|
|
68
|
-
Distributed under the **MIT License**.
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## 🧠 Final Note
|
|
73
|
-
|
|
74
|
-
**ai-critic** is not a *benchmarking* tool. It's a **decision-making tool**.
|
|
75
|
-
|
|
76
|
-
If a model fails here, it doesn't mean it's bad—it means it **shouldn't be trusted yet**.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|