ai-critic 0.2.5__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. ai_critic-1.1.0/PKG-INFO +289 -0
  2. ai_critic-1.1.0/README.md +279 -0
  3. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/critic.py +87 -49
  4. ai_critic-1.1.0/ai_critic/evaluators/adapters.py +84 -0
  5. ai_critic-1.1.0/ai_critic/evaluators/scoring.py +39 -0
  6. ai_critic-1.1.0/ai_critic/sessions/__init__.py +3 -0
  7. ai_critic-1.1.0/ai_critic/sessions/store.py +33 -0
  8. ai_critic-1.1.0/ai_critic.egg-info/PKG-INFO +289 -0
  9. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic.egg-info/SOURCES.txt +4 -0
  10. {ai_critic-0.2.5 → ai_critic-1.1.0}/pyproject.toml +1 -1
  11. ai_critic-0.2.5/PKG-INFO +0 -200
  12. ai_critic-0.2.5/README.md +0 -190
  13. ai_critic-0.2.5/ai_critic.egg-info/PKG-INFO +0 -200
  14. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/__init__.py +0 -0
  15. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/__init__.py +0 -0
  16. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/config.py +0 -0
  17. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/data.py +0 -0
  18. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/performance.py +0 -0
  19. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/robustness.py +0 -0
  20. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/summary.py +0 -0
  21. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic/evaluators/validation.py +0 -0
  22. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic.egg-info/dependency_links.txt +0 -0
  23. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic.egg-info/requires.txt +0 -0
  24. {ai_critic-0.2.5 → ai_critic-1.1.0}/ai_critic.egg-info/top_level.txt +0 -0
  25. {ai_critic-0.2.5 → ai_critic-1.1.0}/setup.cfg +0 -0
  26. {ai_critic-0.2.5 → ai_critic-1.1.0}/test/test_in_ia.py +0 -0
  27. {ai_critic-0.2.5 → ai_critic-1.1.0}/test/test_model.py +0 -0
@@ -0,0 +1,289 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-critic
3
+ Version: 1.1.0
4
+ Summary: Fast AI evaluator for scikit-learn models
5
+ Author-email: Luiz Seabra <filipedemarco@yahoo.com>
6
+ Requires-Python: >=3.9
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: numpy
9
+ Requires-Dist: scikit-learn
10
+
11
+ # ai-critic 🧠: The Quality Gate for Machine Learning Models
12
+
13
+ **ai-critic** is a specialized **decision-making** tool designed to audit the reliability and readiness for deployment of **scikit-learn**, **PyTorch**, and **TensorFlow** models.
14
+
15
+ Instead of merely measuring performance (accuracy, F1 score), **ai-critic** acts as a **Quality Gate**, actively probing the model to uncover *hidden risks* that commonly cause production failures — such as **data leakage**, **structural overfitting**, and **fragility under noise**.
16
+
17
+ > **ai-critic does not ask “How good is this model?”**
18
+ > It asks **“Can this model be trusted?”**
19
+
20
+ ---
21
+
22
+ ## 🚀 Getting Started (The Basics)
23
+
24
+ This section is ideal for beginners who need a **fast and reliable verdict** on a trained model.
25
+
26
+ ### Installation
27
+
28
+ Install directly from PyPI:
29
+
30
+ ```bash
31
+ pip install ai-critic
32
+ ```
33
+
34
+ ---
35
+
36
+ ### The Quick Verdict
37
+
38
+ With just a few lines of code, you obtain an **executive-level assessment** and a **deployment recommendation**.
39
+
40
+ ```python
41
+ from ai_critic import AICritic
42
+ from sklearn.ensemble import RandomForestClassifier
43
+ from sklearn.datasets import make_classification
44
+
45
+ # 1. Prepare data and model
46
+ X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
47
+ model = RandomForestClassifier(max_depth=5, random_state=42)
48
+
49
+ # 2. Initialize the Critic
50
+ critic = AICritic(model, X, y)
51
+
52
+ # 3. Run the audit (executive mode)
53
+ report = critic.evaluate(view="executive")
54
+
55
+ print(f"Verdict: {report['verdict']}")
56
+ print(f"Risk Level: {report['risk_level']}")
57
+ print(f"Main Reason: {report['main_reason']}")
58
+ ```
59
+
60
+ **Expected Output (example):**
61
+
62
+ ```text
63
+ Verdict: ⚠️ Risky
64
+ Risk Level: medium
65
+ Main Reason: Structural or robustness-related risks detected.
66
+ ```
67
+
68
+ This output is intentionally **conservative**.
69
+ If **ai-critic** recommends deployment, it means meaningful risks were *not* detected.
70
+
71
+ ---
72
+
73
+ ## 💡 Understanding the Critique (The Intermediary)
74
+
75
+ For data scientists who want to understand **why** the model received a given verdict and **how to improve it**.
76
+
77
+ ---
78
+
79
+ ### The Four Pillars of the Audit
80
+
81
+ **ai-critic** evaluates models across four independent risk dimensions:
82
+
83
+ | Pillar | Main Risk Detected | Internal Module |
84
+ | ---------------------- | -------------------------------------- | ------------------------ |
85
+ | 📊 **Data Integrity** | Target Leakage & Correlation Artifacts | `evaluators.data` |
86
+ | 🧠 **Model Structure** | Over-complexity & Misconfiguration | `evaluators.config` |
87
+ | 📈 **Performance** | Suspicious CV or Learning Curves | `evaluators.performance` |
88
+ | 🧪 **Robustness** | Sensitivity to Noise | `evaluators.robustness` |
89
+
90
+ Each pillar contributes signals used later in the **deployment gate**.
91
+
92
+ ---
93
+
94
+ ### Full Technical & Visual Analysis
95
+
96
+ To access **all internal diagnostics**, including plots and recommendations, use `view="all"`.
97
+
98
+ ```python
99
+ full_report = critic.evaluate(view="all", plot=True)
100
+
101
+ technical_summary = full_report["technical"]
102
+
103
+ print("\n--- Key Risks Detected ---")
104
+ for i, risk in enumerate(technical_summary["key_risks"], start=1):
105
+ print(f"{i}. {risk}")
106
+
107
+ print("\n--- Recommendations ---")
108
+ for rec in technical_summary["recommendations"]:
109
+ print(f"- {rec}")
110
+ ```
111
+
112
+ Generated plots may include:
113
+
114
+ * Feature correlation heatmaps
115
+ * Learning curves
116
+ * Robustness degradation charts
117
+
118
+ ---
119
+
120
+ ### Robustness Test (Noise Injection)
121
+
122
+ A model that collapses under small perturbations is **not production-safe**.
123
+
124
+ ```python
125
+ robustness = full_report["details"]["robustness"]
126
+
127
+ print("\n--- Robustness Analysis ---")
128
+ print(f"Original CV Score: {robustness['cv_score_original']:.4f}")
129
+ print(f"Noisy CV Score: {robustness['cv_score_noisy']:.4f}")
130
+ print(f"Performance Drop: {robustness['performance_drop']:.4f}")
131
+ print(f"Verdict: {robustness['verdict']}")
132
+ ```
133
+
134
+ **Possible Verdicts:**
135
+
136
+ * `stable` → acceptable degradation
137
+ * `fragile` → high sensitivity to noise
138
+ * `misleading` → performance likely inflated by leakage
139
+
140
+ ---
141
+
142
+ ## ⚙️ Integration and Governance (The Advanced)
143
+
144
+ This section targets **MLOps engineers**, **architects**, and teams operating automated pipelines.
145
+
146
+ ---
147
+
148
+ ### Multi-Framework Support
149
+
150
+ **ai-critic 1.0+** supports models from multiple frameworks with the **same API**:
151
+
152
+ ```python
153
+ # PyTorch Example
154
+ import torch
155
+ import torch.nn as nn
156
+ from ai_critic import AICritic
157
+
158
+ X = torch.randn(1000, 20)
159
+ y = torch.randint(0, 2, (1000,))
160
+
161
+ model = nn.Sequential(
162
+ nn.Linear(20, 32),
163
+ nn.ReLU(),
164
+ nn.Linear(32, 2)
165
+ )
166
+
167
+ critic = AICritic(model, X, y, framework="torch", adapter_kwargs={"epochs":5, "batch_size":64})
168
+ report = critic.evaluate(view="executive")
169
+ print(report)
170
+
171
+ # TensorFlow Example
172
+ import tensorflow as tf
173
+
174
+ model = tf.keras.Sequential([
175
+ tf.keras.layers.Dense(32, activation="relu", input_shape=(20,)),
176
+ tf.keras.layers.Dense(2)
177
+ ])
178
+ critic = AICritic(model, X.numpy(), y.numpy(), framework="tensorflow", adapter_kwargs={"epochs":5})
179
+ report = critic.evaluate(view="executive")
180
+ print(report)
181
+ ```
182
+
183
+ > No need to rewrite evaluation code — **one Critic API works for sklearn, PyTorch, or TensorFlow**.
184
+
185
+ ---
186
+
187
+ ### The Deployment Gate (`deploy_decision`)
188
+
189
+ The `deploy_decision()` method aggregates *all detected risks* and produces a final gate decision.
190
+
191
+ ```python
192
+ decision = critic.deploy_decision()
193
+
194
+ if decision["deploy"]:
195
+ print("✅ Deployment Approved")
196
+ else:
197
+ print("❌ Deployment Blocked")
198
+
199
+ print(f"Risk Level: {decision['risk_level']}")
200
+ print(f"Confidence Score: {decision['confidence']:.2f}")
201
+
202
+ print("\nBlocking Issues:")
203
+ for issue in decision["blocking_issues"]:
204
+ print(f"- {issue}")
205
+ ```
206
+
207
+ **Conceptual model:**
208
+
209
+ * **Hard Blockers** → deployment denied
210
+ * **Soft Blockers** → deployment discouraged
211
+ * **Confidence Score (0–1)** → heuristic trust level
212
+
213
+ ---
214
+
215
+ ### Modes & Views (API Design)
216
+
217
+ The `evaluate()` method supports **multiple modes** via the `view` parameter:
218
+
219
+ | View | Description |
220
+ | ------------- | ---------------------------------- |
221
+ | `"executive"` | High-level verdict (non-technical) |
222
+ | `"technical"` | Risks & recommendations |
223
+ | `"details"` | Raw evaluator outputs |
224
+ | `"all"` | Complete payload |
225
+
226
+ Example:
227
+
228
+ ```python
229
+ critic.evaluate(view="technical")
230
+ critic.evaluate(view=["executive", "performance"])
231
+ ```
232
+
233
+ ---
234
+
235
+ ### Session Tracking & Model Comparison
236
+
237
+ You can persist evaluations and compare model versions over time.
238
+
239
+ ```python
240
+ critic_v1 = AICritic(model, X, y, session="v1")
241
+ critic_v1.evaluate()
242
+
243
+ critic_v2 = AICritic(model, X, y, session="v2")
244
+ critic_v2.evaluate()
245
+
246
+ comparison = critic_v2.compare_with("v1")
247
+ print(comparison["score_diff"])
248
+ ```
249
+
250
+ This enables:
251
+
252
+ * Regression tracking
253
+ * Risk drift detection
254
+ * Governance & audit trails
255
+
256
+ ---
257
+
258
+ ### Best Practices & Use Cases
259
+
260
+ | Scenario | Recommended Usage |
261
+ | ----------------------- | -------------------------------------- |
262
+ | **CI/CD** | Block merges using `deploy_decision()` |
263
+ | **Model Tuning** | Use technical view for guidance |
264
+ | **Governance** | Persist session outputs |
265
+ | **Stakeholder Reports** | Share executive summaries |
266
+
267
+ ---
268
+
269
+ ## 🔒 API Stability
270
+
271
+ Starting from version **1.0.0**, the public API of **ai-critic** follows semantic versioning.
272
+ Breaking changes will only occur in major releases.
273
+
274
+ ---
275
+
276
+ ## 📄 License
277
+
278
+ Distributed under the **MIT License**.
279
+
280
+ ---
281
+
282
+ ## 🧠 Final Note
283
+
284
+ > **ai-critic is not a benchmarking tool.**
285
+ > It is a *decision-making system*.
286
+
287
+ A failed audit does **not** mean the model is bad — it means the model **is not ready to be trusted**.
288
+
289
+ The purpose of **ai-critic** is to introduce *structured skepticism* into machine learning workflows — exactly where it belongs.
@@ -0,0 +1,279 @@
1
+ # ai-critic 🧠: The Quality Gate for Machine Learning Models
2
+
3
+ **ai-critic** is a specialized **decision-making** tool designed to audit the reliability and readiness for deployment of **scikit-learn**, **PyTorch**, and **TensorFlow** models.
4
+
5
+ Instead of merely measuring performance (accuracy, F1 score), **ai-critic** acts as a **Quality Gate**, actively probing the model to uncover *hidden risks* that commonly cause production failures — such as **data leakage**, **structural overfitting**, and **fragility under noise**.
6
+
7
+ > **ai-critic does not ask “How good is this model?”**
8
+ > It asks **“Can this model be trusted?”**
9
+
10
+ ---
11
+
12
+ ## 🚀 Getting Started (The Basics)
13
+
14
+ This section is ideal for beginners who need a **fast and reliable verdict** on a trained model.
15
+
16
+ ### Installation
17
+
18
+ Install directly from PyPI:
19
+
20
+ ```bash
21
+ pip install ai-critic
22
+ ```
23
+
24
+ ---
25
+
26
+ ### The Quick Verdict
27
+
28
+ With just a few lines of code, you obtain an **executive-level assessment** and a **deployment recommendation**.
29
+
30
+ ```python
31
+ from ai_critic import AICritic
32
+ from sklearn.ensemble import RandomForestClassifier
33
+ from sklearn.datasets import make_classification
34
+
35
+ # 1. Prepare data and model
36
+ X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
37
+ model = RandomForestClassifier(max_depth=5, random_state=42)
38
+
39
+ # 2. Initialize the Critic
40
+ critic = AICritic(model, X, y)
41
+
42
+ # 3. Run the audit (executive mode)
43
+ report = critic.evaluate(view="executive")
44
+
45
+ print(f"Verdict: {report['verdict']}")
46
+ print(f"Risk Level: {report['risk_level']}")
47
+ print(f"Main Reason: {report['main_reason']}")
48
+ ```
49
+
50
+ **Expected Output (example):**
51
+
52
+ ```text
53
+ Verdict: ⚠️ Risky
54
+ Risk Level: medium
55
+ Main Reason: Structural or robustness-related risks detected.
56
+ ```
57
+
58
+ This output is intentionally **conservative**.
59
+ If **ai-critic** recommends deployment, it means meaningful risks were *not* detected.
60
+
61
+ ---
62
+
63
+ ## 💡 Understanding the Critique (The Intermediary)
64
+
65
+ For data scientists who want to understand **why** the model received a given verdict and **how to improve it**.
66
+
67
+ ---
68
+
69
+ ### The Four Pillars of the Audit
70
+
71
+ **ai-critic** evaluates models across four independent risk dimensions:
72
+
73
+ | Pillar | Main Risk Detected | Internal Module |
74
+ | ---------------------- | -------------------------------------- | ------------------------ |
75
+ | 📊 **Data Integrity** | Target Leakage & Correlation Artifacts | `evaluators.data` |
76
+ | 🧠 **Model Structure** | Over-complexity & Misconfiguration | `evaluators.config` |
77
+ | 📈 **Performance** | Suspicious CV or Learning Curves | `evaluators.performance` |
78
+ | 🧪 **Robustness** | Sensitivity to Noise | `evaluators.robustness` |
79
+
80
+ Each pillar contributes signals used later in the **deployment gate**.
81
+
82
+ ---
83
+
84
+ ### Full Technical & Visual Analysis
85
+
86
+ To access **all internal diagnostics**, including plots and recommendations, use `view="all"`.
87
+
88
+ ```python
89
+ full_report = critic.evaluate(view="all", plot=True)
90
+
91
+ technical_summary = full_report["technical"]
92
+
93
+ print("\n--- Key Risks Detected ---")
94
+ for i, risk in enumerate(technical_summary["key_risks"], start=1):
95
+ print(f"{i}. {risk}")
96
+
97
+ print("\n--- Recommendations ---")
98
+ for rec in technical_summary["recommendations"]:
99
+ print(f"- {rec}")
100
+ ```
101
+
102
+ Generated plots may include:
103
+
104
+ * Feature correlation heatmaps
105
+ * Learning curves
106
+ * Robustness degradation charts
107
+
108
+ ---
109
+
110
+ ### Robustness Test (Noise Injection)
111
+
112
+ A model that collapses under small perturbations is **not production-safe**.
113
+
114
+ ```python
115
+ robustness = full_report["details"]["robustness"]
116
+
117
+ print("\n--- Robustness Analysis ---")
118
+ print(f"Original CV Score: {robustness['cv_score_original']:.4f}")
119
+ print(f"Noisy CV Score: {robustness['cv_score_noisy']:.4f}")
120
+ print(f"Performance Drop: {robustness['performance_drop']:.4f}")
121
+ print(f"Verdict: {robustness['verdict']}")
122
+ ```
123
+
124
+ **Possible Verdicts:**
125
+
126
+ * `stable` → acceptable degradation
127
+ * `fragile` → high sensitivity to noise
128
+ * `misleading` → performance likely inflated by leakage
129
+
130
+ ---
131
+
132
+ ## ⚙️ Integration and Governance (The Advanced)
133
+
134
+ This section targets **MLOps engineers**, **architects**, and teams operating automated pipelines.
135
+
136
+ ---
137
+
138
+ ### Multi-Framework Support
139
+
140
+ **ai-critic 1.0+** supports models from multiple frameworks with the **same API**:
141
+
142
+ ```python
143
+ # PyTorch Example
144
+ import torch
145
+ import torch.nn as nn
146
+ from ai_critic import AICritic
147
+
148
+ X = torch.randn(1000, 20)
149
+ y = torch.randint(0, 2, (1000,))
150
+
151
+ model = nn.Sequential(
152
+ nn.Linear(20, 32),
153
+ nn.ReLU(),
154
+ nn.Linear(32, 2)
155
+ )
156
+
157
+ critic = AICritic(model, X, y, framework="torch", adapter_kwargs={"epochs":5, "batch_size":64})
158
+ report = critic.evaluate(view="executive")
159
+ print(report)
160
+
161
+ # TensorFlow Example
162
+ import tensorflow as tf
163
+
164
+ model = tf.keras.Sequential([
165
+ tf.keras.layers.Dense(32, activation="relu", input_shape=(20,)),
166
+ tf.keras.layers.Dense(2)
167
+ ])
168
+ critic = AICritic(model, X.numpy(), y.numpy(), framework="tensorflow", adapter_kwargs={"epochs":5})
169
+ report = critic.evaluate(view="executive")
170
+ print(report)
171
+ ```
172
+
173
+ > No need to rewrite evaluation code — **one Critic API works for sklearn, PyTorch, or TensorFlow**.
174
+
175
+ ---
176
+
177
+ ### The Deployment Gate (`deploy_decision`)
178
+
179
+ The `deploy_decision()` method aggregates *all detected risks* and produces a final gate decision.
180
+
181
+ ```python
182
+ decision = critic.deploy_decision()
183
+
184
+ if decision["deploy"]:
185
+ print("✅ Deployment Approved")
186
+ else:
187
+ print("❌ Deployment Blocked")
188
+
189
+ print(f"Risk Level: {decision['risk_level']}")
190
+ print(f"Confidence Score: {decision['confidence']:.2f}")
191
+
192
+ print("\nBlocking Issues:")
193
+ for issue in decision["blocking_issues"]:
194
+ print(f"- {issue}")
195
+ ```
196
+
197
+ **Conceptual model:**
198
+
199
+ * **Hard Blockers** → deployment denied
200
+ * **Soft Blockers** → deployment discouraged
201
+ * **Confidence Score (0–1)** → heuristic trust level
202
+
203
+ ---
204
+
205
+ ### Modes & Views (API Design)
206
+
207
+ The `evaluate()` method supports **multiple modes** via the `view` parameter:
208
+
209
+ | View | Description |
210
+ | ------------- | ---------------------------------- |
211
+ | `"executive"` | High-level verdict (non-technical) |
212
+ | `"technical"` | Risks & recommendations |
213
+ | `"details"` | Raw evaluator outputs |
214
+ | `"all"` | Complete payload |
215
+
216
+ Example:
217
+
218
+ ```python
219
+ critic.evaluate(view="technical")
220
+ critic.evaluate(view=["executive", "performance"])
221
+ ```
222
+
223
+ ---
224
+
225
+ ### Session Tracking & Model Comparison
226
+
227
+ You can persist evaluations and compare model versions over time.
228
+
229
+ ```python
230
+ critic_v1 = AICritic(model, X, y, session="v1")
231
+ critic_v1.evaluate()
232
+
233
+ critic_v2 = AICritic(model, X, y, session="v2")
234
+ critic_v2.evaluate()
235
+
236
+ comparison = critic_v2.compare_with("v1")
237
+ print(comparison["score_diff"])
238
+ ```
239
+
240
+ This enables:
241
+
242
+ * Regression tracking
243
+ * Risk drift detection
244
+ * Governance & audit trails
245
+
246
+ ---
247
+
248
+ ### Best Practices & Use Cases
249
+
250
+ | Scenario | Recommended Usage |
251
+ | ----------------------- | -------------------------------------- |
252
+ | **CI/CD** | Block merges using `deploy_decision()` |
253
+ | **Model Tuning** | Use technical view for guidance |
254
+ | **Governance** | Persist session outputs |
255
+ | **Stakeholder Reports** | Share executive summaries |
256
+
257
+ ---
258
+
259
+ ## 🔒 API Stability
260
+
261
+ Starting from version **1.0.0**, the public API of **ai-critic** follows semantic versioning.
262
+ Breaking changes will only occur in major releases.
263
+
264
+ ---
265
+
266
+ ## 📄 License
267
+
268
+ Distributed under the **MIT License**.
269
+
270
+ ---
271
+
272
+ ## 🧠 Final Note
273
+
274
+ > **ai-critic is not a benchmarking tool.**
275
+ > It is a *decision-making system*.
276
+
277
+ A failed audit does **not** mean the model is bad — it means the model **is not ready to be trusted**.
278
+
279
+ The purpose of **ai-critic** is to introduce *structured skepticism* into machine learning workflows — exactly where it belongs.