trustlens 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. trustlens-0.1.1/LICENSE +21 -0
  2. trustlens-0.1.1/PKG-INFO +349 -0
  3. trustlens-0.1.1/README.md +297 -0
  4. trustlens-0.1.1/pyproject.toml +145 -0
  5. trustlens-0.1.1/setup.cfg +4 -0
  6. trustlens-0.1.1/tests/test_api.py +163 -0
  7. trustlens-0.1.1/tests/test_bias.py +73 -0
  8. trustlens-0.1.1/tests/test_calibration.py +165 -0
  9. trustlens-0.1.1/tests/test_failure.py +78 -0
  10. trustlens-0.1.1/tests/test_output_formatting.py +45 -0
  11. trustlens-0.1.1/tests/test_plugins.py +119 -0
  12. trustlens-0.1.1/tests/test_representation.py +82 -0
  13. trustlens-0.1.1/tests/test_trust_score.py +292 -0
  14. trustlens-0.1.1/trustlens/__init__.py +30 -0
  15. trustlens-0.1.1/trustlens/api.py +264 -0
  16. trustlens-0.1.1/trustlens/explainability/__init__.py +25 -0
  17. trustlens-0.1.1/trustlens/explainability/faithfulness.py +232 -0
  18. trustlens-0.1.1/trustlens/explainability/gradcam.py +223 -0
  19. trustlens-0.1.1/trustlens/metrics/__init__.py +35 -0
  20. trustlens-0.1.1/trustlens/metrics/bias.py +164 -0
  21. trustlens-0.1.1/trustlens/metrics/calibration.py +232 -0
  22. trustlens-0.1.1/trustlens/metrics/failure.py +159 -0
  23. trustlens-0.1.1/trustlens/metrics/faithfulness.py +50 -0
  24. trustlens-0.1.1/trustlens/metrics/representation.py +209 -0
  25. trustlens-0.1.1/trustlens/plugins/__init__.py +35 -0
  26. trustlens-0.1.1/trustlens/plugins/base.py +100 -0
  27. trustlens-0.1.1/trustlens/plugins/registry.py +143 -0
  28. trustlens-0.1.1/trustlens/report.py +657 -0
  29. trustlens-0.1.1/trustlens/trust_score.py +373 -0
  30. trustlens-0.1.1/trustlens/utils.py +105 -0
  31. trustlens-0.1.1/trustlens/visualization/__init__.py +97 -0
  32. trustlens-0.1.1/trustlens/visualization/bias_plots.py +106 -0
  33. trustlens-0.1.1/trustlens/visualization/calibration_plots.py +138 -0
  34. trustlens-0.1.1/trustlens/visualization/failure_plots.py +110 -0
  35. trustlens-0.1.1/trustlens/visualization/representation_plots.py +129 -0
  36. trustlens-0.1.1/trustlens/visualization/summary_plot.py +588 -0
  37. trustlens-0.1.1/trustlens.egg-info/PKG-INFO +349 -0
  38. trustlens-0.1.1/trustlens.egg-info/SOURCES.txt +39 -0
  39. trustlens-0.1.1/trustlens.egg-info/dependency_links.txt +1 -0
  40. trustlens-0.1.1/trustlens.egg-info/requires.txt +27 -0
  41. trustlens-0.1.1/trustlens.egg-info/top_level.txt +3 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 TrustLens Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,349 @@
1
+ Metadata-Version: 2.4
2
+ Name: trustlens
3
+ Version: 0.1.1
4
+ Summary: Debug your ML models beyond accuracy.
5
+ Author-email: Shahid Ul Islam <maintainers@trustlens.dev>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/Khanz9664/TrustLens
8
+ Project-URL: Source, https://github.com/Khanz9664/TrustLens
9
+ Project-URL: Portfolio, https://khanz9664.github.io/portfolio/
10
+ Project-URL: LinkedIn, https://www.linkedin.com/in/shahid-ul-islam-13650998/
11
+ Project-URL: Instagram, https://instagram.com/shaddy9664
12
+ Keywords: machine learning,model analysis,calibration,explainability,fairness,bias detection
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
25
+ Requires-Python: >=3.9
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: numpy>=1.23
29
+ Requires-Dist: scikit-learn>=1.2
30
+ Requires-Dist: matplotlib>=3.6
31
+ Requires-Dist: scipy>=1.9
32
+ Provides-Extra: torch
33
+ Requires-Dist: torch>=2.0; extra == "torch"
34
+ Requires-Dist: torchvision>=0.15; extra == "torch"
35
+ Provides-Extra: full
36
+ Requires-Dist: torch>=2.0; extra == "full"
37
+ Requires-Dist: torchvision>=0.15; extra == "full"
38
+ Requires-Dist: shap>=0.42; extra == "full"
39
+ Requires-Dist: captum>=0.6; extra == "full"
40
+ Provides-Extra: dev
41
+ Requires-Dist: pytest>=7.0; extra == "dev"
42
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
43
+ Requires-Dist: ruff>=0.4; extra == "dev"
44
+ Requires-Dist: mypy>=1.0; extra == "dev"
45
+ Requires-Dist: pre-commit>=3.0; extra == "dev"
46
+ Provides-Extra: docs
47
+ Requires-Dist: sphinx>=6.0; extra == "docs"
48
+ Requires-Dist: sphinx-rtd-theme>=1.2; extra == "docs"
49
+ Requires-Dist: myst-parser>=2.0; extra == "docs"
50
+ Requires-Dist: nbsphinx>=0.9; extra == "docs"
51
+ Dynamic: license-file
52
+
53
+ <div align="center">
54
+ <img src="assets/banner.png" width="180" alt="TrustLens Logo">
55
+
56
+ # TrustLens
57
+
58
+ ### Your model has 92% accuracy. But can you trust it?
59
+
60
+ **TrustLens is the open-source library that answers the questions accuracy never does.**
61
+
62
+ [![CI](https://github.com/Khanz9664/TrustLens/actions/workflows/ci.yml/badge.svg)](https://github.com/Khanz9664/TrustLens/actions)
63
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
64
+ [![Stars](https://img.shields.io/github/stars/Khanz9664/TrustLens?style=social)](https://github.com/Khanz9664/TrustLens/stargazers)
65
+
66
+ </div>
67
+
68
+ ---
69
+
70
+ ## The Problem Nobody Talks About
71
+
72
+ You trained a model. It hits **92% accuracy** on your validation set.
73
+
74
+ So you ship it.
75
+
76
+ Three months later:
77
+ - A minority-class user gets consistently wrong predictions
78
+ - The model is 90% confident on its worst mistakes
79
+ - A regulator asks "why did it make that decision?" and you have no answer
80
+
81
+ **Accuracy tells you how often your model is right. It tells you nothing about when it fails, why it fails, or who it fails.**
82
+
83
+ TrustLens fixes that. In one function call.
84
+
85
+ ---
86
+
87
+ ## Quick Analyze (Zero-Friction Start)
88
+
89
+ Try TrustLens instantly without bringing your own data or models. We provide a zero-friction entry point:
90
+
91
+ ```python
92
+ from trustlens import quick_analyze
93
+
94
+ # Automatically loads the breast cancer dataset, trains a baseline logic model,
95
+ # and runs the full analysis, returning a TrustReport and rendering the dashboard.
96
+ report = quick_analyze(dataset="breast_cancer")
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Quick Usage with Custom Models
102
+
103
+ ```bash
104
+ pip install trustlens
105
+ ```
106
+
107
+ ```python
108
+ from trustlens import analyze
109
+
110
+ report = analyze(
111
+ model, # any sklearn-compatible model
112
+ X_val, # validation features
113
+ y_val, # ground truth
114
+ y_prob=proba, # predicted probabilities
115
+ )
116
+
117
+ print(report.trust_score)
118
+ report.show()
119
+ ```
120
+
121
+ **Output Insight:**
122
+
123
+ ```text
124
+ ==================================================================
125
+ TrustLens Report
126
+ ==================================================================
127
+ Timestamp : 2026-04-16T15:43:02Z
128
+ Model : RandomForestClassifier
129
+ Samples : 2,500
130
+ Classes : 2
131
+
132
+ ==================================================================
133
+ TRUST SCORE: 61/100 [B]
134
+ Assessment: Good Trust - minor issues to address
135
+ ==================================================================
136
+
137
+ Key Observations:
138
+ * Calibration needs improvement (ECE > 0.1).
139
+ * Model is overconfident on incorrect predictions (low confidence gap).
140
+
141
+ ==================================================================
142
+ Dimension breakdown:
143
+ calibration 52.3/100 []
144
+ failure 74.1/100 []
145
+ bias 41.2/100 []
146
+ representation 68.5/100 []
147
+ ```
148
+
149
+ Your calibration is fine. Your bias score is not. **TrustLens just saved you a PR disaster.**
150
+
151
+ ---
152
+
153
+ ## The Summary Dashboard
154
+
155
+ One line. One picture. Everything you need.
156
+
157
+ ```python
158
+ report.summary_plot()
159
+ ```
160
+
161
+ The presentation-ready 6-panel dashboard shows:
162
+ - **Trust Score gauge**: Your model's overall trustworthiness at a glance
163
+ - **Reliability diagram**: Is your model overconfident or underconfident?
164
+ - **Confidence gap**: Does high confidence actually mean high accuracy?
165
+ - **Error rate by class**: Which classes are being failed?
166
+ - **Class distribution**: Is your training data biased?
167
+ - **Sub-score breakdown**: Which dimension needs the most work?
168
+
169
+ ---
170
+
171
+ ## The Trust Score
172
+
173
+ A single, actionable number: **0 to 100**.
174
+
175
+ Computed from four dimensions, each independently interpretable:
176
+
177
+ | Dimension | What it measures | Weight |
178
+ |-----------|-----------------|--------|
179
+ | **Calibration** | Do probabilities reflect reality? | 35% |
180
+ | **Failure** | Does confidence correlate with accuracy? | 30% |
181
+ | **Bias** | Are all groups treated equally? | 25% |
182
+ | **Representation** | Is the embedding space well-structured? | 10% |
183
+
184
+ | Score | Grade | Recommendation |
185
+ |-------|-------|----------------|
186
+ | 80-100 | A | Production-ready |
187
+ | 60-79 | B | Good - fix flagged issues first |
188
+ | 40-59 | C | Investigate before deployment |
189
+ | 0-39 | D | Do not deploy |
190
+
191
+ ---
192
+
193
+ ## The Failure Showcase
194
+
195
+ Find your model's most dangerous mistakes in 1 line:
196
+
197
+ ```python
198
+ report.show_failures(top_k=5)
199
+ ```
200
+
201
+ **Output:**
202
+
203
+ ```text
204
+ ==================================================================
205
+ TOP 5 CRITICAL FAILURES
206
+ GradientBoostingClassifier | 58 total errors / 700 samples (8.3%)
207
+ ==================================================================
208
+ # Sample True Pred Confidence Danger
209
+ ------------------------------------------------------
210
+ 1 412 1 0 97.4% CRITICAL
211
+ 2 88 0 1 95.1% CRITICAL
212
+ 3 301 1 0 91.8% HIGH
213
+ 4 556 0 1 89.2% HIGH
214
+
215
+ Insights:
216
+ Mean confidence on top failures: 93.4%
217
+ These are high-confidence mistakes - the model is
218
+ certain it is right, but it is wrong.
219
+ Overconfidence detected - consider calibration.
220
+ ```
221
+
222
+ ---
223
+
224
+ ## Real-World Use Cases
225
+
226
+ ### Medical AI
227
+ A diagnostic model with 94% accuracy has an ECE of 0.18 - dangerously overconfident on edge cases. TrustLens surfaces it before deployment.
228
+
229
+ ### Fraud Detection
230
+ Your model's confidence gap is 0.04 - it's almost as confident on fraud it misses as on fraud it catches. That's your false-negative problem, quantified.
231
+
232
+ ### Hiring, Loan, and Insurance
233
+ Subgroup analysis reveals a 23% accuracy gap between applicant demographics. You have a fairness problem. Now you know before a regulator tells you.
234
+
235
+ ### Research
236
+ Use CKA to compare representation quality across model architectures. Use faithfulness testing to benchmark explanation methods honestly.
237
+
238
+ ## Repository Structure
239
+
240
+ ```text
241
+ TrustLens/
242
+ ├── assets/
243
+ │ ├── banner.png
244
+ │ └── logo.png
245
+ ├── docs/
246
+ │ ├── DESIGN_PRINCIPLES.md
247
+ │ ├── FUTURE_EXTENSIONS.md
248
+ │ ├── GITHUB_ISSUES.md
249
+ │ ├── POSITIONING.md
250
+ │ └── REWRITTEN_ISSUES.md
251
+ ├── examples/
252
+ │ ├── calibration_deep_dive.py
253
+ │ ├── cnn_vs_vit_trustlens.py
254
+ │ ├── custom_plugin_demo.py
255
+ │ ├── quickstart.py
256
+ │ └── trustlens_demo.ipynb
257
+ ├── .github/workflows/
258
+ │ └── ci.yml
259
+ ├── tests/
260
+ │ ├── test_api.py
261
+ │ ├── test_bias.py
262
+ │ ├── test_calibration.py
263
+ │ ├── test_failure.py
264
+ │ ├── test_output_formatting.py
265
+ │ ├── test_plugins.py
266
+ │ ├── test_representation.py
267
+ │ └── test_trust_score.py
268
+ ├── trustlens/
269
+ │ ├── explainability/
270
+ │ │ ├── faithfulness.py
271
+ │ │ └── gradcam.py
272
+ │ ├── metrics/
273
+ │ │ ├── bias.py
274
+ │ │ ├── calibration.py
275
+ │ │ ├── failure.py
276
+ │ │ ├── faithfulness.py
277
+ │ │ └── representation.py
278
+ │ ├── plugins/
279
+ │ │ ├── base.py
280
+ │ │ └── registry.py
281
+ │ ├── visualization/
282
+ │ │ ├── bias_plots.py
283
+ │ │ ├── calibration_plots.py
284
+ │ │ ├── failure_plots.py
285
+ │ │ ├── representation_plots.py
286
+ │ │ └── summary_plot.py
287
+ │ ├── api.py
288
+ │ ├── report.py
289
+ │ ├── trust_score.py
290
+ │ └── utils.py
291
+ ├── CHANGELOG.md
292
+ ├── CONTRIBUTING.md
293
+ ├── LICENSE
294
+ ├── Makefile
295
+ ├── pyproject.toml
296
+ ├── README.md
297
+ ├── requirements.txt
298
+ └── ROADMAP.md
299
+ ```
300
+
301
+ ---
302
+
303
+ ## Contributing
304
+
305
+ TrustLens is designed to grow with the community. Adding a new metric takes just four simple steps:
306
+
307
+ > **Testing Policy:** Current test coverage is 67% to ensure core stability. It will be incrementally improved toward 85%+ as advanced modules (e.g., explainability, visualization) receive additional tests. All new contributions must maintain or improve this baseline.
308
+
309
+ 1. Write a pure function `my_metric(y_true, y_pred) -> float`
310
+ 2. Add it to the appropriate module (`metrics/calibration.py`, etc.)
311
+ 3. Export it from `metrics/__init__.py`
312
+ 4. Write a test in `tests/test_<module>.py`
313
+
314
+ See `CONTRIBUTING.md` for the full guide including instructions on adding plugins and explainability methods.
315
+ Review `docs/GITHUB_ISSUES.md` for open tasks ready to be developed.
316
+
317
+ ---
318
+
319
+ ## Citation
320
+
321
+ ```bibtex
322
+ @software{trustlens2026,
323
+ author = {Shahid Ul Islam},
324
+ title = {TrustLens: Debug your ML models beyond accuracy},
325
+ year = {2026},
326
+ url = {https://github.com/Khanz9664/TrustLens},
327
+ }
328
+ ```
329
+
330
+ ---
331
+
332
+ ## Author & Maintainer
333
+
334
+ **Shahid Ul Islam**
335
+ - **GitHub**: [Khanz9664](https://github.com/Khanz9664)
336
+ - **Portfolio**: [Visit Portfolio](https://khanz9664.github.io/portfolio/)
337
+ - **LinkedIn**: [Connect on LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/)
338
+ - **Instagram**: [Follow on Instagram](https://instagram.com/shaddy9664)
339
+
340
+ ---
341
+
342
+ <div align="center">
343
+
344
+ **If TrustLens saved you from a bad deployment, star it.**
345
+ It helps other engineers find it before they make the same mistake.
346
+
347
+ [GitHub](https://github.com/Khanz9664/TrustLens) | [Portfolio](https://khanz9664.github.io/portfolio/) | [LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/) | [Discussions](https://github.com/Khanz9664/TrustLens/discussions)
348
+
349
+ </div>
@@ -0,0 +1,297 @@
1
+ <div align="center">
2
+ <img src="assets/banner.png" width="180" alt="TrustLens Logo">
3
+
4
+ # TrustLens
5
+
6
+ ### Your model has 92% accuracy. But can you trust it?
7
+
8
+ **TrustLens is the open-source library that answers the questions accuracy never does.**
9
+
10
+ [![CI](https://github.com/Khanz9664/TrustLens/actions/workflows/ci.yml/badge.svg)](https://github.com/Khanz9664/TrustLens/actions)
11
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
12
+ [![Stars](https://img.shields.io/github/stars/Khanz9664/TrustLens?style=social)](https://github.com/Khanz9664/TrustLens/stargazers)
13
+
14
+ </div>
15
+
16
+ ---
17
+
18
+ ## The Problem Nobody Talks About
19
+
20
+ You trained a model. It hits **92% accuracy** on your validation set.
21
+
22
+ So you ship it.
23
+
24
+ Three months later:
25
+ - A minority-class user gets consistently wrong predictions
26
+ - The model is 90% confident on its worst mistakes
27
+ - A regulator asks "why did it make that decision?" and you have no answer
28
+
29
+ **Accuracy tells you how often your model is right. It tells you nothing about when it fails, why it fails, or who it fails.**
30
+
31
+ TrustLens fixes that. In one function call.
32
+
33
+ ---
34
+
35
+ ## Quick Analyze (Zero-Friction Start)
36
+
37
+ Try TrustLens instantly without bringing your own data or models. We provide a zero-friction entry point:
38
+
39
+ ```python
40
+ from trustlens import quick_analyze
41
+
42
+ # Automatically loads the breast cancer dataset, trains a baseline logic model,
43
+ # and runs the full analysis, returning a TrustReport and rendering the dashboard.
44
+ report = quick_analyze(dataset="breast_cancer")
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Quick Usage with Custom Models
50
+
51
+ ```bash
52
+ pip install trustlens
53
+ ```
54
+
55
+ ```python
56
+ from trustlens import analyze
57
+
58
+ report = analyze(
59
+ model, # any sklearn-compatible model
60
+ X_val, # validation features
61
+ y_val, # ground truth
62
+ y_prob=proba, # predicted probabilities
63
+ )
64
+
65
+ print(report.trust_score)
66
+ report.show()
67
+ ```
68
+
69
+ **Output Insight:**
70
+
71
+ ```text
72
+ ==================================================================
73
+ TrustLens Report
74
+ ==================================================================
75
+ Timestamp : 2026-04-16T15:43:02Z
76
+ Model : RandomForestClassifier
77
+ Samples : 2,500
78
+ Classes : 2
79
+
80
+ ==================================================================
81
+ TRUST SCORE: 61/100 [B]
82
+ Assessment: Good Trust - minor issues to address
83
+ ==================================================================
84
+
85
+ Key Observations:
86
+ * Calibration needs improvement (ECE > 0.1).
87
+ * Model is overconfident on incorrect predictions (low confidence gap).
88
+
89
+ ==================================================================
90
+ Dimension breakdown:
91
+ calibration 52.3/100 []
92
+ failure 74.1/100 []
93
+ bias 41.2/100 []
94
+ representation 68.5/100 []
95
+ ```
96
+
97
+ Your calibration is fine. Your bias score is not. **TrustLens just saved you a PR disaster.**
98
+
99
+ ---
100
+
101
+ ## The Summary Dashboard
102
+
103
+ One line. One picture. Everything you need.
104
+
105
+ ```python
106
+ report.summary_plot()
107
+ ```
108
+
109
+ The presentation-ready 6-panel dashboard shows:
110
+ - **Trust Score gauge**: Your model's overall trustworthiness at a glance
111
+ - **Reliability diagram**: Is your model overconfident or underconfident?
112
+ - **Confidence gap**: Does high confidence actually mean high accuracy?
113
+ - **Error rate by class**: Which classes are being failed?
114
+ - **Class distribution**: Is your training data biased?
115
+ - **Sub-score breakdown**: Which dimension needs the most work?
116
+
117
+ ---
118
+
119
+ ## The Trust Score
120
+
121
+ A single, actionable number: **0 to 100**.
122
+
123
+ Computed from four dimensions, each independently interpretable:
124
+
125
+ | Dimension | What it measures | Weight |
126
+ |-----------|-----------------|--------|
127
+ | **Calibration** | Do probabilities reflect reality? | 35% |
128
+ | **Failure** | Does confidence correlate with accuracy? | 30% |
129
+ | **Bias** | Are all groups treated equally? | 25% |
130
+ | **Representation** | Is the embedding space well-structured? | 10% |
131
+
132
+ | Score | Grade | Recommendation |
133
+ |-------|-------|----------------|
134
+ | 80-100 | A | Production-ready |
135
+ | 60-79 | B | Good - fix flagged issues first |
136
+ | 40-59 | C | Investigate before deployment |
137
+ | 0-39 | D | Do not deploy |
138
+
139
+ ---
140
+
141
+ ## The Failure Showcase
142
+
143
+ Find your model's most dangerous mistakes in 1 line:
144
+
145
+ ```python
146
+ report.show_failures(top_k=5)
147
+ ```
148
+
149
+ **Output:**
150
+
151
+ ```text
152
+ ==================================================================
153
+ TOP 5 CRITICAL FAILURES
154
+ GradientBoostingClassifier | 58 total errors / 700 samples (8.3%)
155
+ ==================================================================
156
+ # Sample True Pred Confidence Danger
157
+ ------------------------------------------------------
158
+ 1 412 1 0 97.4% CRITICAL
159
+ 2 88 0 1 95.1% CRITICAL
160
+ 3 301 1 0 91.8% HIGH
161
+ 4 556 0 1 89.2% HIGH
162
+
163
+ Insights:
164
+ Mean confidence on top failures: 93.4%
165
+ These are high-confidence mistakes - the model is
166
+ certain it is right, but it is wrong.
167
+ Overconfidence detected - consider calibration.
168
+ ```
169
+
170
+ ---
171
+
172
+ ## Real-World Use Cases
173
+
174
+ ### Medical AI
175
+ A diagnostic model with 94% accuracy has an ECE of 0.18 - dangerously overconfident on edge cases. TrustLens surfaces it before deployment.
176
+
177
+ ### Fraud Detection
178
+ Your model's confidence gap is 0.04 - it's almost as confident on fraud it misses as on fraud it catches. That's your false-negative problem, quantified.
179
+
180
+ ### Hiring, Loan, and Insurance
181
+ Subgroup analysis reveals a 23% accuracy gap between applicant demographics. You have a fairness problem. Now you know before a regulator tells you.
182
+
183
+ ### Research
184
+ Use CKA to compare representation quality across model architectures. Use faithfulness testing to benchmark explanation methods honestly.
185
+
186
+ ## Repository Structure
187
+
188
+ ```text
189
+ TrustLens/
190
+ ├── assets/
191
+ │ ├── banner.png
192
+ │ └── logo.png
193
+ ├── docs/
194
+ │ ├── DESIGN_PRINCIPLES.md
195
+ │ ├── FUTURE_EXTENSIONS.md
196
+ │ ├── GITHUB_ISSUES.md
197
+ │ ├── POSITIONING.md
198
+ │ └── REWRITTEN_ISSUES.md
199
+ ├── examples/
200
+ │ ├── calibration_deep_dive.py
201
+ │ ├── cnn_vs_vit_trustlens.py
202
+ │ ├── custom_plugin_demo.py
203
+ │ ├── quickstart.py
204
+ │ └── trustlens_demo.ipynb
205
+ ├── .github/workflows/
206
+ │ └── ci.yml
207
+ ├── tests/
208
+ │ ├── test_api.py
209
+ │ ├── test_bias.py
210
+ │ ├── test_calibration.py
211
+ │ ├── test_failure.py
212
+ │ ├── test_output_formatting.py
213
+ │ ├── test_plugins.py
214
+ │ ├── test_representation.py
215
+ │ └── test_trust_score.py
216
+ ├── trustlens/
217
+ │ ├── explainability/
218
+ │ │ ├── faithfulness.py
219
+ │ │ └── gradcam.py
220
+ │ ├── metrics/
221
+ │ │ ├── bias.py
222
+ │ │ ├── calibration.py
223
+ │ │ ├── failure.py
224
+ │ │ ├── faithfulness.py
225
+ │ │ └── representation.py
226
+ │ ├── plugins/
227
+ │ │ ├── base.py
228
+ │ │ └── registry.py
229
+ │ ├── visualization/
230
+ │ │ ├── bias_plots.py
231
+ │ │ ├── calibration_plots.py
232
+ │ │ ├── failure_plots.py
233
+ │ │ ├── representation_plots.py
234
+ │ │ └── summary_plot.py
235
+ │ ├── api.py
236
+ │ ├── report.py
237
+ │ ├── trust_score.py
238
+ │ └── utils.py
239
+ ├── CHANGELOG.md
240
+ ├── CONTRIBUTING.md
241
+ ├── LICENSE
242
+ ├── Makefile
243
+ ├── pyproject.toml
244
+ ├── README.md
245
+ ├── requirements.txt
246
+ └── ROADMAP.md
247
+ ```
248
+
249
+ ---
250
+
251
+ ## Contributing
252
+
253
+ TrustLens is designed to grow with the community. Adding a new metric takes just four simple steps:
254
+
255
+ > **Testing Policy:** Current test coverage is 67% to ensure core stability. It will be incrementally improved toward 85%+ as advanced modules (e.g., explainability, visualization) receive additional tests. All new contributions must maintain or improve this baseline.
256
+
257
+ 1. Write a pure function `my_metric(y_true, y_pred) -> float`
258
+ 2. Add it to the appropriate module (`metrics/calibration.py`, etc.)
259
+ 3. Export it from `metrics/__init__.py`
260
+ 4. Write a test in `tests/test_<module>.py`
261
+
262
+ See `CONTRIBUTING.md` for the full guide including instructions on adding plugins and explainability methods.
263
+ Review `docs/GITHUB_ISSUES.md` for open tasks ready to be developed.
264
+
265
+ ---
266
+
267
+ ## Citation
268
+
269
+ ```bibtex
270
+ @software{trustlens2026,
271
+ author = {Shahid Ul Islam},
272
+ title = {TrustLens: Debug your ML models beyond accuracy},
273
+ year = {2026},
274
+ url = {https://github.com/Khanz9664/TrustLens},
275
+ }
276
+ ```
277
+
278
+ ---
279
+
280
+ ## Author & Maintainer
281
+
282
+ **Shahid Ul Islam**
283
+ - **GitHub**: [Khanz9664](https://github.com/Khanz9664)
284
+ - **Portfolio**: [Visit Portfolio](https://khanz9664.github.io/portfolio/)
285
+ - **LinkedIn**: [Connect on LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/)
286
+ - **Instagram**: [Follow on Instagram](https://instagram.com/shaddy9664)
287
+
288
+ ---
289
+
290
+ <div align="center">
291
+
292
+ **If TrustLens saved you from a bad deployment, star it.**
293
+ It helps other engineers find it before they make the same mistake.
294
+
295
+ [GitHub](https://github.com/Khanz9664/TrustLens) | [Portfolio](https://khanz9664.github.io/portfolio/) | [LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/) | [Discussions](https://github.com/Khanz9664/TrustLens/discussions)
296
+
297
+ </div>