trustlens 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- trustlens-0.1.1/LICENSE +21 -0
- trustlens-0.1.1/PKG-INFO +349 -0
- trustlens-0.1.1/README.md +297 -0
- trustlens-0.1.1/pyproject.toml +145 -0
- trustlens-0.1.1/setup.cfg +4 -0
- trustlens-0.1.1/tests/test_api.py +163 -0
- trustlens-0.1.1/tests/test_bias.py +73 -0
- trustlens-0.1.1/tests/test_calibration.py +165 -0
- trustlens-0.1.1/tests/test_failure.py +78 -0
- trustlens-0.1.1/tests/test_output_formatting.py +45 -0
- trustlens-0.1.1/tests/test_plugins.py +119 -0
- trustlens-0.1.1/tests/test_representation.py +82 -0
- trustlens-0.1.1/tests/test_trust_score.py +292 -0
- trustlens-0.1.1/trustlens/__init__.py +30 -0
- trustlens-0.1.1/trustlens/api.py +264 -0
- trustlens-0.1.1/trustlens/explainability/__init__.py +25 -0
- trustlens-0.1.1/trustlens/explainability/faithfulness.py +232 -0
- trustlens-0.1.1/trustlens/explainability/gradcam.py +223 -0
- trustlens-0.1.1/trustlens/metrics/__init__.py +35 -0
- trustlens-0.1.1/trustlens/metrics/bias.py +164 -0
- trustlens-0.1.1/trustlens/metrics/calibration.py +232 -0
- trustlens-0.1.1/trustlens/metrics/failure.py +159 -0
- trustlens-0.1.1/trustlens/metrics/faithfulness.py +50 -0
- trustlens-0.1.1/trustlens/metrics/representation.py +209 -0
- trustlens-0.1.1/trustlens/plugins/__init__.py +35 -0
- trustlens-0.1.1/trustlens/plugins/base.py +100 -0
- trustlens-0.1.1/trustlens/plugins/registry.py +143 -0
- trustlens-0.1.1/trustlens/report.py +657 -0
- trustlens-0.1.1/trustlens/trust_score.py +373 -0
- trustlens-0.1.1/trustlens/utils.py +105 -0
- trustlens-0.1.1/trustlens/visualization/__init__.py +97 -0
- trustlens-0.1.1/trustlens/visualization/bias_plots.py +106 -0
- trustlens-0.1.1/trustlens/visualization/calibration_plots.py +138 -0
- trustlens-0.1.1/trustlens/visualization/failure_plots.py +110 -0
- trustlens-0.1.1/trustlens/visualization/representation_plots.py +129 -0
- trustlens-0.1.1/trustlens/visualization/summary_plot.py +588 -0
- trustlens-0.1.1/trustlens.egg-info/PKG-INFO +349 -0
- trustlens-0.1.1/trustlens.egg-info/SOURCES.txt +39 -0
- trustlens-0.1.1/trustlens.egg-info/dependency_links.txt +1 -0
- trustlens-0.1.1/trustlens.egg-info/requires.txt +27 -0
- trustlens-0.1.1/trustlens.egg-info/top_level.txt +3 -0
trustlens-0.1.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 TrustLens Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
trustlens-0.1.1/PKG-INFO
ADDED
|
@@ -0,0 +1,349 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: trustlens
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: Debug your ML models beyond accuracy.
|
|
5
|
+
Author-email: Shahid Ul Islam <maintainers@trustlens.dev>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/Khanz9664/TrustLens
|
|
8
|
+
Project-URL: Source, https://github.com/Khanz9664/TrustLens
|
|
9
|
+
Project-URL: Portfolio, https://khanz9664.github.io/portfolio/
|
|
10
|
+
Project-URL: LinkedIn, https://www.linkedin.com/in/shahid-ul-islam-13650998/
|
|
11
|
+
Project-URL: Instagram, https://instagram.com/shaddy9664
|
|
12
|
+
Keywords: machine learning,model analysis,calibration,explainability,fairness,bias detection
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: Science/Research
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Operating System :: OS Independent
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
24
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
25
|
+
Requires-Python: >=3.9
|
|
26
|
+
Description-Content-Type: text/markdown
|
|
27
|
+
License-File: LICENSE
|
|
28
|
+
Requires-Dist: numpy>=1.23
|
|
29
|
+
Requires-Dist: scikit-learn>=1.2
|
|
30
|
+
Requires-Dist: matplotlib>=3.6
|
|
31
|
+
Requires-Dist: scipy>=1.9
|
|
32
|
+
Provides-Extra: torch
|
|
33
|
+
Requires-Dist: torch>=2.0; extra == "torch"
|
|
34
|
+
Requires-Dist: torchvision>=0.15; extra == "torch"
|
|
35
|
+
Provides-Extra: full
|
|
36
|
+
Requires-Dist: torch>=2.0; extra == "full"
|
|
37
|
+
Requires-Dist: torchvision>=0.15; extra == "full"
|
|
38
|
+
Requires-Dist: shap>=0.42; extra == "full"
|
|
39
|
+
Requires-Dist: captum>=0.6; extra == "full"
|
|
40
|
+
Provides-Extra: dev
|
|
41
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
42
|
+
Requires-Dist: pytest-cov>=4.0; extra == "dev"
|
|
43
|
+
Requires-Dist: ruff>=0.4; extra == "dev"
|
|
44
|
+
Requires-Dist: mypy>=1.0; extra == "dev"
|
|
45
|
+
Requires-Dist: pre-commit>=3.0; extra == "dev"
|
|
46
|
+
Provides-Extra: docs
|
|
47
|
+
Requires-Dist: sphinx>=6.0; extra == "docs"
|
|
48
|
+
Requires-Dist: sphinx-rtd-theme>=1.2; extra == "docs"
|
|
49
|
+
Requires-Dist: myst-parser>=2.0; extra == "docs"
|
|
50
|
+
Requires-Dist: nbsphinx>=0.9; extra == "docs"
|
|
51
|
+
Dynamic: license-file
|
|
52
|
+
|
|
53
|
+
<div align="center">
|
|
54
|
+
<img src="assets/banner.png" width="180" alt="TrustLens Logo">
|
|
55
|
+
|
|
56
|
+
# TrustLens
|
|
57
|
+
|
|
58
|
+
### Your model has 92% accuracy. But can you trust it?
|
|
59
|
+
|
|
60
|
+
**TrustLens is the open-source library that answers the questions accuracy never does.**
|
|
61
|
+
|
|
62
|
+
[](https://github.com/Khanz9664/TrustLens/actions)
|
|
63
|
+
[](LICENSE)
|
|
64
|
+
[](https://github.com/Khanz9664/TrustLens/stargazers)
|
|
65
|
+
|
|
66
|
+
</div>
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## The Problem Nobody Talks About
|
|
71
|
+
|
|
72
|
+
You trained a model. It hits **92% accuracy** on your validation set.
|
|
73
|
+
|
|
74
|
+
So you ship it.
|
|
75
|
+
|
|
76
|
+
Three months later:
|
|
77
|
+
- A minority-class user gets consistently wrong predictions
|
|
78
|
+
- The model is 90% confident on its worst mistakes
|
|
79
|
+
- A regulator asks "why did it make that decision?" and you have no answer
|
|
80
|
+
|
|
81
|
+
**Accuracy tells you how often your model is right. It tells you nothing about when it fails, why it fails, or who it fails.**
|
|
82
|
+
|
|
83
|
+
TrustLens fixes that. In one function call.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Quick Analyze (Zero-Friction Start)
|
|
88
|
+
|
|
89
|
+
Try TrustLens instantly without bringing your own data or models. We provide a zero-friction entry point:
|
|
90
|
+
|
|
91
|
+
```python
|
|
92
|
+
from trustlens import quick_analyze
|
|
93
|
+
|
|
94
|
+
# Automatically loads the breast cancer dataset, trains a baseline logic model,
|
|
95
|
+
# and runs the full analysis, returning a TrustReport and rendering the dashboard.
|
|
96
|
+
report = quick_analyze(dataset="breast_cancer")
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Quick Usage with Custom Models
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
pip install trustlens
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
from trustlens import analyze
|
|
109
|
+
|
|
110
|
+
report = analyze(
|
|
111
|
+
model, # any sklearn-compatible model
|
|
112
|
+
X_val, # validation features
|
|
113
|
+
y_val, # ground truth
|
|
114
|
+
y_prob=proba, # predicted probabilities
|
|
115
|
+
)
|
|
116
|
+
|
|
117
|
+
print(report.trust_score)
|
|
118
|
+
report.show()
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Output Insight:**
|
|
122
|
+
|
|
123
|
+
```text
|
|
124
|
+
==================================================================
|
|
125
|
+
TrustLens Report
|
|
126
|
+
==================================================================
|
|
127
|
+
Timestamp : 2026-04-16T15:43:02Z
|
|
128
|
+
Model : RandomForestClassifier
|
|
129
|
+
Samples : 2,500
|
|
130
|
+
Classes : 2
|
|
131
|
+
|
|
132
|
+
==================================================================
|
|
133
|
+
TRUST SCORE: 61/100 [B]
|
|
134
|
+
Assessment: Good Trust - minor issues to address
|
|
135
|
+
==================================================================
|
|
136
|
+
|
|
137
|
+
Key Observations:
|
|
138
|
+
* Calibration needs improvement (ECE > 0.1).
|
|
139
|
+
* Model is overconfident on incorrect predictions (low confidence gap).
|
|
140
|
+
|
|
141
|
+
==================================================================
|
|
142
|
+
Dimension breakdown:
|
|
143
|
+
calibration 52.3/100 []
|
|
144
|
+
failure 74.1/100 []
|
|
145
|
+
bias 41.2/100 []
|
|
146
|
+
representation 68.5/100 []
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Your calibration is fine. Your bias score is not. **TrustLens just saved you a PR disaster.**
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## The Summary Dashboard
|
|
154
|
+
|
|
155
|
+
One line. One picture. Everything you need.
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
report.summary_plot()
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
The presentation-ready 6-panel dashboard shows:
|
|
162
|
+
- **Trust Score gauge**: Your model's overall trustworthiness at a glance
|
|
163
|
+
- **Reliability diagram**: Is your model overconfident or underconfident?
|
|
164
|
+
- **Confidence gap**: Does high confidence actually mean high accuracy?
|
|
165
|
+
- **Error rate by class**: Which classes are being failed?
|
|
166
|
+
- **Class distribution**: Is your training data biased?
|
|
167
|
+
- **Sub-score breakdown**: Which dimension needs the most work?
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## The Trust Score
|
|
172
|
+
|
|
173
|
+
A single, actionable number: **0 to 100**.
|
|
174
|
+
|
|
175
|
+
Computed from four dimensions, each independently interpretable:
|
|
176
|
+
|
|
177
|
+
| Dimension | What it measures | Weight |
|
|
178
|
+
|-----------|-----------------|--------|
|
|
179
|
+
| **Calibration** | Do probabilities reflect reality? | 35% |
|
|
180
|
+
| **Failure** | Does confidence correlate with accuracy? | 30% |
|
|
181
|
+
| **Bias** | Are all groups treated equally? | 25% |
|
|
182
|
+
| **Representation** | Is the embedding space well-structured? | 10% |
|
|
183
|
+
|
|
184
|
+
| Score | Grade | Recommendation |
|
|
185
|
+
|-------|-------|----------------|
|
|
186
|
+
| 80-100 | A | Production-ready |
|
|
187
|
+
| 60-79 | B | Good - fix flagged issues first |
|
|
188
|
+
| 40-59 | C | Investigate before deployment |
|
|
189
|
+
| 0-39 | D | Do not deploy |
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## The Failure Showcase
|
|
194
|
+
|
|
195
|
+
Find your model's most dangerous mistakes in 1 line:
|
|
196
|
+
|
|
197
|
+
```python
|
|
198
|
+
report.show_failures(top_k=5)
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
**Output:**
|
|
202
|
+
|
|
203
|
+
```text
|
|
204
|
+
==================================================================
|
|
205
|
+
TOP 5 CRITICAL FAILURES
|
|
206
|
+
GradientBoostingClassifier | 58 total errors / 700 samples (8.3%)
|
|
207
|
+
==================================================================
|
|
208
|
+
# Sample True Pred Confidence Danger
|
|
209
|
+
------------------------------------------------------
|
|
210
|
+
1 412 1 0 97.4% CRITICAL
|
|
211
|
+
2 88 0 1 95.1% CRITICAL
|
|
212
|
+
3 301 1 0 91.8% HIGH
|
|
213
|
+
4 556 0 1 89.2% HIGH
|
|
214
|
+
|
|
215
|
+
Insights:
|
|
216
|
+
Mean confidence on top failures: 93.4%
|
|
217
|
+
These are high-confidence mistakes - the model is
|
|
218
|
+
certain it is right, but it is wrong.
|
|
219
|
+
Overconfidence detected - consider calibration.
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Real-World Use Cases
|
|
225
|
+
|
|
226
|
+
### Medical AI
|
|
227
|
+
A diagnostic model with 94% accuracy has an ECE of 0.18 - dangerously overconfident on edge cases. TrustLens surfaces it before deployment.
|
|
228
|
+
|
|
229
|
+
### Fraud Detection
|
|
230
|
+
Your model's confidence gap is 0.04 - it's almost as confident on fraud it misses as on fraud it catches. That's your false-negative problem, quantified.
|
|
231
|
+
|
|
232
|
+
### Hiring, Loan, and Insurance
|
|
233
|
+
Subgroup analysis reveals a 23% accuracy gap between applicant demographics. You have a fairness problem. Now you know before a regulator tells you.
|
|
234
|
+
|
|
235
|
+
### Research
|
|
236
|
+
Use CKA to compare representation quality across model architectures. Use faithfulness testing to benchmark explanation methods honestly.
|
|
237
|
+
|
|
238
|
+
## Repository Structure
|
|
239
|
+
|
|
240
|
+
```text
|
|
241
|
+
TrustLens/
|
|
242
|
+
├── assets/
|
|
243
|
+
│ ├── banner.png
|
|
244
|
+
│ └── logo.png
|
|
245
|
+
├── docs/
|
|
246
|
+
│ ├── DESIGN_PRINCIPLES.md
|
|
247
|
+
│ ├── FUTURE_EXTENSIONS.md
|
|
248
|
+
│ ├── GITHUB_ISSUES.md
|
|
249
|
+
│ ├── POSITIONING.md
|
|
250
|
+
│ └── REWRITTEN_ISSUES.md
|
|
251
|
+
├── examples/
|
|
252
|
+
│ ├── calibration_deep_dive.py
|
|
253
|
+
│ ├── cnn_vs_vit_trustlens.py
|
|
254
|
+
│ ├── custom_plugin_demo.py
|
|
255
|
+
│ ├── quickstart.py
|
|
256
|
+
│ └── trustlens_demo.ipynb
|
|
257
|
+
├── .github/workflows/
|
|
258
|
+
│ └── ci.yml
|
|
259
|
+
├── tests/
|
|
260
|
+
│ ├── test_api.py
|
|
261
|
+
│ ├── test_bias.py
|
|
262
|
+
│ ├── test_calibration.py
|
|
263
|
+
│ ├── test_failure.py
|
|
264
|
+
│ ├── test_output_formatting.py
|
|
265
|
+
│ ├── test_plugins.py
|
|
266
|
+
│ ├── test_representation.py
|
|
267
|
+
│ └── test_trust_score.py
|
|
268
|
+
├── trustlens/
|
|
269
|
+
│ ├── explainability/
|
|
270
|
+
│ │ ├── faithfulness.py
|
|
271
|
+
│ │ └── gradcam.py
|
|
272
|
+
│ ├── metrics/
|
|
273
|
+
│ │ ├── bias.py
|
|
274
|
+
│ │ ├── calibration.py
|
|
275
|
+
│ │ ├── failure.py
|
|
276
|
+
│ │ ├── faithfulness.py
|
|
277
|
+
│ │ └── representation.py
|
|
278
|
+
│ ├── plugins/
|
|
279
|
+
│ │ ├── base.py
|
|
280
|
+
│ │ └── registry.py
|
|
281
|
+
│ ├── visualization/
|
|
282
|
+
│ │ ├── bias_plots.py
|
|
283
|
+
│ │ ├── calibration_plots.py
|
|
284
|
+
│ │ ├── failure_plots.py
|
|
285
|
+
│ │ ├── representation_plots.py
|
|
286
|
+
│ │ └── summary_plot.py
|
|
287
|
+
│ ├── api.py
|
|
288
|
+
│ ├── report.py
|
|
289
|
+
│ ├── trust_score.py
|
|
290
|
+
│ └── utils.py
|
|
291
|
+
├── CHANGELOG.md
|
|
292
|
+
├── CONTRIBUTING.md
|
|
293
|
+
├── LICENSE
|
|
294
|
+
├── Makefile
|
|
295
|
+
├── pyproject.toml
|
|
296
|
+
├── README.md
|
|
297
|
+
├── requirements.txt
|
|
298
|
+
└── ROADMAP.md
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Contributing
|
|
304
|
+
|
|
305
|
+
TrustLens is designed to grow with the community. Adding a new metric takes just four simple steps:
|
|
306
|
+
|
|
307
|
+
> **Testing Policy:** Current test coverage is 67% to ensure core stability. It will be incrementally improved toward 85%+ as advanced modules (e.g., explainability, visualization) receive additional tests. All new contributions must maintain or improve this baseline.
|
|
308
|
+
|
|
309
|
+
1. Write a pure function `my_metric(y_true, y_pred) -> float`
|
|
310
|
+
2. Add it to the appropriate module (`metrics/calibration.py`, etc.)
|
|
311
|
+
3. Export it from `metrics/__init__.py`
|
|
312
|
+
4. Write a test in `tests/test_<module>.py`
|
|
313
|
+
|
|
314
|
+
See `CONTRIBUTING.md` for the full guide including instructions on adding plugins and explainability methods.
|
|
315
|
+
Review `docs/GITHUB_ISSUES.md` for open tasks ready to be developed.
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## Citation
|
|
320
|
+
|
|
321
|
+
```bibtex
|
|
322
|
+
@software{trustlens2026,
|
|
323
|
+
author = {Shahid Ul Islam},
|
|
324
|
+
title = {TrustLens: Debug your ML models beyond accuracy},
|
|
325
|
+
year = {2026},
|
|
326
|
+
url = {https://github.com/Khanz9664/TrustLens},
|
|
327
|
+
}
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
---
|
|
331
|
+
|
|
332
|
+
## Author & Maintainer
|
|
333
|
+
|
|
334
|
+
**Shahid Ul Islam**
|
|
335
|
+
- **GitHub**: [Khanz9664](https://github.com/Khanz9664)
|
|
336
|
+
- **Portfolio**: [Visit Portfolio](https://khanz9664.github.io/portfolio/)
|
|
337
|
+
- **LinkedIn**: [Connect on LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/)
|
|
338
|
+
- **Instagram**: [Follow on Instagram](https://instagram.com/shaddy9664)
|
|
339
|
+
|
|
340
|
+
---
|
|
341
|
+
|
|
342
|
+
<div align="center">
|
|
343
|
+
|
|
344
|
+
**If TrustLens saved you from a bad deployment, star it.**
|
|
345
|
+
It helps other engineers find it before they make the same mistake.
|
|
346
|
+
|
|
347
|
+
[GitHub](https://github.com/Khanz9664/TrustLens) | [Portfolio](https://khanz9664.github.io/portfolio/) | [LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/) | [Discussions](https://github.com/Khanz9664/TrustLens/discussions)
|
|
348
|
+
|
|
349
|
+
</div>
|
|
@@ -0,0 +1,297 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
<img src="assets/banner.png" width="180" alt="TrustLens Logo">
|
|
3
|
+
|
|
4
|
+
# TrustLens
|
|
5
|
+
|
|
6
|
+
### Your model has 92% accuracy. But can you trust it?
|
|
7
|
+
|
|
8
|
+
**TrustLens is the open-source library that answers the questions accuracy never does.**
|
|
9
|
+
|
|
10
|
+
[](https://github.com/Khanz9664/TrustLens/actions)
|
|
11
|
+
[](LICENSE)
|
|
12
|
+
[](https://github.com/Khanz9664/TrustLens/stargazers)
|
|
13
|
+
|
|
14
|
+
</div>
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## The Problem Nobody Talks About
|
|
19
|
+
|
|
20
|
+
You trained a model. It hits **92% accuracy** on your validation set.
|
|
21
|
+
|
|
22
|
+
So you ship it.
|
|
23
|
+
|
|
24
|
+
Three months later:
|
|
25
|
+
- A minority-class user gets consistently wrong predictions
|
|
26
|
+
- The model is 90% confident on its worst mistakes
|
|
27
|
+
- A regulator asks "why did it make that decision?" and you have no answer
|
|
28
|
+
|
|
29
|
+
**Accuracy tells you how often your model is right. It tells you nothing about when it fails, why it fails, or who it fails.**
|
|
30
|
+
|
|
31
|
+
TrustLens fixes that. In one function call.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Quick Analyze (Zero-Friction Start)
|
|
36
|
+
|
|
37
|
+
Try TrustLens instantly without bringing your own data or models. We provide a zero-friction entry point:
|
|
38
|
+
|
|
39
|
+
```python
|
|
40
|
+
from trustlens import quick_analyze
|
|
41
|
+
|
|
42
|
+
# Automatically loads the breast cancer dataset, trains a baseline logic model,
|
|
43
|
+
# and runs the full analysis, returning a TrustReport and rendering the dashboard.
|
|
44
|
+
report = quick_analyze(dataset="breast_cancer")
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Quick Usage with Custom Models
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install trustlens
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
from trustlens import analyze
|
|
57
|
+
|
|
58
|
+
report = analyze(
|
|
59
|
+
model, # any sklearn-compatible model
|
|
60
|
+
X_val, # validation features
|
|
61
|
+
y_val, # ground truth
|
|
62
|
+
y_prob=proba, # predicted probabilities
|
|
63
|
+
)
|
|
64
|
+
|
|
65
|
+
print(report.trust_score)
|
|
66
|
+
report.show()
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
**Output Insight:**
|
|
70
|
+
|
|
71
|
+
```text
|
|
72
|
+
==================================================================
|
|
73
|
+
TrustLens Report
|
|
74
|
+
==================================================================
|
|
75
|
+
Timestamp : 2026-04-16T15:43:02Z
|
|
76
|
+
Model : RandomForestClassifier
|
|
77
|
+
Samples : 2,500
|
|
78
|
+
Classes : 2
|
|
79
|
+
|
|
80
|
+
==================================================================
|
|
81
|
+
TRUST SCORE: 61/100 [B]
|
|
82
|
+
Assessment: Good Trust - minor issues to address
|
|
83
|
+
==================================================================
|
|
84
|
+
|
|
85
|
+
Key Observations:
|
|
86
|
+
* Calibration needs improvement (ECE > 0.1).
|
|
87
|
+
* Model is overconfident on incorrect predictions (low confidence gap).
|
|
88
|
+
|
|
89
|
+
==================================================================
|
|
90
|
+
Dimension breakdown:
|
|
91
|
+
calibration 52.3/100 []
|
|
92
|
+
failure 74.1/100 []
|
|
93
|
+
bias 41.2/100 []
|
|
94
|
+
representation 68.5/100 []
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Your calibration is fine. Your bias score is not. **TrustLens just saved you a PR disaster.**
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## The Summary Dashboard
|
|
102
|
+
|
|
103
|
+
One line. One picture. Everything you need.
|
|
104
|
+
|
|
105
|
+
```python
|
|
106
|
+
report.summary_plot()
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
The presentation-ready 6-panel dashboard shows:
|
|
110
|
+
- **Trust Score gauge**: Your model's overall trustworthiness at a glance
|
|
111
|
+
- **Reliability diagram**: Is your model overconfident or underconfident?
|
|
112
|
+
- **Confidence gap**: Does high confidence actually mean high accuracy?
|
|
113
|
+
- **Error rate by class**: Which classes are being failed?
|
|
114
|
+
- **Class distribution**: Is your training data biased?
|
|
115
|
+
- **Sub-score breakdown**: Which dimension needs the most work?
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## The Trust Score
|
|
120
|
+
|
|
121
|
+
A single, actionable number: **0 to 100**.
|
|
122
|
+
|
|
123
|
+
Computed from four dimensions, each independently interpretable:
|
|
124
|
+
|
|
125
|
+
| Dimension | What it measures | Weight |
|
|
126
|
+
|-----------|-----------------|--------|
|
|
127
|
+
| **Calibration** | Do probabilities reflect reality? | 35% |
|
|
128
|
+
| **Failure** | Does confidence correlate with accuracy? | 30% |
|
|
129
|
+
| **Bias** | Are all groups treated equally? | 25% |
|
|
130
|
+
| **Representation** | Is the embedding space well-structured? | 10% |
|
|
131
|
+
|
|
132
|
+
| Score | Grade | Recommendation |
|
|
133
|
+
|-------|-------|----------------|
|
|
134
|
+
| 80-100 | A | Production-ready |
|
|
135
|
+
| 60-79 | B | Good - fix flagged issues first |
|
|
136
|
+
| 40-59 | C | Investigate before deployment |
|
|
137
|
+
| 0-39 | D | Do not deploy |
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## The Failure Showcase
|
|
142
|
+
|
|
143
|
+
Find your model's most dangerous mistakes in 1 line:
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
report.show_failures(top_k=5)
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
**Output:**
|
|
150
|
+
|
|
151
|
+
```text
|
|
152
|
+
==================================================================
|
|
153
|
+
TOP 5 CRITICAL FAILURES
|
|
154
|
+
GradientBoostingClassifier | 58 total errors / 700 samples (8.3%)
|
|
155
|
+
==================================================================
|
|
156
|
+
# Sample True Pred Confidence Danger
|
|
157
|
+
------------------------------------------------------
|
|
158
|
+
1 412 1 0 97.4% CRITICAL
|
|
159
|
+
2 88 0 1 95.1% CRITICAL
|
|
160
|
+
3 301 1 0 91.8% HIGH
|
|
161
|
+
4 556 0 1 89.2% HIGH
|
|
162
|
+
|
|
163
|
+
Insights:
|
|
164
|
+
Mean confidence on top failures: 93.4%
|
|
165
|
+
These are high-confidence mistakes - the model is
|
|
166
|
+
certain it is right, but it is wrong.
|
|
167
|
+
Overconfidence detected - consider calibration.
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Real-World Use Cases
|
|
173
|
+
|
|
174
|
+
### Medical AI
|
|
175
|
+
A diagnostic model with 94% accuracy has an ECE of 0.18 - dangerously overconfident on edge cases. TrustLens surfaces it before deployment.
|
|
176
|
+
|
|
177
|
+
### Fraud Detection
|
|
178
|
+
Your model's confidence gap is 0.04 - it's almost as confident on fraud it misses as on fraud it catches. That's your false-negative problem, quantified.
|
|
179
|
+
|
|
180
|
+
### Hiring, Loan, and Insurance
|
|
181
|
+
Subgroup analysis reveals a 23% accuracy gap between applicant demographics. You have a fairness problem. Now you know before a regulator tells you.
|
|
182
|
+
|
|
183
|
+
### Research
|
|
184
|
+
Use CKA to compare representation quality across model architectures. Use faithfulness testing to benchmark explanation methods honestly.
|
|
185
|
+
|
|
186
|
+
## Repository Structure
|
|
187
|
+
|
|
188
|
+
```text
|
|
189
|
+
TrustLens/
|
|
190
|
+
├── assets/
|
|
191
|
+
│ ├── banner.png
|
|
192
|
+
│ └── logo.png
|
|
193
|
+
├── docs/
|
|
194
|
+
│ ├── DESIGN_PRINCIPLES.md
|
|
195
|
+
│ ├── FUTURE_EXTENSIONS.md
|
|
196
|
+
│ ├── GITHUB_ISSUES.md
|
|
197
|
+
│ ├── POSITIONING.md
|
|
198
|
+
│ └── REWRITTEN_ISSUES.md
|
|
199
|
+
├── examples/
|
|
200
|
+
│ ├── calibration_deep_dive.py
|
|
201
|
+
│ ├── cnn_vs_vit_trustlens.py
|
|
202
|
+
│ ├── custom_plugin_demo.py
|
|
203
|
+
│ ├── quickstart.py
|
|
204
|
+
│ └── trustlens_demo.ipynb
|
|
205
|
+
├── .github/workflows/
|
|
206
|
+
│ └── ci.yml
|
|
207
|
+
├── tests/
|
|
208
|
+
│ ├── test_api.py
|
|
209
|
+
│ ├── test_bias.py
|
|
210
|
+
│ ├── test_calibration.py
|
|
211
|
+
│ ├── test_failure.py
|
|
212
|
+
│ ├── test_output_formatting.py
|
|
213
|
+
│ ├── test_plugins.py
|
|
214
|
+
│ ├── test_representation.py
|
|
215
|
+
│ └── test_trust_score.py
|
|
216
|
+
├── trustlens/
|
|
217
|
+
│ ├── explainability/
|
|
218
|
+
│ │ ├── faithfulness.py
|
|
219
|
+
│ │ └── gradcam.py
|
|
220
|
+
│ ├── metrics/
|
|
221
|
+
│ │ ├── bias.py
|
|
222
|
+
│ │ ├── calibration.py
|
|
223
|
+
│ │ ├── failure.py
|
|
224
|
+
│ │ ├── faithfulness.py
|
|
225
|
+
│ │ └── representation.py
|
|
226
|
+
│ ├── plugins/
|
|
227
|
+
│ │ ├── base.py
|
|
228
|
+
│ │ └── registry.py
|
|
229
|
+
│ ├── visualization/
|
|
230
|
+
│ │ ├── bias_plots.py
|
|
231
|
+
│ │ ├── calibration_plots.py
|
|
232
|
+
│ │ ├── failure_plots.py
|
|
233
|
+
│ │ ├── representation_plots.py
|
|
234
|
+
│ │ └── summary_plot.py
|
|
235
|
+
│ ├── api.py
|
|
236
|
+
│ ├── report.py
|
|
237
|
+
│ ├── trust_score.py
|
|
238
|
+
│ └── utils.py
|
|
239
|
+
├── CHANGELOG.md
|
|
240
|
+
├── CONTRIBUTING.md
|
|
241
|
+
├── LICENSE
|
|
242
|
+
├── Makefile
|
|
243
|
+
├── pyproject.toml
|
|
244
|
+
├── README.md
|
|
245
|
+
├── requirements.txt
|
|
246
|
+
└── ROADMAP.md
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Contributing
|
|
252
|
+
|
|
253
|
+
TrustLens is designed to grow with the community. Adding a new metric takes just four simple steps:
|
|
254
|
+
|
|
255
|
+
> **Testing Policy:** Current test coverage is 67% to ensure core stability. It will be incrementally improved toward 85%+ as advanced modules (e.g., explainability, visualization) receive additional tests. All new contributions must maintain or improve this baseline.
|
|
256
|
+
|
|
257
|
+
1. Write a pure function `my_metric(y_true, y_pred) -> float`
|
|
258
|
+
2. Add it to the appropriate module (`metrics/calibration.py`, etc.)
|
|
259
|
+
3. Export it from `metrics/__init__.py`
|
|
260
|
+
4. Write a test in `tests/test_<module>.py`
|
|
261
|
+
|
|
262
|
+
See `CONTRIBUTING.md` for the full guide including instructions on adding plugins and explainability methods.
|
|
263
|
+
Review `docs/GITHUB_ISSUES.md` for open tasks ready to be developed.
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## Citation
|
|
268
|
+
|
|
269
|
+
```bibtex
|
|
270
|
+
@software{trustlens2026,
|
|
271
|
+
author = {Shahid Ul Islam},
|
|
272
|
+
title = {TrustLens: Debug your ML models beyond accuracy},
|
|
273
|
+
year = {2026},
|
|
274
|
+
url = {https://github.com/Khanz9664/TrustLens},
|
|
275
|
+
}
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## Author & Maintainer
|
|
281
|
+
|
|
282
|
+
**Shahid Ul Islam**
|
|
283
|
+
- **GitHub**: [Khanz9664](https://github.com/Khanz9664)
|
|
284
|
+
- **Portfolio**: [Visit Portfolio](https://khanz9664.github.io/portfolio/)
|
|
285
|
+
- **LinkedIn**: [Connect on LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/)
|
|
286
|
+
- **Instagram**: [Follow on Instagram](https://instagram.com/shaddy9664)
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
<div align="center">
|
|
291
|
+
|
|
292
|
+
**If TrustLens saved you from a bad deployment, star it.**
|
|
293
|
+
It helps other engineers find it before they make the same mistake.
|
|
294
|
+
|
|
295
|
+
[GitHub](https://github.com/Khanz9664/TrustLens) | [Portfolio](https://khanz9664.github.io/portfolio/) | [LinkedIn](https://www.linkedin.com/in/shahid-ul-islam-13650998/) | [Discussions](https://github.com/Khanz9664/TrustLens/discussions)
|
|
296
|
+
|
|
297
|
+
</div>
|