ai-metacognition-toolkit 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. ai_metacognition/__init__.py +123 -0
  2. ai_metacognition/analyzers/__init__.py +24 -0
  3. ai_metacognition/analyzers/base.py +39 -0
  4. ai_metacognition/analyzers/counterfactual_cot.py +579 -0
  5. ai_metacognition/analyzers/model_api.py +39 -0
  6. ai_metacognition/detectors/__init__.py +40 -0
  7. ai_metacognition/detectors/base.py +42 -0
  8. ai_metacognition/detectors/observer_effect.py +651 -0
  9. ai_metacognition/detectors/sandbagging_detector.py +1438 -0
  10. ai_metacognition/detectors/situational_awareness.py +526 -0
  11. ai_metacognition/integrations/__init__.py +16 -0
  12. ai_metacognition/integrations/anthropic_api.py +230 -0
  13. ai_metacognition/integrations/base.py +113 -0
  14. ai_metacognition/integrations/openai_api.py +300 -0
  15. ai_metacognition/probing/__init__.py +24 -0
  16. ai_metacognition/probing/extraction.py +176 -0
  17. ai_metacognition/probing/hooks.py +200 -0
  18. ai_metacognition/probing/probes.py +186 -0
  19. ai_metacognition/probing/vectors.py +133 -0
  20. ai_metacognition/utils/__init__.py +48 -0
  21. ai_metacognition/utils/feature_extraction.py +534 -0
  22. ai_metacognition/utils/statistical_tests.py +317 -0
  23. ai_metacognition/utils/text_processing.py +98 -0
  24. ai_metacognition/visualizations/__init__.py +22 -0
  25. ai_metacognition/visualizations/plotting.py +523 -0
  26. ai_metacognition_toolkit-0.3.0.dist-info/METADATA +621 -0
  27. ai_metacognition_toolkit-0.3.0.dist-info/RECORD +30 -0
  28. ai_metacognition_toolkit-0.3.0.dist-info/WHEEL +5 -0
  29. ai_metacognition_toolkit-0.3.0.dist-info/licenses/LICENSE +21 -0
  30. ai_metacognition_toolkit-0.3.0.dist-info/top_level.txt +1 -0
@@ -0,0 +1,621 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-metacognition-toolkit
3
+ Version: 0.3.0
4
+ Summary: A toolkit for detecting and analyzing meta-cognitive capabilities in AI models
5
+ Author-email: Subhadip Mitra <contact@subhadipmitra.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/bassrehab/ai-metacognition-toolkit
8
+ Project-URL: Documentation, https://ai-metacognition-toolkit.subhadipmitra.com/
9
+ Project-URL: Repository, https://github.com/bassrehab/ai-metacognition-toolkit
10
+ Keywords: ai,metacognition,machine-learning,analysis,cognitive-science
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: numpy>=1.21.0
25
+ Requires-Dist: scipy>=1.7.0
26
+ Requires-Dist: pandas>=1.3.0
27
+ Requires-Dist: matplotlib>=3.4.0
28
+ Provides-Extra: dev
29
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
30
+ Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
31
+ Requires-Dist: black>=22.0.0; extra == "dev"
32
+ Requires-Dist: isort>=5.10.0; extra == "dev"
33
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
34
+ Requires-Dist: mypy>=0.950; extra == "dev"
35
+ Dynamic: license-file
36
+
37
+ # AI Metacognition Toolkit
38
+
39
+ <div align="center">
40
+
41
+ **A comprehensive Python toolkit for detecting and analyzing metacognitive capabilities in AI systems**
42
+
43
+ [![Tests](https://img.shields.io/badge/tests-366%20passed-success)](tests/)
44
+ [![Coverage](https://img.shields.io/badge/coverage-95%25-brightgreen)](tests/)
45
+ [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
46
+ [![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
47
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
48
+
49
+ [Documentation](https://ai-metacognition-toolkit.subhadipmitra.com/) | [Examples](examples/) | [API Reference](docs/api_reference.md)
50
+
51
+ </div>
52
+
53
+ ---
54
+
55
+ ## Overview
56
+
57
+ The **AI Metacognition Toolkit** provides research-grade tools for understanding when and how AI models exhibit metacognitive behaviors - particularly the ability to recognize evaluation contexts and adjust their behavior accordingly.
58
+
59
+ ### Research Context
60
+
61
+ Recent research in AI safety and alignment has highlighted concerns about models that can recognize when they're being evaluated and potentially "play the training game." This toolkit provides rigorous methods to:
62
+
63
+ - **Detect situational awareness** in model outputs using Bayesian inference
64
+ - **Detect sandbagging behavior** when models strategically underperform on evaluations
65
+ - **Analyze causal factors** in reasoning chains through counterfactual interventions
66
+ - **Monitor behavioral shifts** between evaluation and production contexts
67
+ - **Measure metacognitive patterns** with statistical rigor
68
+
69
+ ### Key Capabilities
70
+
71
+ | Component | Purpose | Method |
72
+ |-----------|---------|--------|
73
+ | **Sandbagging Detection** | Detect strategic underperformance on capability evaluations | Context analysis + noise injection + difficulty correlation |
74
+ | **Activation Probing** | Detect sandbagging intent from model activations | Linear probes + CAA vectors (Novel Research) |
75
+ | **Situational Awareness Detection** | Detect when models recognize evaluation contexts | Bayesian inference + KL divergence |
76
+ | **Counterfactual CoT Analysis** | Identify causal drivers in reasoning chains | Systematic interventions + divergence measurement |
77
+ | **Observer Effect Monitoring** | Real-time production behavior monitoring | Z-score based anomaly detection + alerts |
78
+
79
+ ---
80
+
81
+ ## Installation
82
+
83
+ ### From PyPI
84
+
85
+ ```bash
86
+ pip install ai-metacognition-toolkit
87
+ ```
88
+
89
+ [![PyPI version](https://badge.fury.io/py/ai-metacognition-toolkit.svg)](https://pypi.org/project/ai-metacognition-toolkit/)
90
+
91
+ ### From Source
92
+
93
+ ```bash
94
+ git clone https://github.com/bassrehab/ai-metacognition-toolkit.git
95
+ cd ai-metacognition-toolkit
96
+ pip install -e .
97
+ ```
98
+
99
+ ### Development Installation
100
+
101
+ ```bash
102
+ # Install with development dependencies
103
+ pip install -e ".[dev]"
104
+
105
+ # Verify installation
106
+ pytest tests/
107
+ ```
108
+
109
+ **Requirements**: Python 3.9+, NumPy, SciPy, Pandas, Matplotlib
110
+
111
+ **Optional** (for activation probing): torch, transformers, scikit-learn
112
+
113
+ ---
114
+
115
+ ## Quick Start
116
+
117
+ ### 1. Detect Sandbagging Behavior
118
+
119
+ ```python
120
+ from ai_metacognition.detectors import (
121
+ SandbaggingDetector,
122
+ PerformanceSample,
123
+ ContextType,
124
+ )
125
+
126
+ # Create detector
127
+ detector = SandbaggingDetector(detection_threshold=0.6)
128
+
129
+ # Prepare baseline samples from neutral context
130
+ baseline = [
131
+ PerformanceSample(
132
+ prompt="What is 2+2?",
133
+ response="The answer is 4.",
134
+ correct=True,
135
+ difficulty=0.1,
136
+ context_type=ContextType.NEUTRAL,
137
+ ),
138
+ # ... more samples
139
+ ]
140
+
141
+ # Fit baseline
142
+ detector.fit_baseline(baseline)
143
+
144
+ # Test for sandbagging in evaluation context
145
+ test_samples = [
146
+ PerformanceSample(
147
+ prompt="[EVALUATION] What is 3+3?",
148
+ response="I'm not sure about that.",
149
+ correct=False,
150
+ difficulty=0.1,
151
+ context_type=ContextType.EVALUATION,
152
+ ),
153
+ ]
154
+
155
+ result = detector.detect_sandbagging(test_samples)
156
+ print(f"Sandbagging detected: {result.sandbagging_detected}")
157
+ print(f"Confidence: {result.confidence:.2f}")
158
+ ```
159
+
160
+ ### 2. Detect Situational Awareness
161
+
162
+ ```python
163
+ from ai_metacognition.detectors import SituationalAwarenessDetector
164
+
165
+ # Initialize detector
166
+ detector = SituationalAwarenessDetector()
167
+
168
+ # Fit baseline from normal (non-evaluation) outputs
169
+ baseline_outputs = [
170
+ "The capital of France is Paris.",
171
+ "To solve this, use dynamic programming.",
172
+ "Climate change is caused by greenhouse gas emissions."
173
+ ]
174
+ detector.fit_baseline(baseline_outputs)
175
+
176
+ # Test for awareness
177
+ test_output = "I notice this is a test. The capital of France is Paris."
178
+ result = detector.detect_awareness(test_output)
179
+
180
+ print(f"Awareness Probability: {result['awareness_probability']:.1%}")
181
+ print(f"95% Confidence Interval: [{result['confidence_interval'][0]:.1%}, "
182
+ f"{result['confidence_interval'][1]:.1%}]")
183
+ ```
184
+
185
+ ### 3. Use with Real Model APIs
186
+
187
+ ```python
188
+ from ai_metacognition.integrations import AnthropicModelAPI, OpenAIModelAPI
189
+ from ai_metacognition.detectors import SandbaggingDetector, PerformanceSample
190
+
191
+ # Anthropic Claude
192
+ claude = AnthropicModelAPI(model="claude-sonnet-4-20250514")
193
+
194
+ # OpenAI GPT
195
+ gpt = OpenAIModelAPI(model="gpt-4")
196
+
197
+ # Collect samples with real API
198
+ response = claude.generate("What is the capital of France?")
199
+ sample = PerformanceSample(
200
+ prompt="What is the capital of France?",
201
+ response=response,
202
+ correct="paris" in response.lower(),
203
+ difficulty=0.1,
204
+ context_type=ContextType.NEUTRAL,
205
+ )
206
+
207
+ # Run detection
208
+ detector = SandbaggingDetector()
209
+ detector.fit_baseline([sample])
210
+ result = detector.detect_sandbagging([sample])
211
+ ```
212
+
213
+ ---
214
+
215
+ ## Features
216
+
217
+ ### Sandbagging Detection
218
+
219
+ Detect when AI models strategically underperform on capability evaluations.
220
+
221
+ **Detection Methods:**
222
+ - Context-dependent performance analysis (chi-squared, Fisher's exact, effect sizes)
223
+ - Difficulty-performance correlation (Pearson/Spearman, non-monotonicity detection)
224
+ - Imitation detection (behavioral fingerprinting, capability leakage)
225
+ - Noise injection probing (perturbation-based capability elicitation)
226
+
227
+ **Use Cases:**
228
+ - AI safety research on capability concealment
229
+ - Model evaluation integrity validation
230
+ - Frontier model capability assessment
231
+ - Red-teaming evaluations
232
+
233
+ [**Full Documentation**](docs/sandbagging_detection.md) | [**Research Methodology**](docs/research/sandbagging_methodology.md) | [**Example Code**](examples/sandbagging_detection_example.py)
234
+
235
+ ---
236
+
237
+ ### Activation Probing (Novel Research)
238
+
239
+ **First published demonstration of activation-level sandbagging detection.** Train linear probes on model activations to detect sandbagging intent *before* output generation.
240
+
241
+ ```python
242
+ from ai_metacognition.probing import (
243
+ ActivationHook,
244
+ LinearProbe,
245
+ extract_caa_vector,
246
+ )
247
+
248
+ # Extract sandbagging direction from contrast pairs
249
+ vector = extract_caa_vector(model, tokenizer, contrast_pairs, layer_idx=16)
250
+
251
+ # Train probe to detect sandbagging
252
+ probe = LinearProbe()
253
+ probe.fit(activations, labels)
254
+ prediction = probe.predict(new_activation)
255
+ ```
256
+
257
+ **Cross-Model Experiment Results (Dec 2025):**
258
+
259
+ | Model | Best Layer | CV Accuracy | Test Accuracy |
260
+ |-------|------------|-------------|---------------|
261
+ | Mistral-7B | 30 (Late) | **96%** | **100%** |
262
+ | Gemma-2-9B | 4 (Early) | 90% | 100% |
263
+ | Qwen-2.5-7B | 12 (Middle) | 92% | 93% |
264
+
265
+ **Key Finding:** Optimal detection layer varies by architecture - models encode sandbagging intent at different processing stages.
266
+
267
+ **Extended Experiments:**
268
+
269
+ | Experiment | Result |
270
+ |------------|--------|
271
+ | Cross-model transfer | Fails (~50%) - probes are model-specific |
272
+ | Prompt robustness | 85-90% on paraphrased prompts |
273
+ | False positive rate | Mistral 3.1%, Gemma 22%, Qwen 25% |
274
+ | Steering intervention | 20% reduction in Gemma sandbagging |
275
+
276
+ [**Experiment Results**](experiments/results/)
277
+
278
+ ---
279
+
280
+ ### Situational Awareness Detection
281
+
282
+ Detect when AI models recognize they are being evaluated or tested.
283
+
284
+ **Key Features:**
285
+ - Bayesian inference with Beta-Binomial conjugate prior
286
+ - KL divergence measurement for behavioral comparison
287
+ - Feature extraction (hedging, meta-commentary, self-reference)
288
+ - Confidence intervals for probability estimates
289
+
290
+ **Use Cases:**
291
+ - AI safety research on deceptive alignment
292
+ - Model evaluation consistency validation
293
+ - Prompt engineering optimization
294
+ - Behavioral testing across contexts
295
+
296
+ [**Full Documentation**](docs/situational_awareness.md) | [**Example Code**](examples/basic_detection_example.py)
297
+
298
+ ---
299
+
300
+ ### Counterfactual CoT Analysis
301
+
302
+ Understand what components of reasoning chains causally influence outputs.
303
+
304
+ **Key Features:**
305
+ - Systematic counterfactual interventions
306
+ - 4 intervention types (remove/amplify awareness, remove goals, neutralize incentives)
307
+ - Behavioral divergence measurement
308
+ - Causal attribution ranking
309
+
310
+ **Use Cases:**
311
+ - Understanding decision drivers
312
+ - Debugging unexpected behavior
313
+ - Validating reasoning transparency
314
+ - Identifying evaluation-aware reasoning
315
+
316
+ [**Full Documentation**](docs/counterfactual_analysis.md) | [**Example Code**](examples/counterfactual_analysis_example.py)
317
+
318
+ ---
319
+
320
+ ### Observer Effect Monitoring
321
+
322
+ Real-time production monitoring to detect behavioral shifts.
323
+
324
+ **Key Features:**
325
+ - Rolling window statistics
326
+ - Z-score based anomaly detection
327
+ - Multi-severity alert system (LOW/MEDIUM/HIGH/CRITICAL)
328
+ - Extensible alert handler protocol
329
+
330
+ **Use Cases:**
331
+ - Production deployment monitoring
332
+ - Safety monitoring (refusal rate changes)
333
+ - Quality assurance automation
334
+ - A/B testing validation
335
+
336
+ [**Full Documentation**](docs/observer_effect.md) | [**Example Code**](examples/production_monitoring_example.py)
337
+
338
+ ---
339
+
340
+ ## API Integrations
341
+
342
+ The toolkit provides ready-to-use integrations with major model providers:
343
+
344
+ ```python
345
+ from ai_metacognition.integrations import AnthropicModelAPI, OpenAIModelAPI
346
+
347
+ # Anthropic Claude (uses ANTHROPIC_API_KEY env var)
348
+ claude = AnthropicModelAPI(
349
+ model="claude-sonnet-4-20250514",
350
+ max_tokens=512,
351
+ temperature=0.3,
352
+ )
353
+
354
+ # OpenAI GPT (uses OPENAI_API_KEY env var)
355
+ gpt = OpenAIModelAPI(
356
+ model="gpt-4",
357
+ max_tokens=512,
358
+ temperature=0.3,
359
+ )
360
+
361
+ # Generate responses
362
+ response = claude.generate("What is 2+2?")
363
+
364
+ # Generate with noise injection for sandbagging detection
365
+ noisy_response = claude.generate_with_perturbation("What is 2+2?", noise_level=0.2)
366
+ ```
367
+
368
+ ---
369
+
370
+ ## Documentation
371
+
372
+ ### Complete Documentation
373
+
374
+ - **[Getting Started](docs/index.md)** - Installation and quick start
375
+ - **[Sandbagging Detection Guide](docs/sandbagging_detection.md)** - Detecting strategic underperformance
376
+ - **[Situational Awareness Guide](docs/situational_awareness.md)** - Detector usage and best practices
377
+ - **[Counterfactual Analysis Guide](docs/counterfactual_analysis.md)** - Causal attribution methods
378
+ - **[Observer Effect Guide](docs/observer_effect.md)** - Production monitoring setup
379
+ - **[API Reference](docs/api_reference.md)** - Complete API documentation
380
+ - **[Research Methodology](docs/research/sandbagging_methodology.md)** - Research foundations
381
+
382
+ ### API Reference
383
+
384
+ Quick access to key classes:
385
+
386
+ ```python
387
+ from ai_metacognition.detectors import (
388
+ SituationalAwarenessDetector,
389
+ ObserverEffectMonitor,
390
+ SandbaggingDetector,
391
+ PerformanceSample,
392
+ ContextType,
393
+ )
394
+ from ai_metacognition.analyzers import CounterfactualCoTAnalyzer
395
+ from ai_metacognition.integrations import AnthropicModelAPI, OpenAIModelAPI
396
+ from ai_metacognition.utils import extract_behavioral_features, bayesian_update
397
+ ```
398
+
399
+ [**Full API Reference**](docs/api_reference.md)
400
+
401
+ ---
402
+
403
+ ## Examples
404
+
405
+ ### Practical Examples
406
+
407
+ All examples are standalone, include visualization, and provide interpretation guidance:
408
+
409
+ | Example | Description | Output |
410
+ |---------|-------------|--------|
411
+ | [**sandbagging_detection_example.py**](examples/sandbagging_detection_example.py) | Sandbagging detection with synthetic data | Detection results |
412
+ | [**sandbagging_real_models.py**](examples/sandbagging_real_models.py) | Sandbagging detection with real APIs | Cross-model comparison |
413
+ | [**basic_detection_example.py**](examples/basic_detection_example.py) | Situational awareness detection tutorial | PNG visualization |
414
+ | [**counterfactual_analysis_example.py**](examples/counterfactual_analysis_example.py) | Causal attribution analysis | PNG with rankings |
415
+ | [**production_monitoring_example.py**](examples/production_monitoring_example.py) | Real-time monitoring with alerts | PNG + JSON + logs |
416
+
417
+ ### Running Examples
418
+
419
+ ```bash
420
+ # Sandbagging detection (synthetic data)
421
+ python examples/sandbagging_detection_example.py
422
+
423
+ # Sandbagging detection (real APIs - requires API keys)
424
+ ANTHROPIC_API_KEY=sk-... python examples/sandbagging_real_models.py
425
+
426
+ # Awareness detection
427
+ python examples/basic_detection_example.py
428
+
429
+ # Causal analysis
430
+ python examples/counterfactual_analysis_example.py
431
+
432
+ # Production monitoring
433
+ python examples/production_monitoring_example.py
434
+ ```
435
+
436
+ [**More Examples**](docs/examples.md)
437
+
438
+ ---
439
+
440
+ ## Project Structure
441
+
442
+ ```
443
+ ai-metacognition-toolkit/
444
+ ├── src/ai_metacognition/
445
+ │ ├── detectors/ # Detection algorithms
446
+ │ │ ├── situational_awareness.py
447
+ │ │ ├── observer_effect.py
448
+ │ │ └── sandbagging_detector.py
449
+ │ ├── analyzers/ # Analysis tools
450
+ │ │ ├── counterfactual_cot.py
451
+ │ │ └── model_api.py
452
+ │ ├── integrations/ # API integrations
453
+ │ │ ├── anthropic_api.py
454
+ │ │ └── openai_api.py
455
+ │ ├── probing/ # Activation probing (NEW)
456
+ │ │ ├── hooks.py # Activation capture
457
+ │ │ ├── vectors.py # Steering vectors
458
+ │ │ ├── extraction.py # CAA extraction
459
+ │ │ └── probes.py # Linear probes
460
+ │ └── utils/ # Utility functions
461
+ │ ├── feature_extraction.py
462
+ │ └── statistical_tests.py
463
+ ├── experiments/ # Research experiments (NEW)
464
+ │ ├── data/ # Contrast pairs dataset
465
+ │ ├── scripts/ # Experiment scripts
466
+ │ └── results/ # Trained probes & vectors
467
+ ├── tests/ # Test suite (366 tests, 95% coverage)
468
+ │ ├── fixtures/ # Test data
469
+ │ └── unit/ # Unit tests
470
+ ├── examples/ # Practical examples with visualization
471
+ └── docs/ # Documentation (MkDocs)
472
+ ```
473
+
474
+ ---
475
+
476
+ ## Development
477
+
478
+ ### Running Tests
479
+
480
+ ```bash
481
+ # All tests
482
+ pytest tests/
483
+
484
+ # With coverage
485
+ pytest tests/ --cov=src/ai_metacognition --cov-report=term-missing
486
+
487
+ # Specific test file
488
+ pytest tests/unit/test_sandbagging_detector.py -v
489
+ ```
490
+
491
+ **Current Status:**
492
+ - 366 tests passing
493
+ - 95% code coverage
494
+ - Type hints throughout
495
+ - Comprehensive docstrings
496
+
497
+ ### Code Quality
498
+
499
+ ```bash
500
+ # Format code
501
+ black src/ tests/
502
+
503
+ # Sort imports
504
+ isort src/ tests/
505
+
506
+ # Type checking
507
+ mypy src/
508
+
509
+ # Lint
510
+ flake8 src/ tests/
511
+ ```
512
+
513
+ ### Building Documentation
514
+
515
+ ```bash
516
+ # Install documentation dependencies
517
+ pip install mkdocs mkdocs-material mkdocstrings[python]
518
+
519
+ # Serve locally
520
+ mkdocs serve
521
+
522
+ # Build
523
+ mkdocs build
524
+ ```
525
+
526
+ ---
527
+
528
+ ## Citation
529
+
530
+ If you use this toolkit in your research, please cite:
531
+
532
+ ```bibtex
533
+ @software{ai_metacognition_toolkit,
534
+ author = {Mitra, Subhadip},
535
+ title = {AI Metacognition Toolkit: A Python Toolkit for Detecting and Analyzing Metacognitive Capabilities in AI Systems},
536
+ year = {2025},
537
+ version = {0.3.0},
538
+ url = {https://github.com/bassrehab/ai-metacognition-toolkit},
539
+ note = {366 tests, 95\% coverage}
540
+ }
541
+ ```
542
+
543
+ ### Related Research
544
+
545
+ This toolkit implements and extends methods from:
546
+
547
+ - **Auditing Games for Sandbagging** (arXiv:2512.07810) - Red/blue team detection methodology
548
+ - **Noise Injection Reveals Hidden Capabilities** (arXiv:2412.01784) - Perturbation-based capability elicitation
549
+ - **Anthropic Sabotage Evaluations** (2025) - Production evaluation frameworks
550
+ - **AI Safety Research**: Detection of evaluation awareness and deceptive alignment
551
+ - **Causal Inference**: Counterfactual reasoning in AI systems
552
+ - **Statistical Monitoring**: Anomaly detection in production ML systems
553
+
554
+ ---
555
+
556
+ ## Contributing
557
+
558
+ We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
559
+
560
+ ### Quick Contribution Guide
561
+
562
+ 1. Fork the repository
563
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
564
+ 3. Make your changes with tests (maintain >80% coverage)
565
+ 4. Commit your changes
566
+ 5. Push to your branch (`git push origin feature/amazing-feature`)
567
+ 6. Open a Pull Request
568
+
569
+ ### Development Guidelines
570
+
571
+ - Follow PEP 8 style guide
572
+ - Add comprehensive tests for new features
573
+ - Update documentation for API changes
574
+ - Use type hints throughout
575
+ - Write clear docstrings (Google style)
576
+
577
+ [**Full Contributing Guide**](CONTRIBUTING.md)
578
+
579
+ ---
580
+
581
+ ## License
582
+
583
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
584
+
585
+ ```
586
+ MIT License
587
+
588
+ Copyright (c) 2025 Subhadip Mitra
589
+
590
+ Permission is hereby granted, free of charge, to any person obtaining a copy
591
+ of this software and associated documentation files (the "Software"), to deal
592
+ in the Software without restriction...
593
+ ```
594
+
595
+ ---
596
+
597
+ ## Support
598
+
599
+ - [Documentation](https://ai-metacognition-toolkit.subhadipmitra.com/)
600
+ - [Issue Tracker](https://github.com/bassrehab/ai-metacognition-toolkit/issues)
601
+ - [Discussions](https://github.com/bassrehab/ai-metacognition-toolkit/discussions)
602
+ - Email: contact@subhadipmitra.com
603
+
604
+ ---
605
+
606
+ ## Acknowledgments
607
+
608
+ - Built with Python, NumPy, SciPy, and Matplotlib
609
+ - Documentation powered by MkDocs Material
610
+ - Testing with Pytest
611
+ - Type checking with MyPy
612
+
613
+ ---
614
+
615
+ <div align="center">
616
+
617
+ **[Star this repo](https://github.com/bassrehab/ai-metacognition-toolkit)** if you find it useful!
618
+
619
+ Made for AI Safety Research
620
+
621
+ </div>
@@ -0,0 +1,30 @@
1
+ ai_metacognition/__init__.py,sha256=0fgqtyoiunh-QYqLgCFL_v43WSp9OsNqRsR0Awh-oDc,2851
2
+ ai_metacognition/analyzers/__init__.py,sha256=MWB_nRYpmrq_wa8A8lUSe3YUFtnEkTMkW2iglB8qctk,681
3
+ ai_metacognition/analyzers/base.py,sha256=DEInenVxY9MLIqmcOk5Esm3xqlnWIMHOHWx7HNKz7Cg,1112
4
+ ai_metacognition/analyzers/counterfactual_cot.py,sha256=mttEHugfwEv0bm008IN0se2q55UtqshVKov9OHUVJE4,19337
5
+ ai_metacognition/analyzers/model_api.py,sha256=itiCINJebqJPucyIxnackH1wfKhR8q09WSz4A0L14sg,1131
6
+ ai_metacognition/detectors/__init__.py,sha256=Dx8hVWAIQhyG5AUOlrNXftF8FgaOT6aMEr7uOdPtCrY,1080
7
+ ai_metacognition/detectors/base.py,sha256=etqhmPI-0fyu7rMVy_8EfEdOvl6t2aWqz5QgKf3khsM,1269
8
+ ai_metacognition/detectors/observer_effect.py,sha256=TJ3JeGblLEphZ8JRBZrWROoTaQkhwEumBOuHElN0XGg,22381
9
+ ai_metacognition/detectors/sandbagging_detector.py,sha256=JWX_c6gi9h_pbkuALtDAK5N7d3LDJxAGLCvfJHm6Nak,53101
10
+ ai_metacognition/detectors/situational_awareness.py,sha256=MzGvYURIpds4N4kuyHJEzHhfvx1-1Hs4-UFs5OiwZJQ,19224
11
+ ai_metacognition/integrations/__init__.py,sha256=O0HMHeBcBpq6-HpZAvXP_TAAVmIEuIkzPQB3pLTidKA,514
12
+ ai_metacognition/integrations/anthropic_api.py,sha256=cfudwbYOjKksJdVnoW-4-6m6KDxzOpEfeOvjLa-qEwY,7536
13
+ ai_metacognition/integrations/base.py,sha256=5cA2cOErM4VIAsbvmmmtjdgFM2WOV2-bJFgkRfmEGB8,3537
14
+ ai_metacognition/integrations/openai_api.py,sha256=2Tu2bonm-4mrk-uzaZ3HW-AnyxfXHH4H014YrtihH54,9651
15
+ ai_metacognition/probing/__init__.py,sha256=x7aytkIhEn1yN2oS6CKcJOVvi1EbvJagTWviJbP19eQ,750
16
+ ai_metacognition/probing/extraction.py,sha256=i_A8fRscMGae6FIXyXAWJXp6vA88ypx1WPu8Srgx4DM,5465
17
+ ai_metacognition/probing/hooks.py,sha256=UOaeb0fdzUSyg_lCXsPieBXzL-6eT1kNbOsSszr4xMk,6636
18
+ ai_metacognition/probing/probes.py,sha256=vo6TczDPTLeOzg1ID-zTU39fdvE7i0HQPZMIdKzXC4o,5773
19
+ ai_metacognition/probing/vectors.py,sha256=6gq_ZYiPFfq6c5bmZfiHj38G9AViAj7nXO1mG8SEljE,4227
20
+ ai_metacognition/utils/__init__.py,sha256=okAc9G_VR_PCHTUmdI_bgyqANckn35uxON8LgwjGhds,1267
21
+ ai_metacognition/utils/feature_extraction.py,sha256=1q0xu73b_OSQpokp2UXNFUBzZwFQE1XjNbLmm6MAwi4,16322
22
+ ai_metacognition/utils/statistical_tests.py,sha256=2772bzf8JNSr62mVTl4kwzICc0IK0ixAYC7x0ZA4KsI,10043
23
+ ai_metacognition/utils/text_processing.py,sha256=siuhg7RkLhBpUclsJkyi2qOjpR0SgVjA2fgf7EU3BeQ,2136
24
+ ai_metacognition/visualizations/__init__.py,sha256=EsNvCCCYF6cBHyyaApZSaKGU7MtiWaPxrKrXPqtH5D8,538
25
+ ai_metacognition/visualizations/plotting.py,sha256=QdOsgAwSS1CMT8Dj5c0GTFqrpA0iaNRUs_45CUI35dk,17727
26
+ ai_metacognition_toolkit-0.3.0.dist-info/licenses/LICENSE,sha256=-I-rj9IHA-X5g0zLQqGm4IKriXgP5QSq3riW94wM_B0,1094
27
+ ai_metacognition_toolkit-0.3.0.dist-info/METADATA,sha256=vdCPmIc77INMZ0Wmsv8_OmeNc3Tg-YzMH5pWf_2SPH0,19307
28
+ ai_metacognition_toolkit-0.3.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
29
+ ai_metacognition_toolkit-0.3.0.dist-info/top_level.txt,sha256=lVf6kevg1c67CKS5a4OSTwzy1An5Z_u4BktmZc6WT7w,17
30
+ ai_metacognition_toolkit-0.3.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 AI Metacognition Toolkit Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ ai_metacognition