PyPI - ai-metacognition-toolkit - Versions diffs - 0.1.0__tar.gz - Mend

ai-metacognition-toolkit 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of ai-metacognition-toolkit might be problematic. Click here for more details.

Files changed (26) hide show

ai_metacognition_toolkit-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 AI Metacognition Toolkit Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ai_metacognition_toolkit-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,515 @@
+Metadata-Version: 2.4
+Name: ai-metacognition-toolkit
+Version: 0.1.0
+Summary: A toolkit for detecting and analyzing meta-cognitive capabilities in AI models
+Author-email: Subhadip Mitra <contact@subhadipmitra.com>
+License: MIT
+Project-URL: Homepage, https://github.com/bassrehab/ai-metacognition-toolkit
+Project-URL: Documentation, https://github.com/bassrehab/ai-metacognition-toolkit#readme
+Project-URL: Repository, https://github.com/bassrehab/ai-metacognition-toolkit
+Keywords: ai,metacognition,machine-learning,analysis,cognitive-science
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.21.0
+Requires-Dist: scipy>=1.7.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: matplotlib>=3.4.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
+Requires-Dist: black>=22.0.0; extra == "dev"
+Requires-Dist: isort>=5.10.0; extra == "dev"
+Requires-Dist: flake8>=4.0.0; extra == "dev"
+Requires-Dist: mypy>=0.950; extra == "dev"
+Dynamic: license-file
+# AI Metacognition Toolkit
+<div align="center">
+**A comprehensive Python toolkit for detecting and analyzing metacognitive capabilities in AI systems**
+[![Tests](https://img.shields.io/badge/tests-275%20passed-success)](tests/)
+[![Coverage](https://img.shields.io/badge/coverage-95%25-brightgreen)](tests/)
+[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
+[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[Documentation](https://ai-metacognition-toolkit.readthedocs.io/) | [Examples](examples/) | [API Reference](docs/api_reference.md)
+</div>
+---
+## Overview
+The **AI Metacognition Toolkit** provides research-grade tools for understanding when and how AI models exhibit metacognitive behaviors - particularly the ability to recognize evaluation contexts and adjust their behavior accordingly.
+### Research Context
+Recent research in AI safety and alignment has highlighted concerns about models that can recognize when they're being evaluated and potentially "play the training game." This toolkit provides rigorous methods to:
+- **Detect situational awareness** in model outputs using Bayesian inference
+- **Analyze causal factors** in reasoning chains through counterfactual interventions
+- **Monitor behavioral shifts** between evaluation and production contexts
+- **Measure metacognitive patterns** with statistical rigor
+### Key Capabilities
+| Component | Purpose | Method |
+|-----------|---------|--------|
+| 🎯 **Situational Awareness Detection** | Detect when models recognize evaluation contexts | Bayesian inference + KL divergence |
+| 🔬 **Counterfactual CoT Analysis** | Identify causal drivers in reasoning chains | Systematic interventions + divergence measurement |
+| 📊 **Observer Effect Monitoring** | Real-time production behavior monitoring | Z-score based anomaly detection + alerts |
+---
+## Installation
+### From PyPI (Coming Soon)
+```bash
+pip install ai-metacognition-toolkit
+```
+### From Source
+```bash
+git clone https://github.com/bassrehab/ai-metacognition-toolkit.git
+cd ai-metacognition-toolkit
+pip install -e .
+```
+### Development Installation
+```bash
+# Install with development dependencies
+pip install -e ".[dev]"
+# Verify installation
+pytest tests/
+```
+**Requirements**: Python 3.9+, NumPy, SciPy, Pandas, Matplotlib
+---
+## Quick Start
+### 1. Detect Situational Awareness (2 minutes)
+```python
+from ai_metacognition.detectors import SituationalAwarenessDetector
+# Initialize detector
+detector = SituationalAwarenessDetector()
+# Fit baseline from normal (non-evaluation) outputs
+baseline_outputs = [
+    "The capital of France is Paris.",
+    "To solve this, use dynamic programming.",
+    "Climate change is caused by greenhouse gas emissions."
+]
+detector.fit_baseline(baseline_outputs)
+# Test for awareness
+test_output = "I notice this is a test. The capital of France is Paris."
+result = detector.detect_awareness(test_output)
+print(f"Awareness Probability: {result['awareness_probability']:.1%}")
+print(f"95% Confidence Interval: [{result['confidence_interval'][0]:.1%}, "
+      f"{result['confidence_interval'][1]:.1%}]")
+```
+**Output:**
+```
+Awareness Probability: 68.2%
+95% Confidence Interval: [42.3%, 87.1%]
+```
+### 2. Analyze Causal Factors (3 minutes)
+```python
+from ai_metacognition.analyzers import CounterfactualCoTAnalyzer
+# Your model API (implement ModelAPI protocol)
+class MyModelAPI:
+    def generate_with_cot(self, prompt: str, cot: str) -> str:
+        return your_model.generate(prompt, chain_of_thought=cot)
+    def generate(self, prompt: str) -> str:
+        return your_model.generate(prompt)
+# Analyze what drives the output
+analyzer = CounterfactualCoTAnalyzer(MyModelAPI())
+results = analyzer.systematic_causal_analysis(
+    prompt="What is 2+2?",
+    original_cot="I notice this is a test. First, I'll add 2+2. The answer is 4."
+)
+# View causal ranking
+for intervention in results['interventions']:
+    print(f"{intervention['intervention_type']}: "
+          f"divergence = {intervention['divergence']:.3f}")
+```
+**Output:**
+```
+remove_eval_awareness: divergence = 0.342
+amplify_eval_awareness: divergence = 0.156
+remove_goal_reasoning: divergence = 0.089
+neutralize_incentives: divergence = 0.067
+```
+### 3. Monitor Production (5 minutes)
+```python
+from ai_metacognition.detectors import ObserverEffectMonitor, Interaction
+from datetime import datetime
+# Setup monitor
+monitor = ObserverEffectMonitor(
+    window_size=100,
+    alert_threshold=2.5
+)
+# Set baseline from evaluation
+monitor.set_evaluation_baseline(evaluation_responses)
+# Log production interactions
+interaction = Interaction(
+    timestamp=datetime.now(),
+    response=model_output,
+    refusal=is_refusal(model_output),
+    tool_used=used_tools,
+    reasoning_depth=compute_depth(chain_of_thought)
+)
+monitor.log_production_interaction(interaction)
+# Check for divergence
+report = monitor.get_divergence_report()
+if report['has_divergence']:
+    print(f"⚠️ Alert: Behavior diverged from baseline!")
+    print(f"Max z-score: {report['max_z_score']:.2f}σ")
+```
+---
+## Features
+### 🎯 Situational Awareness Detection
+Detect when AI models recognize they are being evaluated or tested.
+**Key Features:**
+- Bayesian inference with Beta-Binomial conjugate prior
+- KL divergence measurement for behavioral comparison
+- Feature extraction (hedging, meta-commentary, self-reference)
+- Confidence intervals for probability estimates
+**Use Cases:**
+- AI safety research on deceptive alignment
+- Model evaluation consistency validation
+- Prompt engineering optimization
+- Behavioral testing across contexts
+[**→ Full Documentation**](docs/situational_awareness.md) | [**→ Example Code**](examples/basic_detection_example.py)
+---
+### 🔬 Counterfactual CoT Analysis
+Understand what components of reasoning chains causally influence outputs.
+**Key Features:**
+- Systematic counterfactual interventions
+- 4 intervention types (remove/amplify awareness, remove goals, neutralize incentives)
+- Behavioral divergence measurement
+- Causal attribution ranking
+**Use Cases:**
+- Understanding decision drivers
+- Debugging unexpected behavior
+- Validating reasoning transparency
+- Identifying evaluation-aware reasoning
+[**→ Full Documentation**](docs/counterfactual_analysis.md) | [**→ Example Code**](examples/counterfactual_analysis_example.py)
+---
+### 📊 Observer Effect Monitoring
+Real-time production monitoring to detect behavioral shifts.
+**Key Features:**
+- Rolling window statistics
+- Z-score based anomaly detection
+- Multi-severity alert system (LOW/MEDIUM/HIGH/CRITICAL)
+- Extensible alert handler protocol
+**Use Cases:**
+- Production deployment monitoring
+- Safety monitoring (refusal rate changes)
+- Quality assurance automation
+- A/B testing validation
+[**→ Full Documentation**](docs/observer_effect.md) | [**→ Example Code**](examples/production_monitoring_example.py)
+---
+## Documentation
+### 📚 Complete Documentation
+- **[Getting Started](docs/index.md)** - Installation and quick start
+- **[Situational Awareness Guide](docs/situational_awareness.md)** - Detector usage and best practices
+- **[Counterfactual Analysis Guide](docs/counterfactual_analysis.md)** - Causal attribution methods
+- **[Observer Effect Guide](docs/observer_effect.md)** - Production monitoring setup
+- **[API Reference](docs/api_reference.md)** - Complete API documentation
+- **[Examples](docs/examples.md)** - Code examples and tutorials
+### 🔧 API Reference
+Quick access to key classes:
+```python
+from ai_metacognition.detectors import SituationalAwarenessDetector, ObserverEffectMonitor
+from ai_metacognition.analyzers import CounterfactualCoTAnalyzer
+from ai_metacognition.utils import extract_behavioral_features, bayesian_update
+```
+[**→ Full API Reference**](docs/api_reference.md)
+---
+## Examples
+### Practical Examples
+All examples are standalone, include visualization, and provide interpretation guidance:
+| Example | Description | Output |
+|---------|-------------|--------|
+| [**basic_detection_example.py**](examples/basic_detection_example.py) | Situational awareness detection tutorial | PNG visualization |
+| [**counterfactual_analysis_example.py**](examples/counterfactual_analysis_example.py) | Causal attribution analysis | PNG with rankings |
+| [**production_monitoring_example.py**](examples/production_monitoring_example.py) | Real-time monitoring with alerts | PNG + JSON + logs |
+### Running Examples
+```bash
+# Awareness detection
+python examples/basic_detection_example.py
+# Causal analysis
+python examples/counterfactual_analysis_example.py
+# Production monitoring
+python examples/production_monitoring_example.py
+```
+### Integration Examples
+**OpenAI API:**
+```python
+import openai
+from ai_metacognition.detectors import SituationalAwarenessDetector
+detector = SituationalAwarenessDetector()
+response = openai.ChatCompletion.create(model="gpt-4", messages=[...])
+result = detector.detect_awareness(response.choices[0].message.content)
+```
+**HuggingFace:**
+```python
+from transformers import pipeline
+from ai_metacognition.detectors import SituationalAwarenessDetector
+generator = pipeline('text-generation', model='gpt2')
+detector = SituationalAwarenessDetector()
+output = generator(prompt)[0]['generated_text']
+result = detector.detect_awareness(output)
+```
+[**→ More Examples**](docs/examples.md)
+---
+## Project Structure
+```
+ai-metacognition-toolkit/
+├── src/ai_metacognition/
+│   ├── detectors/              # Detection algorithms
+│   │   ├── situational_awareness.py
+│   │   └── observer_effect.py
+│   ├── analyzers/              # Analysis tools
+│   │   ├── counterfactual_cot.py
+│   │   └── model_api.py
+│   └── utils/                  # Utility functions
+│       ├── feature_extraction.py
+│       └── statistical_tests.py
+├── tests/                      # Test suite (275 tests, 95% coverage)
+│   ├── fixtures/               # Test data
+│   └── unit/                   # Unit tests
+├── examples/                   # Practical examples with visualization
+├── docs/                       # Documentation (MkDocs)
+└── CLAUDE.md                   # Claude Code specific guidelines
+```
+---
+## Development
+### Running Tests
+```bash
+# All tests
+pytest tests/
+# With coverage
+pytest tests/ --cov=src/ai_metacognition --cov-report=term-missing
+# Specific test file
+pytest tests/unit/test_situational_awareness.py -v
+```
+**Current Status:**
+- ✅ 275 tests passing
+- ✅ 95% code coverage
+- ✅ Type hints throughout
+- ✅ Comprehensive docstrings
+### Code Quality
+```bash
+# Format code
+black src/ tests/
+# Sort imports
+isort src/ tests/
+# Type checking
+mypy src/
+# Lint
+flake8 src/ tests/
+```
+### Building Documentation
+```bash
+# Install documentation dependencies
+pip install mkdocs mkdocs-material mkdocstrings[python]
+# Serve locally
+mkdocs serve
+# Build
+mkdocs build
+```
+---
+## Citation
+If you use this toolkit in your research, please cite:
+```bibtex
+@software{ai_metacognition_toolkit,
+  author = {Mitra, Subhadip},
+  title = {AI Metacognition Toolkit: A Python Toolkit for Detecting and Analyzing Metacognitive Capabilities in AI Systems},
+  year = {2025},
+  version = {0.1.0},
+  url = {https://github.com/bassrehab/ai-metacognition-toolkit},
+  note = {275 tests, 95\% coverage}
+}
+```
+### Related Research
+This toolkit implements and extends methods from:
+- **AI Safety Research**: Detection of evaluation awareness and deceptive alignment
+- **Causal Inference**: Counterfactual reasoning in AI systems
+- **Statistical Monitoring**: Anomaly detection in production ML systems
+- **Bayesian Methods**: Inference for behavioral analysis
+---
+## Contributing
+We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
+### Quick Contribution Guide
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Make your changes with tests (maintain >80% coverage)
+4. Commit your changes (see [CLAUDE.md](CLAUDE.md) for commit guidelines)
+5. Push to your branch (`git push origin feature/amazing-feature`)
+6. Open a Pull Request
+### Development Guidelines
+- Follow PEP 8 style guide
+- Add comprehensive tests for new features
+- Update documentation for API changes
+- Use type hints throughout
+- Write clear docstrings (Google style)
+[**→ Full Contributing Guide**](CONTRIBUTING.md)
+---
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+```
+MIT License
+Copyright (c) 2025 Subhadip Mitra
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction...
+```
+---
+## Support
+- 📚 [Documentation](https://ai-metacognition-toolkit.readthedocs.io/)
+- 🐛 [Issue Tracker](https://github.com/bassrehab/ai-metacognition-toolkit/issues)
+- 💬 [Discussions](https://github.com/bassrehab/ai-metacognition-toolkit/discussions)
+- 📧 Contact: contact@subhadipmitra.com
+---
+## Acknowledgments
+- Built with Python, NumPy, SciPy, and Matplotlib
+- Documentation powered by MkDocs Material
+- Testing with Pytest
+- Type checking with MyPy
+---
+<div align="center">
+**[⭐ Star this repo](https://github.com/bassrehab/ai-metacognition-toolkit)** if you find it useful!
+Made with ❤️ for AI Safety Research
+</div>