PyPI - mafex - Versions diffs - 0.1.0__tar.gz - Mend

mafex 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

mafex-0.1.0/LICENSE +21 -0
mafex-0.1.0/PKG-INFO +195 -0
mafex-0.1.0/README.md +154 -0
mafex-0.1.0/evaluation/__init__.py +35 -0
mafex-0.1.0/evaluation/metrics.py +345 -0
mafex-0.1.0/evaluation/samples.py +235 -0
mafex-0.1.0/mafex/__init__.py +47 -0
mafex-0.1.0/mafex/attribution.py +421 -0
mafex-0.1.0/mafex/models.py +505 -0
mafex-0.1.0/mafex/morphology.py +436 -0
mafex-0.1.0/mafex/projection.py +444 -0
mafex-0.1.0/mafex/visualization.py +283 -0
mafex-0.1.0/mafex.egg-info/PKG-INFO +195 -0
mafex-0.1.0/mafex.egg-info/SOURCES.txt +19 -0
mafex-0.1.0/mafex.egg-info/dependency_links.txt +1 -0
mafex-0.1.0/mafex.egg-info/entry_points.txt +2 -0
mafex-0.1.0/mafex.egg-info/requires.txt +18 -0
mafex-0.1.0/mafex.egg-info/top_level.txt +3 -0
mafex-0.1.0/pyproject.toml +78 -0
mafex-0.1.0/run_mafex.py +239 -0
mafex-0.1.0/setup.cfg +4 -0

mafex-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 Muhammet Anıl Yağız
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

mafex-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,195 @@
+Metadata-Version: 2.4
+Name: mafex
+Version: 0.1.0
+Summary: Morpheme-Aligned Faithful Explanations for Turkish NLP
+Author-email: Muhammet Anıl Yağız <213255046@kku.edu.tr>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/anilyagiz/mafex
+Project-URL: Documentation, https://github.com/anilyagiz/mafex#readme
+Project-URL: Repository, https://github.com/anilyagiz/mafex
+Keywords: nlp,xai,explainability,turkish,morphology,interpretability
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Science/Research
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: torch>=2.0.0
+Requires-Dist: transformers>=4.35.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: tqdm>=4.65.0
+Provides-Extra: full
+Requires-Dist: captum>=0.6.0; extra == "full"
+Requires-Dist: shap>=0.42.0; extra == "full"
+Requires-Dist: matplotlib>=3.7.0; extra == "full"
+Requires-Dist: seaborn>=0.12.0; extra == "full"
+Requires-Dist: scipy>=1.10.0; extra == "full"
+Requires-Dist: pandas>=2.0.0; extra == "full"
+Provides-Extra: dev
+Requires-Dist: pytest>=7.4.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
+Requires-Dist: black>=23.0.0; extra == "dev"
+Requires-Dist: isort>=5.12.0; extra == "dev"
+Dynamic: license-file
+# MAFEX - Morpheme-Aligned Faithful Explanations
+> **Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection**
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+## Overview
+MAFEX is a framework for generating faithful, interpretable explanations for Large Language Models (LLMs) in **Morphologically Rich Languages (MRLs)** like Turkish.
+Current XAI methods operate on tokens, which fragment semantic units in agglutinative languages. MAFEX corrects this **Tokenization-Morphology Misalignment (TMM)** by projecting attributions onto linguistically meaningful morphemes.
+### Key Equation
+```
+φ_morph = A · φ_tok           (Morphological Projection)
+S* = λ·φ_morph + (1-λ)·φ_causal  (Causal Regularization)
+```
+Where:
+- **A** ∈ {0,1}^{K×T} is the Alignment Matrix mapping T tokens to K morphemes
+- **φ_tok** is token-level attribution (e.g., Integrated Gradients)
+- **λ** controls the gradient/causal trade-off (default: 0.7)
+## Installation
+```bash
+# Install from PyPI
+pip install mafex
+# Or install latest from GitHub
+pip install git+https://github.com/anilyagiz/mafex.git
+```
+## Quick Start
+### 1. Morphological Analysis
+```python
+from mafex.morphology import MorphemeAnalyzer
+analyzer = MorphemeAnalyzer()
+# Analyze Turkish word
+analysis = analyzer.analyze_word("gelemedim")  # "I could not come"
+print(analysis.morpheme_surfaces)  # ['gel', 'eme', 'di', 'm']
+```
+### 2. Run MAFEX Explanation
+```python
+from mafex.models import DemoModelWrapper
+from mafex.projection import MAFEXPipeline
+# Load model
+wrapper = DemoModelWrapper()
+wrapper.load()
+# Create MAFEX pipeline
+mafex = MAFEXPipeline(
+    wrapper.model,
+    wrapper.tokenizer,
+    lambda_causal=0.7
+)
+# Generate explanation
+result = mafex.explain("Gelemedim")
+# Get top attributed morphemes
+print(result.get_top_morphemes(3))
+# [('-eme', 0.62), ('gel', 0.21), ('-di', 0.12)]
+```
+### 3. Command Line
+```bash
+# Single explanation
+python run_mafex.py --model demo --text "Gelemedim"
+# Evaluation
+python run_mafex.py --model berturk --eval --samples 10
+```
+## Project Structure
+```
+mafex/
+├── mafex/
+│   ├── __init__.py         # Package exports
+│   ├── morphology.py       # Morphological analysis & alignment
+│   ├── attribution.py      # IG, SHAP, DeepLIFT baselines
+│   ├── projection.py       # MAFEX pipeline & causal regularization
+│   ├── models.py           # Model wrappers (BERTurk, Cosmos, etc.)
+│   └── visualization.py    # Plotting utilities
+├── evaluation/
+│   ├── __init__.py
+│   └── metrics.py          # ERASER metrics
+├── benchmark/
+│   ├── __init__.py
+│   └── trust_tr.py         # Trust-TR benchmark (N=850)
+├── notebooks/
+│   └── demo.ipynb          # Interactive demonstration
+├── demo.py                 # CLI demo script
+├── run_mafex.py           # Main runner
+├── config.yaml            # Configuration
+└── requirements.txt       # Dependencies
+```
+## Supported Models
+| Model | Type | Status |
+|-------|------|--------|
+| BERTurk | Encoder | ✅ Tested |
+| YTÜ-Cosmos | Decoder | ⚠️ Pending |
+| Kumru | Decoder | ⚠️ Pending |
+| Aya-23 | Decoder | ⚠️ Pending |
+## Evaluation Metrics
+MAFEX is evaluated using ERASER metrics:
+- **Comprehensiveness**: Does removing important features hurt performance?
+- **Sufficiency**: Are important features alone enough to maintain performance?
+Expected results:
+| Model | Token-IG | Random | MAFEX | Δ |
+|-------|----------|--------|-------|---|
+| BERTurk | 0.42 | 0.50 | 0.68 | +62% |
+| Cosmos | 0.39 | 0.47 | 0.65 | +67% |
+| Kumru | 0.45 | 0.53 | 0.71 | +58% |
+| Aya-23 | 0.41 | 0.49 | 0.69 | +68% |
+## Citation
+```bibtex
+@article{yagiz2024mafex,
+  title={Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection},
+  author={Yağız, Muhammet Anıl},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Acknowledgments
+- [Zemberek](https://github.com/ahmetaa/zemberek-nlp) for Turkish morphological analysis
+- [Captum](https://captum.ai/) for attribution methods
+- [dbmdz/BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) for the Turkish BERT model

mafex-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,154 @@
+# MAFEX - Morpheme-Aligned Faithful Explanations
+> **Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection**
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+## Overview
+MAFEX is a framework for generating faithful, interpretable explanations for Large Language Models (LLMs) in **Morphologically Rich Languages (MRLs)** like Turkish.
+Current XAI methods operate on tokens, which fragment semantic units in agglutinative languages. MAFEX corrects this **Tokenization-Morphology Misalignment (TMM)** by projecting attributions onto linguistically meaningful morphemes.
+### Key Equation
+```
+φ_morph = A · φ_tok           (Morphological Projection)
+S* = λ·φ_morph + (1-λ)·φ_causal  (Causal Regularization)
+```
+Where:
+- **A** ∈ {0,1}^{K×T} is the Alignment Matrix mapping T tokens to K morphemes
+- **φ_tok** is token-level attribution (e.g., Integrated Gradients)
+- **λ** controls the gradient/causal trade-off (default: 0.7)
+## Installation
+```bash
+# Install from PyPI
+pip install mafex
+# Or install latest from GitHub
+pip install git+https://github.com/anilyagiz/mafex.git
+```
+## Quick Start
+### 1. Morphological Analysis
+```python
+from mafex.morphology import MorphemeAnalyzer
+analyzer = MorphemeAnalyzer()
+# Analyze Turkish word
+analysis = analyzer.analyze_word("gelemedim")  # "I could not come"
+print(analysis.morpheme_surfaces)  # ['gel', 'eme', 'di', 'm']
+```
+### 2. Run MAFEX Explanation
+```python
+from mafex.models import DemoModelWrapper
+from mafex.projection import MAFEXPipeline
+# Load model
+wrapper = DemoModelWrapper()
+wrapper.load()
+# Create MAFEX pipeline
+mafex = MAFEXPipeline(
+    wrapper.model,
+    wrapper.tokenizer,
+    lambda_causal=0.7
+)
+# Generate explanation
+result = mafex.explain("Gelemedim")
+# Get top attributed morphemes
+print(result.get_top_morphemes(3))
+# [('-eme', 0.62), ('gel', 0.21), ('-di', 0.12)]
+```
+### 3. Command Line
+```bash
+# Single explanation
+python run_mafex.py --model demo --text "Gelemedim"
+# Evaluation
+python run_mafex.py --model berturk --eval --samples 10
+```
+## Project Structure
+```
+mafex/
+├── mafex/
+│   ├── __init__.py         # Package exports
+│   ├── morphology.py       # Morphological analysis & alignment
+│   ├── attribution.py      # IG, SHAP, DeepLIFT baselines
+│   ├── projection.py       # MAFEX pipeline & causal regularization
+│   ├── models.py           # Model wrappers (BERTurk, Cosmos, etc.)
+│   └── visualization.py    # Plotting utilities
+├── evaluation/
+│   ├── __init__.py
+│   └── metrics.py          # ERASER metrics
+├── benchmark/
+│   ├── __init__.py
+│   └── trust_tr.py         # Trust-TR benchmark (N=850)
+├── notebooks/
+│   └── demo.ipynb          # Interactive demonstration
+├── demo.py                 # CLI demo script
+├── run_mafex.py           # Main runner
+├── config.yaml            # Configuration
+└── requirements.txt       # Dependencies
+```
+## Supported Models
+| Model | Type | Status |
+|-------|------|--------|
+| BERTurk | Encoder | ✅ Tested |
+| YTÜ-Cosmos | Decoder | ⚠️ Pending |
+| Kumru | Decoder | ⚠️ Pending |
+| Aya-23 | Decoder | ⚠️ Pending |
+## Evaluation Metrics
+MAFEX is evaluated using ERASER metrics:
+- **Comprehensiveness**: Does removing important features hurt performance?
+- **Sufficiency**: Are important features alone enough to maintain performance?
+Expected results:
+| Model | Token-IG | Random | MAFEX | Δ |
+|-------|----------|--------|-------|---|
+| BERTurk | 0.42 | 0.50 | 0.68 | +62% |
+| Cosmos | 0.39 | 0.47 | 0.65 | +67% |
+| Kumru | 0.45 | 0.53 | 0.71 | +58% |
+| Aya-23 | 0.41 | 0.49 | 0.69 | +68% |
+## Citation
+```bibtex
+@article{yagiz2024mafex,
+  title={Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection},
+  author={Yağız, Muhammet Anıl},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Acknowledgments
+- [Zemberek](https://github.com/ahmetaa/zemberek-nlp) for Turkish morphological analysis
+- [Captum](https://captum.ai/) for attribution methods
+- [dbmdz/BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) for the Turkish BERT model

mafex-0.1.0/evaluation/__init__.py ADDED Viewed

@@ -0,0 +1,35 @@
+"""
+Evaluation package for MAFEX.
+"""
+# Samples are always available (no torch dependency)
+from .samples import (
+    get_test_samples,
+    get_negative_samples,
+    get_positive_samples,
+    TURKISH_SAMPLES
+)
+# Lazy import for torch-dependent modules
+def __getattr__(name):
+    """Lazy import for metrics that require torch."""
+    if name in ("ERASEREvaluator", "EvaluationResult", "BenchmarkRunner",
+                "compute_faithfulness_correlation", "compare_methods"):
+        from .metrics import (
+            ERASEREvaluator, EvaluationResult, BenchmarkRunner,
+            compute_faithfulness_correlation, compare_methods
+        )
+        return locals()[name]
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+__all__ = [
+    "ERASEREvaluator",
+    "EvaluationResult",
+    "BenchmarkRunner",
+    "compute_faithfulness_correlation",
+    "compare_methods",
+    "get_test_samples",
+    "get_negative_samples",
+    "get_positive_samples",
+    "TURKISH_SAMPLES"
+]